vachoppy.fingerprint

vachoppy.fingerprint

Provides tools for calculating and analyzing atomic environment fingerprints, which are based on the partial pair correlation function, g(r).

This module offers a powerful way to characterize crystal structures and track their evolution over time. A “fingerprint” serves as a quantitative measure of the local atomic environment. The module supports two primary workflows:

  1. Static Analysis: Calculating the fingerprint for a single, static crystal structure to characterize its atomic arrangement.

  2. Dynamic Analysis: Tracking how a system’s fingerprint changes over the course of a molecular dynamics trajectory by comparing it to a reference structure.

Main Components

  • FingerPrint: The core class for calculating the g(r) fingerprint for a single pair of atom types in a given crystal structure.

  • get_fingerprint: A convenience function that uses FingerPrint to calculate and concatenate fingerprints for multiple atom pairs.

  • cosine_distance: A utility function to compute a scaled distance metric (from 0 to 1) between two fingerprint vectors.

  • plot_cosine_distance: A high-level analysis function that takes an MD trajectory, generates snapshots, and plots the cosine distance of each snapshot’s fingerprint to a reference, revealing structural changes over time.

Typical Usage

1. Static Fingerprint Calculation:

from vachoppy.fingerprint import get_fingerprint

# Define atom pairs to analyze
pairs = [('Hf', 'Hf'), ('Hf', 'O'), ('O', 'O')]

# Calculate and save the combined fingerprint for a structure
get_fingerprint(
    path_structure='path/to/hfo2.cif',
    filename='hfo2_fingerprint.txt',
    atom_pairs=pairs,
    disp=True  # Also display a plot
)

2. Dynamic Trajectory Analysis:

from vachoppy.fingerprint import plot_cosine_distance

# Trace the cosine distance of a trajectory relative to a reference
plot_cosine_distance(
    path_traj='path/to/trajectory.h5',
    t_interval=0.1,  # Generate a snapshot every 0.1 ps
    reference_structure='path/to/initial_structure.cif',
    # Atom pairs will be auto-generated if not specified
)
class vachoppy.fingerprint.FingerPrint(A: str, B: str, path_structure: str, Rmax: float = 10.0, delta: float = 0.08, sigma: float = 0.03, dirac: str = 'g', verbose: bool = True)[source]

Calculates the atomic environment fingerprint between two atom types.

This class computes the partial pair correlation function, g(r), between two specified atom types (A and B) within a crystal structure. It serves as a “fingerprint” to characterize the local atomic environment. The calculation is vectorized using NumPy for performance.

The primary workflow is to initialize the class, which automatically triggers the calculation. Results can then be visualized using .plot_fingerprint().

Parameters:
  • A (str) – Chemical symbol of the central atom type.

  • B (str) – Chemical symbol of the neighboring atom type.

  • path_structure (str) – Path to the crystallographic structure file (e.g., POSCAR, cif).

  • Rmax (float, optional) – Cutoff radius in Angstroms for the calculation. Defaults to 10.0.

  • delta (float, optional) – Discretization step (bin size) for the distance axis (r). Defaults to 0.08.

  • sigma (float, optional) – Gaussian broadening width applied to interatomic distances. Defaults to 0.03.

  • dirac (str, optional) – Type of Dirac delta function approximation: ‘g’ for Gaussian or ‘s’ for a square function. Defaults to ‘g’.

  • verbose (bool, optional) – Verbosity flag. Defaults to True.

fingerprint

The calculated fingerprint, g(r) - 1, as a 1D NumPy array.

Type:

numpy.ndarray

R

The distance values (r) in Angstroms for which the fingerprint is calculated.

Type:

numpy.ndarray

num_A

The number of atoms of type A found in the structure.

Type:

int

num_B

The number of atoms of type B found in the structure.

Type:

int

Raises:
  • FileNotFoundError – If the specified structure file does not exist.

  • IOError – If the structure file cannot be read by ASE.

  • ValueError – If atom types A or B are not found in the structure, or if an invalid dirac type is specified.

Examples

>>> fp = FingerPrint(
...     A='Ti',
...     B='O',
...     path_structure='POSCAR',
...     rmax=10.0,
...     verbose=True
... )
>>> fp.plot_fingerprint()
calculate() None[source]

Runs the main fingerprint calculation.

This method computes the extended coordinates for neighbor atoms to handle periodic boundaries, then iterates through each central atom to calculate its partial fingerprint, and finally averages the results. The final g(r) - 1 is stored in the self.fingerprint attribute.

plot_fingerprint(title: str | None = None, disp: bool = True, save: bool = True, filename: str | None = None, dpi: int = 300) None[source]

Plots the calculated fingerprint, g(r) - 1.

This method generates a 2D plot of the atomic fingerprint, showing g(r) - 1 as a function of distance (r). The plot can be displayed interactively and optionally saved to a file.

Parameters:
  • title (str | None, optional) – A custom title for the plot. If None, no title is set. Defaults to None.

  • disp (bool, optional) – If True, displays the plot interactively (plt.show()). Defaults to True.

  • save (bool, optional) – If True, saves the plot to a file. Defaults to True.

  • filename (str | None, optional) – The filename for the saved plot. If None, a default filename is automatically generated (e.g., ‘FP_A-B.png’). Defaults to None.

  • dpi (int, optional) – The resolution (dots per inch) for the saved figure. Defaults to 300.

Returns:

This method does not return any value.

Return type:

None

Raises:

RuntimeError – If the fingerprint has not been calculated. Call the .calculate() method first.

Examples

>>> fp = FingerPrint(A='Ti', B='O', path_structure='POSCAR')
>>> fp.calculate()
>>> # Display the plot and save it with a default name
>>> fp.plot_fingerprint()
>>> # Save the plot with a custom name without displaying it
>>> fp.plot_fingerprint(
...     title="Ti-O Fingerprint",
...     disp=False,
...     save=True,
...     filename="tio2_fp.png"
... )
summary()[source]

Prints a summary of the fingerprint analysis settings.

vachoppy.fingerprint.cosine_distance(fp1: ndarray, fp2: ndarray) float[source]

Calculates a scaled cosine distance between two fingerprint vectors.

This function computes the cosine similarity between two vectors and transforms it into a distance metric scaled to the range [0, 1]. A distance of 0 indicates identical vectors, while 1 indicates opposite vectors.

Note

The formula used is 0.5 * (1 - cos_similarity), where cos_similarity is the dot product of the unit vectors.

Parameters:
  • fp1 (np.ndarray) – The first fingerprint vector (1D NumPy array).

  • fp2 (np.ndarray) – The second fingerprint vector (1D NumPy array).

Returns:

The scaled cosine distance, a value between 0.0 and 1.0.

Return type:

float

Raises:

ValueError – If the input arrays have mismatched shapes, which would prevent the dot product calculation.

Examples

>>> import numpy as np
>>> vec1 = np.array([1.0, 0.0, 0.0])
>>> vec2 = np.array([0.0, 1.0, 0.0])
>>> # Identical vectors
>>> cosine_distance(vec1, vec1)
0.0
>>> # Orthogonal vectors
>>> cosine_distance(vec1, vec2)
0.5
>>> # Opposite vectors
>>> cosine_distance(vec1, -vec1)
1.0
vachoppy.fingerprint.get_fingerprint(path_structure: str, filename: str | None, atom_pairs: List[Tuple[str, str]], Rmax: float = 10.0, delta: float = 0.08, sigma: float = 0.03, dirac: str = 'g', disp: bool = True, verbose: bool = True) ndarray[source]

Calculates, concatenates, and saves fingerprints for multiple atom pairs.

This function serves as a convenient wrapper around the FingerPrint class. It iterates through a list of specified atom pairs (A, B), calculates the fingerprint for each, and concatenates them into a single 1D array.

The final result is saved to a two-column text file, where the first column is a composite distance axis and the second is the concatenated fingerprint data. An optional plot visualizes all fingerprints sequentially.

Parameters:
  • path_structure (str) – Path to the crystallographic structure file (e.g., POSCAR, cif).

  • filename (str | None) – The name of the output file to save the concatenated fingerprint data. If None, output file is not saved.

  • atom_pairs (List[Tuple[str, str]]) – A list of tuples, each containing the chemical symbols for an atom pair, e.g., [(‘Ti’, ‘O’), (‘O’, ‘O’)].

  • Rmax (float, optional) – Cutoff radius in Angstroms for the calculation. Defaults to 10.0.

  • delta (float, optional) – Discretization step (bin size) for the distance axis (r). Defaults to 0.08.

  • sigma (float, optional) – Gaussian broadening width for interatomic distances. Defaults to 0.03.

  • dirac (str, optional) – Type of Dirac delta function approximation: ‘g’ for Gaussian or ‘s’ for a square function. Defaults to ‘g’.

  • disp (bool, optional) – If True, displays a plot of the concatenated fingerprints. Defaults to True.

  • verbose (bool, optional) – Verbosity flag. Defaults to True.

Returns:

A single 1D NumPy array containing the concatenated fingerprints of all specified atom pairs.

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If the path_structure file does not exist.

  • ValueError – If an atom type specified in atom_pairs is not found in the structure file.

Examples

>>> pairs = [('Ti', 'Ti'), ('Ti', 'O'), ('O', 'O')]
>>> combined_fp = get_fingerprint(
...     path_structure='POSCAR',
...     filename='tio2_full_fp.txt',
...     atom_pairs=pairs,
...     Rmax=10.0,
...     disp=True
... )
vachoppy.fingerprint.plot_cosine_distance(path_traj: str | list[str], t_interval: float, reference_structure: str, atom_pairs: list[tuple[str, str]] | None = None, Rmax: float = 10.0, delta: float = 0.08, sigma: float = 0.03, dirac: str = 'g', prefix: str = 'cosine_distance_trace', dpi: int = 300, path_dir: str = 'fingerprint_trace', n_jobs: int = -1, find_fluctuations: bool = True, window_size: int = 50, threshold_std: float | None = None, disp: bool = True, verbose: bool = True) None[source]

Traces structural evolution by plotting fingerprint cosine distance over time.

This function provides a comprehensive workflow to analyze how a system’s atomic structure deviates from a reference state over a trajectory. It generates structural snapshots, calculates the fingerprint for each in parallel, and computes the cosine distance to a reference fingerprint. Optionally, it can analyze the resulting time-series data to detect significant fluctuations. The final data is saved to a text file and plotted.

Parameters:
  • path_traj (str | list[str]) – Path to a single HDF5 trajectory file or a list of such paths.

  • t_interval (float) – The time interval in picoseconds (ps) for generating snapshots.

  • reference_structure (str) – Path to the reference structure file (e.g., POSCAR of the initial phase).

  • atom_pairs (list[tuple[str, str]] | None, optional) – A list of atom pairs to include in the fingerprint calculation. If None, all unique pair combinations are auto-generated. Defaults to None.

  • Rmax (float, optional) – Cutoff radius (Å) for the fingerprint calculation. Defaults to 10.0.

  • delta (float, optional) – Discretization step (Å) for the fingerprint. Defaults to 0.08.

  • sigma (float, optional) – Gaussian broadening width (Å) for the fingerprint. Defaults to 0.03.

  • dirac (str, optional) – Dirac function type (‘g’ for Gaussian or ‘s’ for square). Defaults to ‘g’.

  • prefix (str, optional) – A prefix for the output plot and data files. Defaults to ‘cosine_distance_trace’.

  • dpi (int, optional) – Resolution in dots per inch for the saved plot. Defaults to 300.

  • path_dir (str, optional) – Directory to save the final output files. Defaults to ‘fingerprint_trace’.

  • n_jobs (int, optional) – Number of CPU cores for parallel processing. -1 uses all available cores. Defaults to -1.

  • find_fluctuations (bool, optional) – If True, analyzes the data for significant deviations from the mean. Defaults to True.

  • window_size (int, optional) – The window size (number of data points) for the moving average filter. Defaults to 50.

  • threshold_std (float | None, optional) – The threshold in standard deviations (σ) from the global mean to define a fluctuation. If None, fluctuation intervals are not detected. Defaults to None.

  • disp (bool, optional) – If True, displays a plot of the cosine dinstance vs time. Defaults to True.

  • verbose (bool, optional) – Verbosity flag. Defaults to True.

Returns:

This function does not return any value. It saves a plot (.png) and a data file (.txt) to the specified path_dir.

Return type:

None

Raises:
  • IOError – If the reference_structure cannot be read to auto-generate atom pairs.

  • FileNotFoundError – If any of the input trajectory files are not found.

Examples

>>> plot_cosine_distance(
...     path_traj='path/to/trajectory.h5',
...     t_interval=0.1,
...     reference_structure='path/to/initial_POSCAR',
...     atom_pairs=[('Ti', 'O'), ('O', 'O')],
...     threshold_std=2.5
... )