vachoppy.core

Provides the primary user-facing classes and functions for setting up and running a complete vacancy diffusion analysis with VacHopPy.

Core Components

parse_md / parse_lammps: Functions to convert raw MD trajectories from common simulation packages (e.g., VASP, LAMMPS) into the standardized HDF5 format required for analysis.
Site: A fundamental class for analyzing a crystal structure to identify symmetrically inequivalent sites and potential hopping paths. This is the first object to create in an analysis workflow.
Calculator: A unified function that simplifies the setup of an analysis. It intelligently handles both single and multiple trajectory files and returns a configured CalculatorEnsemble instance, ready for computation.

Typical Workflow

A standard analysis involves two main stages: data preparation and calculation.

1. Data Preparation (CLI)

First, convert your raw MD trajectory into the required HDF5 format. This is typically done once using the command-line interface:

vachoppy convert path/to/vasprun.xml 2000 1.0 --label 2000K

2. Analysis Workflow (Python)

Once the HDF5 files are ready, the analysis is performed in Python:

from vachoppy.core import Site, Calculator

# a. Define sites and paths from the crystal structure
site_info = Site(path_structure="path/to/POSCAR", symbol="O")

# b. Set up the analysis for the HDF5 data
#    (This handles both single files and directories automatically)
calc = Calculator(
    path_traj="path/to/hdf5_files/",
    site=site_info,
    t_interval=0.1  # Coarse-graining time in ps
)

# c. Run the full analysis pipeline in parallel
calc.calculate()

# d. View results and generate plots
calc.summary()
calc.plot_D()

vachoppy.core.Calculator(path_traj: str, site: Site, *, t_interval: float | None = None, **kwargs) → CalculatorEnsemble[source]

Initializes and configures a CalculatorEnsemble for analysis.

This function serves as the primary user entry point for setting up a calculation. It intelligently handles both single HDF5 trajectory files and directories containing multiple files by using the TrajectoryBundle class.

If t_interval is not provided, this function will automatically estimate an optimal value based on the mean vibration frequency of the atoms in a representative trajectory.

Parameters:

path_traj (str) – Path to a single HDF5 file or a root directory to search for files.
site (Site) – An initialized Site object containing lattice and hopping path data.
t_interval (float | None, optional) – The time interval in picoseconds (ps) for analysis. If None, the interval is automatically estimated. Defaults to None.
**kwargs – Additional keyword arguments passed to underlying classes. Accepted arguments include: - prefix (str, optional): File prefix for directory scans. Defaults to “TRAJ”. - depth (int, optional): Directory search depth. Defaults to 2. - sampling_size (int, optional): Frames for t_interval estimation. Defaults to 5000. - use_incomplete_encounter (bool, optional): Flag for Encounter analysis. Defaults to True. - eps (float, optional): Tolerance for float comparisons. Defaults to 1.0e-3. - verbose (bool, optional): Verbosity flag. Defaults to True.

Returns:

An initialized and ready-to-use CalculatorEnsemble instance.

Return type:

CalculatorEnsemble

Raises:

FileNotFoundError – If path_traj does not exist or no valid files are found.
ValueError – If t_interval cannot be estimated due to zero mean frequency.

Examples

>>> # Analyze a single trajectory file
>>> site_info = Site("POSCAR", symbol="O")
>>> calc = Calculator("TRAJ_O.h5", site=site_info)

>>> # Analyze a directory of trajectories
>>> calc = Calculator("trajectories/", site=site_info)

class vachoppy.core.Site(path_structure: str, symbol: str, format: str | None = None, rmax: float = 3.25, eps: float = 0.001, verbose: bool = False)[source]

Analyzes a crystal structure to find inequivalent sites and hopping paths.

This class reads a standard crystallographic structure file (e.g., POSCAR, cif), identifies symmetrically inequivalent sites for a specified element, and calculates all unique nearest-neighbor hopping paths up to a given cutoff radius. It is a foundational tool for setting up kinetic Monte Carlo or diffusion analyses.

Parameters:

path_structure (str) – Path to the crystallographic structure file.
symbol (str) – The atomic symbol of the diffusing species to analyze.
structure_format (str | None, optional) – Format of the structure file, as recognized by ASE. If None, ASE will attempt to determine the format automatically. Defaults to None.
rmax (float, optional) – The maximum distance (in Angstroms) to search for neighbors when defining hopping paths. Defaults to 3.25.
eps (float, optional) – A small tolerance value for distance comparisons and identifying atomic coordinates. Defaults to 1.0e-3.
verbose (bool, optional) – Verbosity flag. Defaults to True.

structure

The crystal structure represented as a pymatgen Structure object.

Type:: pymatgen.core.structure.Structure

symbol

The atomic symbol of the diffusing species being analyzed.

Type:: str

path

A list of dictionaries, each describing a unique hopping path with details like distance, coordination number (z), and coordinates.

Type:: list[dict]

path_name

A list of unique names for each hopping path (e.g., ‘A1’, ‘B1’).

Type:: list[str]

site_name

A list of names for the inequivalent sites (e.g., ‘site1’, ‘site2’).

Type:: list[str]

lattice_sites

A list detailing each atomic site of the specified symbol, including its fractional and Cartesian coordinates.

Type:: list[dict]

lattice_parameter

The 3x3 lattice matrix of the crystal structure.

Type:: numpy.ndarray

Raises:

FileNotFoundError – If the specified structure file does not exist.
IOError – If the structure file cannot be read or converted by ASE/pymatgen.
ValueError – If the specified symbol is not found in the structure.

Examples

>>> tio2_site = Site(
...     path_structure='TiO2_rutile.cif',
...     symbol='Ti',
...     rmax=3.5,
...     verbose=True
... )
# This will create the Site object and print a summary of the analysis.
# Hopping path data can then be accessed via tio2_site.path

summary() → None[source]

Prints a formatted summary of the site and path analysis to the console.

The summary includes the number of inequivalent sites and paths found, followed by a detailed table of each unique path, including its name, initial and final sites, distance, coordination number, and fractional coordinates.

vachoppy.core.parse_lammps(lammps_data: str, lammps_dump: str, atom_style_data: str, atom_style_dump: str, atom_symbols: dict[int, str], temperature: float, dt: float = 1.0, label: str | None = None, chunk_size: int = 5000, dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.float64'>, verbose: bool = True) → None[source]

Parses a LAMMPS trajectory using MDAnalysis and saves data to HDF5 files.

This function leverages the MDAnalysis library to efficiently process large LAMMPS data and dump files. It reads the trajectory in chunks to maintain low memory usage while converting atomic positions to unwrapped fractional coordinates for accurate periodic boundary handling.

Key Features: - Efficiently processes large LAMMPS dump files using MDAnalysis. - Unwraps atomic coordinates across periodic boundaries for accurate analysis. - Separates atom data by chemical symbol into distinct HDF5 files. - Stores simulation parameters as metadata in each output file.

Parameters:

lammps_data (str) – Path to the LAMMPS data file containing topology and box info.
lammps_dump (str) – Path to the LAMMPS dump file containing the trajectory.
atom_style_data (str) – Atom style string for the LAMMPS data file (e.g., ‘full’, ‘atomic’).
atom_style_dump (str) – Atom style string for the LAMMPS dump file (e.g., ‘atomic’).
atom_symbols (dict[int, str]) – A dictionary mapping atom type IDs (int) to chemical symbols (str), e.g., {1: ‘Ti’, 2: ‘O’}.
temperature (float) – Simulation temperature in Kelvin, to be stored as metadata.
dt (float, optional) – Timestep in femtoseconds (fs). Defaults to 1.0.
label (str | None, optional) – A custom suffix for output filenames (e.g., ‘TRAJ_SYMBOL_LABEL.h5’). Defaults to None.
chunk_size (int, optional) – Number of frames to read into memory per chunk. Larger values may improve speed but increase RAM usage. Defaults to 5000.
dtype (DTypeLike, optional) – NumPy data type for storing positions and forces, affecting precision and file size. Defaults to np.float64.
verbose (bool, optional) – Verbosity flag. Defaults to True.

Returns:

This function does not return a value; it writes one or more HDF5 files to disk in the current directory.

Return type:

None

Raises:

ImportError – If the MDAnalysis library is not installed.
FileNotFoundError – If a specified LAMMPS input file is not found.
ValueError – If an atom type in the trajectory is missing from atom_symbols, or if force data is not found in the dump file.

Examples

>>> parse_lammps(
...     lammps_data='tio2.data',
...     lammps_dump='tio2.dump',
...     atom_style_data='id type x y z',
...     atom_style_dump='id type x y z fx fy fz',
...     atom_symbols={1: 'Ti', 2: 'O'},
...     temperature=1200.0,
...     dt=1.0,
...     label='1200K'
... )
# This will create files like 'TRAJ_Ti_1200K.h5' and 'TRAJ_O_1200K.h5'.

vachoppy.core.parse_md(filename: str, format: str, temperature: float, dt: float = 1.0, label: str = None, chunk_size: int = 5000, dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.float64'>, verbose: bool = True) → None[source]

Parses a molecular dynamics (MD) trajectory and saves the data to HDF5 files.

This function provides a memory-efficient way to process large MD trajectory files supported by ASE (e.g., VASP outputs, extxyz). It reads and processes the trajectory in chunks, converting atomic positions to unwrapped fractional coordinates to correctly handle periodic boundary conditions.

Key Features: - Processes large files in memory-efficient chunks using ASE’s iread. - Unwraps atomic coordinates across periodic boundaries for accurate analysis. - Separates data by chemical symbol into distinct HDF5 files. - Includes essential simulation metadata (lattice, temp, etc.) in each file.

Parameters:

filename (str) – Path to the input MD trajectory file (e.g., ‘vasprun.xml’).
file_format (str) – The file format string recognized by ASE (e.g., ‘vasp-xml’, ‘extxyz’).
temperature (float) – Simulation temperature in Kelvin, to be stored as metadata.
dt (float, optional) – Timestep in femtoseconds (fs). Defaults to 1.0.
label (str | None, optional) – A custom suffix for output filenames (e.g., ‘TRAJ_SYMBOL_LABEL.h5’). Defaults to None.
chunk_size (int, optional) – Number of frames to read into memory per chunk. Larger values may improve speed but increase RAM usage. Defaults to 5000.
dtype (DTypeLike, optional) – NumPy data type for storing positions and forces, affecting precision and file size. Defaults to np.float64.
verbose (bool, optional) – Verbosity flag. Defaults to True.

Returns:

This function does not return a value; it writes one or more HDF5 files to disk in the current directory.

Return type:

None

Raises:

FileNotFoundError – If the input trajectory file is not found.
ValueError – If force data is missing from the trajectory frames.

Examples

>>> parse_md(
...     filename='path/to/vasprun.xml',
...     file_format='vasp-xml',
...     temperature=2000.0,
...     dt=2.0,
...     label='2000K'
... )
# This will create files like 'TRAJ_Ti_2000K.h5' and 'TRAJ_O_2000K.h5'.