vachoppy.utils

vachoppy.utils

Provides utility functions and classes for managing, inspecting, and preparing trajectory and structural data for the VacHopPy analysis workflow.

This module contains a collection of helper tools for common data-handling tasks. These tools operate on the standardized HDF5 trajectory files used by the package or help in generating data for other analyses.

Main Components

  • Data Management Functions:
    • concat_traj: Joins two HDF5 trajectory files into a single, continuous file.

    • cut_traj: Extracts a specific range of frames from a trajectory file.

    • show_traj: Prints a human-readable summary of a trajectory file’s metadata.

  • Data Processing Classes:
    • Snapshots: A class that reads one or more trajectories and generates time-averaged structural snapshots at regular intervals.

  • Performance Decorators:
    • monitor_performance: A decorator to measure the execution time and peak memory usage of a function.

Typical Usage

1. Inspecting and Combining Files:

from vachoppy.utils import show_traj, concat_traj

# Inspect two trajectory files
show_traj('run1.h5')
show_traj('run2.h5')

# Concatenate them into a new file
concat_traj('run1.h5', 'run2.h5', label='concat')

2. Generating Structural Snapshots:

from vachoppy.utils import Snapshots

# Process a trajectory to get snapshots every 0.1 ps
snapshot_generator = Snapshots(path_traj='full_run.h5', t_interval=0.1)

# Save the snapshots as POSCAR files
snapshot_generator.save_snapshots(path_dir='snapshots_for_analysis')
class vachoppy.utils.Snapshots(path_traj: str | list[str], t_interval: float, eps: float = 0.001, verbose: bool = True)[source]

Generates step-wise structure files from a set of HDF5 trajectory files.

This class reads one or more HDF5 trajectory files, validates their consistency, and reconstructs the full, continuous atomic trajectory. It then calculates averaged atomic positions for user-specified time intervals. The resulting structures (“snapshots”) are stored in memory and can be saved to individual files using the save_snapshots method.

Parameters:
  • path_traj (str | list[str]) – A path to a single HDF5 trajectory file or a list of such paths.

  • t_interval (float) – The time interval in picoseconds (ps) for averaging snapshots. This must be a multiple of the simulation’s dt.

  • eps (float, optional) – A tolerance for floating-point comparisons of metadata. Defaults to 1.0e-3.

  • verbose (bool, optional) – Verbosity flag. Defaults to True.

pos

A 3D array of shape (num_steps, num_atoms, 3) containing the time-averaged, wrapped fractional coordinates for each snapshot.

Type:

numpy.ndarray

num_steps

The total number of snapshots generated.

Type:

int

dt

The simulation timestep in femtoseconds (fs), read from metadata.

Type:

float

lattice

The 3x3 lattice matrix of the simulation cell.

Type:

numpy.ndarray

atom_counts

A dictionary of atom counts for each chemical symbol.

Type:

dict

num_atoms

The total number of atoms in the system.

Type:

int

Raises:
  • FileNotFoundError – If any of the input trajectory files are not found.

  • ValueError – If path_traj is empty, if t_interval is invalid, or if metadata is inconsistent across multiple trajectory files.

Examples

>>> # Create snapshots every 10 ps from a list of files
>>> snapshot_generator = Snapshots(
...     path_traj=['TRAJ_Hf_run1.h5', 'TRAJ_O_run1.h5'],
...     t_interval=0.1
... )
>>> # Save the snapshots as VASP POSCAR files
>>> snapshot_generator.save_snapshots(
...     path_dir='poscar_snapshots',
...     file_format='vasp',
...     prefix='POSCAR'
... )
save_snapshots(path_dir: str = 'snapshots', format: str = 'vasp', prefix: str = 'POSCAR')[source]

Saves the averaged snapshots as a series of structure files using ASE.

This method uses the Atomic Simulation Environment (ASE) to write the atomic structure of each generated snapshot to a separate file. It also creates a description.txt file in the output directory summarizing the snapshot parameters and the mapping from filename to simulation time.

Parameters:
  • path_dir (str, optional) – The directory where output files will be saved. It will be created if it does not exist. Defaults to ‘snapshots’.

  • format (str, optional) – The output file format supported by ase.io.write. Defaults to ‘vasp’.

  • prefix (str, optional) – The prefix for the output filenames (e.g., ‘POSCAR’, ‘snapshot’). Defaults to ‘POSCAR’.

Returns:

This method does not return a value; it writes files to disk.

Return type:

None

vachoppy.utils.concat_traj(path_traj1: str, path_traj2: str, label: str = 'CONCAT', chunk_size: int = 10000, eps: float = 0.001, dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.float64'>, verbose: bool = True) None[source]

Concatenates two HDF5 trajectory files after checking for consistency.

This function joins two sequential trajectory simulations. It performs a thorough check to ensure that critical metadata (symbol, composition, temperature, dt, lattice) are consistent between the two files before proceeding. It also handles periodic boundary conditions by calculating an offset to create a continuous, unwrapped trajectory.

Parameters:
  • path_traj1 (str) – Path to the first HDF5 trajectory file.

  • path_traj2 (str) – Path to the second HDF5 trajectory file.

  • label (str, optional) – A label for the output concatenated file, creating a filename like ‘TRAJ_SYMBOL_LABEL.h5’. Defaults to “CONCAT”.

  • chunk_size (int, optional) – The number of frames to process at once during the copy operation. Defaults to 10000.

  • eps (float, optional) – Tolerance for comparing floating-point metadata values. Defaults to 1.0e-3.

  • dtype (DTypeLike, optional) – NumPy data type for the output arrays. Defaults to np.float64.

  • verbose (bool, optional) – Verbosity flag. Defaults to True.

Returns:

This function saves a new, concatenated HDF5 file to disk.

Return type:

None

Raises:
  • FileNotFoundError – If path_traj1 or path_traj2 is not found.

  • ValueError – If the metadata of the two files is inconsistent or if either file contains no frames.

vachoppy.utils.cut_traj(path_traj: str, start_frame: int | None = None, end_frame: int | None = None, label: str = 'CUT', chunk_size: int = 5000) None[source]

Cuts a portion of a trajectory file and saves it as a new file.

This function extracts a specific range of frames (from start_frame to end_frame) from a source HDF5 trajectory and saves it into a new HDF5 file with updated metadata.

Parameters:
  • path_traj (str) – Path to the source HDF5 trajectory file.

  • start_frame (int) – The starting frame number to include in the new trajectory (inclusive).

  • end_frame (int) – The ending frame number to include in the new trajectory (exclusive).

  • label (str, optional) – A label for the output cut file, creating a filename like ‘TRAJ_SYMBOL_LABEL.h5’. Defaults to “CUT”.

  • chunk_size (int, optional) – The number of frames to process in each chunk to conserve memory. Defaults to 5000.

Returns:

This function saves a new, shorter HDF5 file to disk.

Return type:

None

Raises:
  • FileNotFoundError – If path_traj is not found.

  • ValueError – If the specified frame range is invalid.

vachoppy.utils.monitor_performance(func)[source]

A decorator that measures and prints the execution time and peak memory usage of a function.

vachoppy.utils.show_traj(path_traj: str) None[source]

Displays metadata and dataset info from a trajectory HDF5 file.

This utility function reads the metadata and dataset shapes from a given HDF5 trajectory file and prints a formatted, human-readable summary to the console.

Parameters:

path_traj (str) – Path to the HDF5 trajectory file to inspect.

Returns:

This function prints information to the console.

Return type:

None

Raises:

FileNotFoundError – If the input HDF5 file is not found.

Examples

>>> show_traj('path/to/my_trajectory.h5')