vachoppy.einstein

vachoppy.einstein

Provides tools for calculating diffusivity from molecular dynamics trajectories using the Einstein relation (MSD = 6Dt).

This module contains the primary classes for Mean Squared Displacement (MSD) analysis. It is designed to handle both individual simulations and ensembles of simulations at various temperatures.

Main Components

  • MSDCalculator: A class for analyzing a single trajectory file. It calculates the MSD over time and fits it to determine the diffusivity.

  • MSDEnsemble: A class that manages a collection of trajectories (typically at different temperatures). It runs MSDCalculator for each, aggregates the results, and performs Arrhenius analysis to find the activation energy for diffusion.

  • Einstein: The main factory function and user entry point. It automatically creates either an MSDCalculator or an MSDEnsemble object based on the input path (file or directory), simplifying the analysis workflow.

Typical Usage

The primary way to use this module is through the Einstein factory function:

from vachoppy import Einstein

# Analyze a single simulation at 1000 K
analyzer_single = Einstein(
    path_traj='TRAJ_O_1000K.h5',
    symbol='O',
    skip=1.0  # Skip first 1 ps for equilibration
)
analyzer_single.calculate()
analyzer_single.summary()

# Analyze a multi-temperature ensemble
analyzer_ensemble = Einstein(
    path_traj='path/to/all_trajectories/',
    symbol='O',
    skip=1.0
)
analyzer_ensemble.calculate()
analyzer_ensemble.plot_D() # Show Arrhenius plot
vachoppy.einstein.Einstein(path_traj: str, symbol: str, skip: float = 0.0, segment_length: float | list[float] | None = None, start: float = 1.0, end: float | None = None, **kwargs) MSDEnsemble | MSDCalculator[source]

Creates an appropriate MSD analysis object based on the input path.

This factory function serves as the main user entry point for Einstein relation analysis. It intelligently selects the correct calculator class: - MSDEnsemble: If path_traj is a directory (for multi-temperature analysis). - MSDCalculator: If path_traj is a single file.

Parameters:
  • path_traj (str) – Path to a directory containing HDF5 trajectory files or to a single HDF5 trajectory file.

  • symbol (str) – The chemical symbol of the target diffusing species to analyze.

  • skip (float, optional) – Initial time in picoseconds (ps) to skip for equilibration. Defaults to 0.0.

  • segment_length (float | list[float] | None, optional) – Time length in ps for statistical averaging segments. If a list, it must match the number of temperatures (for ensemble analysis). If None, the entire trajectory is used. Defaults to None.

  • start (float, optional) – Start time in picoseconds (ps) for the linear fitting range. Defaults to 1.0.

  • end (float | None, optional) – End time in picoseconds (ps) for the linear fitting range. If None, the end of the trajectory is used. Defaults to None.

  • **kwargs (Any) – Additional keyword arguments passed to MSDEnsemble or TrajectoryBundle (e.g., prefix, depth).

Returns:

An initialized analysis object ready for calculation.

Return type:

Union[MSDEnsemble, MSDCalculator]

Raises:
  • FileNotFoundError – If the provided path_traj does not exist or is not a file/directory.

  • TypeError – If segment_length is a list when path_traj points to a single file.

Examples

>>> # Analyze an ensemble of trajectories in a directory
>>> ensemble_analyzer = einstein_analyzer(
...     path_traj='path/to/trajectories/',
...     symbol='Li'
... )
>>> ensemble_analyzer.calculate()
>>> # Analyze a single trajectory file
>>> single_analyzer = einstein_analyzer(
...     path_traj='path/to/TRAJ_Li_1000K.h5',
...     symbol='Li'
... )
>>> single_analyzer.calculate()
class vachoppy.einstein.MSDCalculator(path_traj: str, symbol: str, skip: float = 0.0, segment_length: float | None = None, start: float = 1.0, end: float | None = None, verbose: bool = True)[source]

“Analyzes a trajectory to calculate Mean Squared Displacement and diffusivity.

This class reads a single HDF5 trajectory file for a specific atomic species to calculate its Mean Squared Displacement (MSD) and subsequently its diffusivity via the Einstein relation.

The primary workflow is to initialize the class and then call the .calculate() method. Results are stored as attributes and can be plotted using .plot_msd() or summarized with .summary(). The class is designed to be memory-efficient by discarding the large position array after the MSD calculation is complete.

Parameters:
  • path_traj (str) – Path to the HDF5 trajectory file.

  • symbol (str) – The chemical symbol of the target diffusing species to analyze.

  • skip (float, optional) – Initial time in picoseconds (ps) to skip for thermal equilibration. Defaults to 0.0.

  • segment_length (float | None, optional) – Time length in picoseconds (ps) for each segment used in statistical averaging. If None, the entire trajectory after ‘skip’ is treated as a single segment. Defaults to None.

  • start (float, optional) – Start time in picoseconds (ps) for the linear fitting range of the MSD curve. Defaults to 1.0.

  • end (float | None, optional) – End time in picoseconds (ps) for the linear fitting range. If None, the end of the trajectory is used. Defaults to None.

  • verbose (bool, optional) – Verbosity flag. Defaults to True.

msd

The calculated Mean Squared Displacement values in Ų.

Type:

numpy.ndarray

timestep

The time values in picoseconds (ps) corresponding to each MSD point.

Type:

numpy.ndarray

diffusivity

The calculated diffusivity in m²/s, derived from the linear fit.

Type:

float

intercept

The y-intercept of the linear fit of the MSD curve.

Type:

float

temperature

Temperature in Kelvin, read from the trajectory file’s metadata.

Type:

float

dt

Timestep in femtoseconds (fs), read from metadata.

Type:

float

nsw

Total number of steps in the trajectory, read from metadata.

Type:

int

lattice

The 3x3 lattice matrix, read from metadata.

Type:

numpy.ndarray

Raises:
  • FileNotFoundError – If the input trajectory file is not found.

  • ValueError – If the specified symbol is not present in the trajectory file, or if segment_length is too small.

Examples

>>> msd_calc = MSDCalculator(
...     path_traj='TRAJ_Li_1000K.h5',
...     symbol='Li',
...     skip=10.0,
...     start=5.0,
...     end=20.0
... )
>>> msd_calc.calculate()
>>> msd_calc.summary()
# Access results directly:
# print(f"Calculated Diffusivity: {msd_calc.diffusivity:.3e} m^2/s")
calculate()[source]

Runs the MSD and diffusivity calculation pipeline.

plot_msd(disp: bool = True, save: bool = True, filename: str = 'msd.png', dpi: int = 300, ax: Axes | None = None, **kwargs) Line2D[source]

Plots the calculated MSD versus time.

This method visualizes the MSD curve. If diffusivity has been calculated, it also overlays the linear fit and the fitting range boundaries. The plot can be displayed interactively, saved to a file, or drawn on an existing matplotlib Axes object for creating complex figures.

Parameters:
  • disp (bool, optional) – If True, displays the plot interactively (plt.show()). Defaults to True.

  • save (bool, optional) – If True, saves the plot to a file. Defaults to True.

  • filename (str, optional) – The path and name of the file to save the plot. Defaults to ‘msd.png’.

  • dpi (int, optional) – The resolution in dots per inch for the saved figure. Defaults to 300.

  • ax (Axes | None, optional) – A matplotlib Axes object to plot on. If None, a new figure and axes are created automatically. This is useful for creating subplots. Defaults to None.

  • **kwargs – Additional keyword arguments to be passed directly to the ax.plot() function for the main MSD curve (e.g., label, color, linewidth).

Returns:

The Line2D object for the plotted MSD curve, which can be used for further customization (e.g., adding to a legend).

Return type:

Line2D

Examples

>>> # Simple plot, shown and saved as 'msd.png'
>>> msd_calc.calculate()
>>> msd_calc.plot_msd(label='My System')
>>> # Plotting on an existing subplot figure
>>> import matplotlib.pyplot as plt
>>> fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
>>> msd_calc_1.plot_msd(ax=ax1, disp=False, save=False, label='Run 1')
>>> msd_calc_2.plot_msd(ax=ax2, disp=False, save=False, label='Run 2')
>>> fig.suptitle("Comparison of MSD")
>>> fig.tight_layout()
>>> fig.savefig("msd_comparison.png")
summary()[source]

Prints a formatted summary of the MSD analysis results.

class vachoppy.einstein.MSDEnsemble(path_traj: str, symbol: str, skip: float = 0.0, segment_length: float | list[float] | None = None, start: float = 1.0, end: float | None = None, verbose: bool = True, **kwargs)[source]

Analyzes an ensemble of trajectories to find temperature-dependent diffusivity.

This class manages multiple MSDCalculator instances, typically for simulations run at different temperatures. It calculates the diffusivity for each trajectory (or temperature group) in parallel and then performs an Arrhenius fit to determine the activation energy (Ea_D) and pre-exponential factor (D0).

The main workflow is to initialize the class, call .calculate() to run the analysis, and then use plotting methods like .plot_msd() and .plot_D() or .summary() to view the results.

Parameters:
  • path_traj (str) – Path to a directory containing HDF5 trajectory files.

  • symbol (str) – The chemical symbol of the target diffusing species to analyze.

  • skip (float, optional) – Initial time in picoseconds (ps) to skip for thermal equilibration. Defaults to 0.0.

  • segment_length (float | list[float] | None, optional) – Time length in ps for statistical averaging segments. Can be a single float for all temperatures or a list of floats corresponding to each temperature. If None, the entire trajectory is used. Defaults to None.

  • start (float, optional) – Start time in picoseconds (ps) for the linear fitting range of the MSD curve. Defaults to 1.0.

  • end (float | None, optional) – End time in picoseconds (ps) for the linear fitting range. If None, the end of each trajectory is used. Defaults to None.

  • verbose (bool, optional) – Verbosity flag. Defaults to True.

  • **kwargs (Any) – Additional keyword arguments passed to the TrajectoryBundle class for file discovery (e.g., prefix, depth).

D

An array of the average diffusivity (m²/s) for each temperature.

Type:

numpy.ndarray

Ea_D

Activation energy (eV) from the Arrhenius fit.

Type:

float

D0

Pre-exponential factor (m²/s) from the Arrhenius fit.

Type:

float

R2

The R-squared value indicating the goodness of the Arrhenius fit.

Type:

float

calculators

A list of the successfully completed MSDCalculator instances.

Type:

list[MSDCalculator]

temperatures

An array of the unique temperatures found in the ensemble.

Type:

numpy.ndarray

Raises:

ValueError – If the length of segment_length (if provided as a list) does not match the number of unique temperatures found.

calculate(n_jobs: int = -1, verbose=True) MSDEnsemble[source]

Runs the full analysis pipeline in parallel for all trajectories.

This method orchestrates the MSD calculation for each trajectory file, aggregates the results by temperature, and performs an Arrhenius fit if at least two temperatures are present.

Parameters:

n_jobs (int, optional) – Number of CPU cores for parallel processing via joblib. -1 uses all available cores. Defaults to -1.

Returns:

Returns the instance itself to allow for method chaining (e.g., ensemble.calculate().summary()).

Return type:

MSDEnsemble

plot_D(disp: bool = True, save: bool = True, filename: str = 'D_atom.png', dpi: int = 300) None[source]

Plots the Arrhenius plot of diffusivity (D) vs. inverse temperature (1/T).

Parameters:
  • disp (bool, optional) – If True, displays the plot interactively (plt.show()). Defaults to True.

  • save (bool, optional) – If True, saves the plot to a file. Defaults to True.

  • filename (str, optional) – The path and name of the file to save the plot. Defaults to ‘D_arrhenius.png’.

  • dpi (int, optional) – The resolution for the saved figure. Defaults to 300.

plot_msd(disp: bool = True, save: bool = True, filename: str = 'msd.png', dpi: int = 300) None[source]

Plots temperature-averaged MSD curves for each temperature.

This method visualizes the averaged MSD curve for each unique temperature in the ensemble. The linear fit for each curve is also shown.

Parameters:
  • disp (bool, optional) – If True, displays the plot interactively (plt.show()). Defaults to True.

  • save (bool, optional) – If True, saves the plot to a file. Defaults to True.

  • filename (str, optional) – The path and name of the file to save the plot. Defaults to ‘msd_ensemble.png’.

  • dpi (int, optional) – The resolution in dots per inch for the saved figure. Defaults to 300.

save_parameters(filename: str = 'einstein.json') None[source]

Saves calculated diffusivities and Arrhenius parameters to a JSON file.

The output JSON is self-documenting, including a ‘description’ key that explains each saved parameter. NumPy arrays are converted to lists for JSON compatibility.

Parameters:

filename (str, optional) – The name of the output JSON file. Defaults to “einstein_ensemble.json”.

Raises:

RuntimeError – If .calculate() has not been run successfully.

Examples

>>> ensemble = MSDEnsemble(...)
>>> ensemble.calculate()
>>> ensemble.save_parameters("results.json")
summary() None[source]

Prints a formatted summary of the ensemble analysis results.

The summary includes a table of average diffusivities per temperature and the results of the Arrhenius fit if applicable.