vachoppy.einstein
vachoppy.einstein
Provides tools for calculating diffusivity from molecular dynamics trajectories using the Einstein relation (MSD = 6Dt).
This module contains the primary classes for Mean Squared Displacement (MSD) analysis. It is designed to handle both individual simulations and ensembles of simulations at various temperatures.
Main Components
MSDCalculator: A class for analyzing a single trajectory file. It calculates the MSD over time and fits it to determine the diffusivity.
MSDEnsemble: A class that manages a collection of trajectories (typically at different temperatures). It runs MSDCalculator for each, aggregates the results, and performs Arrhenius analysis to find the activation energy for diffusion.
Einstein: The main factory function and user entry point. It automatically creates either an MSDCalculator or an MSDEnsemble object based on the input path (file or directory), simplifying the analysis workflow.
Typical Usage
The primary way to use this module is through the Einstein factory function:
from vachoppy import Einstein
# Analyze a single simulation at 1000 K
analyzer_single = Einstein(
path_traj='TRAJ_O_1000K.h5',
symbol='O',
skip=1.0 # Skip first 1 ps for equilibration
)
analyzer_single.calculate()
analyzer_single.summary()
# Analyze a multi-temperature ensemble
analyzer_ensemble = Einstein(
path_traj='path/to/all_trajectories/',
symbol='O',
skip=1.0
)
analyzer_ensemble.calculate()
analyzer_ensemble.plot_D() # Show Arrhenius plot
- vachoppy.einstein.Einstein(path_traj: str, symbol: str, skip: float = 0.0, segment_length: float | list[float] | None = None, start: float = 1.0, end: float | None = None, **kwargs) MSDEnsemble | MSDCalculator[source]
Creates an appropriate MSD analysis object based on the input path.
This factory function serves as the main user entry point for Einstein relation analysis. It intelligently selects the correct calculator class: - MSDEnsemble: If path_traj is a directory (for multi-temperature analysis). - MSDCalculator: If path_traj is a single file.
- Parameters:
path_traj (str) – Path to a directory containing HDF5 trajectory files or to a single HDF5 trajectory file.
symbol (str) – The chemical symbol of the target diffusing species to analyze.
skip (float, optional) – Initial time in picoseconds (ps) to skip for equilibration. Defaults to 0.0.
segment_length (float | list[float] | None, optional) – Time length in ps for statistical averaging segments. If a list, it must match the number of temperatures (for ensemble analysis). If None, the entire trajectory is used. Defaults to None.
start (float, optional) – Start time in picoseconds (ps) for the linear fitting range. Defaults to 1.0.
end (float | None, optional) – End time in picoseconds (ps) for the linear fitting range. If None, the end of the trajectory is used. Defaults to None.
**kwargs (Any) – Additional keyword arguments passed to MSDEnsemble or TrajectoryBundle (e.g., prefix, depth).
- Returns:
An initialized analysis object ready for calculation.
- Return type:
Union[MSDEnsemble, MSDCalculator]
- Raises:
FileNotFoundError – If the provided path_traj does not exist or is not a file/directory.
TypeError – If segment_length is a list when path_traj points to a single file.
Examples
>>> # Analyze an ensemble of trajectories in a directory >>> ensemble_analyzer = einstein_analyzer( ... path_traj='path/to/trajectories/', ... symbol='Li' ... ) >>> ensemble_analyzer.calculate()
>>> # Analyze a single trajectory file >>> single_analyzer = einstein_analyzer( ... path_traj='path/to/TRAJ_Li_1000K.h5', ... symbol='Li' ... ) >>> single_analyzer.calculate()
- class vachoppy.einstein.MSDCalculator(path_traj: str, symbol: str, skip: float = 0.0, segment_length: float | None = None, start: float = 1.0, end: float | None = None, verbose: bool = True)[source]
“Analyzes a trajectory to calculate Mean Squared Displacement and diffusivity.
This class reads a single HDF5 trajectory file for a specific atomic species to calculate its Mean Squared Displacement (MSD) and subsequently its diffusivity via the Einstein relation.
The primary workflow is to initialize the class and then call the .calculate() method. Results are stored as attributes and can be plotted using .plot_msd() or summarized with .summary(). The class is designed to be memory-efficient by discarding the large position array after the MSD calculation is complete.
- Parameters:
path_traj (str) – Path to the HDF5 trajectory file.
symbol (str) – The chemical symbol of the target diffusing species to analyze.
skip (float, optional) – Initial time in picoseconds (ps) to skip for thermal equilibration. Defaults to 0.0.
segment_length (float | None, optional) – Time length in picoseconds (ps) for each segment used in statistical averaging. If None, the entire trajectory after ‘skip’ is treated as a single segment. Defaults to None.
start (float, optional) – Start time in picoseconds (ps) for the linear fitting range of the MSD curve. Defaults to 1.0.
end (float | None, optional) – End time in picoseconds (ps) for the linear fitting range. If None, the end of the trajectory is used. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to True.
- msd
The calculated Mean Squared Displacement values in Ų.
- Type:
numpy.ndarray
- timestep
The time values in picoseconds (ps) corresponding to each MSD point.
- Type:
numpy.ndarray
- diffusivity
The calculated diffusivity in m²/s, derived from the linear fit.
- Type:
float
- intercept
The y-intercept of the linear fit of the MSD curve.
- Type:
float
- temperature
Temperature in Kelvin, read from the trajectory file’s metadata.
- Type:
float
- dt
Timestep in femtoseconds (fs), read from metadata.
- Type:
float
- nsw
Total number of steps in the trajectory, read from metadata.
- Type:
int
- lattice
The 3x3 lattice matrix, read from metadata.
- Type:
numpy.ndarray
- Raises:
FileNotFoundError – If the input trajectory file is not found.
ValueError – If the specified symbol is not present in the trajectory file, or if segment_length is too small.
Examples
>>> msd_calc = MSDCalculator( ... path_traj='TRAJ_Li_1000K.h5', ... symbol='Li', ... skip=10.0, ... start=5.0, ... end=20.0 ... ) >>> msd_calc.calculate() >>> msd_calc.summary() # Access results directly: # print(f"Calculated Diffusivity: {msd_calc.diffusivity:.3e} m^2/s")
- plot_msd(disp: bool = True, save: bool = True, filename: str = 'msd.png', dpi: int = 300, ax: Axes | None = None, **kwargs) Line2D[source]
Plots the calculated MSD versus time.
This method visualizes the MSD curve. If diffusivity has been calculated, it also overlays the linear fit and the fitting range boundaries. The plot can be displayed interactively, saved to a file, or drawn on an existing matplotlib Axes object for creating complex figures.
- Parameters:
disp (bool, optional) – If True, displays the plot interactively (plt.show()). Defaults to True.
save (bool, optional) – If True, saves the plot to a file. Defaults to True.
filename (str, optional) – The path and name of the file to save the plot. Defaults to ‘msd.png’.
dpi (int, optional) – The resolution in dots per inch for the saved figure. Defaults to 300.
ax (Axes | None, optional) – A matplotlib Axes object to plot on. If None, a new figure and axes are created automatically. This is useful for creating subplots. Defaults to None.
**kwargs – Additional keyword arguments to be passed directly to the ax.plot() function for the main MSD curve (e.g., label, color, linewidth).
- Returns:
The Line2D object for the plotted MSD curve, which can be used for further customization (e.g., adding to a legend).
- Return type:
Line2D
Examples
>>> # Simple plot, shown and saved as 'msd.png' >>> msd_calc.calculate() >>> msd_calc.plot_msd(label='My System')
>>> # Plotting on an existing subplot figure >>> import matplotlib.pyplot as plt >>> fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4)) >>> msd_calc_1.plot_msd(ax=ax1, disp=False, save=False, label='Run 1') >>> msd_calc_2.plot_msd(ax=ax2, disp=False, save=False, label='Run 2') >>> fig.suptitle("Comparison of MSD") >>> fig.tight_layout() >>> fig.savefig("msd_comparison.png")
- class vachoppy.einstein.MSDEnsemble(path_traj: str, symbol: str, skip: float = 0.0, segment_length: float | list[float] | None = None, start: float = 1.0, end: float | None = None, verbose: bool = True, **kwargs)[source]
Analyzes an ensemble of trajectories to find temperature-dependent diffusivity.
This class manages multiple MSDCalculator instances, typically for simulations run at different temperatures. It calculates the diffusivity for each trajectory (or temperature group) in parallel and then performs an Arrhenius fit to determine the activation energy (Ea_D) and pre-exponential factor (D0).
The main workflow is to initialize the class, call .calculate() to run the analysis, and then use plotting methods like .plot_msd() and .plot_D() or .summary() to view the results.
- Parameters:
path_traj (str) – Path to a directory containing HDF5 trajectory files.
symbol (str) – The chemical symbol of the target diffusing species to analyze.
skip (float, optional) – Initial time in picoseconds (ps) to skip for thermal equilibration. Defaults to 0.0.
segment_length (float | list[float] | None, optional) – Time length in ps for statistical averaging segments. Can be a single float for all temperatures or a list of floats corresponding to each temperature. If None, the entire trajectory is used. Defaults to None.
start (float, optional) – Start time in picoseconds (ps) for the linear fitting range of the MSD curve. Defaults to 1.0.
end (float | None, optional) – End time in picoseconds (ps) for the linear fitting range. If None, the end of each trajectory is used. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to True.
**kwargs (Any) – Additional keyword arguments passed to the TrajectoryBundle class for file discovery (e.g., prefix, depth).
- D
An array of the average diffusivity (m²/s) for each temperature.
- Type:
numpy.ndarray
- Ea_D
Activation energy (eV) from the Arrhenius fit.
- Type:
float
- D0
Pre-exponential factor (m²/s) from the Arrhenius fit.
- Type:
float
- R2
The R-squared value indicating the goodness of the Arrhenius fit.
- Type:
float
- calculators
A list of the successfully completed MSDCalculator instances.
- Type:
list[MSDCalculator]
- temperatures
An array of the unique temperatures found in the ensemble.
- Type:
numpy.ndarray
- Raises:
ValueError – If the length of segment_length (if provided as a list) does not match the number of unique temperatures found.
- calculate(n_jobs: int = -1, verbose=True) MSDEnsemble[source]
Runs the full analysis pipeline in parallel for all trajectories.
This method orchestrates the MSD calculation for each trajectory file, aggregates the results by temperature, and performs an Arrhenius fit if at least two temperatures are present.
- Parameters:
n_jobs (int, optional) – Number of CPU cores for parallel processing via joblib. -1 uses all available cores. Defaults to -1.
- Returns:
Returns the instance itself to allow for method chaining (e.g., ensemble.calculate().summary()).
- Return type:
- plot_D(disp: bool = True, save: bool = True, filename: str = 'D_atom.png', dpi: int = 300) None[source]
Plots the Arrhenius plot of diffusivity (D) vs. inverse temperature (1/T).
- Parameters:
disp (bool, optional) – If True, displays the plot interactively (plt.show()). Defaults to True.
save (bool, optional) – If True, saves the plot to a file. Defaults to True.
filename (str, optional) – The path and name of the file to save the plot. Defaults to ‘D_arrhenius.png’.
dpi (int, optional) – The resolution for the saved figure. Defaults to 300.
- plot_msd(disp: bool = True, save: bool = True, filename: str = 'msd.png', dpi: int = 300) None[source]
Plots temperature-averaged MSD curves for each temperature.
This method visualizes the averaged MSD curve for each unique temperature in the ensemble. The linear fit for each curve is also shown.
- Parameters:
disp (bool, optional) – If True, displays the plot interactively (plt.show()). Defaults to True.
save (bool, optional) – If True, saves the plot to a file. Defaults to True.
filename (str, optional) – The path and name of the file to save the plot. Defaults to ‘msd_ensemble.png’.
dpi (int, optional) – The resolution in dots per inch for the saved figure. Defaults to 300.
- save_parameters(filename: str = 'einstein.json') None[source]
Saves calculated diffusivities and Arrhenius parameters to a JSON file.
The output JSON is self-documenting, including a ‘description’ key that explains each saved parameter. NumPy arrays are converted to lists for JSON compatibility.
- Parameters:
filename (str, optional) – The name of the output JSON file. Defaults to “einstein_ensemble.json”.
- Raises:
RuntimeError – If .calculate() has not been run successfully.
Examples
>>> ensemble = MSDEnsemble(...) >>> ensemble.calculate() >>> ensemble.save_parameters("results.json")