Hopping Parameter Extraction
You can download source files to follow this tutorial from this link (26 GB).
Note
This is the same example set used in the Hopping Parameter Extraction or Mean Square Displacement section in CLI Tutorial documentation. If you have already completed that tutorial, you do not need to download the files again.
Calculating effective hopping parameters relies on the Calculator() factory function and the Site class. For a detailed explanation of these core components, please refer to the previous tutorial.
First, navigate to the Example3 directory you downloaded. Inside, you will find three items: the TRAJ_TiO2 directory, a POSCAR_TiO2 file, and a neb.csv file.
The TRAJ_TiO2 directory contains a thermal ensemble of HDF5 trajectory files from MD simulations performed at various temperatures. This example includes simulations from 1700 K to 2100 K, with 20 independent runs at each temperature.
The file structure is as follows:
TRAJ_TiO2/
├── TRAJ_1700K/
│ ├── TRAJ_O_01.h5
│ ├── ...
│ └── TRAJ_O_20.h5
├── TRAJ_1800K/
│ ├── TRAJ_O_01.h5
│ ├── ...
│ └── TRAJ_O_20.h5
├── TRAJ_1900K/
│ ├── TRAJ_O_01.h5
│ ├── ...
│ └── TRAJ_O_20.h5
├── TRAJ_2000K/
│ ├── TRAJ_O_01.h5
│ ├── ...
│ └── TRAJ_O_20.h5
└── TRAJ_2100K/
├── TRAJ_O_01.h5
├── ...
└── TRAJ_O_20.h5
A key advantage of VacHopPy is its ability to process a large number of trajectories from multiple NVT ensembles simultaneously and in a memory-efficient way.
import os
import numpy as np
import pandas as pd
from vachoppy.core import Site, Calculator
path_traj = 'TRAJ_TiO2'
path_structure = 'POSCAR_TiO2'
if not os.path.exists(path_traj): print(f"{path_traj} not found.")
if not os.path.exists(path_structure): print(f"{path_structure} not found.")
Usage Example
- Creating the Site and CalculatorEnsemble Instances
The first step in our analysis is to define the crystal’s structural framework by creating a Site instance. This object analyzes a perfect, vacancy-free supercell to identify all lattice sites and hopping paths.
Next, we pass this Site object to the Calculator() factory function. This function automatically processes all trajectories found in path_traj. A key feature is its ability to automatically determine an optimal t_interval.
site = Site(path_structure, 'O')
calc_ensemble = Calculator(path_traj, site)
====================================================================
Automatic t_interval Estimation
====================================================================
[1700.0 K] Estimating from TRAJ_1700K/TRAJ_O_01.h5
-> t_interval : 0.075 ps
[1800.0 K] Estimating from TRAJ_1800K/TRAJ_O_01.h5
-> t_interval : 0.075 ps
[1900.0 K] Estimating from TRAJ_1900K/TRAJ_O_01.h5
-> t_interval : 0.075 ps
[2000.0 K] Estimating from TRAJ_2000K/TRAJ_O_01.h5
-> t_interval : 0.075 ps
[2100.0 K] Estimating from TRAJ_2100K/TRAJ_O_01.h5
-> t_interval : 0.075 ps
====================================================================
Adjusting t_interval to the nearest multiple of dt
====================================================================
- dt : 0.0020 ps
- Original t_interval : 0.0750 ps
- Adjusted t_interval : 0.0740 ps (37 frames)
====================================================================
When path_traj points to a directory, Calculator() automatically discovers HDF5 files within it. By default, this search extends down to two subdirectory levels (a depth of 2). You can control this search depth using the depth argument if your file structure is different.
- Calculating Hopping Parameters
Once the CalculatorEnsemble is configured, you can initiate the main analysis by calling the .calculate() method.
This method is highly optimized for performance and is designed to handle very large datasets.
Memory Efficiency
It processes trajectories in a streaming fashion, loading data in small chunks to minimize RAM usage.
Speed
It leverages parallel processing to perform computations on multiple CPU cores simultaneously, significantly speeding up the analysis. You can control the number of CPU cores used for this parallel computation with the
n_jobsargument. Settingn_jobs=-1will use all available cores.
calc_ensemble.calculate()
Analysis complete: 100 successful, 0 failed.
Execution Time: 16.900 seconds
Peak RAM Usage: 1.153 GB
This method processes all HDF5 files found within the path_traj directory, calculating the vacancy hopping properties for each one. The results from each individual file are then stored in the .calculators attribute as a list of CalculatorSingle objects.
You can inspect the first few results to see which files were processed:
print(f"Number of Data: {len(calc_ensemble.calculators)}\n")
# Display the file paths for the first three results
for i, calc in enumerate(calc_ensemble.calculators[:3]):
print(f"[Index {i}] : {calc.path_traj}")
Number of Data: 100
[Index 0] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5
[Index 1] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_02.h5
[Index 2] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_03.h5
You can now view the calculated hopping parameters for each trajectory by calling the .summary() method.
calc_ensemble.summary()
====================================================================
Summary for Trajectory dataset
- Path to TRAJ bundle : TRAJ_TiO2 (depth=2)
- Lattice structure : POSCAR_TiO2
- t_interval : 0.074 ps (37 frames)
- Temperatures (K) : [1700.0, 1800.0, 1900.0, 2000.0, 2100.0]
- Num. of TRAJ files : [20, 20, 20, 20, 20]
====================================================================
==================== Temperature-Dependent Data ====================
T (K) D (m2/s) D_rand (m2/s) f tau (ps) a (Ang)
------- ---------- --------------- ------ ---------- ---------
1700 4.176e-10 6.413e-10 0.6511 19.092 2.7104
1800 6.097e-10 9.179e-10 0.6642 13.3759 2.7142
1900 8.228e-10 1.249e-09 0.6589 9.938 2.7287
2000 1.099e-09 1.663e-09 0.661 7.4301 2.7227
2100 1.48e-09 2.262e-09 0.6543 5.4597 2.7224
====================================================================
===================== Final Fitted Parameters ======================
Diffusivity (D):
- Ea : 0.961 eV
- D0 : 2.944e-07 m^2/s
- R-squared : 0.9991
Random Walk Diffusivity (D_rand):
- Ea : 0.958 eV
- D0 : 4.413e-07 m^2/s
- R-squared : 0.9988
Correlation Factor (f):
- Ea : 0.002 eV
- f0 : 0.667
- R-squared : 0.0218
Residence Time (tau):
- Ea (fixed) : 0.958 eV
- tau0 : 2.777e-02 ps
- R-squared : 0.9987
====================================================================
- Plotting the Hopping Parameters
VacHopPy includes a suite of convenient methods to visualize the calculated hopping parameters as a function of temperature.
The following table summarizes the available plotting functions:
Method |
Description |
|---|---|
|
Generates an Arrhenius plot of the calculated diffusivity (\(D\)). |
|
Creates an Arrhenius plot for the random walk diffusivity (\(D_{rand}\)). |
|
Plots the correlation factor (\(f\)) as a function of temperature. |
|
Visualizes the vacancy residence time (\(\tau\)) across different temperatures. |
|
Shows the average hopping distance (\(a\)) as a function of temperature. |
|
Displays the total count for each unique type of hopping path. |
calc_ensemble.plot_D()
Example output:
- Saving the Hopping Parameters
Once your analysis is complete, you can save all the calculated effective hopping parameters to a file using the .save_parameters() method. This command exports the results into a parameters.json file, making it easy to reload or analyze them later without re-running the entire calculation.
calc_ensemble.save_parameters()
Parameters saved to 'parameters.json'
- Calculating the Attempt Frequency
To calculate the effective attempt frequency (\(\nu\)) and the effective coordination number (\(z\)), VacHopPy requires the pre-calculated energy barriers for each unique type of hopping path.
This information must be provided in a CSV file. Each row in this file should correspond to a specific hopping path and contain its calculated hopping barrier, which is typically obtained from methods like the nudged elastic band (NEB).
Below is an example of the CSV file:
neb_csv = 'neb_TiO2.csv'
if not os.path.exists(neb_csv):
print(f"'{neb_csv}' not found.")
else:
data = pd.read_csv(neb_csv)
print(data)
A1 0.8698
0 A2 1.058
1 A3 1.766
This calculation is performed by calling the .calculate_attempt_frequency() method. Simply provide the path to the CSV file containing the hopping barriers as an argument.
Upon completion, the method will print a summary of the calculation. Additionally, if a parameters.json file from a previous run exists, this method will automatically update it by adding the newly calculated attempt frequency and coordination number.
calc_ensemble.calculate_attempt_frequency(neb_csv)
Parameters saved to 'parameters.json'
====================================================================
Attempt Frequency Analysis Summary
====================================================================
-- Temperature-Dependent Effective Parameters --
Temp (K) nu (THz) z
---------- ---------- ------
1700.0 6.1675 5.8920
1800.0 6.0200 5.9904
1900.0 5.7611 6.0862
2000.0 5.6642 6.1788
2100.0 5.8308 6.2682
-------- -------- -
Mean 5.8887 6.0831
-- Path-Wise Parameters (per Temperature) --
Temperature: 1700.0 K
Path Name Hop Count nu_path (THz)
----------- ----------- ---------------
A1 167 7.8923
A2 252 5.3794
A3 1 10.7228
----------------------------------------
Temperature: 1800.0 K
Path Name Hop Count nu_path (THz)
----------- ----------- ---------------
A1 171 7.7409
A2 279 5.3119
A3 0 0
----------------------------------------
Temperature: 1900.0 K
Path Name Hop Count nu_path (THz)
----------- ----------- ---------------
A1 161 6.5072
A2 344 5.4858
A3 0 0
----------------------------------------
Temperature: 2000.0 K
Path Name Hop Count nu_path (THz)
----------- ----------- ---------------
A1 187 7.2352
A2 353 5.0879
A3 1 3.5067
----------------------------------------
Temperature: 2100.0 K
Path Name Hop Count nu_path (THz)
----------- ----------- ---------------
A1 191 7.7369
A2 362 5.1858
A3 0 0
----------------------------------------
====================================================================
NOTE: nu_paths can be unreliable for paths with low hop counts,
as they are sensitive to statistical noise.
====================================================================
After running the .calculate_attempt_frequency() method, you can visualize its primary results using two newly available plotting functions:
Method |
Description |
|---|---|
|
Plots the attempt frequency (\(\nu\)) as a function of temperature. |
|
Plots the coordination number (\(z\)) as a function of temperature. |
calc_ensemble.plot_nu()
Example output:
List of Extractable Parameters
The specific effective hopping parameters that VacHopPy can calculate depend entirely on the data you provide. There are two factors:
Number of Temperatures
Do your HDF5 files represent a single temperature or a range of temperatures?
NEB_CSV
Have you provided a
NEB_CSVfile with pre-calculated hopping barriers?
The following table summarizes what can be calculated under different conditions. (O = Calculable, X = Not Calculable).
Parameter |
Symbol |
Single T |
Multiple T |
|
|---|---|---|---|---|
Diffusivity |
\(D\) |
O |
O |
No |
Residence Time |
\(\tau\) |
O |
O |
No |
Correlation Factor |
\(f\) |
O |
O |
No |
Hopping Barrier |
\(E_a\) |
X |
O |
No |
Hopping Distance |
\(a\) |
O |
O |
No |
Coordination Number |
\(z\) |
X |
O |
Yes |
Attempt Frequency |
\(\nu\) |
O |
O |
Yes |
To demonstrate this, we can run the analysis on a subset of the data corresponding to a single temperature (2100 K).
# Create an analysis object for only the 2100 K simulations
# A single HDF5 also can be used (e.g., TRAJ_TiO2/TRAJ_2100K/TRAJ_O_01.h5)
path_single_temp = os.path.join('TRAJ_TiO2', 'TRAJ_2100K')
calc_ensemble_single = Calculator(path_single_temp, site)
calc_ensemble_single.calculate()
====================================================================
Automatic t_interval Estimation
====================================================================
[2100.0 K] Estimating from TRAJ_2100K/TRAJ_O_01.h5
-> t_interval : 0.075 ps
====================================================================
Adjusting t_interval to the nearest multiple of dt
====================================================================
- dt : 0.0020 ps
- Original t_interval : 0.0751 ps
- Adjusted t_interval : 0.0760 ps (38 frames)
====================================================================
Analysis complete: 20 successful, 0 failed.
Execution Time: 2.088 seconds
Peak RAM Usage: 0.136 GB
# Display the results
calc_ensemble_single.summary()
====================================================================
Summary for Trajectory dataset
- Path to TRAJ bundle : TRAJ_TiO2/TRAJ_2100K (depth=2)
- Lattice structure : POSCAR_TiO2
- t_interval : 0.076 ps (38 frames)
- Temperatures (K) : [2100.0]
- Num. of TRAJ files : [20]
====================================================================
==================== Temperature-Dependent Data ====================
T (K) D (m2/s) D_rand (m2/s) f tau (ps) a (Ang)
------- ---------- --------------- ------ ---------- ---------
2100 1.481e-09 2.254e-09 0.6568 5.4786 2.7221
====================================================================
===================== Final Fitted Parameters ======================
Diffusivity (D):
- Ea : -
- D0 : -
- R-squared : -
Random Walk Diffusivity (D_rand):
- Ea : -
- D0 : -
- R-squared : -
Correlation Factor (f):
- Ea : -
- f0 : -
- R-squared : -
Residence Time (tau):
- Ea (fixed) : -
- tau0 : -
- R-squared : -
====================================================================
As you can see in the .summary() output, VacHopPy successfully calculated all the hopping parameters available for the single temperature of 2100 K.
However, the Final Fitted Parameters section is empty. This is because parameters like the activation energy (\(E_a\)) are determined by performing an Arrhenius fit across a range of temperatures. Since we only provided data for a single temperature, this fitting is not possible.
Next, we will calculate the attempt frequency and coordination number by calling the .calculate_attempt_frequency() method.
calc_ensemble_single.calculate_attempt_frequency(neb_csv)
Parameters saved to 'parameters.json'
====================================================================
Attempt Frequency Analysis Summary
====================================================================
-- Temperature-Dependent Effective Parameters --
Temp (K) nu (THz) z
---------- ---------- ---
2100 5.8106 nan
-- Path-Wise Parameters (per Temperature) --
Temperature: 2100.0 K
Path Name Hop Count nu_path (THz)
----------- ----------- ---------------
A1 191 7.7382
A2 360 5.158
A3 0 0
----------------------------------------
====================================================================
NOTE: nu_paths can be unreliable for paths with low hop counts,
as they are sensitive to statistical noise.
====================================================================
The summary shows that the effective attempt frequency (\(\nu\)) for 2100 K has been successfully calculated. However, just like the activation energy, the effective coordination number (\(z\)) is reported as nan (Not a Number). This is because calculating \(z\) also requires fitting data across a range of temperatures.
Accessing Individual Trajectory Data
While the CalculatorEnsemble object provides results aggregated over the entire bundle of files, you may want to inspect the results from a specific, individual trajectory. This is where the .calculators attribute comes in.
This attribute is a list of CalculatorSingle objects, each containing the detailed analysis for one HDF5 file. Let’s select the first trajectory from our ensemble for a closer look.
calc_single = calc_ensemble.calculators[0]
print(f"Selected HDF5 File : {calc_single.path_traj}")
Selected HDF5 File : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5
Just like its ensemble counterpart, the CalculatorSingle object has its own .calculate() and .summary() methods. Let’s run the analysis and view the results for this single file.
calc_single.calculate()
calc_single.summary()
============================================================
Summary for Trajectory dataset
- Path to TRAJ file : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5
- Lattice structure : POSCAR_TiO2
- t_interval : 0.074 ps (37 frames)
- Temperatures (K) : [1700.0]
- Num. of TRAJ files : [1]
============================================================
================ Temperature-Dependent Data ================
Temp (K) D (m2/s) D_rand (m2/s) f tau (ps)
---------- ---------- --------------- ------ ----------
1700 4.071e-10 7.015e-10 0.5803 17.4318
============================================================
================= Final Fitted Parameters ==================
Diffusivity (D):
- Ea : -
- D0 : -
- R-squared : -
Random Walk Diffusivity (D_rand):
- Ea : -
- D0 : -
- R-squared : -
Correlation Factor (f):
- Ea : -
- f0 : -
- R-squared : -
Residence Time (tau):
- Ea (fixed) : -
- tau0 : -
- R-squared : -
Effective Hopping Distance (a) : -
============================================================
The real power of a CalculatorSingle object is the ability to inspect the fine-grained details of every hop.
You can see not just how many hops occurred for each path type, but also the chronological sequence of those events.
First, let’s see the total count for each type of hopping path for each vacancy:
path_names = calc_single.site.path_name
for i, counts in enumerate(calc_single.counts):
print(f"[Path Counts (Vacancy {i})]")
for name, count in zip(path_names, counts):
print(f" {name}: {int(count)}")
print('')
[Path Counts (Vacancy 0)]
A1: 10
A2: 12
A3: 1
Next, we can view the chronological sequence of these hops using the .hopping_history attribute:
for i, history in enumerate(calc_single.hopping_history):
print(f"[Hopping Sequence (Vacancy {i})]")
for hop in history:
print(f" Step {hop['step']:<4d}: {hop['name']}")
[Hopping Sequence (Vacancy 0)]
Step 368 : A2
Step 584 : A2
Step 797 : A1
... (rest of sequence) ...
While .hopping_history gives you the raw data, it’s often more useful to see a formatted summary. For this, VacHopPy provides the convenient .show_hopping_history() method.
calc_single.show_hopping_history()
====================================================================================================================
Hopping Sequence of Vacancy 0
====================================================================================================================
Num Time (ps) Path a (Ang) Initial Site (Fractional Coordinate) Final Site (Fractional Coordinate)
----- ----------- ------ --------- -------------------------------------- ------------------------------------
1 27.23 A2 2.80308 site1 [0.65249, 0.15249, 0.33333] site1 [0.59751, 0.40249, 0.16667]
2 43.22 A2 2.80308 site1 [0.59751, 0.40249, 0.16667] site1 [0.65249, 0.65249, 0.33333]
3 58.98 A1 2.56255 site1 [0.65249, 0.65249, 0.33333] site1 [0.84751, 0.84751, 0.33333]
... (rest of table) ...
====================================================================================================================
Advanced Features
Assessing Data Reliability
Setting the argument error_bar=True will display error bars on the plots. The argument can be applied to plot_D(), plot_D_rand(), plot_f(), and plot_tau() methods. The error bars represent the standard error of the mean (SEM), which allows for an assessment of the data’s reliability. It is calculated as follows:
Here, \(\sigma\) represents the standard deviation of the physical quantity at each temperature, and \(n\) is the number of data points (i.e., the number of HDF5 files used for that temperature).
calc_ensemble.plot_D(error_bar=True)
Example output:
Decomposing Vacancy Diffusivity
You can decompose the total diffusivity (\(D\)) into its x, y, and z components (\(D_x\), \(D_y\), \(D_z\)) at each temperature using the decompose_diffusivity() method. This functionality is particularly useful for analyzing anisotropic systems where diffusion varies by direction, allowing you to calculate the specific diffusion barrier (\(E_a\)) for each axis.
Note that obtaining statistically reliable directional results may require more simulation data than calculating the total diffusivity alone.
calc_ensemble.decompose_diffusivity()
=========== Temperature-Dependent Decomposed Diffusivity ===========
T (K) Dx (m2/s) Dy (m2/s) Dz (m2/s)
------- ----------- ----------- -----------
1700 4.729e-10 5.324e-10 2.475e-10
1800 6.673e-10 7.123e-10 4.496e-10
1900 1.144e-09 6.331e-10 6.915e-10
2000 1.25e-09 1.497e-09 5.507e-10
2100 2.069e-09 1.525e-09 8.47e-10
====================================================================
=================== Decomposed Diffusivity Fits ====================
Directional Diffusivity (Dx):
- Ea : 1.101 eV
- D0 : 8.523e-07 m^2/s
- R-squared : 0.9710
Directional Diffusivity (Dy):
- Ea : 0.867 eV
- D0 : 1.818e-07 m^2/s
- R-squared : 0.8022
Directional Diffusivity (Dz):
- Ea : 0.834 eV
- D0 : 8.645e-08 m^2/s
- R-squared : 0.8231
====================================================================
The plot_msd_xyz() and plot_D_xyz() methods can be used to obtain component-wise Mean Squared Displacement (MSD) and Arrhenius plots for diffusivity, respectively.
calc_ensemble.plot_msd_xyz()
Example output:
calc_ensemble.plot_D_xyz()
Example output: