Hopping Parameter Extraction

You can download source files to follow this tutorial from this link (26 GB).

Note

This is the same example set used in the Hopping Parameter Extraction or Mean Square Displacement section in CLI Tutorial documentation. If you have already completed that tutorial, you do not need to download the files again.


Calculating effective hopping parameters relies on the Calculator() factory function and the Site class. For a detailed explanation of these core components, please refer to the previous tutorial.

First, navigate to the Example3 directory you downloaded. Inside, you will find three items: the TRAJ_TiO2 directory, a POSCAR_TiO2 file, and a neb.csv file.

The TRAJ_TiO2 directory contains a thermal ensemble of HDF5 trajectory files from MD simulations performed at various temperatures. This example includes simulations from 1700 K to 2100 K, with 20 independent runs at each temperature.

The file structure is as follows:

TRAJ_TiO2/
├── TRAJ_1700K/
│   ├── TRAJ_O_01.h5
│   ├──    ...
│   └── TRAJ_O_20.h5
├── TRAJ_1800K/
│   ├── TRAJ_O_01.h5
│   ├──    ...
│   └── TRAJ_O_20.h5
├── TRAJ_1900K/
│   ├── TRAJ_O_01.h5
│   ├──    ...
│   └── TRAJ_O_20.h5
├── TRAJ_2000K/
│   ├── TRAJ_O_01.h5
│   ├──    ...
│   └── TRAJ_O_20.h5
└── TRAJ_2100K/
    ├── TRAJ_O_01.h5
    ├──    ...
    └── TRAJ_O_20.h5

A key advantage of VacHopPy is its ability to process a large number of trajectories from multiple NVT ensembles simultaneously and in a memory-efficient way.

import os
import numpy as np
import pandas as pd
from vachoppy.core import Site, Calculator

path_traj = 'TRAJ_TiO2'
path_structure = 'POSCAR_TiO2'
if not os.path.exists(path_traj): print(f"{path_traj} not found.")
if not os.path.exists(path_structure): print(f"{path_structure} not found.")

Usage Example

- Creating the Site and CalculatorEnsemble Instances

The first step in our analysis is to define the crystal’s structural framework by creating a Site instance. This object analyzes a perfect, vacancy-free supercell to identify all lattice sites and hopping paths.

Next, we pass this Site object to the Calculator() factory function. This function automatically processes all trajectories found in path_traj. A key feature is its ability to automatically determine an optimal t_interval.

site = Site(path_structure, 'O')
calc_ensemble = Calculator(path_traj, site)
====================================================================
                   Automatic t_interval Estimation
====================================================================
  [1700.0 K] Estimating from TRAJ_1700K/TRAJ_O_01.h5
             -> t_interval : 0.075 ps
  [1800.0 K] Estimating from TRAJ_1800K/TRAJ_O_01.h5
             -> t_interval : 0.075 ps
  [1900.0 K] Estimating from TRAJ_1900K/TRAJ_O_01.h5
             -> t_interval : 0.075 ps
  [2000.0 K] Estimating from TRAJ_2000K/TRAJ_O_01.h5
             -> t_interval : 0.075 ps
  [2100.0 K] Estimating from TRAJ_2100K/TRAJ_O_01.h5
             -> t_interval : 0.075 ps
====================================================================
        Adjusting t_interval to the nearest multiple of dt
====================================================================
    - dt                  : 0.0020 ps
    - Original t_interval : 0.0750 ps
    - Adjusted t_interval : 0.0740 ps (37 frames)
====================================================================

When path_traj points to a directory, Calculator() automatically discovers HDF5 files within it. By default, this search extends down to two subdirectory levels (a depth of 2). You can control this search depth using the depth argument if your file structure is different.


- Calculating Hopping Parameters

Once the CalculatorEnsemble is configured, you can initiate the main analysis by calling the .calculate() method.

This method is highly optimized for performance and is designed to handle very large datasets.

  • Memory Efficiency

    It processes trajectories in a streaming fashion, loading data in small chunks to minimize RAM usage.

  • Speed

    It leverages parallel processing to perform computations on multiple CPU cores simultaneously, significantly speeding up the analysis. You can control the number of CPU cores used for this parallel computation with the n_jobs argument. Setting n_jobs=-1 will use all available cores.

calc_ensemble.calculate()
Analysis complete: 100 successful, 0 failed.
Execution Time: 16.900 seconds
Peak RAM Usage: 1.153 GB

This method processes all HDF5 files found within the path_traj directory, calculating the vacancy hopping properties for each one. The results from each individual file are then stored in the .calculators attribute as a list of CalculatorSingle objects.

You can inspect the first few results to see which files were processed:

print(f"Number of Data: {len(calc_ensemble.calculators)}\n")

# Display the file paths for the first three results
for i, calc in enumerate(calc_ensemble.calculators[:3]):
    print(f"[Index {i}] : {calc.path_traj}")
Number of Data: 100

[Index 0] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5
[Index 1] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_02.h5
[Index 2] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_03.h5

You can now view the calculated hopping parameters for each trajectory by calling the .summary() method.

calc_ensemble.summary()
====================================================================
Summary for Trajectory dataset
  - Path to TRAJ bundle : TRAJ_TiO2 (depth=2)
  - Lattice structure   : POSCAR_TiO2
  - t_interval          : 0.074 ps (37 frames)
  - Temperatures (K)    : [1700.0, 1800.0, 1900.0, 2000.0, 2100.0]
  - Num. of TRAJ files  : [20, 20, 20, 20, 20]
====================================================================

==================== Temperature-Dependent Data ====================
T (K)    D (m2/s)    D_rand (m2/s)    f       tau (ps)    a (Ang)
-------  ----------  ---------------  ------  ----------  ---------
1700     4.176e-10   6.413e-10        0.6511  19.092      2.7104
1800     6.097e-10   9.179e-10        0.6642  13.3759     2.7142
1900     8.228e-10   1.249e-09        0.6589  9.938       2.7287
2000     1.099e-09   1.663e-09        0.661   7.4301      2.7227
2100     1.48e-09    2.262e-09        0.6543  5.4597      2.7224
====================================================================

===================== Final Fitted Parameters ======================
Diffusivity (D):
  - Ea          : 0.961 eV
  - D0          : 2.944e-07 m^2/s
  - R-squared   : 0.9991
Random Walk Diffusivity (D_rand):
  - Ea          : 0.958 eV
  - D0          : 4.413e-07 m^2/s
  - R-squared   : 0.9988
Correlation Factor (f):
  - Ea          : 0.002 eV
  - f0          : 0.667
  - R-squared   : 0.0218
Residence Time (tau):
  - Ea (fixed)  : 0.958 eV
  - tau0        : 2.777e-02 ps
  - R-squared   : 0.9987
====================================================================

- Plotting the Hopping Parameters

VacHopPy includes a suite of convenient methods to visualize the calculated hopping parameters as a function of temperature.

The following table summarizes the available plotting functions:

Method

Description

plot_D()

Generates an Arrhenius plot of the calculated diffusivity (\(D\)).

plot_D_rand()

Creates an Arrhenius plot for the random walk diffusivity (\(D_{rand}\)).

plot_f()

Plots the correlation factor (\(f\)) as a function of temperature.

plot_tau()

Visualizes the vacancy residence time (\(\tau\)) across different temperatures.

plot_a()

Shows the average hopping distance (\(a\)) as a function of temperature.

plot_counts()

Displays the total count for each unique type of hopping path.

calc_ensemble.plot_D()

Example output:

Image

- Saving the Hopping Parameters

Once your analysis is complete, you can save all the calculated effective hopping parameters to a file using the .save_parameters() method. This command exports the results into a parameters.json file, making it easy to reload or analyze them later without re-running the entire calculation.

calc_ensemble.save_parameters()
Parameters saved to 'parameters.json'

- Calculating the Attempt Frequency

To calculate the effective attempt frequency (\(\nu\)) and the effective coordination number (\(z\)), VacHopPy requires the pre-calculated energy barriers for each unique type of hopping path.

This information must be provided in a CSV file. Each row in this file should correspond to a specific hopping path and contain its calculated hopping barrier, which is typically obtained from methods like the nudged elastic band (NEB).

Below is an example of the CSV file:

neb_csv = 'neb_TiO2.csv'
if not os.path.exists(neb_csv):
    print(f"'{neb_csv}' not found.")
else:
    data = pd.read_csv(neb_csv)
    print(data)
   A1  0.8698
0  A2   1.058
1  A3   1.766

This calculation is performed by calling the .calculate_attempt_frequency() method. Simply provide the path to the CSV file containing the hopping barriers as an argument.

Upon completion, the method will print a summary of the calculation. Additionally, if a parameters.json file from a previous run exists, this method will automatically update it by adding the newly calculated attempt frequency and coordination number.

calc_ensemble.calculate_attempt_frequency(neb_csv)
Parameters saved to 'parameters.json'
====================================================================
                Attempt Frequency Analysis Summary
====================================================================

-- Temperature-Dependent Effective Parameters --

Temp (K)    nu (THz)    z
----------  ----------  ------
1700.0      6.1675      5.8920
1800.0      6.0200      5.9904
1900.0      5.7611      6.0862
2000.0      5.6642      6.1788
2100.0      5.8308      6.2682
--------    --------    -
Mean        5.8887      6.0831

-- Path-Wise Parameters (per Temperature) --

Temperature: 1700.0 K
Path Name    Hop Count    nu_path (THz)
-----------  -----------  ---------------
A1           167          7.8923
A2           252          5.3794
A3           1            10.7228
----------------------------------------
Temperature: 1800.0 K
Path Name    Hop Count    nu_path (THz)
-----------  -----------  ---------------
A1           171          7.7409
A2           279          5.3119
A3           0            0
----------------------------------------
Temperature: 1900.0 K
Path Name    Hop Count    nu_path (THz)
-----------  -----------  ---------------
A1           161          6.5072
A2           344          5.4858
A3           0            0
----------------------------------------
Temperature: 2000.0 K
Path Name    Hop Count    nu_path (THz)
-----------  -----------  ---------------
A1           187          7.2352
A2           353          5.0879
A3           1            3.5067
----------------------------------------
Temperature: 2100.0 K
Path Name    Hop Count    nu_path (THz)
-----------  -----------  ---------------
A1           191          7.7369
A2           362          5.1858
A3           0            0
----------------------------------------
====================================================================
NOTE: nu_paths can be unreliable for paths with low hop counts,
      as they are sensitive to statistical noise.
====================================================================

After running the .calculate_attempt_frequency() method, you can visualize its primary results using two newly available plotting functions:

Method

Description

plot_nu()

Plots the attempt frequency (\(\nu\)) as a function of temperature.

plot_z()

Plots the coordination number (\(z\)) as a function of temperature.

calc_ensemble.plot_nu()

Example output:

Image

List of Extractable Parameters

The specific effective hopping parameters that VacHopPy can calculate depend entirely on the data you provide. There are two factors:

  • Number of Temperatures

    Do your HDF5 files represent a single temperature or a range of temperatures?

  • NEB_CSV

    Have you provided a NEB_CSV file with pre-calculated hopping barriers?

The following table summarizes what can be calculated under different conditions. (O = Calculable, X = Not Calculable).

Parameter

Symbol

Single T

Multiple T

NEB_CSV Required?

Diffusivity

\(D\)

O

O

No

Residence Time

\(\tau\)

O

O

No

Correlation Factor

\(f\)

O

O

No

Hopping Barrier

\(E_a\)

X

O

No

Hopping Distance

\(a\)

O

O

No

Coordination Number

\(z\)

X

O

Yes

Attempt Frequency

\(\nu\)

O

O

Yes

To demonstrate this, we can run the analysis on a subset of the data corresponding to a single temperature (2100 K).

# Create an analysis object for only the 2100 K simulations
# A single HDF5 also can be used (e.g., TRAJ_TiO2/TRAJ_2100K/TRAJ_O_01.h5)
path_single_temp = os.path.join('TRAJ_TiO2', 'TRAJ_2100K')

calc_ensemble_single = Calculator(path_single_temp, site)
calc_ensemble_single.calculate()
====================================================================
                   Automatic t_interval Estimation
====================================================================
  [2100.0 K] Estimating from TRAJ_2100K/TRAJ_O_01.h5
             -> t_interval : 0.075 ps
====================================================================
        Adjusting t_interval to the nearest multiple of dt
====================================================================
    - dt                  : 0.0020 ps
    - Original t_interval : 0.0751 ps
    - Adjusted t_interval : 0.0760 ps (38 frames)
====================================================================
Analysis complete: 20 successful, 0 failed.
Execution Time: 2.088 seconds
Peak RAM Usage: 0.136 GB
# Display the results
calc_ensemble_single.summary()
====================================================================
Summary for Trajectory dataset
  - Path to TRAJ bundle : TRAJ_TiO2/TRAJ_2100K (depth=2)
  - Lattice structure   : POSCAR_TiO2
  - t_interval          : 0.076 ps (38 frames)
  - Temperatures (K)    : [2100.0]
  - Num. of TRAJ files  : [20]
====================================================================

==================== Temperature-Dependent Data ====================
T (K)    D (m2/s)    D_rand (m2/s)    f       tau (ps)    a (Ang)
-------  ----------  ---------------  ------  ----------  ---------
2100     1.481e-09   2.254e-09        0.6568  5.4786      2.7221
====================================================================

===================== Final Fitted Parameters ======================
Diffusivity (D):
  - Ea          : - 
  - D0          : - 
  - R-squared   : -
Random Walk Diffusivity (D_rand):
  - Ea          : -
  - D0          : -
  - R-squared   : -
Correlation Factor (f):
  - Ea          : -
  - f0          : -
  - R-squared   : -
Residence Time (tau):
  - Ea (fixed)  : -
  - tau0        : -
  - R-squared   : -
====================================================================

As you can see in the .summary() output, VacHopPy successfully calculated all the hopping parameters available for the single temperature of 2100 K.

However, the Final Fitted Parameters section is empty. This is because parameters like the activation energy (\(E_a\)) are determined by performing an Arrhenius fit across a range of temperatures. Since we only provided data for a single temperature, this fitting is not possible.

Next, we will calculate the attempt frequency and coordination number by calling the .calculate_attempt_frequency() method.

calc_ensemble_single.calculate_attempt_frequency(neb_csv)
Parameters saved to 'parameters.json'
====================================================================
                Attempt Frequency Analysis Summary
====================================================================

-- Temperature-Dependent Effective Parameters --

Temp (K)    nu (THz)    z
----------  ----------  ---
2100        5.8106      nan

-- Path-Wise Parameters (per Temperature) --

Temperature: 2100.0 K
Path Name    Hop Count    nu_path (THz)
-----------  -----------  ---------------
A1           191          7.7382
A2           360          5.158
A3           0            0
----------------------------------------
====================================================================
NOTE: nu_paths can be unreliable for paths with low hop counts,
      as they are sensitive to statistical noise.
====================================================================

The summary shows that the effective attempt frequency (\(\nu\)) for 2100 K has been successfully calculated. However, just like the activation energy, the effective coordination number (\(z\)) is reported as nan (Not a Number). This is because calculating \(z\) also requires fitting data across a range of temperatures.


Accessing Individual Trajectory Data

While the CalculatorEnsemble object provides results aggregated over the entire bundle of files, you may want to inspect the results from a specific, individual trajectory. This is where the .calculators attribute comes in.

This attribute is a list of CalculatorSingle objects, each containing the detailed analysis for one HDF5 file. Let’s select the first trajectory from our ensemble for a closer look.

calc_single = calc_ensemble.calculators[0]
print(f"Selected HDF5 File : {calc_single.path_traj}")
Selected HDF5 File : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5

Just like its ensemble counterpart, the CalculatorSingle object has its own .calculate() and .summary() methods. Let’s run the analysis and view the results for this single file.

calc_single.calculate()
calc_single.summary()
============================================================
Summary for Trajectory dataset
  - Path to TRAJ file   : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5
  - Lattice structure   : POSCAR_TiO2
  - t_interval          : 0.074 ps (37 frames)
  - Temperatures (K)    : [1700.0]
  - Num. of TRAJ files  : [1]
============================================================

================ Temperature-Dependent Data ================
Temp (K)    D (m2/s)    D_rand (m2/s)    f       tau (ps)
----------  ----------  ---------------  ------  ----------
1700        4.071e-10   7.015e-10        0.5803  17.4318
============================================================

================= Final Fitted Parameters ==================
Diffusivity (D):
  - Ea          : - 
  - D0          : - 
  - R-squared   : -
Random Walk Diffusivity (D_rand):
  - Ea          : -
  - D0          : -
  - R-squared   : -
Correlation Factor (f):
  - Ea          : -
  - f0          : -
  - R-squared   : -
Residence Time (tau):
  - Ea (fixed)  : -
  - tau0        : -
  - R-squared   : -
Effective Hopping Distance (a) : -
============================================================

The real power of a CalculatorSingle object is the ability to inspect the fine-grained details of every hop. You can see not just how many hops occurred for each path type, but also the chronological sequence of those events.

First, let’s see the total count for each type of hopping path for each vacancy:

path_names = calc_single.site.path_name

for i, counts in enumerate(calc_single.counts):
    print(f"[Path Counts (Vacancy {i})]")
    for name, count in zip(path_names, counts):
        print(f"  {name}: {int(count)}")
    print('')
[Path Counts (Vacancy 0)]
  A1: 10
  A2: 12
  A3: 1

Next, we can view the chronological sequence of these hops using the .hopping_history attribute:

for i, history in enumerate(calc_single.hopping_history):
    print(f"[Hopping Sequence (Vacancy {i})]")
    for hop in history:
        print(f"  Step {hop['step']:<4d}: {hop['name']}")
[Hopping Sequence (Vacancy 0)]
  Step 368  : A2
  Step 584  : A2
  Step 797  : A1
... (rest of sequence) ...

While .hopping_history gives you the raw data, it’s often more useful to see a formatted summary. For this, VacHopPy provides the convenient .show_hopping_history() method.

calc_single.show_hopping_history()
====================================================================================================================
                                       Hopping Sequence of Vacancy 0
====================================================================================================================
Num     Time (ps)    Path    a (Ang)    Initial Site (Fractional Coordinate)    Final Site (Fractional Coordinate)
-----   -----------  ------  ---------  --------------------------------------  ------------------------------------
1       27.23        A2      2.80308    site1 [0.65249, 0.15249, 0.33333]      site1 [0.59751, 0.40249, 0.16667]
2       43.22        A2      2.80308    site1 [0.59751, 0.40249, 0.16667]      site1 [0.65249, 0.65249, 0.33333]
3       58.98        A1      2.56255    site1 [0.65249, 0.65249, 0.33333]      site1 [0.84751, 0.84751, 0.33333]
... (rest of table) ...
====================================================================================================================

Advanced Features

Assessing Data Reliability

Setting the argument error_bar=True will display error bars on the plots. The argument can be applied to plot_D(), plot_D_rand(), plot_f(), and plot_tau() methods. The error bars represent the standard error of the mean (SEM), which allows for an assessment of the data’s reliability. It is calculated as follows:

\[ SEM = {\sigma}/{\sqrt{n}} \]

Here, \(\sigma\) represents the standard deviation of the physical quantity at each temperature, and \(n\) is the number of data points (i.e., the number of HDF5 files used for that temperature).

calc_ensemble.plot_D(error_bar=True)

Example output:

Image

Decomposing Vacancy Diffusivity

You can decompose the total diffusivity (\(D\)) into its x, y, and z components (\(D_x\), \(D_y\), \(D_z\)) at each temperature using the decompose_diffusivity() method. This functionality is particularly useful for analyzing anisotropic systems where diffusion varies by direction, allowing you to calculate the specific diffusion barrier (\(E_a\)) for each axis.

Note that obtaining statistically reliable directional results may require more simulation data than calculating the total diffusivity alone.

calc_ensemble.decompose_diffusivity()
=========== Temperature-Dependent Decomposed Diffusivity ===========
T (K)    Dx (m2/s)    Dy (m2/s)    Dz (m2/s)
-------  -----------  -----------  -----------
1700     4.729e-10    5.324e-10    2.475e-10
1800     6.673e-10    7.123e-10    4.496e-10
1900     1.144e-09    6.331e-10    6.915e-10
2000     1.25e-09     1.497e-09    5.507e-10
2100     2.069e-09    1.525e-09    8.47e-10
====================================================================

=================== Decomposed Diffusivity Fits ====================
Directional Diffusivity (Dx):
  - Ea        : 1.101 eV
  - D0        : 8.523e-07 m^2/s
  - R-squared : 0.9710
Directional Diffusivity (Dy):
  - Ea        : 0.867 eV
  - D0        : 1.818e-07 m^2/s
  - R-squared : 0.8022
Directional Diffusivity (Dz):
  - Ea        : 0.834 eV
  - D0        : 8.645e-08 m^2/s
  - R-squared : 0.8231
====================================================================

The plot_msd_xyz() and plot_D_xyz() methods can be used to obtain component-wise Mean Squared Displacement (MSD) and Arrhenius plots for diffusivity, respectively.

calc_ensemble.plot_msd_xyz()

Example output:

Image
calc_ensemble.plot_D_xyz()

Example output:

Image