# Hopping Parameter Extraction To follow this tutorial, please download and unzip `Example3/` from [this link](https://drive.google.com/file/d/1_gMT74f_1PqxQ8Um1-y_i11Y2tKg7-3f/view?usp=drivesdk) (26 GB). ---- ## How to Extract Hopping Parameters The core feature of `VacHopPy`—calculating **effective hopping parameters**—is automated by the `analyze` command. The basic syntax is as follows: ```bash vachoppy analyze [PATH_TRAJ] [PATH_STRUCTURE] [SYMBOL] --neb [NEB_CSV] ``` ### Command Arguments This command takes the following primary arguments: * **`PATH_TRAJ`** Path to either a single HDF5 trajectory file or a root directory containing multiple HDF5 files. If files from multiple temperatures are provided, an Arrhenius-type analysis is automatically performed. * **`PATH_STRUCTURE`** Path to a structure file of the perfect, vacancy-free material. While this example uses VASP POSCAR, any format supported by the **Atomic Simulation Environment (ASE)** is compatible. `VacHopPy` uses this file to determine lattice sites and potential hopping paths. * **`SYMBOL`** The chemical symbol of the diffusing species (e.g., O for oxygen). * **`--neb [NEB_CSV]`** (Optional) Path to a CSV file containing the activation energy barrier for each hopping path, as pre-calculated from a method like nudged elastic band (NEB). ### The NEB_CSV File This optional CSV file provides essential energy information for calculating certain parameters like the attempt frequency ($\nu$) and coordination number ($z$). The format is a simple two-column list of path names and their corresponding energy barriers in eV. Example `neb_TiO2.csv`: ```bash A1,0.8698 A2,1.058 A3,1.766 ``` The required path names (e.g., `A1`, `A2`) can be identified using the `Site` class. ```python from vachoppy.core import Site site = Site([PATH_STRUCTURE], [SYMBOL]) site.summary() ``` Running the `summary()` method provides the necessary structural information about crystal sites and potential hopping paths. Example output: ```bash ============================================================================================================== Site Analysis Summary ============================================================================================================== - Structure File : POSCAR_TiO2 - Diffusing Symbol : O - Inequivalent Sites : 1 found - Inequivalent Paths : 3 found (with Rmax = 3.25 Å) -- Hopping Path Information -- Name Init Site Final Site a (Å) z Initial Coord (Frac) Final Coord (Frac) ------ ----------- ------------ ------- --- ------------------------ ------------------------- A1 site1 site1 2.5626 1 [0.0975, 0.4025, 0.1667] [-0.0975, 0.5975, 0.1667] A2 site1 site1 2.8031 8 [0.0975, 0.4025, 0.1667] [0.3475, 0.3475, 0.3333] A3 site1 site1 2.967 2 [0.0975, 0.4025, 0.1667] [0.0975, 0.4025, -0.1667] ============================================================================================================== ``` ### List of Extractable Parameters The specific effective hopping parameters that `VacHopPy` can calculate depend entirely on the data you provide. There are two factors: * **Number of Temperatures** Do your HDF5 files represent a single temperature or a range of temperatures? * **NEB_CSV** Have you provided a CSV file with pre-calculated hopping barriers? The following table summarizes what can be calculated under different conditions. (**O** = Calculable, **X** = Not Calculable).
| Parameter | Symbol | Single T | Multiple T | `NEB_CSV` Required? | | :--- | :---: | :---: | :---: | :---: | | Diffusivity | $D$ | O | O | No | | Residence Time | $\tau$ | O | O | No | | Correlation Factor | $f$ | O | O | No | | Hopping Barrier | $E_a$ | X | O | No | | Hopping Distance | $a$ | O | O | No | | Coordination Number | $z$ | X | O | **Yes** | | Attempt Frequency | $\nu$ | O | O | **Yes** |
----- ## Running the Analysis Navigate into the `Example3/` directory you downloaded. In this directory, you will find one subdirectory and two files: ```bash cd path/to/Example3/ ls # >> TRAJ_TiO2/ POSCAR_TiO2 neb_TiO2.csv ``` * **`TRAJ_TiO2/`** This directory contains the HDF5 files from MD simulations performed at multiple temperatures. Inside, it has subdirectories for each temperature (`TRAJ_1700K/` through `TRAJ_2100K/`). Each of these subdirectories contains 20 individual HDF5 files (e.g., `TRAJ_O_01.h5` to `TRAJ_O_20.h5`) of the same NVT ensemble. * **`POSCAR_TiO2`** This is the structure file corresponding to the `PATH_STRUCTURE` argument. It contains the perfect, vacancy-free rutile TiO₂ supercell. * **`neb_TiO2.csv`** This is the `NEB_CSV` file, which contains the pre-calculated hopping barrier information for the vacancy hopping paths identified from `POSCAR_TiO2`. ### Usage Scenarios for the analyze Command The `analyze` command is flexible and can be used in several ways depending on your analysis needs: * **To analyze a single HDF5 file**: ```bash # Analyzes a single trajectory file from the 2100 K simulation vachoppy analyze TRAJ_TiO2/TRAJ_2100K/TRAJ_O_01.h5 POSCAR_TiO2 O --neb neb_TiO2.csv ``` * **To analyze multiple HDF5 files from a single-temperature ensemble**: ```bash # Analyzes all 20 trajectory files from the 2100 K simulation vachoppy analyze TRAJ_TiO2/TRAJ_2100K/ POSCAR_TiO2 O --neb neb_TiO2.csv ``` * **To analyze multiple HDF5 files across a range of temperatures**: ```bash # Analyzes all 100 trajectory files from 1700 K to 2100 K vachoppy analyze TRAJ_TiO2/ POSCAR_TiO2 O --neb neb_TiO2.csv ``` ### Key Optional Flags * The `--neb [NEB_CSV]` flag can be omitted if you do not need to calculate parameters that depend on NEB data (like $\nu$ and $z$). * When `PATH_TRAJ` is a directory, `VacHopPy` automatically searches for HDF5 files up to two levels deep. You can adjust this search range using the `--depth` flag (default is 2). * `VacHopPy` processes large datasets efficiently using streaming and parallel processing. You can control the number of CPU cores used with the `--n_jobs` flag. The default is -1, which uses all available cores. ### Running the Full Tutorial Analysis For this tutorial, we will demonstrate the full capabilities of `VacHopPy` by analyzing all 100 HDF5 files from 1700-2100 K, including the `NEB_CSV` file. Execute the following command: ```bash vachoppy analyze TRAJ_TiO2/ POSCAR_TiO2 O --neb neb_TiO2.csv ``` ---- ## Understanding the Output The command prints a detailed summary in the terminal and generates two types of files: several **image files** in a new `imgs/` directory and a comprehensive `parameters.json` file. ### Terminal Output The command prints a multi-step summary direct to the terminal. Let's break down each step. * **Step 1: Automatic t_interval Estimation** The analysis begins by estimating the optimal time interval for analysis (`t_interval`) from a representative file for each temperature. ```{code-block} bash :class: scrollable-output [STEP1] Automatic t_interval Estimation: ============================================================ Automatic t_interval Estimation ============================================================ [1700.0 K] Estimating from TRAJ_1700K/TRAJ_O_01.h5 -> t_interval : 0.075 ps [1800.0 K] Estimating from TRAJ_1800K/TRAJ_O_01.h5 -> t_interval : 0.075 ps [1900.0 K] Estimating from TRAJ_1900K/TRAJ_O_01.h5 -> t_interval : 0.075 ps [2000.0 K] Estimating from TRAJ_2000K/TRAJ_O_01.h5 -> t_interval : 0.075 ps [2100.0 K] Estimating from TRAJ_2100K/TRAJ_O_01.h5 -> t_interval : 0.075 ps ============================================================ Adjusting t_interval to the nearest multiple of dt ============================================================ - dt : 0.0020 ps - Original t_interval : 0.0750 ps - Adjusted t_interval : 0.0740 ps (37 frames) ============================================================ ``` * **Step 2: Trajectory Analysis Progress** Next, `VacHopPy` analyzes all 100 trajectory files in parallel, showing a progress bar. ```bash [STEP2] Vacancy Trajectory Identification: Analyze Trajectory: 100%|##############################| 100/100 [00:09<00:00, 10.70it/s] ``` * **Step 3: Summary on Effective Hopping Parameters** Once the analysis is complete, a summary of the calculated effective hopping parameters is displayed. This includes both the temperature-dependent data and the final parameters from the Arrhenius fits. ```{code-block} bash :class: scrollable-output [STEP3] Hopping Parameter Calculation: ============================================================ Summary for Trajectory dataset - Path to TRAJ bundle : TRAJ_TiO2 (depth=2) - Lattice structure : POSCAR_TiO2 - t_interval : 0.074 ps (37 frames) - Temperatures (K) : [1700.0, 1800.0, 1900.0, 2000.0, 2100.0] - Num. of TRAJ files : [20, 20, 20, 20, 20] ============================================================ ================ Temperature-Dependent Data ================ Temp (K) D (m2/s) D_rand (m2/s) f tau (ps) a (Ang) ---------- ---------- --------------- ------ ---------- --------- 1700 4.176e-10 6.413e-10 0.6511 19.092 2.7104 1800 6.097e-10 9.179e-10 0.6642 13.3759 2.7142 1900 8.228e-10 1.249e-09 0.6589 9.938 2.7287 2000 1.099e-09 1.663e-09 0.661 7.4301 2.7227 2100 1.48e-09 2.262e-09 0.6543 5.4597 2.7224 ============================================================ ================= Final Fitted Parameters ================== Diffusivity (D): - Ea : 0.961 eV - D0 : 2.944e-07 m^2/s - R-squared : 0.9991 Random Walk Diffusivity (D_rand): - Ea : 0.958 eV - D0 : 4.413e-07 m^2/s - R-squared : 0.9988 Correlation Factor (f): - Ea : 0.002 eV - f0 : 0.667 - R-squared : 0.0218 Residence Time (tau): - Ea (fixed) : 0.958 eV - tau0 : 2.777e-02 ps - R-squared : 0.9987 ============================================================ ``` * **Step 4: Attempt Frequency Summary** Finally, if `NEB_CSV` file was provided, the results of the attempt frequency analysis are shown. ```{code-block} bash :class: scrollable-output [STEP4] Attempt Frequency Calculation: ============================================================ Attempt Frequency Analysis Summary ============================================================ -- Temperature-Dependent Effective Parameters -- Temp (K) nu (THz) z ---------- ---------- ------ 1700.0 6.1675 5.8920 1800.0 6.0200 5.9904 1900.0 5.7611 6.0862 2000.0 5.6642 6.1788 2100.0 5.8308 6.2682 -------- -------- - Mean 5.8887 6.0831 -- Path-Wise Parameters (per Temperature) -- Temperature: 1700.0 K Path Name Hop Count nu_path (THz) ----------- ----------- --------------- A1 167 7.8923 A2 252 5.3794 A3 1 10.7228 ---------------------------------------- Temperature: 1800.0 K Path Name Hop Count nu_path (THz) ----------- ----------- --------------- A1 171 7.7409 A2 279 5.3119 A3 0 0 ---------------------------------------- Temperature: 1900.0 K Path Name Hop Count nu_path (THz) ----------- ----------- --------------- A1 161 6.5072 A2 344 5.4858 A3 0 0 ---------------------------------------- Temperature: 2000.0 K Path Name Hop Count nu_path (THz) ----------- ----------- --------------- A1 187 7.2352 A2 353 5.0879 A3 1 3.5067 ---------------------------------------- Temperature: 2100.0 K Path Name Hop Count nu_path (THz) ----------- ----------- --------------- A1 191 7.7369 A2 362 5.1858 A3 0 0 ---------------------------------------- ============================================================ NOTE: nu_paths can be unreliable for paths with low hop counts, as they are sensitive to statistical noise. ============================================================ ``` ```{note} The automatic `t_interval` estimation (Step 1) is skipped if you manually provide a value with the `--t_interval` flag. The attempt frequency calculation (Step 4) is only performed if an `NEB_CSV` file is provided via the `--neb` flag. ``` ### Generated Files `VacHopPy` generates several output files to help you understand and use the calculated parameters. #### Image Files for Visualization A series of plots are saved in the `imgs/` directory to visualize the results. Below are a few key examples: * **Total Hopping Counts per Path** This plot shows how frequently each type of hopping path was observed across all simulations. ```{image} ../../_static/counts.png :height: 350px :align: center :alt: Counts Plot ``` * **Arrhenius plot for diffusivity** This plot shows the temperature dependence of the vacancy diffusivity and the corresponding Arrhenius fit. ```{image} ../../_static/D.png :height: 350px :align: center :alt: Diffusivity Plot ``` * **Attempt Frequency vs Temperature** This plot shows how the effective attempt frequency changes with temperature. ```{image} ../../_static/nu.png :height: 350px :align: center :alt: Attempt Frequency Plot ``` Other generated images (`D_rand.png`, `f.png`, `tau.png`, `a.png`, `z.png`) can also be found in the `imgs/` directory. (section-parameters)= #### The parameters.json File This file contains all the raw calculated data in a machine-readable format. It also includes a helpful `description` key that provides a brief explanation for each parameter. You can easily inspect the contents of this file with a simple Python script: ```python import json with open('parameters.json', 'r') as f: params = json.load(f) data = params['description'] max_key_length = max(len(k) for k in data.keys()) for k, v in data.items(): print(f"{k:<{max_key_length}} : {v}") ``` Example output: ```{code-block} bash :class: scrollable-output D : Diffusivity (m2/s): (n_temperatures,) D0 : Pre-exponential factor for diffusivity (m2/s) Ea_D : Activation barrier for diffusivity (eV) D_rand : Random walk diffusivity (m2/s): (n_temperatures,) D_rand0 : Pre-exponential factor for random walk diffusivity (m2/s) Ea_D_rand : Activation barrier for random walk diffusivity (eV) f : Correlation factor: (n_temperatures,) f0 : Pre-exponential factor for correlation factor Ea_f : Activation barrier for correlation factor (eV) tau : Residence time (ps): (n_temperatures,) tau0 : Pre-exponential factor for residence time (ps) Ea_tau : Activation barrier for residence time (eV) a : Effective hopping distance (Ang) (n_temperatures,) a_path : Path-wise hopping distance (Ang): (n_paths,) nu : Effective attempt frequency (THz): (n_temperatures,) nu_path : Path-wise attempt frequency (THz): (n_temperatures, n_paths) Ea_path : Path-wise hopping barrier (eV): (n_paths) z : Effective number of the equivalent paths: (n_temperatures,) z_path : Number of the equivalent paths of each path: (n_paths,) z_mean : Mean number of the equivalent paths per path type: (n_temperatures,) m_mean : Mean number of path types: (n_temperatures,) P_site : Site occupation probability: (n_temperatures, n_sites) P_esc : Escape probability: (n_temperature, n_paths) times_site : Total residence times at each site (ps): (n_temperature, n_sites) counts_hop : Counts of hops at each temperature: (n_temperature, n_paths) ``` ```{note} **A Note on Activation Barriers** In this context, it is crucial to distinguish between the **hopping barrier** and the **diffusion barrier**. * The **hopping barrier**, represented by `Ea_D_rand`, is the activation energy for a single, random hop. * The **diffusion barrier**, represented by `Ea_D`, is the total activation energy for the macroscopic diffusion process. It encompasses both the hopping barrier and the energetic contributions from jump correlations. This subtle distinction is important. For instance, the activation energy that governs physical quantities like the residence time ($\tau$) or the hopping rate ($\Gamma$) corresponds to the **hopping barrier** (`Ea_D_rand`), not the overall diffusion barrier. ``` ---- ## Advanced Features ### Assessing Data Reliability Using the `--error_bar` flag will display error bars on the plots for $D$, $D_{rand}$, $f$, $\tau$. The error bars represent the **standard error of the mean (SEM)**, which allows for an assessment of the data's reliability. It is calculated as follows: $$ SEM = {\sigma}/{\sqrt{n}} $$ Here, $\sigma$ represents the standard deviation of the physical quantity at each temperature, and $n$ is the number of data points (i.e., the number of HDF5 files used for that temperature). Below is an example of the output plot: ```{image} ../../_static/D_sem.png :height: 350px :align: center :alt: Diffusivity Plot ``` ---- ### Decomposing Vacancy Diffusivity The `--xyz` flag enables the decomposition and analysis of vacancy diffusivity into its x, y, and z components. This analysis can be particularly effective when dealing with anisotropic systems. Example terminal output: ```{code-block} bash :class: scrollable-output [STEP5] Decomposing Diffusivity into xyz-Components: =========== Temperature-Dependent Decomposed Diffusivity =========== T (K) Dx (m2/s) Dy (m2/s) Dz (m2/s) ------- ----------- ----------- ----------- 1700 4.729e-10 5.324e-10 2.475e-10 1800 6.673e-10 7.123e-10 4.496e-10 1900 1.144e-09 6.331e-10 6.915e-10 2000 1.25e-09 1.497e-09 5.507e-10 2100 2.069e-09 1.525e-09 8.47e-10 ==================================================================== =================== Decomposed Diffusivity Fits ==================== Directional Diffusivity (Dx): - Ea : 1.101 eV - D0 : 8.523e-07 m^2/s - R-squared : 0.9710 Directional Diffusivity (Dy): - Ea : 0.867 eV - D0 : 1.818e-07 m^2/s - R-squared : 0.8022 Directional Diffusivity (Dz): - Ea : 0.834 eV - D0 : 8.645e-08 m^2/s - R-squared : 0.8231 ==================================================================== ``` This flag also generates additional plots for the component-wise Mean Squared Displacement (MSD) and component-wise Arrhenius fits. ```{image} ../../_static/msd_xyz.png :width: 750px :align: center :alt: Diffusivity Plot ``` ```{image} ../../_static/D_xyz.png :width: 750px :align: center :alt: Diffusivity Plot ``` ```{note} This component-wise analysis requires significantly more sampling of hopping events than calculating the isotropic vacancy diffusivity. As seen in the example plots, insufficient sampling can lead to poor linearity in the MSD and reduced R² values for the Arrhenius fits. ```