Input Generation

To follow this tutorial, please download and unzip Example1/ from this link (177 MB).

VacHopPy uses the HDF5 format as its primary input for analysis. This format allows for highly efficient, streaming-based data access. As a result, VacHopPy can process massive trajectory datasets (hundreds of gigabytes) quickly while consuming minimal RAM, typically only a few gigabytes.

Before starting an analysis, you must convert your MD trajectory into this HDF5 format. While this can be done using the Python API (see parse_md() and parse_lammps()), the most convenient method is the command-line interface (CLI). VacHopPy employs the Atomic Simulation Environment (ASE) to parse position and force data from a wide variety of MD formats. For a full list of compatible file formats, please see the official ASE documentation.

Warning

Your MD trajectory file must contain both position and force data for each atom. VacHopPy uses force information to accurately determine site occupations.

Note

Based on our experience, ASE can have issues parsing the lammps-dump-text format. To ensure robust handling of these files, VacHopPy automatically uses MDAnalysis as a backend for this specific format.

How to Convert Trajectories to HDF5

To convert your MD trajectory into the HDF5 format, VacHopPy provides the convert command.

vachoppy convert [PATH_TRAJ] [TEMPERATURE] [TIMESTEP] --label [LABEL]

This command takes the following primary arguments:

PATH_TRAJ: The path to your source MD trajectory file (e.g., vasprun.xml).
TEMPERATURE: The simulation temperature in Kelvin.
TIMESTEP: The time step between frames in femtoseconds (fs).
--label [LABEL] (Optional): A suffix to append to the output HDF5 filenames.

For a full list of all available options, use the -h flag:

vachoppy convert -h

Converting VASP Results into HDF5

Navigate into the example folder you downloaded (path/to/Example1/). The directory contains two subfolders: VASP/ and LAMMPS/. Each subfolder contains MD outputs produced with the corresponding code. First, enter the VASP/ directory:

cd path/to/Example1/VASP
ls
# >> OUTCAR_RUN01  OUTCAR_RUN02

The VASP/ folder contains two files: OUTCAR_RUN01 and OUTCAR_RUN02. These are two consecutive MD runs. It is common to continue a simulation after the initial run if hopping events were not sufficiently sampled — the two files here represent such a continuation. The example system is rutile TiO₂ containing two oxygen vacancies.

Run the following vachoppy convert commands to convert each OUTCAR run into HDF5 input:

vachoppy convert OUTCAR_RUN01 2100 2 --label 01
# produces: TRAJ_Ti_01.h5, TRAJ_O_01.h5

vachoppy convert OUTCAR_RUN02 2100 2 --label 02
# produces: TRAJ_Ti_02.h5, TRAJ_O_02.h5

Each command creates two HDF5 files (one per atomic species) because VacHopPy stores trajectories split by atomic species to save disk space and enable efficient streaming access. For this tutorial, we are interested only in oxygen vacancies, so from now on we will use the TRAJ_O_*.h5 files.

Note

The explanations in this section apply equally to files in formats other than lammps-dump-text (e.g., extxyz).

Concatenating Two HDF5 Files

The generated TRAJ_O_01.h5 and TRAJ_O_02.h5 files contain two consecutive segments of the same MD trajectory. Combining them into a single HDF5 file simplifies file management and subsequent analyses. VacHopPy provides the concat command to join two consecutive HDF5 files.

vachoppy concat TRAJ_O_01.h5 TRAJ_O_02.h5
# produces: TRAJ_O_CONCAT.h5

This command writes a single output file (by default TRAJ_O_CONCAT.h5) containing frames from the first file followed by frames from the second file. When concatenating, VacHopPy takes periodic boundary conditions (PBC) into account and applies a positional offset to the second file.

Converting LAMMPS Results into HDF5

Enter the LAMMPS/ directory:

cd path/to/Example1/LAMMPS
ls
# >> lammps.data  lammps.dump

The directory contains two files: lammps.data, which defines the initial atomic structure, and lammps.dump, which stores the MD trajectory. Because LAMMPS outputs can vary widely depending on user settings, converting its trajectories requires explicitly specifying the file format and atom styles.

To convert the trajectory into HDF5 format, run:

vachoppy convert lammps.dump 2100 2 \
  --label 01 \
  --format lammps-dump-text \
  --lammps_data lammps.data \
  --atom_style_data 'id type x y z' \
  --atom_style_dump 'id type x y z fx fy fz' \
  --atom_symbols 1=Ti 2=O
# produces: TRAJ_Ti_01.h5  TRAJ_O_01.h5

This command includes several additional arguments compared to the VASP example:

--format: Specifies the input file format (lammps-dump-text).
--lammps_data: Path to the LAMMPS data file defining the initial structure.
--atom_style_data: Atom style used in the data file (e.g., ‘id type x y z’).
--atom_style_dump: Atom style used in the dump file (e.g., ‘id type x y z fx fy fz’).
--atom_symbols: Mapping between atom types and chemical symbols (e.g., 1=Ti 2=O).

This command creates two HDF5 files, TRAJ_Ti_01.h5 and TRAJ_O_01.h5, just like in the VASP example. For more details on parsing LAMMPS trajectories and supported atom styles, refer to the official MDAnalysis documentation.

How to Read HDF5 Files

You can inspect the contents of an HDF5 trajectory using the show command:

vachoppy show TRAJ_O_01.h5

This command prints key information about the stored MD simulation.

==================================================
  Trajectory File: TRAJ_O_01.h5
==================================================

[Simulation Parameters]
  - Atomic Symbol:      O
  - Number of Frames:   10000
  - Temperature:        2100.0 K
  - Time Step:          2.0 fs

[Composition]
  - Counts:             O: 46, Ti: 24
  - Total Atoms:        70

[Lattice Vectors (Å)]
  [  9.16177,   0.00000,   0.00000]
  [  0.00000,   9.16177,   0.00000]
  [  0.00000,   0.00000,   8.92242]

[Stored Datasets]
  - positions:          Shape = (10000, 46, 3)
  - forces:             Shape = (10000, 46, 3)
==================================================

If you want to directly access the stored data for your own analysis, you can use the following Python script:

import h5py
import json
import numpy as np

path_traj = 'TRAJ_O_01.h5'  # HDF5 file to read

with h5py.File(path_traj, 'r') as f:
    conditions = json.loads(f.attrs['metadata'])
    positions = np.array(f['positions'][:], dtype=np.float64)
    forces = np.array(f['forces'][:], dtype=np.float64)

print('=' * 40)
print('  Metadata')
print('=' * 40)
print(f"    Chemical Symbol   : {conditions['symbol']}")
print(f"    Number of Frames  : {conditions['nsw']}")
print(f"    Temperature (K)   : {conditions['temperature']}")
print(f"    Timestep (fs)     : {conditions['dt']}")
print('=' * 40)
print('  Main Data')
print('=' * 40)
print(f"    Shape of Positions: {positions.shape}")
print(f"    Shape of Forces   : {forces.shape}")
print('=' * 40)

Example output:

========================================
  Metadata
========================================
    Chemical Symbol   : O
    Number of Frames  : 10000
    Temperature (K)   : 2100.0
    Timestep (fs)     : 2.0
========================================
  Main Data
========================================
    Shape of Positions: (10000, 46, 3)
    Shape of Forces   : (10000, 46, 3)
========================================

The datasets positions and forces are stored in the shape (n_frames, n_atoms, 3). Note that the positions contains PBC-unwrapped fractional coordinates. It can be converted to Cartesian coordiantes using lattice parameters, which can be obtained from conditions['lattice'].