{ "cells": [ { "cell_type": "markdown", "id": "34123934", "metadata": {}, "source": [ "# Vacancy Trajectory Analysis\n", "\n", "You can download source files to follow this tutorial from this [link](https://drive.google.com/file/d/1OV7nbabecWXPq6DxNCrkphOIG-InZ7my/view?usp=drivesdk) (552 MB).\n", "\n", "---\n", "Analyzing vacancy hopping in `VacHopPy` begins with the `Calculator()` factory function and the `Site` class.\n", "\n", "---\n", "## The `Calculator()` Factory Function\n", "\n", "The `Calculator()` function is the main entry point for `VacHopPy`'s analysis pipeline. It gathers all necessary inputs, performs initial setup, and returns a fully configured `CalculatorEnsemble` object, ready for analysis.\n", "\n", "To get started, you call this function as follows:\n", "\n", "```bash\n", "calc_ensemble = Calculator(path_traj, site, t_interval)\n", "```\n", "### Key Arguments\n", "\n", "* **`path_traj` (*str*)** \n", "\n", " Path to the trajectory data. This can be either a single HDF5 trajectory file or a directory containing a bundle of HDF5 files. `VacHopPy` will automatically discover and process all valid files in the specified directory.\n", "\n", "* **`site` (*Site*)**\n", "\n", " An instance of the `Site` class.\n", "\n", "* **`t_interval` (*float, Optional*)**\n", "\n", " The time interval in **picoseconds (ps)** used for averaging atomic positions to determine site occupation. To distinguish true hopping events from thermal vibrations, `VacHopPy` averages the atomic coordinates and forces over each `t_interval`. This averaged quantities are then used to assign an atom to a specific lattice site. Each interval corresponds to a single analysis step. If this argument is not provided, `Calculator()` will automatically search for an optimal `t_interval` for the analysis.\n", "\n", "### Return Value\n", "\n", "Calling the `Calculator()` function returns an instance of the `CalculatorEnsemble` class, which holds all the processed data and methods required for subsequent analysis steps.\n", "\n", "---\n", "## The `Site` Class\n", "\n", "The `Site` class is used to define the structural backbone of your material. It identifies all **lattice sites** from a vacancy-free reference structure and determines the possible **hopping paths** between them.\n", "\n", "You can create a `Site` object like this:\n", "\n", "```bash\n", "site = Site(path_structure, symbol)\n", "```\n", "\n", "### Key Arguments\n", "\n", "* **`path_structure` (*str*)**\n", "\n", " Path to a structure file of the perfect, vacancy-free material. Any format supported by the **Atomic Simulation Environment (ASE)** is compatible.\n", "\n", "* **`symbol` (*str*)**\n", "\n", " The chemical symbol of the diffusing species.\n", "\n", "----\n", "\n", "## Usage Example\n", "\n", "Navigate into the `Example6/` directory you downloaded. In this directory, you will find two files:\n", "\n", "* **`TRAJ_O.h5`**\n", "\n", " This is the HDF5 file containing the MD trajectory data. The example system is **rutile TiO₂** containing **two oxygen vacancies**, simulated at **2100 K**.\n", "\n", "* **`POSCAR_TiO2`**\n", "\n", " This file contains the crystal structure of the perfect, **vacancy-free** rutile rutile TiO₂ supercell.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "0e5533cd", "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "from vachoppy.utils import show_traj\n", "from vachoppy.core import Site, Calculator\n", "\n", "path_traj, path_structure = 'TRAJ_O.h5', 'POSCAR_TiO2'\n", "\n", "if not os.path.exists(path_traj): print(f\"{path_traj} not found.\")\n", "if not os.path.exists(path_structure): print(f\"{path_structure} not found.\")" ] }, { "cell_type": "markdown", "id": "09bbb32a", "metadata": {}, "source": [ "---\n", "### 1. How to inspect the HDF5 file\n", "\n", "You can inspect the contents of the HDF5 trajectory file using the `show_traj` method:" ] }, { "cell_type": "code", "execution_count": 2, "id": "2e4a8f69", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==================================================\n", " Trajectory File: TRAJ_O.h5\n", "==================================================\n", "\n", "[Simulation Parameters]\n", " - Atomic Symbol: O\n", " - Number of Frames: 300000\n", " - Temperature: 2100.0 K\n", " - Time Step: 2.0 fs\n", "\n", "[Composition]\n", " - Counts: O: 46, Ti: 24\n", " - Total Atoms: 70\n", "\n", "[Lattice Vectors (Ang)]\n", " [ 9.29133, 0.00000, 0.00000]\n", " [ 0.00000, 9.29133, 0.00000]\n", " [ 0.00000, 0.00000, 8.90126]\n", "\n", "[Stored Datasets]\n", " - positions: Shape = (300000, 46, 3)\n", " - forces: Shape = (300000, 46, 3)\n", "==================================================\n" ] } ], "source": [ "show_traj(path_traj)" ] }, { "cell_type": "markdown", "id": "ebf6e7e6", "metadata": {}, "source": [ "From the output, the MD run was performed at 2100 K for 600 ps (300,000 iterations × 2.0 fs). The composition shows the system contains **two oxygen vacancies**." ] }, { "cell_type": "markdown", "id": "b1e6a22a", "metadata": {}, "source": [ "---\n", "### 2. Creating a `Site` Instance" ] }, { "cell_type": "code", "execution_count": 3, "id": "097068c5", "metadata": {}, "outputs": [], "source": [ "site = Site(path_structure, 'O')" ] }, { "cell_type": "markdown", "id": "3b36bb90", "metadata": {}, "source": [ "You can inspect the generated lattice information by calling the `.summary()` method.\n", "\n", "`VacHopPy` automatically identifies all possible hopping paths by employing the **Voronoi tessellation** technique. The maximum distance for this search can be adjusted using the `rmax` parameter, which defaults to **3.25 Å**." ] }, { "cell_type": "code", "execution_count": 5, "id": "4b0a5562", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "====================================================================================================\n", " Structure File: POSCAR_TiO2\n", "====================================================================================================\n", "[Structure Information]\n", " - Structure Composition : Ti24 O48\n", " - Lattice Vectors (Ang) :\n", " [ 9.29133, 0.00000, 0.00000]\n", " [ 0.00000, 9.29133, 0.00000]\n", " [ 0.00000, 0.00000, 8.90126]\n", "\n", "[Hopping Path Information]\n", " - Diffusing Symbol : O\n", " - Inequivalent Sites : 1 found\n", " - Inequivalent Paths : 3 found (with Rmax = 3.25 Å)\n", "\n", "Name Init Site Final Site a (Å) z Initial Coord (Frac) Final Coord (Frac)\n", "------ ----------- ------------ ------- --- ------------------------ -------------------------\n", "A1 site1 site1 2.5625 1 [0.0975, 0.4025, 0.1667] [-0.0975, 0.5975, 0.1667]\n", "A2 site1 site1 2.8031 8 [0.0975, 0.4025, 0.1667] [0.3475, 0.3475, 0.0000]\n", "A3 site1 site1 2.9671 2 [0.0975, 0.4025, 0.1667] [0.0975, 0.4025, -0.1667]\n", "====================================================================================================\n", "\n" ] } ], "source": [ "site.summary()" ] }, { "cell_type": "markdown", "id": "6cbfafb0", "metadata": {}, "source": [ "---\n", "### 3. Creating the `CalculatorEnsemble` Instance\n", "\n", "While you can create a `CalculatorEnsemble` instance by calling its constructor directly, the recommended approach is to use the `Calculator()` factory function. This helper function offers a significant advantage: it can automatically search for an optimal `t_interval` if one is not provided, streamlining your analysis setup" ] }, { "cell_type": "code", "execution_count": 6, "id": "e9ac7ca9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "====================================================================\n", " Automatic t_interval Estimation\n", "====================================================================\n", " Estimating from TRAJ_O.h5\n", " -> t_interval : 0.076 ps\n", "====================================================================\n", " Adjusting t_interval to the nearest multiple of dt\n", "====================================================================\n", " - dt : 0.0020 ps\n", " - Original t_interval : 0.0764 ps\n", " - Adjusted t_interval : 0.0760 ps (38 frames)\n", "====================================================================\n" ] } ], "source": [ "calc_ensemble = Calculator(path_traj, site)" ] }, { "cell_type": "markdown", "id": "021ee656", "metadata": {}, "source": [ "Now that the `calc_ensemble` object is prepared, you can kick off the main analysis by calling the `.calculate()` method.\n", "\n", "This powerful method handles the entire analysis pipeline for you, which automatically includes **identifying the trajectory of each vacancy** and **calculating hopping parameters**." ] }, { "cell_type": "code", "execution_count": 7, "id": "99f761f6", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "73ade4a805e74b958e8fb675938535fc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Analyze Trajectory: 0%| | 0/1 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Analysis complete: 1 successful, 0 failed.\n", "Execution Time: 2.528 seconds\n", "Peak RAM Usage: 0.043 GB\n" ] } ], "source": [ "calc_ensemble.calculate()" ] }, { "cell_type": "markdown", "id": "af90e699", "metadata": {}, "source": [ "Although this tutorial uses a single HDF5 file, the `CalculatorEnsemble` class is specifically designed to process multiple trajectory files at once.\n", "\n", "The results for each individual HDF5 file are stored in the `.calculators` attribute. This attribute is a list of `CalculatorSingle` objects, sorted first by **temperature** (ascending) and then **alphabetically by file path**.\n", "\n", "You can verify which `CalculatorSingle` instance corresponds to which HDF5 file using the following code:" ] }, { "cell_type": "code", "execution_count": 8, "id": "6cb01bac", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "calculators[0] : /home/jty/Examples/Example6/TRAJ_O.h5\n" ] } ], "source": [ "for i, calc in enumerate(calc_ensemble.calculators):\n", " print(f\"calculators[{i}] : {calc.path_traj}\")" ] }, { "cell_type": "markdown", "id": "a2fccc57", "metadata": {}, "source": [ "For the remainder of this tutorial, we will focus on the results from the first trajectory by selecting its corresponding object." ] }, { "cell_type": "code", "execution_count": null, "id": "76e9079c", "metadata": {}, "outputs": [], "source": [ "calc = calc_ensemble.calculators[0]" ] }, { "cell_type": "markdown", "id": "e107d7c4", "metadata": {}, "source": [ "Key analysis parameters are now stored as attributes in the `calc` object and can be viewed as follows:" ] }, { "cell_type": "code", "execution_count": 9, "id": "385eaff1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "- Number of Vacancies : 2\n", "- Time Step for the MD Simulation (dt) : 2.0 fs\n", "- Time interval for Averaging (t_interval) : 0.076 ps\n", "- Number of Frames in the MD Simulatoin : 299972 frames\n", "- Number of Frames per Step : 38 frames\n", "- Number of Steps for Analysis : 7894 steps\n" ] } ], "source": [ "print(f\"- Number of Vacancies : {calc.num_vacancies}\")\n", "print(f\"- Time Step for the MD Simulation (dt) : {calc.dt} fs\")\n", "print(f\"- Time interval for Average (t_interval) : {calc.t_interval} ps\")\n", "print(f\"- Number of Frames in the MD Simulatoin : {calc.num_frames} frames\")\n", "print(f\"- Number of Frames per Step : {int(calc.t_interval * 1000 / calc.dt)} frames\")\n", "print(f\"- Number of Steps for Analysis : {calc.num_steps} steps\")" ] }, { "cell_type": "markdown", "id": "b12820ab", "metadata": {}, "source": [ "You might notice that while the MD simulation contains 300,000 frames, only 299,972 are used in the analysis. This is a direct consequence of the time-averaging process controlled by `t_interval`.\n", "\n", "`VacHopPy` groups the raw simulation frames into blocks, with each block corresponding to a single analysis step. The duration of each step is defined by `t_interval`.\n", "\n", "In this example, with a `t_interval` of 0.076 ps (or 76 fs) and a simulation timestep (`dt`) of 2 fs, each analysis step is derived by averaging **76 fs / 2 fs = 38** frames.\n", "\n", "Because the analysis requires complete steps, `VacHopPy` uses the largest number of frames that is perfectly divisible by 38. Any remaining frames at the end of the trajectory are discarded. Therefore, the total number of analysis steps is **int(300000 / 38) = 7894**, and the total frames used is **7894 * 38 = 299972**." ] }, { "cell_type": "markdown", "id": "15eb1a93", "metadata": {}, "source": [ "---\n", "### 4. Analyzing Vacancy Hopping Trajectories\n", "\n", "Once the calculation is complete, you can access the detailed hopping trajectory of each vacancy through two key attributes. These attributes provide different but complementary views of the vacancy movement.\n", "\n", "* **`.hopping_history` (*list*)**\n", "\n", " A list where the length is equal to the number of vacancies (`num_vacancies`). Each element in the list corresponds to a single vacancy and contains its complete hopping history.\n", "\n", "* **`.unwrapped_vacancy_trajectory_coord_cart` (*dict*)**\n", "\n", " A dictionary where each key is an analysis step (an integer) and the value contains the corresponding spatial coordinates. The coordinates are the **PBC-unwrapped Cartesian positions (in Å)** of all vacancies at that specific step. \n", "\n", "---\n", "#### - Usage of `.hopping_history`\n", "\n", "Let's examine the history of the first vacancy (`calc.hopping_history[0]`) to see this in action:" ] }, { "cell_type": "code", "execution_count": 23, "id": "1f3a4799", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of Vacancies: 2\n", "Number of Hopping Events for the First Vacancy: 111\n" ] } ], "source": [ "print(f\"Number of Vacancies: {len(calc.hopping_history)}\")\n", "print(f\"Number of Hopping Events for the First Vacancy: {len(calc.hopping_history[0])}\")" ] }, { "cell_type": "markdown", "id": "32312d7e", "metadata": {}, "source": [ "The raw dictionary output for a single event can be hard to read. We can easily format it for better clarity:" ] }, { "cell_type": "code", "execution_count": 26, "id": "f1bcb75b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Details of the Last Hopping Event ---\n", "- site_init : site1\n", "- site_final : site1\n", "- distance : 2.5624908680896756\n", "- z : 1\n", "- coord_init : [0.098, 0.402, 0.167]\n", "- coord_final : [-0.098, 0.598, 0.167]\n", "- name : A1\n", "- step : 7825\n", "- index_init : 24\n", "- index_final : 36\n" ] } ], "source": [ "# Get the last hopping event of the first vacancy\n", "last_event = calc.hopping_history[0][-1]\n", "\n", "print(\"\\n--- Details of the Last Hopping Event ---\")\n", "for key, value in last_event.items():\n", " if 'coord' in key and isinstance(value, np.ndarray):\n", " print(f\"- {key:<12}: [{value[0]:.3f}, {value[1]:.3f}, {value[2]:.3f}]\")\n", " else:\n", " print(f\"- {key:<12}: {value}\")" ] }, { "cell_type": "markdown", "id": "0f555bfd", "metadata": {}, "source": [ "---\n", "#### - Usage of `.unwrapped_vacancy_trajectory_coord_cart`\n", "\n", "To get the coordinates at the 100th step:" ] }, { "cell_type": "code", "execution_count": 44, "id": "ddae8323", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.90597733 3.73968856 1.48354252]\n", " [3.22881028 3.22881028 2.96708503]]\n" ] } ], "source": [ "coords_at_step_100 = calc.unwrapped_vacancy_trajectory_coord_cart[100]\n", "print(coords_at_step_100)" ] }, { "cell_type": "markdown", "id": "e7e4365f", "metadata": {}, "source": [ "The result is a NumPy array where each row represents the **[x, y, z]** coordinates of a single vacancy at the specified step. The first row corresponds to the first vacancy, the second row to the second, and so on. " ] }, { "cell_type": "markdown", "id": "1467f1c8", "metadata": {}, "source": [ "---\n", "#### - Save the Vacancy Hopping Histories\n", "\n", "You can save this unwrapped trajectory into the `trajectory.json` file using `.save_trajectory()` method." ] }, { "cell_type": "code", "execution_count": 45, "id": "e297690c", "metadata": {}, "outputs": [], "source": [ "calc.save_trajectory()" ] }, { "cell_type": "markdown", "id": "ea7dad96", "metadata": {}, "source": [ "---\n", "#### - Summary of Vacancy Hopping Histories\n", "\n", "You can print the hopping history of each vacancy easily using `.show_hopping_history()`:" ] }, { "cell_type": "code", "execution_count": null, "id": "b76cf6af", "metadata": {}, "outputs": [], "source": [ "calc.show_hopping_history()" ] }, { "cell_type": "markdown", "id": "1843d4de", "metadata": {}, "source": [ "```bash\n", "====================================================================================================================\n", " Hopping Sequence of Vacancy 0\n", "====================================================================================================================\n", "Num Time (ps) Path a (Ang) Initial Site (Fractional Coordinate) Final Site (Fractional Coordinate)\n", "----- ----------- ------ --------- -------------------------------------- ------------------------------------\n", "1 11.93 A2 2.80311 site1 [0.09751, 0.40249, 0.16667] site1 [0.15249, 0.15249, 0.33333]\n", "2 13.6 A1 2.56249 site1 [0.15249, 0.15249, 0.33333] site1 [0.34751, 0.34751, 0.33333]\n", "3 14.44 A1 2.56249 site1 [0.34751, 0.34751, 0.33333] site1 [0.15249, 0.15249, 0.33333]\n", "... (rest of table) ...\n", "====================================================================================================================\n", "\n", "====================================================================================================================\n", " Hopping Sequence of Vacancy 1\n", "====================================================================================================================\n", "Num Time (ps) Path a (Ang) Initial Site (Fractional Coordinate) Final Site (Fractional Coordinate)\n", "----- ----------- ------ --------- -------------------------------------- ------------------------------------\n", "1 7.3 A2 2.80311 site1 [0.40249, 0.59751, 0.50000] site1 [0.34751, 0.34751, 0.33333]\n", "2 8.13 A1 2.56249 site1 [0.34751, 0.34751, 0.33333] site1 [0.15249, 0.15249, 0.33333]\n", "3 8.36 A1 2.56249 site1 [0.15249, 0.15249, 0.33333] site1 [0.34751, 0.34751, 0.33333]\n", "... (rest of table) ...\n", "====================================================================================================================\n", "```" ] }, { "cell_type": "markdown", "id": "0f4dfb94", "metadata": {}, "source": [ "---\n", "### 4. Visualizing Vacancy Hopping Trajectories\n", "\n", "`VacHopPy` provides three powerful methods for visualizing the vacancy hopping trajectories, each offering a unique perspective on the diffusion process.\n", "\n", "* **`plot_vacancy_trajectory()`**\n", "\n", " This method generates a static **interactive 3D plot** of the complete vacancy paths.\n", "\n", "* **`animate_occupation()`**\n", "\n", " This method creates a **GIF animation** showing the time evolution of **site occupations**. It visualizes how individual atoms move between lattice sites and, as a consequence, how vacancies propagate through the structure. \n", "\n", "* **`animate_vacancy_trajectory()`**\n", "\n", " This method generates an **interactive HTML-based video** of the vacancy trajectories." ] }, { "cell_type": "markdown", "id": "65b7e06c", "metadata": {}, "source": [ "---\n", "#### - Usage of `plot_vacancy_trajectory()`\n", "\n", "You can customize which trajectories are displayed using the following key arguments:\n", "\n", "* **`vacancy_indices`**\n", "\n", " Specifies which vacancies to visualize. You can pass a single index (e.g., `0`) or a list of indices (e.g., `[0, 2]`).\n", "\n", "* **`unwrap`**\n", "\n", " Set this argument to `True` to plot the PBC-unwrapped trajectory.\n", "\n", "Furthermore, you can adjust the range of steps using `step_init` and `step_final` arguments" ] }, { "cell_type": "code", "execution_count": null, "id": "b22a267c", "metadata": {}, "outputs": [], "source": [ "calc.plot_vacancy_trajectory(vacancy_indices=[0, 1], unwrap=True)" ] }, { "cell_type": "markdown", "id": "a1a74463", "metadata": {}, "source": [ "Example output:\n", "