{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "23347dcb",
   "metadata": {},
   "source": [
    "# Hopping Parameter Extraction\n",
    "\n",
    "You can download source files to follow this tutorial from this [link](https://drive.google.com/file/d/1_gMT74f_1PqxQ8Um1-y_i11Y2tKg7-3f/view?usp=drivesdk) (26 GB).\n",
    "\n",
    "```{note}\n",
    "This is the same example set used in the **Hopping Parameter Extraction** or **Mean Square Displacement** section in `CLI Tutorial` documentation. If you have already completed that tutorial, you do not need to download the files again.\n",
    "```\n",
    "\n",
    "---\n",
    "\n",
    "Calculating effective hopping parameters relies on the `Calculator()` factory function and the `Site` class. For a detailed explanation of these core components, please refer to the previous tutorial.\n",
    "\n",
    "First, navigate to the `Example3` directory you downloaded. Inside, you will find three items: the `TRAJ_TiO2` directory, a `POSCAR_TiO2` file, and a `neb.csv` file.\n",
    "\n",
    "The `TRAJ_TiO2` directory contains a thermal ensemble of HDF5 trajectory files from MD simulations performed at various temperatures. This example includes simulations from 1700 K to 2100 K, with 20 independent runs at each temperature.\n",
    "\n",
    "The file structure is as follows:\n",
    "\n",
    "```bash\n",
    "TRAJ_TiO2/\n",
    "├── TRAJ_1700K/\n",
    "│   ├── TRAJ_O_01.h5\n",
    "│   ├──    ...\n",
    "│   └── TRAJ_O_20.h5\n",
    "├── TRAJ_1800K/\n",
    "│   ├── TRAJ_O_01.h5\n",
    "│   ├──    ...\n",
    "│   └── TRAJ_O_20.h5\n",
    "├── TRAJ_1900K/\n",
    "│   ├── TRAJ_O_01.h5\n",
    "│   ├──    ...\n",
    "│   └── TRAJ_O_20.h5\n",
    "├── TRAJ_2000K/\n",
    "│   ├── TRAJ_O_01.h5\n",
    "│   ├──    ...\n",
    "│   └── TRAJ_O_20.h5\n",
    "└── TRAJ_2100K/\n",
    "    ├── TRAJ_O_01.h5\n",
    "    ├──    ...\n",
    "    └── TRAJ_O_20.h5\n",
    " ```\n",
    "\n",
    "A key advantage of `VacHopPy` is its ability to process a large number of trajectories from multiple NVT ensembles simultaneously and in a memory-efficient way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "96c12618",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from vachoppy.core import Site, Calculator\n",
    "\n",
    "path_traj = 'TRAJ_TiO2'\n",
    "path_structure = 'POSCAR_TiO2'\n",
    "if not os.path.exists(path_traj): print(f\"{path_traj} not found.\")\n",
    "if not os.path.exists(path_structure): print(f\"{path_structure} not found.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff28ce42",
   "metadata": {},
   "source": [
    "---\n",
    "## Usage Example\n",
    "\n",
    "### - Creating the `Site` and `CalculatorEnsemble` Instances\n",
    "\n",
    "The first step in our analysis is to define the crystal's structural framework by creating a `Site` instance. This object analyzes a **perfect, vacancy-free supercell** to identify all **lattice sites** and **hopping paths**.\n",
    "\n",
    "Next, we pass this `Site` object to the `Calculator()` factory function. This function automatically processes all trajectories found in `path_traj`. A key feature is its ability to automatically determine an optimal `t_interval`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8a70cf94",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "====================================================================\n",
      "                   Automatic t_interval Estimation\n",
      "====================================================================\n",
      "  [1700.0 K] Estimating from TRAJ_1700K/TRAJ_O_01.h5\n",
      "             -> t_interval : 0.075 ps\n",
      "  [1800.0 K] Estimating from TRAJ_1800K/TRAJ_O_01.h5\n",
      "             -> t_interval : 0.075 ps\n",
      "  [1900.0 K] Estimating from TRAJ_1900K/TRAJ_O_01.h5\n",
      "             -> t_interval : 0.075 ps\n",
      "  [2000.0 K] Estimating from TRAJ_2000K/TRAJ_O_01.h5\n",
      "             -> t_interval : 0.075 ps\n",
      "  [2100.0 K] Estimating from TRAJ_2100K/TRAJ_O_01.h5\n",
      "             -> t_interval : 0.075 ps\n",
      "====================================================================\n",
      "        Adjusting t_interval to the nearest multiple of dt\n",
      "====================================================================\n",
      "    - dt                  : 0.0020 ps\n",
      "    - Original t_interval : 0.0750 ps\n",
      "    - Adjusted t_interval : 0.0740 ps (37 frames)\n",
      "====================================================================\n"
     ]
    }
   ],
   "source": [
    "site = Site(path_structure, 'O')\n",
    "calc_ensemble = Calculator(path_traj, site)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecd8b508",
   "metadata": {},
   "source": [
    "When `path_traj` points to a directory, `Calculator()` automatically discovers HDF5 files within it. By default, this search extends down to two subdirectory levels (a depth of 2). You can control this search depth using the `depth` argument if your file structure is different.\n",
    "\n",
    "---\n",
    "### - Calculating Hopping Parameters\n",
    "\n",
    "Once the `CalculatorEnsemble` is configured, you can initiate the main analysis by calling the `.calculate()` method.\n",
    "\n",
    "This method is highly optimized for performance and is designed to handle very large datasets.\n",
    "\n",
    "* **Memory Efficiency**\n",
    "    \n",
    "    It processes trajectories in a **streaming** fashion, loading data in small chunks to minimize RAM usage.\n",
    "\n",
    "* **Speed** \n",
    "\n",
    "    It leverages **parallel processing** to perform computations on multiple CPU cores simultaneously, significantly speeding up the analysis. You can control the number of CPU cores used for this parallel computation with the `n_jobs` argument. Setting `n_jobs=-1` will use all available cores."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "209c110d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "317674348d6241e0b2b7bf9f0e5ec249",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Analyze Trajectory:   0%|                              | 0/100 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Analysis complete: 100 successful, 0 failed.\n",
      "Execution Time: 16.900 seconds\n",
      "Peak RAM Usage: 1.153 GB\n"
     ]
    }
   ],
   "source": [
    "calc_ensemble.calculate()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b4376af",
   "metadata": {},
   "source": [
    "This method processes all HDF5 files found within the `path_traj` directory, calculating the vacancy hopping properties for each one. The results from each individual file are then stored in the `.calculators` attribute as a list of `CalculatorSingle` objects.\n",
    "\n",
    "You can inspect the first few results to see which files were processed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9e840c5e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of Data: 100\n",
      "\n",
      "[Index 0] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5\n",
      "[Index 1] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_02.h5\n",
      "[Index 2] : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_03.h5\n"
     ]
    }
   ],
   "source": [
    "print(f\"Number of Data: {len(calc_ensemble.calculators)}\\n\")\n",
    "\n",
    "# Display the file paths for the first three results\n",
    "for i, calc in enumerate(calc_ensemble.calculators[:3]):\n",
    "    print(f\"[Index {i}] : {calc.path_traj}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16e4ca76",
   "metadata": {},
   "source": [
    "You can now view the calculated hopping parameters for each trajectory by calling the `.summary()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4ad64cfb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "====================================================================\n",
      "Summary for Trajectory dataset\n",
      "  - Path to TRAJ bundle : TRAJ_TiO2 (depth=2)\n",
      "  - Lattice structure   : POSCAR_TiO2\n",
      "  - t_interval          : 0.074 ps (37 frames)\n",
      "  - Temperatures (K)    : [1700.0, 1800.0, 1900.0, 2000.0, 2100.0]\n",
      "  - Num. of TRAJ files  : [20, 20, 20, 20, 20]\n",
      "====================================================================\n",
      "\n",
      "==================== Temperature-Dependent Data ====================\n",
      "T (K)    D (m2/s)    D_rand (m2/s)    f       tau (ps)    a (Ang)\n",
      "-------  ----------  ---------------  ------  ----------  ---------\n",
      "1700     4.176e-10   6.413e-10        0.6511  19.092      2.7104\n",
      "1800     6.097e-10   9.179e-10        0.6642  13.3759     2.7142\n",
      "1900     8.228e-10   1.249e-09        0.6589  9.938       2.7287\n",
      "2000     1.099e-09   1.663e-09        0.661   7.4301      2.7227\n",
      "2100     1.48e-09    2.262e-09        0.6543  5.4597      2.7224\n",
      "====================================================================\n",
      "\n",
      "===================== Final Fitted Parameters ======================\n",
      "Diffusivity (D):\n",
      "  - Ea          : 0.961 eV\n",
      "  - D0          : 2.944e-07 m^2/s\n",
      "  - R-squared   : 0.9991\n",
      "Random Walk Diffusivity (D_rand):\n",
      "  - Ea          : 0.958 eV\n",
      "  - D0          : 4.413e-07 m^2/s\n",
      "  - R-squared   : 0.9988\n",
      "Correlation Factor (f):\n",
      "  - Ea          : 0.002 eV\n",
      "  - f0          : 0.667\n",
      "  - R-squared   : 0.0218\n",
      "Residence Time (tau):\n",
      "  - Ea (fixed)  : 0.958 eV\n",
      "  - tau0        : 2.777e-02 ps\n",
      "  - R-squared   : 0.9987\n",
      "====================================================================\n"
     ]
    }
   ],
   "source": [
    "calc_ensemble.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f88f15d6",
   "metadata": {},
   "source": [
    "### - Plotting the Hopping Parameters\n",
    "\n",
    "`VacHopPy` includes a suite of convenient methods to visualize the calculated hopping parameters as a function of temperature. \n",
    "\n",
    "The following table summarizes the available plotting functions:\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "| Method | Description |\n",
    "| :---: | :--- |\n",
    "| **`plot_D()`** | Generates an Arrhenius plot of the calculated **diffusivity ($D$)**. |\n",
    "| **`plot_D_rand()`** | Creates an Arrhenius plot for the **random walk diffusivity ($D_{rand}$)**. |\n",
    "| **`plot_f()`** | Plots the **correlation factor ($f$)** as a function of temperature. |\n",
    "| **`plot_tau()`** | Visualizes the **vacancy residence time ($\\tau$)** across different temperatures. |\n",
    "| **`plot_a()`** | Shows the average **hopping distance ($a$)** as a function of temperature. |\n",
    "| **`plot_counts()`** | Displays the total **count** for each unique type of hopping path. |\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3ef6178",
   "metadata": {},
   "outputs": [],
   "source": [
    "calc_ensemble.plot_D()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08ff16fb",
   "metadata": {},
   "source": [
    "Example output:\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "  <img src=\"https://github.com/user-attachments/assets/98dc26c5-a7c2-4b9d-9c5b-8540d0803963\" alt=\"Image\" width=\"400\">\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a69a705e",
   "metadata": {},
   "source": [
    "#### - Saving the Hopping Parameters\n",
    "\n",
    "Once your analysis is complete, you can save all the calculated effective hopping parameters to a file using the `.save_parameters()` method. This command exports the results into a `parameters.json` file, making it easy to reload or analyze them later without re-running the entire calculation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "4ba806e1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameters saved to 'parameters.json'\n"
     ]
    }
   ],
   "source": [
    "calc_ensemble.save_parameters()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6273fc6",
   "metadata": {},
   "source": [
    "### - Calculating the Attempt Frequency\n",
    "\n",
    "To calculate the **effective attempt frequency** ($\\nu$) and the **effective coordination number** ($z$), `VacHopPy` requires the pre-calculated energy barriers for each unique type of hopping path.\n",
    "\n",
    "This information must be provided in a CSV file. Each row in this file should correspond to a specific hopping path and contain its calculated hopping barrier, which is typically obtained from methods like the nudged elastic band (NEB).\n",
    "\n",
    "Below is an example of the CSV file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "5e789932",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   A1  0.8698\n",
      "0  A2   1.058\n",
      "1  A3   1.766\n"
     ]
    }
   ],
   "source": [
    "neb_csv = 'neb_TiO2.csv'\n",
    "if not os.path.exists(neb_csv):\n",
    "    print(f\"'{neb_csv}' not found.\")\n",
    "else:\n",
    "    data = pd.read_csv(neb_csv)\n",
    "    print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0f62c74",
   "metadata": {},
   "source": [
    "This calculation is performed by calling the `.calculate_attempt_frequency()` method. Simply provide the path to the CSV file containing the hopping barriers as an argument.\n",
    "\n",
    "Upon completion, the method will print a summary of the calculation. Additionally, if a `parameters.json` file from a previous run exists, this method will automatically update it by adding the newly calculated attempt frequency and coordination number.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "23b87a1d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameters saved to 'parameters.json'\n",
      "====================================================================\n",
      "                Attempt Frequency Analysis Summary\n",
      "====================================================================\n",
      "\n",
      "-- Temperature-Dependent Effective Parameters --\n",
      "\n",
      "Temp (K)    nu (THz)    z\n",
      "----------  ----------  ------\n",
      "1700.0      6.1675      5.8920\n",
      "1800.0      6.0200      5.9904\n",
      "1900.0      5.7611      6.0862\n",
      "2000.0      5.6642      6.1788\n",
      "2100.0      5.8308      6.2682\n",
      "--------    --------    -\n",
      "Mean        5.8887      6.0831\n",
      "\n",
      "-- Path-Wise Parameters (per Temperature) --\n",
      "\n",
      "Temperature: 1700.0 K\n",
      "Path Name    Hop Count    nu_path (THz)\n",
      "-----------  -----------  ---------------\n",
      "A1           167          7.8923\n",
      "A2           252          5.3794\n",
      "A3           1            10.7228\n",
      "----------------------------------------\n",
      "Temperature: 1800.0 K\n",
      "Path Name    Hop Count    nu_path (THz)\n",
      "-----------  -----------  ---------------\n",
      "A1           171          7.7409\n",
      "A2           279          5.3119\n",
      "A3           0            0\n",
      "----------------------------------------\n",
      "Temperature: 1900.0 K\n",
      "Path Name    Hop Count    nu_path (THz)\n",
      "-----------  -----------  ---------------\n",
      "A1           161          6.5072\n",
      "A2           344          5.4858\n",
      "A3           0            0\n",
      "----------------------------------------\n",
      "Temperature: 2000.0 K\n",
      "Path Name    Hop Count    nu_path (THz)\n",
      "-----------  -----------  ---------------\n",
      "A1           187          7.2352\n",
      "A2           353          5.0879\n",
      "A3           1            3.5067\n",
      "----------------------------------------\n",
      "Temperature: 2100.0 K\n",
      "Path Name    Hop Count    nu_path (THz)\n",
      "-----------  -----------  ---------------\n",
      "A1           191          7.7369\n",
      "A2           362          5.1858\n",
      "A3           0            0\n",
      "----------------------------------------\n",
      "====================================================================\n",
      "NOTE: nu_paths can be unreliable for paths with low hop counts,\n",
      "      as they are sensitive to statistical noise.\n",
      "====================================================================\n",
      "\n"
     ]
    }
   ],
   "source": [
    "calc_ensemble.calculate_attempt_frequency(neb_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61585cfe",
   "metadata": {},
   "source": [
    "After running the `.calculate_attempt_frequency()` method, you can visualize its primary results using two newly available plotting functions:\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "| Method          | Description                                       |\n",
    "| :-------------: | :----------------------------------------------------------------- |\n",
    "| **`plot_nu()`** | Plots the **attempt frequency ($\\nu$)** as a function of temperature. |\n",
    "| **`plot_z()`** | Plots the **coordination number ($z$)**  as a function of temperature.  |\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e44a7291",
   "metadata": {},
   "outputs": [],
   "source": [
    "calc_ensemble.plot_nu()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1b53aa2",
   "metadata": {},
   "source": [
    "Example output:\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "  <img src=\"https://github.com/user-attachments/assets/fb3b6dbe-0cbe-4efd-baa7-9aac247962f1\" alt=\"Image\" width=\"400\">\n",
    "\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13d24d6b",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## List of Extractable Parameters\n",
    "\n",
    "The specific effective hopping parameters that `VacHopPy` can calculate depend entirely on the data you provide. There are two factors:\n",
    "\n",
    "* **Number of Temperatures**\n",
    "\n",
    "    Do your HDF5 files represent a single temperature or a range of temperatures?\n",
    "\n",
    "* **NEB_CSV** \n",
    "\n",
    "    Have you provided a `NEB_CSV` file with pre-calculated hopping barriers?\n",
    "\n",
    "The following table summarizes what can be calculated under different conditions. (**O** = Calculable, **X** = Not Calculable).\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "| Parameter | Symbol | Single T | Multiple T | `NEB_CSV` Required? |\n",
    "| :--- | :---: | :---: | :---: | :---: |\n",
    "| Diffusivity | $D$ | O | O | No |\n",
    "| Residence Time | $\\tau$ | O | O | No |\n",
    "| Correlation Factor | $f$ | O | O | No |\n",
    "| Hopping Barrier | $E_a$ | X | O | No |\n",
    "| Hopping Distance | $a$ | O | O | No |\n",
    "| Coordination Number | $z$ | X | O | **Yes** |\n",
    "| Attempt Frequency | $\\nu$ | O | O | **Yes** |\n",
    "\n",
    "</div>\n",
    "\n",
    "To demonstrate this, we can run the analysis on a subset of the data corresponding to a single temperature (2100 K).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "7d7e4a35",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "====================================================================\n",
      "                   Automatic t_interval Estimation\n",
      "====================================================================\n",
      "  [2100.0 K] Estimating from TRAJ_2100K/TRAJ_O_01.h5\n",
      "             -> t_interval : 0.075 ps\n",
      "====================================================================\n",
      "        Adjusting t_interval to the nearest multiple of dt\n",
      "====================================================================\n",
      "    - dt                  : 0.0020 ps\n",
      "    - Original t_interval : 0.0751 ps\n",
      "    - Adjusted t_interval : 0.0760 ps (38 frames)\n",
      "====================================================================\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5be2f0dbc5034486be39629701a84824",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Analyze Trajectory:   0%|                              | 0/20 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Analysis complete: 20 successful, 0 failed.\n",
      "Execution Time: 2.088 seconds\n",
      "Peak RAM Usage: 0.136 GB\n"
     ]
    }
   ],
   "source": [
    "# Create an analysis object for only the 2100 K simulations\n",
    "# A single HDF5 also can be used (e.g., TRAJ_TiO2/TRAJ_2100K/TRAJ_O_01.h5)\n",
    "path_single_temp = os.path.join('TRAJ_TiO2', 'TRAJ_2100K')\n",
    "\n",
    "calc_ensemble_single = Calculator(path_single_temp, site)\n",
    "calc_ensemble_single.calculate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "425df73d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "====================================================================\n",
      "Summary for Trajectory dataset\n",
      "  - Path to TRAJ bundle : TRAJ_TiO2/TRAJ_2100K (depth=2)\n",
      "  - Lattice structure   : POSCAR_TiO2\n",
      "  - t_interval          : 0.076 ps (38 frames)\n",
      "  - Temperatures (K)    : [2100.0]\n",
      "  - Num. of TRAJ files  : [20]\n",
      "====================================================================\n",
      "\n",
      "==================== Temperature-Dependent Data ====================\n",
      "T (K)    D (m2/s)    D_rand (m2/s)    f       tau (ps)    a (Ang)\n",
      "-------  ----------  ---------------  ------  ----------  ---------\n",
      "2100     1.481e-09   2.254e-09        0.6568  5.4786      2.7221\n",
      "====================================================================\n",
      "\n",
      "===================== Final Fitted Parameters ======================\n",
      "Diffusivity (D):\n",
      "  - Ea          : - \n",
      "  - D0          : - \n",
      "  - R-squared   : -\n",
      "Random Walk Diffusivity (D_rand):\n",
      "  - Ea          : -\n",
      "  - D0          : -\n",
      "  - R-squared   : -\n",
      "Correlation Factor (f):\n",
      "  - Ea          : -\n",
      "  - f0          : -\n",
      "  - R-squared   : -\n",
      "Residence Time (tau):\n",
      "  - Ea (fixed)  : -\n",
      "  - tau0        : -\n",
      "  - R-squared   : -\n",
      "====================================================================\n"
     ]
    }
   ],
   "source": [
    "# Display the results\n",
    "calc_ensemble_single.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e97b7e12",
   "metadata": {},
   "source": [
    "As you can see in the `.summary()` output, `VacHopPy` successfully calculated all the hopping parameters available for the single temperature of 2100 K.\n",
    "\n",
    "However, the **Final Fitted Parameters** section is empty. This is because parameters like the activation energy ($E_a$) are determined by performing an Arrhenius fit across **a range of temperatures**. Since we only provided data for a single temperature, this fitting is not possible.\n",
    "\n",
    "Next, we will calculate the attempt frequency and coordination number by calling the `.calculate_attempt_frequency()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "313aa6bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameters saved to 'parameters.json'\n",
      "====================================================================\n",
      "                Attempt Frequency Analysis Summary\n",
      "====================================================================\n",
      "\n",
      "-- Temperature-Dependent Effective Parameters --\n",
      "\n",
      "Temp (K)    nu (THz)    z\n",
      "----------  ----------  ---\n",
      "2100        5.8106      nan\n",
      "\n",
      "-- Path-Wise Parameters (per Temperature) --\n",
      "\n",
      "Temperature: 2100.0 K\n",
      "Path Name    Hop Count    nu_path (THz)\n",
      "-----------  -----------  ---------------\n",
      "A1           191          7.7382\n",
      "A2           360          5.158\n",
      "A3           0            0\n",
      "----------------------------------------\n",
      "====================================================================\n",
      "NOTE: nu_paths can be unreliable for paths with low hop counts,\n",
      "      as they are sensitive to statistical noise.\n",
      "====================================================================\n",
      "\n"
     ]
    }
   ],
   "source": [
    "calc_ensemble_single.calculate_attempt_frequency(neb_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61221f0b",
   "metadata": {},
   "source": [
    "The summary shows that the **effective attempt frequency ($\\nu$)** for 2100 K has been successfully calculated.\n",
    "However, just like the activation energy, the **effective coordination number ($z$)** is reported as **nan** (Not a Number). This is because calculating $z$ also requires fitting data across a **range of temperatures**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26054aa8",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Accessing Individual Trajectory Data\n",
    "\n",
    "While the `CalculatorEnsemble` object provides results aggregated over the entire bundle of files, you may want to inspect the results from a specific, individual trajectory. This is where the `.calculators` attribute comes in.\n",
    "\n",
    "This attribute is a list of `CalculatorSingle` objects, each containing the detailed analysis for one HDF5 file. Let's select the first trajectory from our ensemble for a closer look."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "ddc88de9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selected HDF5 File : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5\n"
     ]
    }
   ],
   "source": [
    "calc_single = calc_ensemble.calculators[0]\n",
    "print(f\"Selected HDF5 File : {calc_single.path_traj}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a89befa",
   "metadata": {},
   "source": [
    "Just like its ensemble counterpart, the `CalculatorSingle` object has its own `.calculate()` and `.summary()` methods. Let's run the analysis and view the results for this single file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "a090a533",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================================\n",
      "Summary for Trajectory dataset\n",
      "  - Path to TRAJ file   : /home/jty/Examples/Example3/TRAJ_TiO2/TRAJ_1700K/TRAJ_O_01.h5\n",
      "  - Lattice structure   : POSCAR_TiO2\n",
      "  - t_interval          : 0.074 ps (37 frames)\n",
      "  - Temperatures (K)    : [1700.0]\n",
      "  - Num. of TRAJ files  : [1]\n",
      "============================================================\n",
      "\n",
      "================ Temperature-Dependent Data ================\n",
      "Temp (K)    D (m2/s)    D_rand (m2/s)    f       tau (ps)\n",
      "----------  ----------  ---------------  ------  ----------\n",
      "1700        4.071e-10   7.015e-10        0.5803  17.4318\n",
      "============================================================\n",
      "\n",
      "================= Final Fitted Parameters ==================\n",
      "Diffusivity (D):\n",
      "  - Ea          : - \n",
      "  - D0          : - \n",
      "  - R-squared   : -\n",
      "Random Walk Diffusivity (D_rand):\n",
      "  - Ea          : -\n",
      "  - D0          : -\n",
      "  - R-squared   : -\n",
      "Correlation Factor (f):\n",
      "  - Ea          : -\n",
      "  - f0          : -\n",
      "  - R-squared   : -\n",
      "Residence Time (tau):\n",
      "  - Ea (fixed)  : -\n",
      "  - tau0        : -\n",
      "  - R-squared   : -\n",
      "Effective Hopping Distance (a) : -\n",
      "============================================================\n"
     ]
    }
   ],
   "source": [
    "calc_single.calculate()\n",
    "calc_single.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf996be3",
   "metadata": {},
   "source": [
    "The real power of a `CalculatorSingle` object is the ability to inspect the fine-grained details of every hop. \n",
    "You can see not just **how many hops** occurred for each path type, but also the chronological sequence of those events.\n",
    "\n",
    "First, let's see the total count for each type of hopping path for each vacancy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "05582cc1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Path Counts (Vacancy 0)]\n",
      "  A1: 10\n",
      "  A2: 12\n",
      "  A3: 1\n",
      "\n"
     ]
    }
   ],
   "source": [
    "path_names = calc_single.site.path_name\n",
    "\n",
    "for i, counts in enumerate(calc_single.counts):\n",
    "    print(f\"[Path Counts (Vacancy {i})]\")\n",
    "    for name, count in zip(path_names, counts):\n",
    "        print(f\"  {name}: {int(count)}\")\n",
    "    print('')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "983644dc",
   "metadata": {},
   "source": [
    "Next, we can view the chronological sequence of these hops using the `.hopping_history` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69f509ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "for i, history in enumerate(calc_single.hopping_history):\n",
    "    print(f\"[Hopping Sequence (Vacancy {i})]\")\n",
    "    for hop in history:\n",
    "        print(f\"  Step {hop['step']:<4d}: {hop['name']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b20438e",
   "metadata": {},
   "source": [
    "```bash\n",
    "[Hopping Sequence (Vacancy 0)]\n",
    "  Step 368  : A2\n",
    "  Step 584  : A2\n",
    "  Step 797  : A1\n",
    "... (rest of sequence) ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c33c28a3",
   "metadata": {},
   "source": [
    "While `.hopping_history` gives you the raw data, it's often more useful to see a formatted summary. For this, `VacHopPy` provides the convenient `.show_hopping_history()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b9b94cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "calc_single.show_hopping_history()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fb77afd",
   "metadata": {},
   "source": [
    "```bash\n",
    "====================================================================================================================\n",
    "                                       Hopping Sequence of Vacancy 0\n",
    "====================================================================================================================\n",
    "Num     Time (ps)    Path    a (Ang)    Initial Site (Fractional Coordinate)    Final Site (Fractional Coordinate)\n",
    "-----   -----------  ------  ---------  --------------------------------------  ------------------------------------\n",
    "1       27.23        A2      2.80308    site1 [0.65249, 0.15249, 0.33333]      site1 [0.59751, 0.40249, 0.16667]\n",
    "2       43.22        A2      2.80308    site1 [0.59751, 0.40249, 0.16667]      site1 [0.65249, 0.65249, 0.33333]\n",
    "3       58.98        A1      2.56255    site1 [0.65249, 0.65249, 0.33333]      site1 [0.84751, 0.84751, 0.33333]\n",
    "... (rest of table) ...\n",
    "====================================================================================================================\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a74745c8",
   "metadata": {},
   "source": [
    "---\n",
    "## Advanced Features\n",
    "\n",
    "### Assessing Data Reliability\n",
    "\n",
    "Setting the argument `error_bar=True` will display error bars on the plots. The argument can be applied to `plot_D()`, `plot_D_rand()`, `plot_f()`, and `plot_tau()` methods. The error bars represent the **standard error of the mean (SEM)**, which allows for an assessment of the data's reliability. It is calculated as follows:\n",
    "\n",
    "$$\n",
    "SEM = {\\sigma}/{\\sqrt{n}}\n",
    "$$\n",
    "\n",
    "Here, $\\sigma$ represents the standard deviation of the physical quantity at each temperature, and $n$ is the number of data points (i.e., the number of HDF5 files used for that temperature).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61c653c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "calc_ensemble.plot_D(error_bar=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bfb0863",
   "metadata": {},
   "source": [
    "Example output:\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "  <img src=\"https://github.com/user-attachments/assets/48fa8c51-a183-4efd-b9c2-e44dad0ac483\" alt=\"Image\" width=\"400\">\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad50b088",
   "metadata": {},
   "source": [
    "---\n",
    "### Decomposing Vacancy Diffusivity\n",
    "\n",
    "You can decompose the total diffusivity ($D$) into its x, y, and z components ($D_x$, $D_y$, $D_z$) at each temperature using the `decompose_diffusivity()` method. This functionality is particularly useful for analyzing **anisotropic systems** where diffusion varies by direction, allowing you to calculate the specific diffusion barrier ($E_a$) for each axis.\n",
    "\n",
    "Note that obtaining statistically reliable directional results may require more simulation data than calculating the total diffusivity alone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "2c3ac629",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=========== Temperature-Dependent Decomposed Diffusivity ===========\n",
      "T (K)    Dx (m2/s)    Dy (m2/s)    Dz (m2/s)\n",
      "-------  -----------  -----------  -----------\n",
      "1700     4.729e-10    5.324e-10    2.475e-10\n",
      "1800     6.673e-10    7.123e-10    4.496e-10\n",
      "1900     1.144e-09    6.331e-10    6.915e-10\n",
      "2000     1.25e-09     1.497e-09    5.507e-10\n",
      "2100     2.069e-09    1.525e-09    8.47e-10\n",
      "====================================================================\n",
      "\n",
      "=================== Decomposed Diffusivity Fits ====================\n",
      "Directional Diffusivity (Dx):\n",
      "  - Ea        : 1.101 eV\n",
      "  - D0        : 8.523e-07 m^2/s\n",
      "  - R-squared : 0.9710\n",
      "Directional Diffusivity (Dy):\n",
      "  - Ea        : 0.867 eV\n",
      "  - D0        : 1.818e-07 m^2/s\n",
      "  - R-squared : 0.8022\n",
      "Directional Diffusivity (Dz):\n",
      "  - Ea        : 0.834 eV\n",
      "  - D0        : 8.645e-08 m^2/s\n",
      "  - R-squared : 0.8231\n",
      "====================================================================\n"
     ]
    }
   ],
   "source": [
    "calc_ensemble.decompose_diffusivity()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73f3911e",
   "metadata": {},
   "source": [
    "The `plot_msd_xyz()` and `plot_D_xyz()` methods can be used to obtain component-wise Mean Squared Displacement (MSD) and Arrhenius plots for diffusivity, respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05f0784a",
   "metadata": {},
   "outputs": [],
   "source": [
    "calc_ensemble.plot_msd_xyz()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66e1e85b",
   "metadata": {},
   "source": [
    "Example output:\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "  <img src=\"https://github.com/user-attachments/assets/8247872d-c6a9-498d-9a90-a89246bee2d5\" alt=\"Image\" width=\"750\">\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4387b6b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "calc_ensemble.plot_D_xyz()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d9337eb",
   "metadata": {},
   "source": [
    "Example output:\n",
    "\n",
    "<div align=\"center\">\n",
    "\n",
    "  <img src=\"https://github.com/user-attachments/assets/f2a58778-6c2a-4e76-8638-60df71b73fc2\" alt=\"Image\" width=\"750\">\n",
    "\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "vhp_ver3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}