Add Tutorial2. 2D mat #38

AsymmetryChou · 2025-09-21T07:27:34Z

add a hands-on tutorial for transport in 2D mat

Summary by CodeRabbit

New Features
- Added support to load previously saved self-energies, with automatic computation and saving when absent.
- Batch/parallel self-energy computation across k-points and energies with flexible save/load directories and formats, plus clearer logging.
- Updated runner to centralize self-energy preparation and respect saved artifacts.
- CLI/API change: introduce use_saved_se and treat self_energy_save_path as a directory for save/load.
Documentation
- New Jupyter tutorial on 2D quantum transport (end-to-end DPNEGF/DeePTB workflow).
- Added comprehensive graphene and hBN examples: band-structure configs, training templates, datasets, and pretrained checkpoints.

coderabbitai · 2025-09-21T07:27:41Z

Walkthrough

Adds tutorial and example datasets for graphene and hBN, including training inputs, k-path metadata, POSCARs, model templates, and checkpoints. Refactors self-energy handling: new save/load path logic, batch computation utilities, and NEGF orchestration via prepare_self_energy. Updates CLI arg parsing and tests to align with renamed/added parameters.

Changes

Cohort / File(s)	Summary
Documentation: Tutorial notebook `docs/hands_on/tutorial2_2d_mat.ipynb`	New end-to-end tutorial demonstrating DPNEGF/DeePTB workflow, directory checks, logging, model build attempt, and band computation/plotting.
Graphene examples: configs, data, models `examples/graphene/band.json`, `examples/graphene/extr_baseline/c_spds.json`, `examples/graphene/extr_baseline/grap_spd_model/sktb.json`, `examples/graphene/extr_baseline/input_templete.json`, `examples/graphene/train/data/POSCAR`, `examples/graphene/train/data/kpath.0/info.json`, `examples/graphene/train/input.json`	Add band-task config, basis presets, full SPD TB model params, training template, POSCAR, kpath metadata, and training config.
hBN examples: configs, data, models `examples/hBN/band.json`, `examples/hBN/extra_baseline/bn_spds.json`, `examples/hBN/extra_baseline/hbn_spd_model/sktb.json`, `examples/hBN/extra_baseline/input_templete.json`, `examples/hBN/train/data/POSCAR`, `examples/hBN/train/data/kpath.0/info.json`, `examples/hBN/train/input.json`	Add band-task config, basis presets for B/N, full SPD TB model params, training template, POSCAR, kpath metadata, and training config.
Example checkpoints `examples/graphene/train/train_out/checkpoint/nnsk.best.pth`, `examples/hBN/train/train_out/checkpoint/nnsk.best.pth`, `examples/hBN/train/train_out/checkpoint/nnsk.latest.pth`	Add model checkpoint artifacts (epoch/iter references).
NEGF self-energy refactor `dpnegf/negf/lead_property.py`, `dpnegf/runner/NEGF.py`	Overhaul self-energy save/load paths, add loaders, default pathing, batch computation (compute_all_self_energy, worker), and centralize NEGF self-energy prep via prepare_self_energy. Rename flag to use_saved_se.
CLI/args `dpnegf/utils/argcheck.py`	Replace self_energy_save with use_saved_se; clarify self_energy_save_path as directory for save/load; adjust ordering/docs.
Tests `dpnegf/tests/test_negf_negf_hamiltonian_init.py`	Update calls to self_energy to remove deprecated save parameter.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant NEGF
  participant FS as FileSystem
  participant Lead as LeadProperty
  participant Batch as compute_all_self_energy

  User->>NEGF: negf_compute(...)
  NEGF->>NEGF: prepare_self_energy(scf_require)

  alt use_saved_se == True
    NEGF->>FS: check _has_saved_self_energy(self_energy_save_path)
    FS-->>NEGF: exists / not exists
    opt not exists
      NEGF-->>User: assert fail (missing saved SE)
    end
  else use_saved_se == False
    NEGF->>FS: ensure save dir exists
    alt scf_require == True
      NEGF->>Batch: compute_all_self_energy(eta_grid_all)
      Batch->>Lead: self_energy_worker(k,e,...,save_path)
      Lead->>FS: save temp .h5
      Batch->>FS: merge temps → final .h5
    else scf_require == False
      NEGF->>Batch: compute_all_self_energy(eta_grid_dirichlet)
      Batch->>Lead: self_energy_worker(...)
      Lead->>FS: save temp .h5
      Batch->>FS: merge temps → final .h5
    end
  end

  NEGF-->>User: proceed with NEGF solve (SE ready)

sequenceDiagram
  autonumber
  participant Caller
  participant Lead as LeadProperty.self_energy
  participant FS as FileSystem

  Caller->>Lead: self_energy(energy,kpoint,eta,save_path,save_format)
  Lead->>Lead: _get_save_path(k,e,format,save_path)
  alt save_path points to file and exists
    Lead->>Lead: _load_self_energy(path,k,e,format)
    Lead-->>Caller: return Σ from disk
  else
    Lead->>Lead: compute Σ(energy,k)
    Lead->>FS: save Σ to resolved path (h5|pth)
    Lead-->>Caller: return Σ
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I nibbled on k-points, crunchy and fine,
Packed bands in a basket, Γ—K—M—Γ line.
Saved my self-energy, neat as can be,
Load if it’s there, compute if it’s free.
Graphene and hBN hop in delight—
Carrots compiled, plots in the night. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Add Tutorial2. 2D mat" accurately reflects the primary change in the PR — addition of a tutorial for 2D materials and associated example assets (notebook, example configs, and data) — so it is directly related to the changeset. It is concise and highlights the main intent, but the punctuation/spacing ("Tutorial2. 2D mat") is slightly awkward and could be clearer. Overall the title conveys the main change sufficiently for a reviewer scanning history.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (13)

examples/hBN/extra_baseline/bn_spds.json (1)

1-16: LGTM on B/N basis; align directory naming with graphene

Basis arrays look fine. Also consider aligning directory naming with graphene after fixing its typo (“extra_baseline” everywhere).

examples/graphene/band.json (1)

6-13: nkpoints duplication

nkpoints (151) is derivable from kpath (50+50+50+1). If your loader recomputes it, drop the field to avoid drift; else keep and ensure validation checks consistency.
examples/graphene/graphene.vasp (1)
1-1: Optional: rename title to “graphene”

Minor clarity tweak.
-graphite                                
+graphene                                
examples/hBN/extra_baseline/input_templete.json (2)
65-65: Typo: Fix "prexfix" to "prefix" in dataset prefix fields

There's a consistent typo in the dataset prefix placeholder text.

Apply this diff to fix the typo:
-            "prefix": "prexfix_for_dataset",
+            "prefix": "prefix_for_dataset",
Also applies to: 73-73, 80-80, 87-87

93-93: Add trailing newline to JSON file

JSON files should end with a newline character for better compatibility with version control and text editors.

Add a newline character at the end of the file.
examples/graphene/extr_baseline/input_templete.json (2)
65-65: Typo: Fix "prexfix" to "prefix" in dataset prefix fields

Same typo as in the hBN template file.

Apply this diff to fix the typo:
-            "prefix": "prexfix_for_dataset",
+            "prefix": "prefix_for_dataset",
Also applies to: 72-72, 79-79

85-85: Add trailing newline to JSON file

JSON files should end with a newline character for better compatibility with version control and text editors.

Add a newline character at the end of the file.
examples/graphene/extr_baseline/grap_spd_model/sktb.json (2)
32-32: Device mismatch for example configuration

The model is configured to use "device": "cuda" which requires GPU availability. For example/tutorial files, it's typically better to default to CPU to ensure broader compatibility.

Consider changing the default device to CPU for better accessibility:
-        "device": "cuda",
+        "device": "cpu",
Or add a comment explaining GPU requirements if CUDA is intentional for this baseline model.

212-212: Add trailing newline to JSON file

JSON files should end with a newline character.

Add a newline character at the end of the file.
examples/hBN/extra_baseline/hbn_spd_model/sktb.json (2)
40-40: Device mismatch for example configuration

Same issue as the graphene model - uses CUDA by default which may not be available for all users.

Consider changing the default device to CPU for better accessibility:
-        "device": "cuda",
+        "device": "cpu",
477-477: Add trailing newline to JSON file

JSON files should end with a newline character.

Add a newline character at the end of the file.
examples/hBN/train/input.json (1)

79-79: Add trailing newline to JSON file

JSON files should end with a newline character for better compatibility.

Add a newline character at the end of the file.

examples/graphene/train/input.json (1)

71-71: Add trailing newline to JSON file

JSON files should end with a newline character for better compatibility.

Add a newline character at the end of the file.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c58a1a0 and 52066fb.

⛔ Files ignored due to path filters (2)

examples/graphene/train/data/kpath.0/band_structure.png is excluded by !**/*.png
examples/hBN/train/data/kpath.0/band_structure.png is excluded by !**/*.png

📒 Files selected for processing (20)

docs/hands_on/tutorial2_2d_mat.ipynb (1 hunks)
examples/graphene/band.json (1 hunks)
examples/graphene/extr_baseline/c_spds.json (1 hunks)
examples/graphene/extr_baseline/grap_spd_model/sktb.json (1 hunks)
examples/graphene/extr_baseline/input_templete.json (1 hunks)
examples/graphene/graphene.vasp (1 hunks)
examples/graphene/train/data/POSCAR (1 hunks)
examples/graphene/train/data/kpath.0/info.json (1 hunks)
examples/graphene/train/input.json (1 hunks)
examples/graphene/train/train_out/checkpoint/nnsk.best.pth (1 hunks)
examples/graphene/train/train_out/checkpoint/nnsk.latest.pth (1 hunks)
examples/hBN/band.json (1 hunks)
examples/hBN/extra_baseline/bn_spds.json (1 hunks)
examples/hBN/extra_baseline/hbn_spd_model/sktb.json (1 hunks)
examples/hBN/extra_baseline/input_templete.json (1 hunks)
examples/hBN/train/data/POSCAR (1 hunks)
examples/hBN/train/data/kpath.0/info.json (1 hunks)
examples/hBN/train/input.json (1 hunks)
examples/hBN/train/train_out/checkpoint/nnsk.best.pth (1 hunks)
examples/hBN/train/train_out/checkpoint/nnsk.latest.pth (1 hunks)

🧰 Additional context used

🪛 Ruff (0.13.1)

docs/hands_on/tutorial2_2d_mat.ipynb

10-10: Avoid specifying long messages outside the exception class

(TRY003)

31-31: Undefined name model_json

(F821)

32-32: Undefined name model_json

(F821)

59-59: Undefined name uni_cell_atoms

(F821)

🔇 Additional comments (9)

examples/hBN/train/train_out/checkpoint/nnsk.latest.pth (1)
1-1: Avoid committing training outputs; track checkpoints with Git LFS or publish as release assets.

Remove examples/hBN/train/train_out/checkpoint/nnsk.latest.pth from Git; track *.pth with LFS and ignore train_out/.

.gitattributes:
*.pth filter=lfs diff=lfs merge=lfs -text
.gitignore:
examples/**/train_out/
Keep a “latest” indirection as a tiny relative .txt next to the checkpoint if needed; publish full checkpoints as release assets or provide a download/fetch script.

Verify where this file is loaded and switch consumers to a pointer‑aware loader (or adapt loading code).

Verification note: the prior ripgrep run returned "No files were searched". Re-run locally and paste the output:
#!/bin/bash
set -euo pipefail
git ls-files --error-unmatch "examples/hBN/train/train_out/checkpoint/nnsk.latest.pth" >/dev/null 2>&1 && echo "TRACKED" || echo "NOT_TRACKED"
rg -n -C2 --hidden --no-ignore 'nnsk\.latest\.pth|torch\.load\(|load_state_dict\(' || true
examples/graphene/extr_baseline/c_spds.json (1)

2-10: LGTM on basis

"C": ["2s","2p","d*"] matches the hBN style. Just confirm “d*” is recognized by your parser.

examples/graphene/train/data/POSCAR (1)

1-10: Format check

Valid POSCAR. Coordinates are Cartesian at z=15 with c=30 (z=0.5). Ensure downstream readers expect Cartesian (not Direct).

examples/graphene/train/data/kpath.0/info.json (1)

1-12: Band window sanity

band_min/max (0–8) should cover the number of bands produced by your chosen basis. With SPD-like configs this may be low; please confirm.

examples/hBN/train/data/kpath.0/info.json (1)

1-12: Mirror graphene check

Same note as graphene: verify band_min/max span all bands for the used basis.
examples/graphene/band.json (2)
13-16: Label glyph

Some tools expect "Γ" instead of "G". Confirm accepted labels.

2-2: Make structure path robust

Relative path "train/data/POSCAR" can resolve incorrectly if CWD ≠ file dir. Prefer "./train/data/POSCAR".

Apply:
-    "structure":"train/data/POSCAR",
+    "structure":"./train/data/POSCAR",
Likely an incorrect or invalid review comment.
examples/hBN/band.json (1)

16-16: Valence counts look correct

nel_atom {"B":3,"N":5} matches expectations.

examples/hBN/train/data/POSCAR (1)

1-11: LGTM! Valid POSCAR structure file

The POSCAR file correctly defines a hexagonal boron nitride structure with appropriate lattice vectors and atomic positions in Cartesian coordinates. The 30 Å vacuum spacing in the z-direction is suitable for 2D material calculations.

coderabbitai · 2025-09-21T08:04:38Z

docs/hands_on/tutorial2_2d_mat.ipynb

+    "dptb esk bn_spds.json -o hbn_spd_model\n",
+    "dptb config -m hbn_spd_model/sktb.json -tr -sk ./\n",
+    "\n",
+    "\"onsite\": {\n",
+    "    \"method\": \"strain\"\n",
+    "}\n",
+    "\n",
+    "dptb train input.json -i ../extra_baseline/hbn_spd_model/sktb.json -o train_out\n",
+    "dptb run band.json -i train/train_out/checkpoint/nnsk.best.pth -o band_plot"


⚠️ Potential issue

Clean up shell commands in code cell

The code cell contains shell commands mixed with Python configuration snippet. These should either be in a markdown cell as documentation or properly formatted as shell commands using ! prefix or %%bash magic.

Convert this to a markdown cell with proper code formatting:

## Setup Commands Run these commands to set up the hBN model: ```bash dptb esk bn_spds.json -o hbn_spd_model dptb config -m hbn_spd_model/sktb.json -tr -sk ./

For the configuration, modify the onsite method:

"onsite": { "method": "strain" }

Then train and run:

dptb train input.json -i ../extra_baseline/hbn_spd_model/sktb.json -o train_out dptb run band.json -i train/train_out/checkpoint/nnsk.best.pth -o band_plot

<details> <summary>🤖 Prompt for AI Agents</summary>

In docs/hands_on/tutorial2_2d_mat.ipynb around lines 22–30, the notebook cell
mixes shell commands with a JSON snippet; replace that cell with a single
Markdown cell titled "Setup Commands" that contains three fenced code blocks: a
bash block listing the two dptb setup/config commands, a json block showing the
onsite configuration with "method": "strain", and a final bash block with the
dptb train and dptb run commands; ensure proper fences (bash and json) and
brief explanatory text before each block.

</details>  

coderabbitai · 2025-09-21T08:04:38Z

docs/hands_on/tutorial2_2d_mat.ipynb

+    "results_path = '../band_plot'\n",
+    "log_path = os.path.join(results_path, 'log')\n",


🛠️ Refactor suggestion

Ensure results directory exists before logging

The code sets up logging to a directory that may not exist yet.

Create the results directory before setting up logging:

results_path = '../band_plot' +os.makedirs(results_path, exist_ok=True) log_path = os.path.join(results_path, 'log')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"results_path = '../band_plot'\n",

"log_path = os.path.join(results_path, 'log')\n",

results_path = '../band_plot'

os.makedirs(results_path, exist_ok=True)

log_path = os.path.join(results_path, 'log')

🤖 Prompt for AI Agents

In docs/hands_on/tutorial2_2d_mat.ipynb around lines 84-85, the code sets results_path and log_path but doesn't ensure the results/log directories exist before configuring logging; create the results directory (and the nested log directory) before any logging calls by calling os.makedirs(results_path, exist_ok=True) and os.makedirs(log_path, exist_ok=True) (or a single os.makedirs(log_path, exist_ok=True)) so the logging target path exists.

coderabbitai · 2025-09-21T08:04:38Z

docs/hands_on/tutorial2_2d_mat.ipynb

+   "id": "5c4797d7",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "KeyError",
+     "evalue": "'config'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[3], line 5\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mdptb\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnn\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbuild\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m build_model\n\u001b[1;32m      3\u001b[0m model \u001b[38;5;241m=\u001b[39m  \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlatest_dptb_b3.300_c2.600_w0.200.pth\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m# the model for demonstration\u001b[39;00m\n\u001b[0;32m----> 5\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[43mbuild_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/opt/mamba/envs/dpnegf-dev/lib/python3.10/site-packages/dptb/nn/build.py:50\u001b[0m, in \u001b[0;36mbuild_model\u001b[0;34m(checkpoint, model_options, common_options, no_check)\u001b[0m\n\u001b[1;32m     48\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m     49\u001b[0m     f \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mload(checkpoint, map_location\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m\"\u001b[39m, weights_only\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m---> 50\u001b[0m     ckptconfig \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mconfig\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[1;32m     51\u001b[0m     \u001b[38;5;28;01mdel\u001b[39;00m f\n\u001b[1;32m     53\u001b[0m \u001b[38;5;66;03m# init model from checkpoint\u001b[39;00m\n",
+      "\u001b[0;31mKeyError\u001b[0m: 'config'"
+     ]
+    }
+   ],
+   "source": [
+    "from dptb.nn.build import build_model\n",
+    "\n",
+    "model =  \"latest_dptb_b3.300_c2.600_w0.200.pth\" # the model for demonstration\n",
+    "\n",
+    "model = build_model(model,\n",
+    "                    model_options= model_json['model_options'],\n",
+    "                    common_options=model_json['common_options'])"
+   ]


⚠️ Potential issue

Fix undefined variables in model building code

The code references undefined variables model_json which will cause a NameError. The model loading also appears to fail with a KeyError for 'config', suggesting the checkpoint format might be incompatible.

Fix the undefined variables and handle the missing config gracefully:

from dptb.nn.build import build_model +import json model = "latest_dptb_b3.300_c2.600_w0.200.pth" # the model for demonstration +# Load model configuration from a JSON file +with open('band.json', 'r') as f: + model_json = json.load(f) + +# Build model with proper error handling +try: + model = build_model(model, + model_options=model_json.get('model_options', {}), + common_options=model_json.get('common_options', {})) +except KeyError as e: + print(f"Model checkpoint may be missing configuration. Error: {e}") + # Alternative: load model directly if it's a pre-configured checkpoint + import torch + model = torch.load(model, map_location="cpu", weights_only=False) -model = build_model(model, - model_options= model_json['model_options'], - common_options=model_json['common_options'])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"id": "5c4797d7",

"metadata": {},

"outputs": [

{

"ename": "KeyError",

"evalue": "'config'",

"output_type": "error",

"traceback": [

"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",

"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",

"Cell \u001b[0;32mIn[3], line 5\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mdptb\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnn\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbuild\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m build_model\n\u001b[1;32m 3\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlatest_dptb_b3.300_c2.600_w0.200.pth\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m# the model for demonstration\u001b[39;00m\n\u001b[0;32m----> 5\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[43mbuild_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m)\u001b[49m\n",

"File \u001b[0;32m/opt/mamba/envs/dpnegf-dev/lib/python3.10/site-packages/dptb/nn/build.py:50\u001b[0m, in \u001b[0;36mbuild_model\u001b[0;34m(checkpoint, model_options, common_options, no_check)\u001b[0m\n\u001b[1;32m 48\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 49\u001b[0m f \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mload(checkpoint, map_location\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m\"\u001b[39m, weights_only\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m---> 50\u001b[0m ckptconfig \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mconfig\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[1;32m 51\u001b[0m \u001b[38;5;28;01mdel\u001b[39;00m f\n\u001b[1;32m 53\u001b[0m \u001b[38;5;66;03m# init model from checkpoint\u001b[39;00m\n",

"\u001b[0;31mKeyError\u001b[0m: 'config'"

]

}

],

"source": [

"from dptb.nn.build import build_model\n",

"\n",

"model = \"latest_dptb_b3.300_c2.600_w0.200.pth\" # the model for demonstration\n",

"\n",

"model = build_model(model,\n",

" model_options= model_json['model_options'],\n",

" common_options=model_json['common_options'])"

]

from dptb.nn.build import build_model

import json

model = "latest_dptb_b3.300_c2.600_w0.200.pth" # the model for demonstration

# Load model configuration from a JSON file

with open('band.json', 'r') as f:

model_json = json.load(f)

# Build model with proper error handling

try:

model = build_model(model,

model_options=model_json.get('model_options', {}),

common_options=model_json.get('common_options', {}))

except KeyError as e:

print(f"Model checkpoint may be missing configuration. Error: {e}")

# Alternative: load model directly if it's a pre-configured checkpoint

import torch

model = torch.load(model, map_location="cpu", weights_only=False)

🤖 Prompt for AI Agents

In docs/hands_on/tutorial2_2d_mat.ipynb around lines 93 to 117, the snippet uses an undefined model_json and the checkpoint load fails with KeyError 'config'; define or load model_json before calling build_model (e.g., read a JSON/dict with 'model_options' and 'common_options' or construct defaults) and change the build step to handle checkpoints without a 'config' key by catching KeyError when accessing ckpt['config'] and falling back to sensible defaults or passing explicit model_options/common_options; ensure the checkpoint path is correct and use torch.load(map_location='cpu') to inspect its keys before calling build_model so you can adapt the call to the actual checkpoint format.

coderabbitai · 2025-09-21T08:04:39Z

docs/hands_on/tutorial2_2d_mat.ipynb

+   "source": [
+    "from dptb.postprocess.bandstructure.band import Band\n",
+    "import shutil\n",
+    "\n",
+    "task_options = {\n",
+    "        \"task\": \"band\",\n",
+    "        \"kline_type\":\"abacus\",\n",
+    "        \"kpath\":[\n",
+    "                [0, 0, 0, 50],\n",
+    "                [0.5, 0, 0, 50],\n",
+    "                [0.3333333, 0.3333333, 0, 50],\n",
+    "                [0, 0, 0, 1]\n",
+    "                ],\n",
+    "        \"klabels\":[\"G\", \"M\", \"K\", \"G\"],\n",
+    "        \"emin\":-20,\n",
+    "        \"emax\": 20,\n",
+    "        \"nel_atom\":{\"C\": 4} \n",
+    "\n",
+    "       }\n",
+    "\n",
+    "if os.path.isdir(results_path):\n",
+    "    shutil.rmtree(results_path, ignore_errors=True) \n",
+    "\n",
+    "band = Band(model, results_path)\n",
+    "AtomicData_options = {\"r_max\": 3.0, \"pbc\": True}\n",
+    "band.get_bands(data = uni_cell_atoms, \n",
+    "               kpath_kwargs = task_options,\n",
+    "               AtomicData_options = AtomicData_options)\n",
+    "band.band_plot(emin = task_options['emin'],\n",
+    "               emax = task_options['emax'])"
+   ]


⚠️ Potential issue

Fix undefined variable uni_cell_atoms

The code references an undefined variable uni_cell_atoms which needs to be defined before use.

Add the missing structure definition before using it:

from dptb.postprocess.bandstructure.band import Band import shutil +from ase.io import read + +# Load the structure from POSCAR or graphene.vasp +uni_cell_atoms = read("graphene.vasp") # or appropriate structure file task_options = {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"source": [

"from dptb.postprocess.bandstructure.band import Band\n",

"import shutil\n",

"\n",

"task_options = {\n",

" \"task\": \"band\",\n",

" \"kline_type\":\"abacus\",\n",

" \"kpath\":[\n",

" [0, 0, 0, 50],\n",

" [0.5, 0, 0, 50],\n",

" [0.3333333, 0.3333333, 0, 50],\n",

" [0, 0, 0, 1]\n",

" ],\n",

" \"klabels\":[\"G\", \"M\", \"K\", \"G\"],\n",

" \"emin\":-20,\n",

" \"emax\": 20,\n",

" \"nel_atom\":{\"C\": 4} \n",

"\n",

" }\n",

"\n",

"if os.path.isdir(results_path):\n",

" shutil.rmtree(results_path, ignore_errors=True) \n",

"\n",

"band = Band(model, results_path)\n",

"AtomicData_options = {\"r_max\": 3.0, \"pbc\": True}\n",

"band.get_bands(data = uni_cell_atoms, \n",

" kpath_kwargs = task_options,\n",

" AtomicData_options = AtomicData_options)\n",

"band.band_plot(emin = task_options['emin'],\n",

" emax = task_options['emax'])"

]

from dptb.postprocess.bandstructure.band import Band

import shutil

from ase.io import read

# Load the structure from POSCAR or graphene.vasp

uni_cell_atoms = read("graphene.vasp") # or appropriate structure file

task_options = {

"task": "band",

"kline_type":"abacus",

"kpath":[

[0, 0, 0, 50],

[0.5, 0, 0, 50],

[0.3333333, 0.3333333, 0, 50],

[0, 0, 0, 1]

],

"klabels":["G", "M", "K", "G"],

"emin":-20,

"emax": 20,

"nel_atom":{"C": 4}

}

if os.path.isdir(results_path):

shutil.rmtree(results_path, ignore_errors=True)

band = Band(model, results_path)

AtomicData_options = {"r_max": 3.0, "pbc": True}

band.get_bands(data = uni_cell_atoms,

kpath_kwargs = task_options,

AtomicData_options = AtomicData_options)

band.band_plot(emin = task_options['emin'],

emax = task_options['emax'])

coderabbitai · 2025-09-21T08:04:39Z

examples/graphene/extr_baseline/c_spds.json

+{
+    "common_options": {
+        "basis": {
+                "C": [                
+                    "2s",
+                    "2p",
+                    "d*"
+                    ]
+        }
+    }
+}


⚠️ Potential issue

🧩 Analysis chain

Fix directory name: extr_baseline vs extra_baseline

Graphene uses "extr_baseline" while hBN uses "extra_baseline". This inconsistency will break scripts relying on a common pattern. Rename the directory or align all references.

Run to spot all occurrences:

🏁 Script executed:

#!/bin/bash rg -n 'extra_baseline|extr_baseline' -C2

Length of output: 396

Fix directory name: extr_baseline vs extra_baseline

PR adds examples/graphene/extr_baseline (e.g., examples/graphene/extr_baseline/c_spds.json) while docs reference ../extra_baseline/hbn_spd_model/sktb.json (docs/hands_on/tutorial2_2d_mat.ipynb:29). Align the directory name or update all references to avoid breaking scripts.

🤖 Prompt for AI Agents

In examples/graphene/extr_baseline/c_spds.json (lines 1-11) the new directory is named "extr_baseline" but docs reference "extra_baseline" (e.g., docs/hands_on/tutorial2_2d_mat.ipynb line ~29); fix by making the paths consistent across the repo: either rename the examples/graphene/extr_baseline directory to examples/graphene/extra_baseline, or update all documentation and scripts to use "extr_baseline". Search the repo for both "extra_baseline" and "extr_baseline" and apply the chosen name consistently, updating notebook references (docs/hands_on/tutorial2_2d_mat.ipynb) and any CI/scripts that consume these example paths.

coderabbitai · 2025-09-21T08:04:39Z

examples/graphene/train/train_out/checkpoint/nnsk.best.pth

@@ -0,0 +1 @@
+/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth


⚠️ Potential issue

🧩 Analysis chain

Absolute path leaked; make symlink target relative (portability + repo hygiene).

This file looks like a Git symlink whose target was captured as an absolute, machine‑local path. It will break for everyone else and in CI, and it leaks a personal filesystem path.

Fix by using a relative target within the same directory:

-/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth +nnsk.ep5000.pth

Also consider not committing large checkpoints; prefer Git LFS or a download script referenced by the tutorial. Ensure the notebook handles “checkpoint missing” gracefully.

Run to verify symlink correctness and to find any other absolute targets:

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Confirm it's a symlink in Git (expects mode 120000) git ls-files -s examples/graphene/train/train_out/checkpoint/nnsk.best.pth # 2) Show target and flag if absolute echo "Target:"; cat examples/graphene/train/train_out/checkpoint/nnsk.best.pth if grep -qE '^/' examples/graphene/train/train_out/checkpoint/nnsk.best.pth; then echo "ERROR: Absolute path detected." fi # 3) Check if the relative target exists (from the symlink's dir) pushd examples/graphene/train/train_out/checkpoint >/dev/null tgt="$(cat nnsk.best.pth)" if [[ -e "$tgt" ]]; then ls -lh "$tgt" else echo "WARNING: Target '$tgt' not found in repo." fi popd >/dev/null # 4) Scan for other absolute-path symlink pointers in example checkpoints rg -n '^/' -g 'examples/**/checkpoint/*.pth' -S || true

Length of output: 207

🏁 Script executed:

#!/bin/bash set -euo pipefail path="examples/graphene/train/train_out/checkpoint/nnsk.best.pth" echo "1) git index entry for $path" git ls-files -s -- "$path" || true echo echo "2) OID -> blob (symlink target stored in Git)" oid=$(git ls-files -s -- "$path" | awk '{print $2}' || true) if [[ -z "$oid" ]]; then echo "No OID found for $path" else echo "OID: $oid" echo "---- git cat-file -p $oid ----" git cat-file -p "$oid" || true echo "-----------------------------" fi echo echo "3) Show file from HEAD (if present)" git show HEAD:"$path" || true echo echo "4) Worktree symlink info (if file exists in checkout)" if [[ -e "$path" || -L "$path" ]]; then echo "Worktree entry:" ls -l "$path" || true if [[ -L "$path" ]]; then echo "readlink -> $(readlink -n "$path" || true)" fi else echo "No worktree file at $path" fi echo echo "5) Repo-wide: list all git-tracked symlinks and flag absolute targets" git ls-files -s | awk '$1=="120000" {print $2 "|" $4}' | while IFS='|' read -r sha p; do tgt=$(git cat-file -p "$sha" 2>/dev/null || true) if [[ -z "$tgt" ]]; then echo "SYMLINK: $p -> (empty)" else if [[ "$tgt" == /* ]]; then echo "SYMLINK ABS: $p -> $tgt" else echo "SYMLINK: $p -> $tgt" fi fi done echo echo "Done."

Length of output: 1725

Replace absolute symlink targets (leaks + CI breakage).

Git-tracked symlinks contain machine-local absolute paths; update them to relative targets or remove them and provide checkpoints via Git LFS / a download script. Affected entries (from git blob):

examples/graphene/train/train_out/checkpoint/nnsk.best.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth

examples/graphene/train/train_out/checkpoint/nnsk.latest.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth

examples/hBN/train/train_out/checkpoint/nnsk.best.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth

examples/hBN/train/train_out/checkpoint/nnsk.latest.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth

Immediate fixes:

Replace each symlink blob with a relative target (e.g., nnsk.ep5000.pth / nnsk.iter5000.pth) or remove the symlink and commit a small placeholder.

Do not commit large checkpoint files; move them to Git LFS or provide a documented download script and update notebooks to handle missing checkpoints gracefully.

Re-run the verification script to ensure no other git-tracked symlinks use absolute paths.

🤖 Prompt for AI Agents

examples/graphene/train/train_out/checkpoint/nnsk.best.pth (and related entries: examples/graphene/train/train_out/checkpoint/nnsk.latest.pth, examples/hBN/train/train_out/checkpoint/nnsk.best.pth, examples/hBN/train/train_out/checkpoint/nnsk.latest.pth) around line 1: the repo currently contains git-tracked symlinks with absolute, machine-local targets; replace each symlink blob with a relative symlink target (e.g., point nnsk.best.pth -> nnsk.ep5000.pth and nnsk.latest.pth -> nnsk.iter5000.pth) or remove the symlink and commit a small placeholder file, move any actual large checkpoint binaries out of git (use Git LFS or host a download and add a script and README), update notebooks/code to handle missing checkpoints gracefully, and re-run the verification script to ensure no remaining git-tracked symlinks use absolute paths.

coderabbitai · 2025-09-21T08:04:39Z

examples/graphene/train/train_out/checkpoint/nnsk.latest.pth

@@ -0,0 +1 @@
+/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth


⚠️ Potential issue

“.pth” points to text, not a real checkpoint; likely to break loaders.

If any code calls torch.load(".../nnsk.latest.pth"), this text file will raise an error. If you intend a pointer file, rename to .txt (or .json) and adjust tutorial code to read it; otherwise replace this file with the actual checkpoint (tracked via Git‑LFS).

I can propose a tiny helper that resolves <name>.latest.{txt,json} to a checkpoint path and falls back to the file itself if it’s a real .pth.

🤖 Prompt for AI Agents

In examples/graphene/train/train_out/checkpoint/nnsk.latest.pth (line 1), the file is a text pointer but has a .pth extension which will cause torch.load(...) to error; either replace this with the real checkpoint file (tracked via Git-LFS) or rename the pointer to .txt or .json and update tutorial code to read the pointer; implement a small helper that given "<name>.latest" resolves "<name>.latest.txt" or ".json" to the actual checkpoint path and otherwise returns the .pth file path, then update any loader calls in the tutorial to use that helper so torch.load receives a real .pth file path.

⚠️ Potential issue

Remove absolute personal path; use a portable reference.

Committing an absolute path from a personal filesystem is non-portable and leaks environment/PII. This will break for all users and CI.

Apply this minimal fix to make it relative within the checkpoint directory:

-/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth +nnsk.iter5000.pth

Optionally, make nnsk.latest.pth a symlink or an actual copy of the latest checkpoint instead of a pointer file, to match common tooling expectations.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In examples/graphene/train/train_out/checkpoint/nnsk.latest.pth around lines 1 to 1, the file currently contains an absolute personal filesystem path which is non-portable and exposes PII; replace that absolute path with a relative reference inside the checkpoint directory (e.g., the filename of the latest checkpoint relative to the checkpoint folder) or make nnsk.latest.pth a symlink or actual copy of the latest checkpoint file so it works across environments; update the repo to commit the relative/symlinked file instead of the absolute path (and remove the absolute path from history if necessary) to prevent future leakage.

⚠️ Potential issue

🧩 Analysis chain

Ignore or LFS large model artifacts.

Checkpoints shouldn’t live in Git history. Add Git‑LFS or .gitignore rules (e.g., **/checkpoint/*.pth, **/train_out/**) unless you intend to distribute small demo weights.

Run this to spot absolute paths and committed large artifacts:

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Absolute paths embedded:" rg -nP '(^/|[A-Za-z]:\\\\)' -g '!**/.git/**' -S echo -e "\nPotential checkpoint files and sizes:" fd -t f -e pth | xargs -I{} sh -c 'printf "%s\t" "{}"; { wc -c < "{}" 2>/dev/null || echo "?"; }' | awk '{sz=$2/1024/1024; printf "%s\t%.2f MB\n",$1,sz}' echo -e "\nLatest .pth consumers (torch.load):" rg -nP 'torch\.load\([^)]*latest\.pth' -S || true

Length of output: 173

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo -e "\nTop-level entries:" ls -la | sed -n '1,200p' || true echo -e "\nGit repo check:" if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then echo "Inside git repo. Branch: $(git branch --show-current 2>/dev/null || echo '-')" else echo "Not inside a git repo." fi echo -e "\nFinding .pth files (case-insensitive):" mapfile -t PTHS < <(find . -type f -iname '*.pth' -print) if [ "${#PTHS[@]}" -eq 0 ]; then echo "No .pth files found." else for f in "${PTHS[@]}"; do sz=$(wc -c < "$f" 2>/dev/null || echo 0) printf "%s\t%.2f MB\n" "$f" "$(awk "BEGIN{printf $sz/1024/1024}")" done fi echo -e "\nTop large files (>1MB):" find . -type f -size +1M -printf "%p\t%k KB\n" | sort -k2 -n | tail -n 50 || true echo -e "\nSearching for references to 'latest.pth':" rg -n 'latest\.pth' -S . || echo "No references to latest.pth found." echo -e "\nSearching for torch.load(...) with .pth:" rg -nP "torch\.load$[^)]*\.pth" -S . || echo "No torch.load(...) referencing .pth found." echo -e "\nSearching for absolute path literals:" rg -nP '(^/|[A-Za-z]:\\\$' -S . || echo "No absolute path literals found via rg." echo -e "\nCommitted .pth files (git ls-files):" if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then git ls-files | rg '\.pth$' -n || echo "No .pth files tracked in git index." else echo "Not a git repo - cannot list tracked files." fi echo -e "\nContents of .gitignore (first 200 lines):" if [ -f .gitignore ]; then sed -n '1,200p' .gitignore; else echo ".gitignore not present"; fi echo -e "\nContents of .gitattributes (first 200 lines):" if [ -f .gitattributes ]; then sed -n '1,200p' .gitattributes; else echo ".gitattributes not present"; fi echo -e "\nCheckpoint directories and sizes:" find . -type d -iname 'checkpoint' -print0 | xargs -0 -I{} sh -c 'printf "%s\t" "{}"; du -sh "{}" 2>/dev/null || echo "?"' || true

Length of output: 1062

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo echo "Top-level entries:" ls -la | sed -n '1,200p' || true echo echo "Git repo check:" if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then echo "Inside git repo. Branch: $(git branch --show-current 2>/dev/null || echo '-')" else echo "Not inside a git repo." fi echo echo "Check specific path mentioned in review:" target="examples/graphene/train/train_out/checkpoint/nnsk.latest.pth" if [ -f "$target" ]; then if stat -c%s "$target" >/dev/null 2>&1; then bytes=$(stat -c%s "$target"); else bytes=$(wc -c < "$target"); fi printf "%s\t%.2f MB\n" "$target" "$(awk "BEGIN{printf $bytes/1024/1024}")" else echo "$target not found" fi echo echo "Find any .pth files (case-insensitive):" found=0 find . -type f -iname '*.pth' -print0 | while IFS= read -r -d '' f; do found=1 if stat -c%s "$f" >/dev/null 2>&1; then bytes=$(stat -c%s "$f"); else bytes=$(wc -c < "$f"); fi printf "%s\t%.2f MB\n" "$f" "$(awk "BEGIN{printf $bytes/1024/1024}")" done # indicate if none found (the while runs in a subshell, so check via another find) if [ -z "$(find . -type f -iname '*.pth' -print -quit)" ]; then echo "No .pth files found." fi echo echo "Top large files (>1MB):" find . -type f -size +1M -printf "%p\t%k KB\n" 2>/dev/null | sort -k2 -n | tail -n 50 || true echo echo "Searching for references to 'latest.pth' (including hidden/ignored files):" rg -n --hidden -uu 'latest\.pth' -S || echo "No references to latest.pth found." echo echo "Searching for torch.load(...) referencing .pth files:" rg -n --hidden -uu -P "torch\.load$[^)]*\.pth" -S || echo "No torch.load(...) referencing .pth found." echo echo "Searching for absolute path literals:" rg -n --hidden -uu -P '(^/|[A-Za-z]:\\\$' -S || echo "No absolute path literals found via rg." echo echo "Git-tracked .pth files (git ls-files):" if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then git ls-files | rg '\.pth$' -n || echo "No .pth files tracked in git index." else echo "Not a git repo - cannot list tracked files." fi echo echo "Contents of .gitignore (first 200 lines):" if [ -f .gitignore ]; then sed -n '1,200p' .gitignore; else echo ".gitignore not present"; fi echo echo "Contents of .gitattributes (first 200 lines):" if [ -f .gitattributes ]; then sed -n '1,200p' .gitattributes; else echo ".gitattributes not present"; fi echo echo "Checkpoint directories and sizes:" find . -type d -iname 'checkpoint' -print0 | while IFS= read -r -d '' d; do du -sh "$d" 2>/dev/null || echo "Could not get size for $d" done || true echo echo "Done."

Length of output: 8543

Ignore or track model checkpoints with Git‑LFS; remove committed .pth files from Git history.

Multiple .pth checkpoint files are currently tracked — remove them from history or migrate to Git‑LFS and add ignore rules unless you intentionally ship small demo weights.

Tracked files discovered:

dpnegf/tests/data/test_get_fermi/nnsk.best.pth

examples/graphene/train/train_out/checkpoint/nnsk.best.pth

examples/graphene/train/train_out/checkpoint/nnsk.ep4997.pth

examples/graphene/train/train_out/checkpoint/nnsk.ep4998.pth

examples/graphene/train/train_out/checkpoint/nnsk.ep4999.pth

examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth

examples/graphene/train/train_out/checkpoint/nnsk.iter4997.pth

examples/graphene/train/train_out/checkpoint/nnsk.iter4998.pth

examples/graphene/train/train_out/checkpoint/nnsk.iter4999.pth

examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth

examples/graphene/train/train_out/checkpoint/nnsk.latest.pth

examples/hBN/train/train_out/checkpoint/nnsk.best.pth

examples/hBN/train/train_out/checkpoint/nnsk.ep4997.pth

examples/hBN/train/train_out/checkpoint/nnsk.ep4998.pth

examples/hBN/train/train_out/checkpoint/nnsk.ep4999.pth

examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth

examples/hBN/train/train_out/checkpoint/nnsk.iter4997.pth

examples/hBN/train/train_out/checkpoint/nnsk.iter4998.pth

examples/hBN/train/train_out/checkpoint/nnsk.iter4999.pth

examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth

examples/hBN/train/train_out/checkpoint/nnsk.latest.pth

Recommended remediation (short):

Add ignore patterns (example): **/checkpoint/*.pth and /train_out/ to .gitignore.

If you want versioned binaries, enable Git‑LFS and add .gitattributes (e.g., '.pth filter=lfs diff=lfs merge=lfs -text') and run git lfs migrate import --include=".pth".

To stop tracking current files: git rm --cached && commit; to purge from history use BFG or git‑filter‑repo.

🤖 Prompt for AI Agents

In examples/graphene/train/train_out/checkpoint/nnsk.latest.pth (lines 1-1) there are committed .pth model checkpoints that should not be tracked by Git; to fix, add ignore patterns (e.g., **/checkpoint/*.pth and **/train_out/**) to .gitignore, enable Git‑LFS and add a .gitattributes entry for '*.pth filter=lfs diff=lfs merge=lfs -text' if you want to store large model files in LFS, run git rm --cached on the tracked .pth files and commit to stop tracking them, and if you need to remove them from repo history run git lfs migrate import --include="*.pth" or use BFG/git‑filter‑repo to purge past commits.

coderabbitai · 2025-09-21T08:04:39Z

examples/hBN/band.json

@@ -0,0 +1,18 @@
+{   
+    "structure":"train/data/POSCAR",


⚠️ Potential issue

Make structure path robust

Same as graphene: use "./train/data/POSCAR".

- "structure":"train/data/POSCAR", + "structure":"./train/data/POSCAR",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"structure":"train/data/POSCAR",

"structure":"./train/data/POSCAR",

🤖 Prompt for AI Agents

In examples/hBN/band.json around line 2, the structure path is not robust; change the value from "train/data/POSCAR" to "./train/data/POSCAR" so the relative path is explicit and matches the graphene fix—update the JSON entry to use the "./" prefix to ensure correct resolution from different working directories.

coderabbitai · 2025-09-21T08:04:39Z

examples/hBN/train/train_out/checkpoint/nnsk.best.pth

@@ -0,0 +1 @@
+/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth


⚠️ Potential issue

🧩 Analysis chain

Blocker: absolute, machine-specific symlink; leaks local path and will be broken for everyone else

This file appears to be a Git symlink pointing to an absolute path on a personal machine. It will not resolve in other environments and exposes a local filesystem path. Use a relative symlink (or remove checkpoints from the repo and provide a download path/LFS pointer).

Apply this minimal fix to point to the colocated checkpoint relatively:

-/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth +nnsk.ep5000.pth

Verification script to ensure all checkpoint symlinks are relative and resolvable:

🏁 Script executed:

#!/bin/bash set -euo pipefail p="examples/hBN/train/train_out/checkpoint/nnsk.best.pth" # 1) Confirm it's a symlink in git echo "Git mode for $p:" git ls-files -s "$p" # 2) Confirm it's a symlink in the working tree if [[ -L "$p" ]]; then tgt="$(readlink "$p")" echo "Symlink target: $tgt" else echo "ERROR: $p is not a symlink in the working tree" >&2 exit 1 fi # 3) Reject absolute targets if [[ "$tgt" = /* ]]; then echo "ERROR: Absolute symlink target detected. Use a relative path." >&2 exit 1 fi # 4) Check that the target exists and is readable if [[ -e "$(dirname "$p")/$tgt" ]]; then echo "OK: Target exists." else echo "ERROR: Target $(dirname "$p")/$tgt does not exist." >&2 exit 1 fi

Length of output: 338

Blocker: absolute, machine-specific symlink — leaks local path and will break elsewhere

Confirmed: examples/hBN/train/train_out/checkpoint/nnsk.best.pth is a git symlink (mode 120000) pointing to absolute target /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth. Replace with a relative symlink or remove the checkpoint from the repo (use LFS or external hosting).

Apply minimal fix:

-/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth +nnsk.ep5000.pth

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth

nnsk.ep5000.pth

🤖 Prompt for AI Agents

In examples/hBN/train/train_out/checkpoint/nnsk.best.pth around lines 1-1, the file is a git symlink pointing to an absolute machine-specific path which leaks local filesystem details and will break for other users; replace it with either (A) a relative symlink that points to the repository-relative target (e.g., ../../.. path to nnsk.ep5000.pth) or (B) remove the checkpoint from the repo entirely and track large binaries via Git LFS or external storage, then update .gitignore accordingly; after making the change ensure you remove the absolute symlink from history if needed or add a note in the commit message explaining the removal and push the corrected commit.

coderabbitai · 2025-09-21T08:04:39Z

examples/hBN/train/train_out/checkpoint/nnsk.latest.pth

@@ -0,0 +1 @@
+/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth


⚠️ Potential issue

.pth file is a plain-text pointer — loaders will break.

A .pth is expected to be a binary checkpoint (e.g., torch.save). Storing a text path in a .pth will cause torch.load to error unless you have custom resolution logic. Either:

Commit the actual checkpoint (prefer via Git LFS), or

Rename this file to .txt/.pointer and update loader code to resolve it before calling torch.load.

Example loader (if you must keep a pointer file):

import os, torch def load_ckpt(ptr_path): with open(ptr_path) as f: target = f.read().strip() if not os.path.isabs(target): target = os.path.join(os.path.dirname(ptr_path), target) return torch.load(target, map_location='cpu')

🤖 Prompt for AI Agents

In examples/hBN/train/train_out/checkpoint/nnsk.latest.pth around lines 1 to 1, the file currently contains a plaintext pointer rather than a binary PyTorch checkpoint which will cause torch.load to fail; either replace this pointer with the actual serialized checkpoint file (store via Git LFS if large) or rename the pointer to .txt/.pointer and update the loader to read the pointer and resolve it to the actual checkpoint path before calling torch.load (handle relative paths by joining with the pointer's directory and validate existence).

⚠️ Potential issue

Absolute personal path leaks environment and is non‑portable.

Use a relative path so the repo works on any machine:

-/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth +nnsk.iter5000.pth

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In examples/hBN/train/train_out/checkpoint/nnsk.latest.pth around line 1, the file path is an absolute personal path which leaks your environment and breaks portability; replace the absolute path with a repository-relative path (e.g., relative to the repo root or train_out/checkpoint), or derive it at runtime using a project-root or config variable (use path.join or equivalent) instead of hardcoding a home directory; update any references/configs that point to the old absolute location so they load the relative path and ensure the repository works on any machine.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

dpnegf/negf/lead_property.py (2)
121-130: Docstring is stale: mentions a removed 'save' param and saving behavior.

self_energy() no longer accepts save nor persists results; saving is handled by compute_all_self_energy. Please update the docstring to avoid confusion.

Apply this diff:
@@
-        save :
-            whether to save the self energy. 
-        save_path :
-            the path to save the self energy. If not specified, the self energy will be saved in the results_path.
+        save_path :
+            If provided, try to load self-energy from this directory/file.
+            For 'h5', a lead-specific file is expected (self_energy_leadL.h5/self_energy_leadR.h5).
+            For 'pth', a per (k,E) file path is used.
+            Note: computation here does not persist; batch saving is done by compute_all_self_energy().
613-624: Saved-SE presence check may yield false positives.

Any “*.h5” under the tree will pass, even if unrelated or missing required k/E coverage. Consider checking for expected filenames (self_energy_leadL.h5/self_energy_leadR.h5) and optionally verifying coverage vs grids.

Example:
-        patterns = ("*.h5", "*.pth")
+        patterns = ("self_energy_leadL.h5", "self_energy_leadR.h5", "*.pth")
Optionally, add a lightweight coverage check helper that reads group/dataset keys and compares against requested grids.
dpnegf/runner/NEGF.py (1)

517-555: Centralized self-energy flow is solid; minor ask: log energy/k coverage when using saved SE.

When use_saved_se=True, consider logging the unique E and k counts found to aid mismatch debugging.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52066fb and 1d03c18.

⛔ Files ignored due to path filters (1)

examples/graphene/train/data/kpath.0/band_structure.png is excluded by !**/*.png

📒 Files selected for processing (4)

dpnegf/negf/lead_property.py (3 hunks)
dpnegf/runner/NEGF.py (5 hunks)
dpnegf/tests/test_negf_negf_hamiltonian_init.py (1 hunks)
dpnegf/utils/argcheck.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

dpnegf/negf/lead_property.py (1)

dpnegf/negf/negf_hamiltonian_init.py (1)

get_hs_device (732-855)

dpnegf/runner/NEGF.py (2)

dpnegf/utils/argcheck.py (2)

negf (1014-1068)

poisson_options (1142-1151)

dpnegf/negf/lead_property.py (3)

LeadProperty (28-454)

compute_all_self_energy (458-517)

_has_saved_self_energy (614-624)

🪛 Ruff (0.13.1)

dpnegf/negf/lead_property.py

157-157: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

194-194: Avoid specifying long messages outside the exception class

(TRY003)

211-211: Avoid specifying long messages outside the exception class

(TRY003)

214-214: Avoid specifying long messages outside the exception class

(TRY003)

241-241: Avoid specifying long messages outside the exception class

(TRY003)

250-253: Avoid specifying long messages outside the exception class

(TRY003)

255-255: Avoid specifying long messages outside the exception class

(TRY003)

258-258: Avoid specifying long messages outside the exception class

(TRY003)

dpnegf/runner/NEGF.py

51-51: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

562-562: Multiple statements on one line (semicolon)

(E702)

563-563: Multiple statements on one line (colon)

(E701)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: run-tests

🔇 Additional comments (5)

dpnegf/negf/lead_property.py (1)
157-162: Type hint: make save_path Optional to satisfy type checkers (RUF013).

[ suggest_optional_refactor ]
Apply this diff and import Optional:
@@
-from typing import List
+from typing import List, Optional
@@
-    def _get_save_path(self, kpoint, energy, save_format: str, save_path: str = None):
+    def _get_save_path(self, kpoint, energy, save_format: str, save_path: Optional[str] = None):
dpnegf/tests/test_negf_negf_hamiltonian_init.py (1)

74-79: Test call updated correctly to new API (no save=).

Matches the refactor to centralized self-energy handling.
dpnegf/runner/NEGF.py (2)
51-52: Type hints: make possibly-None paths Optional to satisfy type checkers (RUF013).

[ suggest_optional_refactor ]
Apply this diff:
-                use_saved_se: bool=False, self_energy_save_path: str=None, se_info_display: bool=False,
+                use_saved_se: bool=False, self_energy_save_path: Optional[str]=None, se_info_display: bool=False,
562-565: Style nit: split multiple statements per line (E701/E702).

[ suggest_optional_refactor ]
Apply this diff:
-        self.out['k']=[];self.out['wk']=[]
-        if hasattr(self, "uni_grid"): self.out["uni_grid"] = self.uni_grid
+        self.out['k'] = []
+        self.out['wk'] = []
+        if hasattr(self, "uni_grid"):
+            self.out["uni_grid"] = self.uni_grid
dpnegf/utils/argcheck.py (1)

1050-1052: Arg additions LGTM — downstream usage updated; docs/CLI require verification.
Found updated usage in dpnegf/runner/NEGF.py and dpnegf/negf/lead_property.py; no legacy "self_energy_save" occurrences. Verify CLI/help/tutorial JSONs and user docs reference the new flags.

coderabbitai · 2025-09-22T02:45:10Z

dpnegf/negf/lead_property.py

+        if save_format == "pth":
+            try:
+                se = torch.load(save_path, weights_only=False)
+            except Exception as e:
+                raise IOError(f"Failed to load self-energy from {save_path} (pth format).") from e
+
+        elif save_format == "h5":
+            try:
+                data = read_from_hdf5(save_path, kpoint, energy)
+                se = torch.as_tensor(data, dtype=torch.complex128)  # 自能一般是复数
+            except KeyError as e:
+                kx, ky, kz = np.asarray(kpoint, dtype=float).reshape(3)
+                ev = energy.item() if hasattr(energy, "item") else float(energy)
+                raise KeyError(
+                    f"Cannot find self-energy in {save_path} "
+                    f"for k=({kx:.4f},{ky:.4f},{kz:.4f}), E={ev:.6f}"
+                ) from e
+            except Exception as e:
+                raise IOError(f"Failed to read HDF5 self-energy from {save_path}.") from e



⚠️ Potential issue

Torch scalar formatting bug: f-strings with :.8f on torch.Tensors will raise TypeError.

Energy values arrive as torch scalars in several paths; formatting them directly with :.8f breaks. Coerce to float first for filenames/keys and HDF5 I/O.

Apply these diffs:

@@ - data = read_from_hdf5(save_path, kpoint, energy) + data = read_from_hdf5(save_path, kpoint, energy) @@ -def read_from_hdf5(h5_path, k, e): +def read_from_hdf5(h5_path, k, e): + # normalize tensor/scalar to float for naming + try: + e_val = float(e.item() if hasattr(e, "item") else e) + except Exception: + e_val = float(e) with h5py.File(h5_path, "r") as f: - group_name = f"E_{e:.8f}" + group_name = f"E_{e_val:.8f}" dset_name = f"k_{k[0]}_{k[1]}_{k[2]}" @@ -def self_energy_worker(k, e, eta, lead_L, lead_R, self_energy_save_path): +def self_energy_worker(k, e, eta, lead_L, lead_R, self_energy_save_path): @@ - save_tmp_L = os.path.join(self_energy_save_path, f"tmp_leadL_k{k[0]}_{k[1]}_{k[2]}_E{e:.8f}.h5") - save_tmp_R = os.path.join(self_energy_save_path, f"tmp_leadR_k{k[0]}_{k[1]}_{k[2]}_E{e:.8f}.h5") + # ensure float energy for consistent naming + e_val = float(e.item() if hasattr(e, "item") else e) + save_tmp_L = os.path.join(self_energy_save_path, f"tmp_leadL_k{k[0]}_{k[1]}_{k[2]}_E{e_val:.8f}.h5") + save_tmp_R = os.path.join(self_energy_save_path, f"tmp_leadR_k{k[0]}_{k[1]}_{k[2]}_E{e_val:.8f}.h5") @@ - write_to_hdf5(save_tmp_L, k, e, seL) - write_to_hdf5(save_tmp_R, k, e, seR) + write_to_hdf5(save_tmp_L, k, e_val, seL) + write_to_hdf5(save_tmp_R, k, e_val, seR) @@ -def write_to_hdf5(h5_path, k, e, se): +def write_to_hdf5(h5_path, k, e, se): with h5py.File(h5_path, "a") as f: - group_name = f"E_{e:.8f}" + group_name = f"E_{float(e):.8f}" dset_name = f"k_{k[0]}_{k[1]}_{k[2]}"

Also applies to: 571-579, 549-556, 560-567

🧰 Tools

🪛 Ruff (0.13.1)

241-241: Avoid specifying long messages outside the exception class

(TRY003)

250-253: Avoid specifying long messages outside the exception class

(TRY003)

255-255: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In dpnegf/negf/lead_property.py around lines 237-256 (also apply same fix at 549-556, 560-567, 571-579), f-strings attempt to format torch scalar tensors (energy) with numeric format specifiers which raises TypeError; coerce energy to a Python float (use energy.item() or float(energy)) before any formatted string interpolation or HDF5 key/filename generation, and replace direct uses of energy in f"{energy:.8f}" (or similar) with f"{float(energy):.8f}" (or use a local ev = energy.item() / float(energy) and use ev in the formatted messages and HDF5 reads).

coderabbitai · 2025-09-22T02:45:10Z

dpnegf/negf/lead_property.py

+    if self_energy_save_path is None:
+        if lead_L.results_path != lead_R.results_path:
+            log.warning("The results_path of lead_L and lead_R are different. "
+                        "Self energy files will be saved in lead_L's results_path.")
+        self_energy_save_path = os.path.join(lead_L.results_path, "self_energy")



⚠️ Potential issue

Ensure output dir exists in all code paths (robustness for scheduled workers).

prepare_self_energy creates the dir, but compute_all_self_energy and workers should be defensive.

Apply this diff:

@@ - if self_energy_save_path is None: + if self_energy_save_path is None: if lead_L.results_path != lead_R.results_path: log.warning("The results_path of lead_L and lead_R are different. " "Self energy files will be saved in lead_L's results_path.") self_energy_save_path = os.path.join(lead_L.results_path, "self_energy") + os.makedirs(self_energy_save_path, exist_ok=True) @@ def self_energy_worker(k, e, eta, lead_L, lead_R, self_energy_save_path): @@ + os.makedirs(self_energy_save_path, exist_ok=True)

Also applies to: 523-552

🤖 Prompt for AI Agents

In dpnegf/negf/lead_property.py around lines 491-496 (and similarly 523-552), after determining self_energy_save_path ensure the directory exists by calling os.makedirs(self_energy_save_path, exist_ok=True) (or pathlib.Path(...).mkdir(parents=True, exist_ok=True)); wrap this in a try/except to log any OSError/PermissionError with a clear message and re-raise so scheduled workers fail loudly rather than silently proceeding.

coderabbitai · 2025-09-22T02:45:10Z

dpnegf/runner/NEGF.py

+            profiler = Profiler()
+            profiler.start() 
            self.negf_compute(scf_require=False,Vbias=None)
-            # profiler.stop()
-            # output_path = os.path.join(self.results_path, "profile_report.html")
-            # with open(output_path, 'w') as report_file:
-                # report_file.write(profiler.output_html())
+            profiler.stop()
+            output_path = os.path.join(self.results_path, "profile_report.html")
+            with open(output_path, 'w') as report_file:
+                report_file.write(profiler.output_html())



⚠️ Potential issue

Make profiling optional; avoid hard dependency on pyinstrument.

Importing pyinstrument unconditionally can break environments. Guard it and fall back gracefully.

Apply this diff:

- profiler = Profiler() - profiler.start() - self.negf_compute(scf_require=False,Vbias=None) - profiler.stop() - output_path = os.path.join(self.results_path, "profile_report.html") - with open(output_path, 'w') as report_file: - report_file.write(profiler.output_html()) + try: + from pyinstrument import Profiler + profiler = Profiler() + profiler.start() + self.negf_compute(scf_require=False, Vbias=None) + profiler.stop() + output_path = os.path.join(self.results_path, "profile_report.html") + with open(output_path, 'w') as report_file: + report_file.write(profiler.output_html()) + except Exception as ex: + log.debug("Profiling disabled (%s). Running without profiler.", ex) + self.negf_compute(scf_require=False, Vbias=None)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In dpnegf/runner/NEGF.py around lines 405 to 412, make pyinstrument optional by wrapping the Profiler import in a try/except and only using profiling when import succeeds: attempt to import Profiler from pyinstrument, set profiler = None on ImportError, and then guard profiler.start(), profiler.stop(), and writing profiler.output_html() with an if profiler check so the code falls back silently (or logs a debug message) instead of raising when pyinstrument is not installed; ensure output_path handling is unchanged and no profiler-related calls run when profiler is None.

AsymmetryChou added 3 commits September 21, 2025 14:54

add example graphene

da37121

add esk input.json

ec92b6c

add example hBN

52066fb

update graphene data

839374e

coderabbitai bot reviewed Sep 21, 2025

View reviewed changes

AsymmetryChou added 7 commits September 21, 2025 21:51

refactor lead_property

5249fe1

remove self_energy_save as it must be saved

04fec58

add use_saved_se (default: False)

b5a93d7

add log info for self_energy

8e41227

fix types in self_energy file name

3ff7fab

add _has_saved_self_energy check and module prepare_self_energy func

4f25276

add docs

1d03c18

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

		"results_path = '../band_plot'\n",
		"log_path = os.path.join(results_path, 'log')\n",

		@@ -0,0 +1 @@
		/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth No newline at end of file

	"structure":"train/data/POSCAR",
	"structure":"./train/data/POSCAR",

	/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth
	nnsk.ep5000.pth

Add Tutorial2. 2D mat #38

Are you sure you want to change the base?

Add Tutorial2. 2D mat #38

Uh oh!

Conversation

AsymmetryChou commented Sep 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AsymmetryChou commented Sep 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 21, 2025 •

edited

Loading