-
Notifications
You must be signed in to change notification settings - Fork 5
Add Tutorial2. 2D mat #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds tutorial and example datasets for graphene and hBN, including training inputs, k-path metadata, POSCARs, model templates, and checkpoints. Refactors self-energy handling: new save/load path logic, batch computation utilities, and NEGF orchestration via prepare_self_energy. Updates CLI arg parsing and tests to align with renamed/added parameters. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant NEGF
participant FS as FileSystem
participant Lead as LeadProperty
participant Batch as compute_all_self_energy
User->>NEGF: negf_compute(...)
NEGF->>NEGF: prepare_self_energy(scf_require)
alt use_saved_se == True
NEGF->>FS: check _has_saved_self_energy(self_energy_save_path)
FS-->>NEGF: exists / not exists
opt not exists
NEGF-->>User: assert fail (missing saved SE)
end
else use_saved_se == False
NEGF->>FS: ensure save dir exists
alt scf_require == True
NEGF->>Batch: compute_all_self_energy(eta_grid_all)
Batch->>Lead: self_energy_worker(k,e,...,save_path)
Lead->>FS: save temp .h5
Batch->>FS: merge temps → final .h5
else scf_require == False
NEGF->>Batch: compute_all_self_energy(eta_grid_dirichlet)
Batch->>Lead: self_energy_worker(...)
Lead->>FS: save temp .h5
Batch->>FS: merge temps → final .h5
end
end
NEGF-->>User: proceed with NEGF solve (SE ready)
sequenceDiagram
autonumber
participant Caller
participant Lead as LeadProperty.self_energy
participant FS as FileSystem
Caller->>Lead: self_energy(energy,kpoint,eta,save_path,save_format)
Lead->>Lead: _get_save_path(k,e,format,save_path)
alt save_path points to file and exists
Lead->>Lead: _load_self_energy(path,k,e,format)
Lead-->>Caller: return Σ from disk
else
Lead->>Lead: compute Σ(energy,k)
Lead->>FS: save Σ to resolved path (h5|pth)
Lead-->>Caller: return Σ
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Nitpick comments (13)
examples/hBN/extra_baseline/bn_spds.json (1)
1-16: LGTM on B/N basis; align directory naming with grapheneBasis arrays look fine. Also consider aligning directory naming with graphene after fixing its typo (“extra_baseline” everywhere).
examples/graphene/band.json (1)
6-13: nkpoints duplicationnkpoints (151) is derivable from kpath (50+50+50+1). If your loader recomputes it, drop the field to avoid drift; else keep and ensure validation checks consistency.
examples/graphene/graphene.vasp (1)
1-1: Optional: rename title to “graphene”Minor clarity tweak.
-graphite +grapheneexamples/hBN/extra_baseline/input_templete.json (2)
65-65: Typo: Fix "prexfix" to "prefix" in dataset prefix fieldsThere's a consistent typo in the dataset prefix placeholder text.
Apply this diff to fix the typo:
- "prefix": "prexfix_for_dataset", + "prefix": "prefix_for_dataset",Also applies to: 73-73, 80-80, 87-87
93-93: Add trailing newline to JSON fileJSON files should end with a newline character for better compatibility with version control and text editors.
Add a newline character at the end of the file.
examples/graphene/extr_baseline/input_templete.json (2)
65-65: Typo: Fix "prexfix" to "prefix" in dataset prefix fieldsSame typo as in the hBN template file.
Apply this diff to fix the typo:
- "prefix": "prexfix_for_dataset", + "prefix": "prefix_for_dataset",Also applies to: 72-72, 79-79
85-85: Add trailing newline to JSON fileJSON files should end with a newline character for better compatibility with version control and text editors.
Add a newline character at the end of the file.
examples/graphene/extr_baseline/grap_spd_model/sktb.json (2)
32-32: Device mismatch for example configurationThe model is configured to use
"device": "cuda"which requires GPU availability. For example/tutorial files, it's typically better to default to CPU to ensure broader compatibility.Consider changing the default device to CPU for better accessibility:
- "device": "cuda", + "device": "cpu",Or add a comment explaining GPU requirements if CUDA is intentional for this baseline model.
212-212: Add trailing newline to JSON fileJSON files should end with a newline character.
Add a newline character at the end of the file.
examples/hBN/extra_baseline/hbn_spd_model/sktb.json (2)
40-40: Device mismatch for example configurationSame issue as the graphene model - uses CUDA by default which may not be available for all users.
Consider changing the default device to CPU for better accessibility:
- "device": "cuda", + "device": "cpu",
477-477: Add trailing newline to JSON fileJSON files should end with a newline character.
Add a newline character at the end of the file.
examples/hBN/train/input.json (1)
79-79: Add trailing newline to JSON fileJSON files should end with a newline character for better compatibility.
Add a newline character at the end of the file.
examples/graphene/train/input.json (1)
71-71: Add trailing newline to JSON fileJSON files should end with a newline character for better compatibility.
Add a newline character at the end of the file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
examples/graphene/train/data/kpath.0/band_structure.pngis excluded by!**/*.pngexamples/hBN/train/data/kpath.0/band_structure.pngis excluded by!**/*.png
📒 Files selected for processing (20)
docs/hands_on/tutorial2_2d_mat.ipynb(1 hunks)examples/graphene/band.json(1 hunks)examples/graphene/extr_baseline/c_spds.json(1 hunks)examples/graphene/extr_baseline/grap_spd_model/sktb.json(1 hunks)examples/graphene/extr_baseline/input_templete.json(1 hunks)examples/graphene/graphene.vasp(1 hunks)examples/graphene/train/data/POSCAR(1 hunks)examples/graphene/train/data/kpath.0/info.json(1 hunks)examples/graphene/train/input.json(1 hunks)examples/graphene/train/train_out/checkpoint/nnsk.best.pth(1 hunks)examples/graphene/train/train_out/checkpoint/nnsk.latest.pth(1 hunks)examples/hBN/band.json(1 hunks)examples/hBN/extra_baseline/bn_spds.json(1 hunks)examples/hBN/extra_baseline/hbn_spd_model/sktb.json(1 hunks)examples/hBN/extra_baseline/input_templete.json(1 hunks)examples/hBN/train/data/POSCAR(1 hunks)examples/hBN/train/data/kpath.0/info.json(1 hunks)examples/hBN/train/input.json(1 hunks)examples/hBN/train/train_out/checkpoint/nnsk.best.pth(1 hunks)examples/hBN/train/train_out/checkpoint/nnsk.latest.pth(1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.1)
docs/hands_on/tutorial2_2d_mat.ipynb
10-10: Avoid specifying long messages outside the exception class
(TRY003)
31-31: Undefined name model_json
(F821)
32-32: Undefined name model_json
(F821)
59-59: Undefined name uni_cell_atoms
(F821)
🔇 Additional comments (9)
examples/hBN/train/train_out/checkpoint/nnsk.latest.pth (1)
1-1: Avoid committing training outputs; track checkpoints with Git LFS or publish as release assets.
- Remove examples/hBN/train/train_out/checkpoint/nnsk.latest.pth from Git; track *.pth with LFS and ignore train_out/.
- .gitattributes:
*.pth filter=lfs diff=lfs merge=lfs -text
- .gitignore:
examples/**/train_out/
- Keep a “latest” indirection as a tiny relative .txt next to the checkpoint if needed; publish full checkpoints as release assets or provide a download/fetch script.
- Verify where this file is loaded and switch consumers to a pointer‑aware loader (or adapt loading code).
Verification note: the prior ripgrep run returned "No files were searched". Re-run locally and paste the output:
#!/bin/bash set -euo pipefail git ls-files --error-unmatch "examples/hBN/train/train_out/checkpoint/nnsk.latest.pth" >/dev/null 2>&1 && echo "TRACKED" || echo "NOT_TRACKED" rg -n -C2 --hidden --no-ignore 'nnsk\.latest\.pth|torch\.load\(|load_state_dict\(' || trueexamples/graphene/extr_baseline/c_spds.json (1)
2-10: LGTM on basis"C": ["2s","2p","d*"] matches the hBN style. Just confirm “d*” is recognized by your parser.
examples/graphene/train/data/POSCAR (1)
1-10: Format checkValid POSCAR. Coordinates are Cartesian at z=15 with c=30 (z=0.5). Ensure downstream readers expect Cartesian (not Direct).
examples/graphene/train/data/kpath.0/info.json (1)
1-12: Band window sanityband_min/max (0–8) should cover the number of bands produced by your chosen basis. With SPD-like configs this may be low; please confirm.
examples/hBN/train/data/kpath.0/info.json (1)
1-12: Mirror graphene checkSame note as graphene: verify band_min/max span all bands for the used basis.
examples/graphene/band.json (2)
13-16: Label glyphSome tools expect "Γ" instead of "G". Confirm accepted labels.
2-2: Make structure path robustRelative path "train/data/POSCAR" can resolve incorrectly if CWD ≠ file dir. Prefer "./train/data/POSCAR".
Apply:
- "structure":"train/data/POSCAR", + "structure":"./train/data/POSCAR",Likely an incorrect or invalid review comment.
examples/hBN/band.json (1)
16-16: Valence counts look correctnel_atom {"B":3,"N":5} matches expectations.
examples/hBN/train/data/POSCAR (1)
1-11: LGTM! Valid POSCAR structure fileThe POSCAR file correctly defines a hexagonal boron nitride structure with appropriate lattice vectors and atomic positions in Cartesian coordinates. The 30 Å vacuum spacing in the z-direction is suitable for 2D material calculations.
| "dptb esk bn_spds.json -o hbn_spd_model\n", | ||
| "dptb config -m hbn_spd_model/sktb.json -tr -sk ./\n", | ||
| "\n", | ||
| "\"onsite\": {\n", | ||
| " \"method\": \"strain\"\n", | ||
| "}\n", | ||
| "\n", | ||
| "dptb train input.json -i ../extra_baseline/hbn_spd_model/sktb.json -o train_out\n", | ||
| "dptb run band.json -i train/train_out/checkpoint/nnsk.best.pth -o band_plot" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean up shell commands in code cell
The code cell contains shell commands mixed with Python configuration snippet. These should either be in a markdown cell as documentation or properly formatted as shell commands using ! prefix or %%bash magic.
Convert this to a markdown cell with proper code formatting:
## Setup Commands
Run these commands to set up the hBN model:
```bash
dptb esk bn_spds.json -o hbn_spd_model
dptb config -m hbn_spd_model/sktb.json -tr -sk ./For the configuration, modify the onsite method:
"onsite": {
"method": "strain"
}Then train and run:
dptb train input.json -i ../extra_baseline/hbn_spd_model/sktb.json -o train_out
dptb run band.json -i train/train_out/checkpoint/nnsk.best.pth -o band_plot
<details>
<summary>🤖 Prompt for AI Agents</summary>
In docs/hands_on/tutorial2_2d_mat.ipynb around lines 22–30, the notebook cell
mixes shell commands with a JSON snippet; replace that cell with a single
Markdown cell titled "Setup Commands" that contains three fenced code blocks: a
bash block listing the two dptb setup/config commands, a json block showing the
onsite configuration with "method": "strain", and a final bash block with the
dptb train and dptb run commands; ensure proper fences (bash and json) and
brief explanatory text before each block.
</details>
<!-- fingerprinting:phantom:medusa:armadillo -->
<!-- This is an auto-generated comment by CodeRabbit -->
| "results_path = '../band_plot'\n", | ||
| "log_path = os.path.join(results_path, 'log')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Ensure results directory exists before logging
The code sets up logging to a directory that may not exist yet.
Create the results directory before setting up logging:
results_path = '../band_plot'
+os.makedirs(results_path, exist_ok=True)
log_path = os.path.join(results_path, 'log')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "results_path = '../band_plot'\n", | |
| "log_path = os.path.join(results_path, 'log')\n", | |
| results_path = '../band_plot' | |
| os.makedirs(results_path, exist_ok=True) | |
| log_path = os.path.join(results_path, 'log') |
🤖 Prompt for AI Agents
In docs/hands_on/tutorial2_2d_mat.ipynb around lines 84-85, the code sets
results_path and log_path but doesn't ensure the results/log directories exist
before configuring logging; create the results directory (and the nested log
directory) before any logging calls by calling os.makedirs(results_path,
exist_ok=True) and os.makedirs(log_path, exist_ok=True) (or a single
os.makedirs(log_path, exist_ok=True)) so the logging target path exists.
| "id": "5c4797d7", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "ename": "KeyError", | ||
| "evalue": "'config'", | ||
| "output_type": "error", | ||
| "traceback": [ | ||
| "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | ||
| "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", | ||
| "Cell \u001b[0;32mIn[3], line 5\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mdptb\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnn\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbuild\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m build_model\n\u001b[1;32m 3\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlatest_dptb_b3.300_c2.600_w0.200.pth\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m# the model for demonstration\u001b[39;00m\n\u001b[0;32m----> 5\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[43mbuild_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m)\u001b[49m\n", | ||
| "File \u001b[0;32m/opt/mamba/envs/dpnegf-dev/lib/python3.10/site-packages/dptb/nn/build.py:50\u001b[0m, in \u001b[0;36mbuild_model\u001b[0;34m(checkpoint, model_options, common_options, no_check)\u001b[0m\n\u001b[1;32m 48\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 49\u001b[0m f \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mload(checkpoint, map_location\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m\"\u001b[39m, weights_only\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m---> 50\u001b[0m ckptconfig \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mconfig\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[1;32m 51\u001b[0m \u001b[38;5;28;01mdel\u001b[39;00m f\n\u001b[1;32m 53\u001b[0m \u001b[38;5;66;03m# init model from checkpoint\u001b[39;00m\n", | ||
| "\u001b[0;31mKeyError\u001b[0m: 'config'" | ||
| ] | ||
| } | ||
| ], | ||
| "source": [ | ||
| "from dptb.nn.build import build_model\n", | ||
| "\n", | ||
| "model = \"latest_dptb_b3.300_c2.600_w0.200.pth\" # the model for demonstration\n", | ||
| "\n", | ||
| "model = build_model(model,\n", | ||
| " model_options= model_json['model_options'],\n", | ||
| " common_options=model_json['common_options'])" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variables in model building code
The code references undefined variables model_json which will cause a NameError. The model loading also appears to fail with a KeyError for 'config', suggesting the checkpoint format might be incompatible.
Fix the undefined variables and handle the missing config gracefully:
from dptb.nn.build import build_model
+import json
model = "latest_dptb_b3.300_c2.600_w0.200.pth" # the model for demonstration
+# Load model configuration from a JSON file
+with open('band.json', 'r') as f:
+ model_json = json.load(f)
+
+# Build model with proper error handling
+try:
+ model = build_model(model,
+ model_options=model_json.get('model_options', {}),
+ common_options=model_json.get('common_options', {}))
+except KeyError as e:
+ print(f"Model checkpoint may be missing configuration. Error: {e}")
+ # Alternative: load model directly if it's a pre-configured checkpoint
+ import torch
+ model = torch.load(model, map_location="cpu", weights_only=False)
-model = build_model(model,
- model_options= model_json['model_options'],
- common_options=model_json['common_options'])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "id": "5c4797d7", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "ename": "KeyError", | |
| "evalue": "'config'", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
| "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", | |
| "Cell \u001b[0;32mIn[3], line 5\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mdptb\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnn\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbuild\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m build_model\n\u001b[1;32m 3\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlatest_dptb_b3.300_c2.600_w0.200.pth\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m# the model for demonstration\u001b[39;00m\n\u001b[0;32m----> 5\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[43mbuild_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m)\u001b[49m\n", | |
| "File \u001b[0;32m/opt/mamba/envs/dpnegf-dev/lib/python3.10/site-packages/dptb/nn/build.py:50\u001b[0m, in \u001b[0;36mbuild_model\u001b[0;34m(checkpoint, model_options, common_options, no_check)\u001b[0m\n\u001b[1;32m 48\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 49\u001b[0m f \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mload(checkpoint, map_location\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcpu\u001b[39m\u001b[38;5;124m\"\u001b[39m, weights_only\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m---> 50\u001b[0m ckptconfig \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mconfig\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[1;32m 51\u001b[0m \u001b[38;5;28;01mdel\u001b[39;00m f\n\u001b[1;32m 53\u001b[0m \u001b[38;5;66;03m# init model from checkpoint\u001b[39;00m\n", | |
| "\u001b[0;31mKeyError\u001b[0m: 'config'" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from dptb.nn.build import build_model\n", | |
| "\n", | |
| "model = \"latest_dptb_b3.300_c2.600_w0.200.pth\" # the model for demonstration\n", | |
| "\n", | |
| "model = build_model(model,\n", | |
| " model_options= model_json['model_options'],\n", | |
| " common_options=model_json['common_options'])" | |
| ] | |
| from dptb.nn.build import build_model | |
| import json | |
| model = "latest_dptb_b3.300_c2.600_w0.200.pth" # the model for demonstration | |
| # Load model configuration from a JSON file | |
| with open('band.json', 'r') as f: | |
| model_json = json.load(f) | |
| # Build model with proper error handling | |
| try: | |
| model = build_model(model, | |
| model_options=model_json.get('model_options', {}), | |
| common_options=model_json.get('common_options', {})) | |
| except KeyError as e: | |
| print(f"Model checkpoint may be missing configuration. Error: {e}") | |
| # Alternative: load model directly if it's a pre-configured checkpoint | |
| import torch | |
| model = torch.load(model, map_location="cpu", weights_only=False) |
🤖 Prompt for AI Agents
In docs/hands_on/tutorial2_2d_mat.ipynb around lines 93 to 117, the snippet uses
an undefined model_json and the checkpoint load fails with KeyError 'config';
define or load model_json before calling build_model (e.g., read a JSON/dict
with 'model_options' and 'common_options' or construct defaults) and change the
build step to handle checkpoints without a 'config' key by catching KeyError
when accessing ckpt['config'] and falling back to sensible defaults or passing
explicit model_options/common_options; ensure the checkpoint path is correct and
use torch.load(map_location='cpu') to inspect its keys before calling
build_model so you can adapt the call to the actual checkpoint format.
| "source": [ | ||
| "from dptb.postprocess.bandstructure.band import Band\n", | ||
| "import shutil\n", | ||
| "\n", | ||
| "task_options = {\n", | ||
| " \"task\": \"band\",\n", | ||
| " \"kline_type\":\"abacus\",\n", | ||
| " \"kpath\":[\n", | ||
| " [0, 0, 0, 50],\n", | ||
| " [0.5, 0, 0, 50],\n", | ||
| " [0.3333333, 0.3333333, 0, 50],\n", | ||
| " [0, 0, 0, 1]\n", | ||
| " ],\n", | ||
| " \"klabels\":[\"G\", \"M\", \"K\", \"G\"],\n", | ||
| " \"emin\":-20,\n", | ||
| " \"emax\": 20,\n", | ||
| " \"nel_atom\":{\"C\": 4} \n", | ||
| "\n", | ||
| " }\n", | ||
| "\n", | ||
| "if os.path.isdir(results_path):\n", | ||
| " shutil.rmtree(results_path, ignore_errors=True) \n", | ||
| "\n", | ||
| "band = Band(model, results_path)\n", | ||
| "AtomicData_options = {\"r_max\": 3.0, \"pbc\": True}\n", | ||
| "band.get_bands(data = uni_cell_atoms, \n", | ||
| " kpath_kwargs = task_options,\n", | ||
| " AtomicData_options = AtomicData_options)\n", | ||
| "band.band_plot(emin = task_options['emin'],\n", | ||
| " emax = task_options['emax'])" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variable uni_cell_atoms
The code references an undefined variable uni_cell_atoms which needs to be defined before use.
Add the missing structure definition before using it:
from dptb.postprocess.bandstructure.band import Band
import shutil
+from ase.io import read
+
+# Load the structure from POSCAR or graphene.vasp
+uni_cell_atoms = read("graphene.vasp") # or appropriate structure file
task_options = {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "source": [ | |
| "from dptb.postprocess.bandstructure.band import Band\n", | |
| "import shutil\n", | |
| "\n", | |
| "task_options = {\n", | |
| " \"task\": \"band\",\n", | |
| " \"kline_type\":\"abacus\",\n", | |
| " \"kpath\":[\n", | |
| " [0, 0, 0, 50],\n", | |
| " [0.5, 0, 0, 50],\n", | |
| " [0.3333333, 0.3333333, 0, 50],\n", | |
| " [0, 0, 0, 1]\n", | |
| " ],\n", | |
| " \"klabels\":[\"G\", \"M\", \"K\", \"G\"],\n", | |
| " \"emin\":-20,\n", | |
| " \"emax\": 20,\n", | |
| " \"nel_atom\":{\"C\": 4} \n", | |
| "\n", | |
| " }\n", | |
| "\n", | |
| "if os.path.isdir(results_path):\n", | |
| " shutil.rmtree(results_path, ignore_errors=True) \n", | |
| "\n", | |
| "band = Band(model, results_path)\n", | |
| "AtomicData_options = {\"r_max\": 3.0, \"pbc\": True}\n", | |
| "band.get_bands(data = uni_cell_atoms, \n", | |
| " kpath_kwargs = task_options,\n", | |
| " AtomicData_options = AtomicData_options)\n", | |
| "band.band_plot(emin = task_options['emin'],\n", | |
| " emax = task_options['emax'])" | |
| ] | |
| from dptb.postprocess.bandstructure.band import Band | |
| import shutil | |
| from ase.io import read | |
| # Load the structure from POSCAR or graphene.vasp | |
| uni_cell_atoms = read("graphene.vasp") # or appropriate structure file | |
| task_options = { | |
| "task": "band", | |
| "kline_type":"abacus", | |
| "kpath":[ | |
| [0, 0, 0, 50], | |
| [0.5, 0, 0, 50], | |
| [0.3333333, 0.3333333, 0, 50], | |
| [0, 0, 0, 1] | |
| ], | |
| "klabels":["G", "M", "K", "G"], | |
| "emin":-20, | |
| "emax": 20, | |
| "nel_atom":{"C": 4} | |
| } | |
| if os.path.isdir(results_path): | |
| shutil.rmtree(results_path, ignore_errors=True) | |
| band = Band(model, results_path) | |
| AtomicData_options = {"r_max": 3.0, "pbc": True} | |
| band.get_bands(data = uni_cell_atoms, | |
| kpath_kwargs = task_options, | |
| AtomicData_options = AtomicData_options) | |
| band.band_plot(emin = task_options['emin'], | |
| emax = task_options['emax']) |
| { | ||
| "common_options": { | ||
| "basis": { | ||
| "C": [ | ||
| "2s", | ||
| "2p", | ||
| "d*" | ||
| ] | ||
| } | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix directory name: extr_baseline vs extra_baseline
Graphene uses "extr_baseline" while hBN uses "extra_baseline". This inconsistency will break scripts relying on a common pattern. Rename the directory or align all references.
Run to spot all occurrences:
🏁 Script executed:
#!/bin/bash
rg -n 'extra_baseline|extr_baseline' -C2Length of output: 396
Fix directory name: extr_baseline vs extra_baseline
PR adds examples/graphene/extr_baseline (e.g., examples/graphene/extr_baseline/c_spds.json) while docs reference ../extra_baseline/hbn_spd_model/sktb.json (docs/hands_on/tutorial2_2d_mat.ipynb:29). Align the directory name or update all references to avoid breaking scripts.
🤖 Prompt for AI Agents
In examples/graphene/extr_baseline/c_spds.json (lines 1-11) the new directory is
named "extr_baseline" but docs reference "extra_baseline" (e.g.,
docs/hands_on/tutorial2_2d_mat.ipynb line ~29); fix by making the paths
consistent across the repo: either rename the examples/graphene/extr_baseline
directory to examples/graphene/extra_baseline, or update all documentation and
scripts to use "extr_baseline". Search the repo for both "extra_baseline" and
"extr_baseline" and apply the chosen name consistently, updating notebook
references (docs/hands_on/tutorial2_2d_mat.ipynb) and any CI/scripts that
consume these example paths.
| @@ -0,0 +1 @@ | |||
| /personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Absolute path leaked; make symlink target relative (portability + repo hygiene).
This file looks like a Git symlink whose target was captured as an absolute, machine‑local path. It will break for everyone else and in CI, and it leaks a personal filesystem path.
Fix by using a relative target within the same directory:
-/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth
+nnsk.ep5000.pthAlso consider not committing large checkpoints; prefer Git LFS or a download script referenced by the tutorial. Ensure the notebook handles “checkpoint missing” gracefully.
Run to verify symlink correctness and to find any other absolute targets:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Confirm it's a symlink in Git (expects mode 120000)
git ls-files -s examples/graphene/train/train_out/checkpoint/nnsk.best.pth
# 2) Show target and flag if absolute
echo "Target:"; cat examples/graphene/train/train_out/checkpoint/nnsk.best.pth
if grep -qE '^/' examples/graphene/train/train_out/checkpoint/nnsk.best.pth; then
echo "ERROR: Absolute path detected."
fi
# 3) Check if the relative target exists (from the symlink's dir)
pushd examples/graphene/train/train_out/checkpoint >/dev/null
tgt="$(cat nnsk.best.pth)"
if [[ -e "$tgt" ]]; then
ls -lh "$tgt"
else
echo "WARNING: Target '$tgt' not found in repo."
fi
popd >/dev/null
# 4) Scan for other absolute-path symlink pointers in example checkpoints
rg -n '^/' -g 'examples/**/checkpoint/*.pth' -S || trueLength of output: 207
🏁 Script executed:
#!/bin/bash
set -euo pipefail
path="examples/graphene/train/train_out/checkpoint/nnsk.best.pth"
echo "1) git index entry for $path"
git ls-files -s -- "$path" || true
echo
echo "2) OID -> blob (symlink target stored in Git)"
oid=$(git ls-files -s -- "$path" | awk '{print $2}' || true)
if [[ -z "$oid" ]]; then
echo "No OID found for $path"
else
echo "OID: $oid"
echo "---- git cat-file -p $oid ----"
git cat-file -p "$oid" || true
echo "-----------------------------"
fi
echo
echo "3) Show file from HEAD (if present)"
git show HEAD:"$path" || true
echo
echo "4) Worktree symlink info (if file exists in checkout)"
if [[ -e "$path" || -L "$path" ]]; then
echo "Worktree entry:"
ls -l "$path" || true
if [[ -L "$path" ]]; then
echo "readlink -> $(readlink -n "$path" || true)"
fi
else
echo "No worktree file at $path"
fi
echo
echo "5) Repo-wide: list all git-tracked symlinks and flag absolute targets"
git ls-files -s | awk '$1=="120000" {print $2 "|" $4}' | while IFS='|' read -r sha p; do
tgt=$(git cat-file -p "$sha" 2>/dev/null || true)
if [[ -z "$tgt" ]]; then
echo "SYMLINK: $p -> (empty)"
else
if [[ "$tgt" == /* ]]; then
echo "SYMLINK ABS: $p -> $tgt"
else
echo "SYMLINK: $p -> $tgt"
fi
fi
done
echo
echo "Done."Length of output: 1725
Replace absolute symlink targets (leaks + CI breakage).
Git-tracked symlinks contain machine-local absolute paths; update them to relative targets or remove them and provide checkpoints via Git LFS / a download script. Affected entries (from git blob):
- examples/graphene/train/train_out/checkpoint/nnsk.best.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth
- examples/graphene/train/train_out/checkpoint/nnsk.latest.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth
- examples/hBN/train/train_out/checkpoint/nnsk.best.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth
- examples/hBN/train/train_out/checkpoint/nnsk.latest.pth -> /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth
Immediate fixes:
- Replace each symlink blob with a relative target (e.g., nnsk.ep5000.pth / nnsk.iter5000.pth) or remove the symlink and commit a small placeholder.
- Do not commit large checkpoint files; move them to Git LFS or provide a documented download script and update notebooks to handle missing checkpoints gracefully.
- Re-run the verification script to ensure no other git-tracked symlinks use absolute paths.
🤖 Prompt for AI Agents
examples/graphene/train/train_out/checkpoint/nnsk.best.pth (and related entries:
examples/graphene/train/train_out/checkpoint/nnsk.latest.pth,
examples/hBN/train/train_out/checkpoint/nnsk.best.pth,
examples/hBN/train/train_out/checkpoint/nnsk.latest.pth) around line 1: the repo
currently contains git-tracked symlinks with absolute, machine-local targets;
replace each symlink blob with a relative symlink target (e.g., point
nnsk.best.pth -> nnsk.ep5000.pth and nnsk.latest.pth -> nnsk.iter5000.pth) or
remove the symlink and commit a small placeholder file, move any actual large
checkpoint binaries out of git (use Git LFS or host a download and add a script
and README), update notebooks/code to handle missing checkpoints gracefully, and
re-run the verification script to ensure no remaining git-tracked symlinks use
absolute paths.
| @@ -0,0 +1 @@ | |||
| /personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“.pth” points to text, not a real checkpoint; likely to break loaders.
If any code calls torch.load(".../nnsk.latest.pth"), this text file will raise an error. If you intend a pointer file, rename to .txt (or .json) and adjust tutorial code to read it; otherwise replace this file with the actual checkpoint (tracked via Git‑LFS).
I can propose a tiny helper that resolves <name>.latest.{txt,json} to a checkpoint path and falls back to the file itself if it’s a real .pth.
🤖 Prompt for AI Agents
In examples/graphene/train/train_out/checkpoint/nnsk.latest.pth (line 1), the
file is a text pointer but has a .pth extension which will cause torch.load(...)
to error; either replace this with the real checkpoint file (tracked via
Git-LFS) or rename the pointer to .txt or .json and update tutorial code to read
the pointer; implement a small helper that given "<name>.latest" resolves
"<name>.latest.txt" or ".json" to the actual checkpoint path and otherwise
returns the .pth file path, then update any loader calls in the tutorial to use
that helper so torch.load receives a real .pth file path.
Remove absolute personal path; use a portable reference.
Committing an absolute path from a personal filesystem is non-portable and leaks environment/PII. This will break for all users and CI.
Apply this minimal fix to make it relative within the checkpoint directory:
-/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth
+nnsk.iter5000.pthOptionally, make nnsk.latest.pth a symlink or an actual copy of the latest checkpoint instead of a pointer file, to match common tooling expectations.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/graphene/train/train_out/checkpoint/nnsk.latest.pth around lines 1
to 1, the file currently contains an absolute personal filesystem path which is
non-portable and exposes PII; replace that absolute path with a relative
reference inside the checkpoint directory (e.g., the filename of the latest
checkpoint relative to the checkpoint folder) or make nnsk.latest.pth a symlink
or actual copy of the latest checkpoint file so it works across environments;
update the repo to commit the relative/symlinked file instead of the absolute
path (and remove the absolute path from history if necessary) to prevent future
leakage.
🧩 Analysis chain
Ignore or LFS large model artifacts.
Checkpoints shouldn’t live in Git history. Add Git‑LFS or .gitignore rules (e.g., **/checkpoint/*.pth, **/train_out/**) unless you intend to distribute small demo weights.
Run this to spot absolute paths and committed large artifacts:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Absolute paths embedded:"
rg -nP '(^/|[A-Za-z]:\\\\)' -g '!**/.git/**' -S
echo -e "\nPotential checkpoint files and sizes:"
fd -t f -e pth | xargs -I{} sh -c 'printf "%s\t" "{}"; { wc -c < "{}" 2>/dev/null || echo "?"; }' | awk '{sz=$2/1024/1024; printf "%s\t%.2f MB\n",$1,sz}'
echo -e "\nLatest .pth consumers (torch.load):"
rg -nP 'torch\.load\([^)]*latest\.pth' -S || trueLength of output: 173
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo -e "\nTop-level entries:"
ls -la | sed -n '1,200p' || true
echo -e "\nGit repo check:"
if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
echo "Inside git repo. Branch: $(git branch --show-current 2>/dev/null || echo '-')"
else
echo "Not inside a git repo."
fi
echo -e "\nFinding .pth files (case-insensitive):"
mapfile -t PTHS < <(find . -type f -iname '*.pth' -print)
if [ "${#PTHS[@]}" -eq 0 ]; then
echo "No .pth files found."
else
for f in "${PTHS[@]}"; do
sz=$(wc -c < "$f" 2>/dev/null || echo 0)
printf "%s\t%.2f MB\n" "$f" "$(awk "BEGIN{printf $sz/1024/1024}")"
done
fi
echo -e "\nTop large files (>1MB):"
find . -type f -size +1M -printf "%p\t%k KB\n" | sort -k2 -n | tail -n 50 || true
echo -e "\nSearching for references to 'latest.pth':"
rg -n 'latest\.pth' -S . || echo "No references to latest.pth found."
echo -e "\nSearching for torch.load(...) with .pth:"
rg -nP "torch\.load\([^)]*\.pth" -S . || echo "No torch.load(...) referencing .pth found."
echo -e "\nSearching for absolute path literals:"
rg -nP '(^/|[A-Za-z]:\\\\)' -S . || echo "No absolute path literals found via rg."
echo -e "\nCommitted .pth files (git ls-files):"
if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
git ls-files | rg '\.pth$' -n || echo "No .pth files tracked in git index."
else
echo "Not a git repo - cannot list tracked files."
fi
echo -e "\nContents of .gitignore (first 200 lines):"
if [ -f .gitignore ]; then sed -n '1,200p' .gitignore; else echo ".gitignore not present"; fi
echo -e "\nContents of .gitattributes (first 200 lines):"
if [ -f .gitattributes ]; then sed -n '1,200p' .gitattributes; else echo ".gitattributes not present"; fi
echo -e "\nCheckpoint directories and sizes:"
find . -type d -iname 'checkpoint' -print0 | xargs -0 -I{} sh -c 'printf "%s\t" "{}"; du -sh "{}" 2>/dev/null || echo "?"' || trueLength of output: 1062
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo
echo "Top-level entries:"
ls -la | sed -n '1,200p' || true
echo
echo "Git repo check:"
if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
echo "Inside git repo. Branch: $(git branch --show-current 2>/dev/null || echo '-')"
else
echo "Not inside a git repo."
fi
echo
echo "Check specific path mentioned in review:"
target="examples/graphene/train/train_out/checkpoint/nnsk.latest.pth"
if [ -f "$target" ]; then
if stat -c%s "$target" >/dev/null 2>&1; then bytes=$(stat -c%s "$target"); else bytes=$(wc -c < "$target"); fi
printf "%s\t%.2f MB\n" "$target" "$(awk "BEGIN{printf $bytes/1024/1024}")"
else
echo "$target not found"
fi
echo
echo "Find any .pth files (case-insensitive):"
found=0
find . -type f -iname '*.pth' -print0 | while IFS= read -r -d '' f; do
found=1
if stat -c%s "$f" >/dev/null 2>&1; then bytes=$(stat -c%s "$f"); else bytes=$(wc -c < "$f"); fi
printf "%s\t%.2f MB\n" "$f" "$(awk "BEGIN{printf $bytes/1024/1024}")"
done
# indicate if none found (the while runs in a subshell, so check via another find)
if [ -z "$(find . -type f -iname '*.pth' -print -quit)" ]; then
echo "No .pth files found."
fi
echo
echo "Top large files (>1MB):"
find . -type f -size +1M -printf "%p\t%k KB\n" 2>/dev/null | sort -k2 -n | tail -n 50 || true
echo
echo "Searching for references to 'latest.pth' (including hidden/ignored files):"
rg -n --hidden -uu 'latest\.pth' -S || echo "No references to latest.pth found."
echo
echo "Searching for torch.load(...) referencing .pth files:"
rg -n --hidden -uu -P "torch\.load\([^)]*\.pth" -S || echo "No torch.load(...) referencing .pth found."
echo
echo "Searching for absolute path literals:"
rg -n --hidden -uu -P '(^/|[A-Za-z]:\\\\)' -S || echo "No absolute path literals found via rg."
echo
echo "Git-tracked .pth files (git ls-files):"
if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
git ls-files | rg '\.pth$' -n || echo "No .pth files tracked in git index."
else
echo "Not a git repo - cannot list tracked files."
fi
echo
echo "Contents of .gitignore (first 200 lines):"
if [ -f .gitignore ]; then sed -n '1,200p' .gitignore; else echo ".gitignore not present"; fi
echo
echo "Contents of .gitattributes (first 200 lines):"
if [ -f .gitattributes ]; then sed -n '1,200p' .gitattributes; else echo ".gitattributes not present"; fi
echo
echo "Checkpoint directories and sizes:"
find . -type d -iname 'checkpoint' -print0 | while IFS= read -r -d '' d; do
du -sh "$d" 2>/dev/null || echo "Could not get size for $d"
done || true
echo
echo "Done."Length of output: 8543
Ignore or track model checkpoints with Git‑LFS; remove committed .pth files from Git history.
Multiple .pth checkpoint files are currently tracked — remove them from history or migrate to Git‑LFS and add ignore rules unless you intentionally ship small demo weights.
Tracked files discovered:
- dpnegf/tests/data/test_get_fermi/nnsk.best.pth
- examples/graphene/train/train_out/checkpoint/nnsk.best.pth
- examples/graphene/train/train_out/checkpoint/nnsk.ep4997.pth
- examples/graphene/train/train_out/checkpoint/nnsk.ep4998.pth
- examples/graphene/train/train_out/checkpoint/nnsk.ep4999.pth
- examples/graphene/train/train_out/checkpoint/nnsk.ep5000.pth
- examples/graphene/train/train_out/checkpoint/nnsk.iter4997.pth
- examples/graphene/train/train_out/checkpoint/nnsk.iter4998.pth
- examples/graphene/train/train_out/checkpoint/nnsk.iter4999.pth
- examples/graphene/train/train_out/checkpoint/nnsk.iter5000.pth
- examples/graphene/train/train_out/checkpoint/nnsk.latest.pth
- examples/hBN/train/train_out/checkpoint/nnsk.best.pth
- examples/hBN/train/train_out/checkpoint/nnsk.ep4997.pth
- examples/hBN/train/train_out/checkpoint/nnsk.ep4998.pth
- examples/hBN/train/train_out/checkpoint/nnsk.ep4999.pth
- examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth
- examples/hBN/train/train_out/checkpoint/nnsk.iter4997.pth
- examples/hBN/train/train_out/checkpoint/nnsk.iter4998.pth
- examples/hBN/train/train_out/checkpoint/nnsk.iter4999.pth
- examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth
- examples/hBN/train/train_out/checkpoint/nnsk.latest.pth
Recommended remediation (short):
- Add ignore patterns (example): **/checkpoint/*.pth and /train_out/ to .gitignore.
- If you want versioned binaries, enable Git‑LFS and add .gitattributes (e.g., '.pth filter=lfs diff=lfs merge=lfs -text') and run git lfs migrate import --include=".pth".
- To stop tracking current files: git rm --cached && commit; to purge from history use BFG or git‑filter‑repo.
🤖 Prompt for AI Agents
In examples/graphene/train/train_out/checkpoint/nnsk.latest.pth (lines 1-1)
there are committed .pth model checkpoints that should not be tracked by Git; to
fix, add ignore patterns (e.g., **/checkpoint/*.pth and **/train_out/**) to
.gitignore, enable Git‑LFS and add a .gitattributes entry for '*.pth filter=lfs
diff=lfs merge=lfs -text' if you want to store large model files in LFS, run git
rm --cached on the tracked .pth files and commit to stop tracking them, and if
you need to remove them from repo history run git lfs migrate import
--include="*.pth" or use BFG/git‑filter‑repo to purge past commits.
| @@ -0,0 +1,18 @@ | |||
| { | |||
| "structure":"train/data/POSCAR", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make structure path robust
Same as graphene: use "./train/data/POSCAR".
- "structure":"train/data/POSCAR",
+ "structure":"./train/data/POSCAR",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "structure":"train/data/POSCAR", | |
| "structure":"./train/data/POSCAR", |
🤖 Prompt for AI Agents
In examples/hBN/band.json around line 2, the structure path is not robust;
change the value from "train/data/POSCAR" to "./train/data/POSCAR" so the
relative path is explicit and matches the graphene fix—update the JSON entry to
use the "./" prefix to ensure correct resolution from different working
directories.
| @@ -0,0 +1 @@ | |||
| /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Blocker: absolute, machine-specific symlink; leaks local path and will be broken for everyone else
This file appears to be a Git symlink pointing to an absolute path on a personal machine. It will not resolve in other environments and exposes a local filesystem path. Use a relative symlink (or remove checkpoints from the repo and provide a download path/LFS pointer).
Apply this minimal fix to point to the colocated checkpoint relatively:
-/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth
+nnsk.ep5000.pthVerification script to ensure all checkpoint symlinks are relative and resolvable:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
p="examples/hBN/train/train_out/checkpoint/nnsk.best.pth"
# 1) Confirm it's a symlink in git
echo "Git mode for $p:"
git ls-files -s "$p"
# 2) Confirm it's a symlink in the working tree
if [[ -L "$p" ]]; then
tgt="$(readlink "$p")"
echo "Symlink target: $tgt"
else
echo "ERROR: $p is not a symlink in the working tree" >&2
exit 1
fi
# 3) Reject absolute targets
if [[ "$tgt" = /* ]]; then
echo "ERROR: Absolute symlink target detected. Use a relative path." >&2
exit 1
fi
# 4) Check that the target exists and is readable
if [[ -e "$(dirname "$p")/$tgt" ]]; then
echo "OK: Target exists."
else
echo "ERROR: Target $(dirname "$p")/$tgt does not exist." >&2
exit 1
fiLength of output: 338
Blocker: absolute, machine-specific symlink — leaks local path and will break elsewhere
Confirmed: examples/hBN/train/train_out/checkpoint/nnsk.best.pth is a git symlink (mode 120000) pointing to absolute target /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth. Replace with a relative symlink or remove the checkpoint from the repo (use LFS or external hosting).
Apply minimal fix:
-/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth
+nnsk.ep5000.pth📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.ep5000.pth | |
| nnsk.ep5000.pth |
🤖 Prompt for AI Agents
In examples/hBN/train/train_out/checkpoint/nnsk.best.pth around lines 1-1, the
file is a git symlink pointing to an absolute machine-specific path which leaks
local filesystem details and will break for other users; replace it with either
(A) a relative symlink that points to the repository-relative target (e.g.,
../../.. path to nnsk.ep5000.pth) or (B) remove the checkpoint from the repo
entirely and track large binaries via Git LFS or external storage, then update
.gitignore accordingly; after making the change ensure you remove the absolute
symlink from history if needed or add a note in the commit message explaining
the removal and push the corrected commit.
| @@ -0,0 +1 @@ | |||
| /personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.pth file is a plain-text pointer — loaders will break.
A .pth is expected to be a binary checkpoint (e.g., torch.save). Storing a text path in a .pth will cause torch.load to error unless you have custom resolution logic. Either:
- Commit the actual checkpoint (prefer via Git LFS), or
- Rename this file to
.txt/.pointerand update loader code to resolve it before callingtorch.load.
Example loader (if you must keep a pointer file):
import os, torch
def load_ckpt(ptr_path):
with open(ptr_path) as f:
target = f.read().strip()
if not os.path.isabs(target):
target = os.path.join(os.path.dirname(ptr_path), target)
return torch.load(target, map_location='cpu')🤖 Prompt for AI Agents
In examples/hBN/train/train_out/checkpoint/nnsk.latest.pth around lines 1 to 1,
the file currently contains a plaintext pointer rather than a binary PyTorch
checkpoint which will cause torch.load to fail; either replace this pointer with
the actual serialized checkpoint file (store via Git LFS if large) or rename the
pointer to .txt/.pointer and update the loader to read the pointer and resolve
it to the actual checkpoint path before calling torch.load (handle relative
paths by joining with the pointer's directory and validate existence).
Absolute personal path leaks environment and is non‑portable.
Use a relative path so the repo works on any machine:
-/personal/DeepTB/dptb_Zjj/dpnegf/examples/hBN/train/train_out/checkpoint/nnsk.iter5000.pth
+nnsk.iter5000.pthCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/hBN/train/train_out/checkpoint/nnsk.latest.pth around line 1, the
file path is an absolute personal path which leaks your environment and breaks
portability; replace the absolute path with a repository-relative path (e.g.,
relative to the repo root or train_out/checkpoint), or derive it at runtime
using a project-root or config variable (use path.join or equivalent) instead of
hardcoding a home directory; update any references/configs that point to the old
absolute location so they load the relative path and ensure the repository works
on any machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
dpnegf/negf/lead_property.py (2)
121-130: Docstring is stale: mentions a removed 'save' param and saving behavior.
self_energy()no longer acceptssavenor persists results; saving is handled bycompute_all_self_energy. Please update the docstring to avoid confusion.Apply this diff:
@@ - save : - whether to save the self energy. - save_path : - the path to save the self energy. If not specified, the self energy will be saved in the results_path. + save_path : + If provided, try to load self-energy from this directory/file. + For 'h5', a lead-specific file is expected (self_energy_leadL.h5/self_energy_leadR.h5). + For 'pth', a per (k,E) file path is used. + Note: computation here does not persist; batch saving is done by compute_all_self_energy().
613-624: Saved-SE presence check may yield false positives.Any “*.h5” under the tree will pass, even if unrelated or missing required k/E coverage. Consider checking for expected filenames (self_energy_leadL.h5/self_energy_leadR.h5) and optionally verifying coverage vs grids.
Example:
- patterns = ("*.h5", "*.pth") + patterns = ("self_energy_leadL.h5", "self_energy_leadR.h5", "*.pth")Optionally, add a lightweight coverage check helper that reads group/dataset keys and compares against requested grids.
dpnegf/runner/NEGF.py (1)
517-555: Centralized self-energy flow is solid; minor ask: log energy/k coverage when using saved SE.When
use_saved_se=True, consider logging the unique E and k counts found to aid mismatch debugging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
examples/graphene/train/data/kpath.0/band_structure.pngis excluded by!**/*.png
📒 Files selected for processing (4)
dpnegf/negf/lead_property.py(3 hunks)dpnegf/runner/NEGF.py(5 hunks)dpnegf/tests/test_negf_negf_hamiltonian_init.py(1 hunks)dpnegf/utils/argcheck.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
dpnegf/negf/lead_property.py (1)
dpnegf/negf/negf_hamiltonian_init.py (1)
get_hs_device(732-855)
dpnegf/runner/NEGF.py (2)
dpnegf/utils/argcheck.py (2)
negf(1014-1068)poisson_options(1142-1151)dpnegf/negf/lead_property.py (3)
LeadProperty(28-454)compute_all_self_energy(458-517)_has_saved_self_energy(614-624)
🪛 Ruff (0.13.1)
dpnegf/negf/lead_property.py
157-157: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
194-194: Avoid specifying long messages outside the exception class
(TRY003)
211-211: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Avoid specifying long messages outside the exception class
(TRY003)
241-241: Avoid specifying long messages outside the exception class
(TRY003)
250-253: Avoid specifying long messages outside the exception class
(TRY003)
255-255: Avoid specifying long messages outside the exception class
(TRY003)
258-258: Avoid specifying long messages outside the exception class
(TRY003)
dpnegf/runner/NEGF.py
51-51: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
562-562: Multiple statements on one line (semicolon)
(E702)
563-563: Multiple statements on one line (colon)
(E701)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-tests
🔇 Additional comments (5)
dpnegf/negf/lead_property.py (1)
157-162: Type hint: makesave_pathOptional to satisfy type checkers (RUF013).[ suggest_optional_refactor ]
Apply this diff and import Optional:@@ -from typing import List +from typing import List, Optional @@ - def _get_save_path(self, kpoint, energy, save_format: str, save_path: str = None): + def _get_save_path(self, kpoint, energy, save_format: str, save_path: Optional[str] = None):dpnegf/tests/test_negf_negf_hamiltonian_init.py (1)
74-79: Test call updated correctly to new API (nosave=).Matches the refactor to centralized self-energy handling.
dpnegf/runner/NEGF.py (2)
51-52: Type hints: make possibly-None paths Optional to satisfy type checkers (RUF013).[ suggest_optional_refactor ]
Apply this diff:- use_saved_se: bool=False, self_energy_save_path: str=None, se_info_display: bool=False, + use_saved_se: bool=False, self_energy_save_path: Optional[str]=None, se_info_display: bool=False,
562-565: Style nit: split multiple statements per line (E701/E702).[ suggest_optional_refactor ]
Apply this diff:- self.out['k']=[];self.out['wk']=[] - if hasattr(self, "uni_grid"): self.out["uni_grid"] = self.uni_grid + self.out['k'] = [] + self.out['wk'] = [] + if hasattr(self, "uni_grid"): + self.out["uni_grid"] = self.uni_griddpnegf/utils/argcheck.py (1)
1050-1052: Arg additions LGTM — downstream usage updated; docs/CLI require verification.
Found updated usage in dpnegf/runner/NEGF.py and dpnegf/negf/lead_property.py; no legacy "self_energy_save" occurrences. Verify CLI/help/tutorial JSONs and user docs reference the new flags.
| if save_format == "pth": | ||
| try: | ||
| se = torch.load(save_path, weights_only=False) | ||
| except Exception as e: | ||
| raise IOError(f"Failed to load self-energy from {save_path} (pth format).") from e | ||
|
|
||
| elif save_format == "h5": | ||
| try: | ||
| data = read_from_hdf5(save_path, kpoint, energy) | ||
| se = torch.as_tensor(data, dtype=torch.complex128) # 自能一般是复数 | ||
| except KeyError as e: | ||
| kx, ky, kz = np.asarray(kpoint, dtype=float).reshape(3) | ||
| ev = energy.item() if hasattr(energy, "item") else float(energy) | ||
| raise KeyError( | ||
| f"Cannot find self-energy in {save_path} " | ||
| f"for k=({kx:.4f},{ky:.4f},{kz:.4f}), E={ev:.6f}" | ||
| ) from e | ||
| except Exception as e: | ||
| raise IOError(f"Failed to read HDF5 self-energy from {save_path}.") from e | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Torch scalar formatting bug: f-strings with :.8f on torch.Tensors will raise TypeError.
Energy values arrive as torch scalars in several paths; formatting them directly with :.8f breaks. Coerce to float first for filenames/keys and HDF5 I/O.
Apply these diffs:
@@
- data = read_from_hdf5(save_path, kpoint, energy)
+ data = read_from_hdf5(save_path, kpoint, energy)
@@
-def read_from_hdf5(h5_path, k, e):
+def read_from_hdf5(h5_path, k, e):
+ # normalize tensor/scalar to float for naming
+ try:
+ e_val = float(e.item() if hasattr(e, "item") else e)
+ except Exception:
+ e_val = float(e)
with h5py.File(h5_path, "r") as f:
- group_name = f"E_{e:.8f}"
+ group_name = f"E_{e_val:.8f}"
dset_name = f"k_{k[0]}_{k[1]}_{k[2]}"
@@
-def self_energy_worker(k, e, eta, lead_L, lead_R, self_energy_save_path):
+def self_energy_worker(k, e, eta, lead_L, lead_R, self_energy_save_path):
@@
- save_tmp_L = os.path.join(self_energy_save_path, f"tmp_leadL_k{k[0]}_{k[1]}_{k[2]}_E{e:.8f}.h5")
- save_tmp_R = os.path.join(self_energy_save_path, f"tmp_leadR_k{k[0]}_{k[1]}_{k[2]}_E{e:.8f}.h5")
+ # ensure float energy for consistent naming
+ e_val = float(e.item() if hasattr(e, "item") else e)
+ save_tmp_L = os.path.join(self_energy_save_path, f"tmp_leadL_k{k[0]}_{k[1]}_{k[2]}_E{e_val:.8f}.h5")
+ save_tmp_R = os.path.join(self_energy_save_path, f"tmp_leadR_k{k[0]}_{k[1]}_{k[2]}_E{e_val:.8f}.h5")
@@
- write_to_hdf5(save_tmp_L, k, e, seL)
- write_to_hdf5(save_tmp_R, k, e, seR)
+ write_to_hdf5(save_tmp_L, k, e_val, seL)
+ write_to_hdf5(save_tmp_R, k, e_val, seR)
@@
-def write_to_hdf5(h5_path, k, e, se):
+def write_to_hdf5(h5_path, k, e, se):
with h5py.File(h5_path, "a") as f:
- group_name = f"E_{e:.8f}"
+ group_name = f"E_{float(e):.8f}"
dset_name = f"k_{k[0]}_{k[1]}_{k[2]}"Also applies to: 571-579, 549-556, 560-567
🧰 Tools
🪛 Ruff (0.13.1)
241-241: Avoid specifying long messages outside the exception class
(TRY003)
250-253: Avoid specifying long messages outside the exception class
(TRY003)
255-255: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In dpnegf/negf/lead_property.py around lines 237-256 (also apply same fix at
549-556, 560-567, 571-579), f-strings attempt to format torch scalar tensors
(energy) with numeric format specifiers which raises TypeError; coerce energy to
a Python float (use energy.item() or float(energy)) before any formatted string
interpolation or HDF5 key/filename generation, and replace direct uses of energy
in f"{energy:.8f}" (or similar) with f"{float(energy):.8f}" (or use a local ev =
energy.item() / float(energy) and use ev in the formatted messages and HDF5
reads).
| if self_energy_save_path is None: | ||
| if lead_L.results_path != lead_R.results_path: | ||
| log.warning("The results_path of lead_L and lead_R are different. " | ||
| "Self energy files will be saved in lead_L's results_path.") | ||
| self_energy_save_path = os.path.join(lead_L.results_path, "self_energy") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure output dir exists in all code paths (robustness for scheduled workers).
prepare_self_energy creates the dir, but compute_all_self_energy and workers should be defensive.
Apply this diff:
@@
- if self_energy_save_path is None:
+ if self_energy_save_path is None:
if lead_L.results_path != lead_R.results_path:
log.warning("The results_path of lead_L and lead_R are different. "
"Self energy files will be saved in lead_L's results_path.")
self_energy_save_path = os.path.join(lead_L.results_path, "self_energy")
+ os.makedirs(self_energy_save_path, exist_ok=True)
@@
def self_energy_worker(k, e, eta, lead_L, lead_R, self_energy_save_path):
@@
+ os.makedirs(self_energy_save_path, exist_ok=True)Also applies to: 523-552
🤖 Prompt for AI Agents
In dpnegf/negf/lead_property.py around lines 491-496 (and similarly 523-552),
after determining self_energy_save_path ensure the directory exists by calling
os.makedirs(self_energy_save_path, exist_ok=True) (or
pathlib.Path(...).mkdir(parents=True, exist_ok=True)); wrap this in a try/except
to log any OSError/PermissionError with a clear message and re-raise so
scheduled workers fail loudly rather than silently proceeding.
| profiler = Profiler() | ||
| profiler.start() | ||
| self.negf_compute(scf_require=False,Vbias=None) | ||
| # profiler.stop() | ||
| # output_path = os.path.join(self.results_path, "profile_report.html") | ||
| # with open(output_path, 'w') as report_file: | ||
| # report_file.write(profiler.output_html()) | ||
| profiler.stop() | ||
| output_path = os.path.join(self.results_path, "profile_report.html") | ||
| with open(output_path, 'w') as report_file: | ||
| report_file.write(profiler.output_html()) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make profiling optional; avoid hard dependency on pyinstrument.
Importing pyinstrument unconditionally can break environments. Guard it and fall back gracefully.
Apply this diff:
- profiler = Profiler()
- profiler.start()
- self.negf_compute(scf_require=False,Vbias=None)
- profiler.stop()
- output_path = os.path.join(self.results_path, "profile_report.html")
- with open(output_path, 'w') as report_file:
- report_file.write(profiler.output_html())
+ try:
+ from pyinstrument import Profiler
+ profiler = Profiler()
+ profiler.start()
+ self.negf_compute(scf_require=False, Vbias=None)
+ profiler.stop()
+ output_path = os.path.join(self.results_path, "profile_report.html")
+ with open(output_path, 'w') as report_file:
+ report_file.write(profiler.output_html())
+ except Exception as ex:
+ log.debug("Profiling disabled (%s). Running without profiler.", ex)
+ self.negf_compute(scf_require=False, Vbias=None)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In dpnegf/runner/NEGF.py around lines 405 to 412, make pyinstrument optional by
wrapping the Profiler import in a try/except and only using profiling when
import succeeds: attempt to import Profiler from pyinstrument, set profiler =
None on ImportError, and then guard profiler.start(), profiler.stop(), and
writing profiler.output_html() with an if profiler check so the code falls back
silently (or logs a debug message) instead of raising when pyinstrument is not
installed; ensure output_path handling is unchanged and no profiler-related
calls run when profiler is None.
add a hands-on tutorial for transport in 2D mat
Summary by CodeRabbit
New Features
Documentation