Skip to content

Commit f189921

Browse files
committed
run through LLM
1 parent e972795 commit f189921

File tree

1 file changed

+118
-89
lines changed

1 file changed

+118
-89
lines changed

README.md

+118-89
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,22 @@
1313

1414
![Logo](https://raw.githubusercontent.com/zincware/ZnTrack/main/docs/source/_static/logo_ZnTrack.png)
1515

16-
# ZnTrack: Make your Python Code reproducible!
16+
# ZnTrack: Make Your Python Code Reproducible!
1717

18-
ZnTrack enables you to convert your existing Python code into reproducible
19-
workflows by converting them into directed graph structure with well defined
20-
inputs and outputs per node.
18+
ZnTrack (`zɪŋk træk`) is a lightweight and easy-to-use Python package for converting your existing Python code into reproducible workflows. By structuring your code as a directed graph with well-defined inputs and outputs, ZnTrack ensures reproducibility, scalability, and ease of collaboration.
2119

22-
## Example
20+
## Key Features
2321

24-
Let us take the following workflow that constructs a periodic, atomistic system
25-
of Ethanol and runs a geometry optimization using MACE-MP-0.
22+
- **Reproducible Workflows**: Convert Python scripts into reproducible workflows with minimal effort.
23+
- **Parameter, Output, and Metric Tracking**: Easily track parameters, outputs, and metrics in your Python code.
24+
- **Lightweight and Database-Free**: ZnTrack is lightweight and does not require any databases.
25+
- **DVC Integration**: Seamlessly integrates with [DVC](https://dvc.org) for data version control.
26+
27+
## Example: Molecular Dynamics Workflow
28+
29+
Let’s take a workflow that constructs a periodic, atomistic system of Ethanol and runs a geometry optimization using MACE-MP-0.
30+
31+
### Original Workflow
2632

2733
```python
2834
from ase.optimize import LBFGS
@@ -41,14 +47,16 @@ dyn.run(fmax=0.5)
4147
```
4248

4349
<details>
44-
<summary>Dependencencyes</summary>
45-
For this example to work you will need
46-
- https://github.com/ACEsuit/mace
47-
- https://github.com/m3g/packmol
48-
- https://github.com/zincware/rdkit2ase
50+
<summary>Dependencies</summary>
51+
For this example to work, you will need:
52+
- [MACE](https://github.com/ACEsuit/mace)
53+
- [Packmol](https://github.com/m3g/packmol)
54+
- [rdkit2ase](https://github.com/zincware/rdkit2ase)
4955
</details>
5056

51-
To make this reproducible, we convert it into the following graph structure:
57+
### Converted Workflow with ZnTrack
58+
59+
To make this workflow reproducible, we convert it into a graph structure:
5260

5361
```mermaid
5462
flowchart LR
@@ -57,118 +65,93 @@ Smiles2Conformers --> Pack --> StructureOptimization
5765
MACE_MP --> StructureOptimization
5866
```
5967

68+
#### Node Definitions
69+
6070
```python
6171
import zntrack
6272
import ase.io
6373
from pathlib import Path
6474

6575
class Smiles2Conformers(zntrack.Node):
66-
smiles: str = zntrack.params()
67-
numConfs: int = zntrack.params(32)
76+
smiles: str = zntrack.params()
77+
numConfs: int = zntrack.params(32)
6878

69-
frames_path: Path = zntrack.outs_path(zntrack.nwd / "frames.xyz")
79+
frames_path: Path = zntrack.outs_path(zntrack.nwd / "frames.xyz")
7080

71-
def run(self) -> None:
72-
frames = smiles2conformers(smiles=self.smiles, numConfs=self.numConfs)
73-
ase.io.write(frames, self.frames_path)
81+
def run(self) -> None:
82+
frames = smiles2conformers(smiles=self.smiles, numConfs=self.numConfs)
83+
ase.io.write(frames, self.frames_path)
7484

75-
@property
76-
def frames(self) -> list[ase.Atoms]:
77-
with self.state.fs.open(self.frames_path, "r") as f:
78-
return list(ase.io.iread(f, ":", format="extxyz"))
85+
@property
86+
def frames(self) -> list[ase.Atoms]:
87+
with self.state.fs.open(self.frames_path, "r") as f:
88+
return list(ase.io.iread(f, ":", format="extxyz"))
7989

8090

8191
class Pack(zntrack.Node):
82-
data: list[list[ase.Atoms]] = zntrack.deps()
83-
counts: list[int] = zntrack.params()
84-
density: float = zntrack.params()
85-
86-
frames_path: Path = zntrack.outs_path(zntrack.nwd / "frames.xyz")
87-
88-
def run(self) -> None:
89-
box = pack(data=self.data, counts=self.counts, density=self.density)
90-
ase.io.write(box, self.frames_path)
92+
data: list[list[ase.Atoms]] = zntrack.deps()
93+
counts: list[int] = zntrack.params()
94+
density: float = zntrack.params()
9195

92-
@property
93-
def frames(self) -> list[ase.Atoms]:
94-
with self.state.fs.open(self.frames_path, "r") as f:
95-
return list(ase.io.iread(f, ":", format="extxyz"))
96+
frames_path: Path = zntrack.outs_path(zntrack.nwd / "frames.xyz")
9697

97-
```
98+
def run(self) -> None:
99+
box = pack(data=self.data, counts=self.counts, density=self.density)
100+
ase.io.write(box, self.frames_path)
98101

99-
We could hardcode the MACE_MP model into the StructureOptimization Node, but we
100-
can also define it as a dependency. In contrast to `Smiles2Conformers` and
101-
`Pack` the model does not require a `run` method and thus we can define it as a
102-
`@dataclass`
102+
@property
103+
def frames(self) -> list[ase.Atoms]:
104+
with self.state.fs.open(self.frames_path, "r") as f:
105+
return list(ase.io.iread(f, ":", format="extxyz"))
103106

104-
```python
105-
from dataclasses import dataclass
106107

107108
@dataclass
108109
class MACE_MP:
109-
model: str = "medium"
110+
model: str = "medium"
110111

111-
def get_calculator(self, **kwargs):
112-
return mace_mp(model=self.model)
112+
def get_calculator(self, **kwargs):
113+
return mace_mp(model=self.model)
113114

114115

115116
class StructureOptimization(zntrack.Node):
116-
model: MACE_MP = zntrack.deps()
117-
data: list[ase.Atoms] = zntrack.deps()
118-
data_id: int = zntrack.params()
119-
fmax: float = zntrack.params(0.05)
120-
121-
frames_path: Path = zntrack.outs_path(zntrack.nwd / "frames.traj")
122-
123-
def run(self):
124-
atoms = self.data[self.data_id]
125-
atoms.calc = self.model.get_calculator()
126-
dyn = LBFGS(atoms, trajectory=self.frames_path)
127-
dyn.run(fmax=0.5)
128-
129-
@property
130-
def frames(self) -> list[ase.Atoms]:
131-
with self.state.fs.open(self.frames_path, "rb") as f:
132-
return list(ase.io.iread(f, ":", format="traj"))
117+
model: MACE_MP = zntrack.deps()
118+
data: list[ase.Atoms] = zntrack.deps()
119+
data_id: int = zntrack.params()
120+
fmax: float = zntrack.params(0.05)
121+
122+
frames_path: Path = zntrack.outs_path(zntrack.nwd / "frames.traj")
123+
124+
def run(self):
125+
atoms = self.data[self.data_id]
126+
atoms.calc = self.model.get_calculator()
127+
dyn = LBFGS(atoms, trajectory=self.frames_path)
128+
dyn.run(fmax=0.5)
129+
130+
@property
131+
def frames(self) -> list[ase.Atoms]:
132+
with self.state.fs.open(self.frames_path, "rb") as f:
133+
return list(ase.io.iread(f, ":", format="traj"))
133134
```
134135

135-
Now that we have defined all necessary Nodes we can put them to use and build
136-
our graph. Best to go into a new and empty directory, run `git init` followed by
137-
`dvc init`. Then we create a file `src/__init__.py` and place the Node
138-
definitions in there. Finally we create a new file `main.py` as described bellow
139-
and execute it using `python main.py` to build and access our workflow.
136+
#### Building and Running the Workflow
140137

141138
```python
142139
import zntrack
143-
144140
from src import MACE_MP, Smiles2Conformers, Pack, StructureOptimization
145141

146142
project = zntrack.Project()
147143

148144
model = MACE_MP()
149145

150146
with project:
151-
etoh = Smiles2Conformers(
152-
smiles="CCO",
153-
numConfs=32
154-
)
155-
box = Pack(
156-
data=[etoh.frames],
157-
counts=[32],
158-
density=789
159-
)
160-
optm = StructureOptimization(
161-
model=model,
162-
data=box.frames,
163-
data_id=-1,
164-
fmax=0.5
165-
)
147+
etoh = Smiles2Conformers(smiles="CCO", numConfs=32)
148+
box = Pack(data=[etoh.frames], counts=[32], density=789)
149+
optm = StructureOptimization(model=model, data=box.frames, data_id=-1, fmax=0.5)
166150

167151
project.repro()
168152
```
169153

170-
We can now see that the files have been created in `nodes/StructureOptimization>/frames.traj` which contains our final trajectory.
171-
To look at the results, we can also run the following Python script:
154+
#### Accessing Results
172155

173156
```python
174157
import zntrack
@@ -177,6 +160,52 @@ optm = zntrack.from_rev(name="StructureOptimization")
177160
print(optm.frames)
178161
```
179162

180-
For more examples checkout the following packages that build ontop of ZnTrack
181-
- https://mlipx.readthedocs.io/en/latest/
182-
- https://github.com/zincware/IPSuite
163+
For more examples, check out the following packages that build on top of ZnTrack:
164+
- [MLIPx](https://mlipx.readthedocs.io/en/latest/)
165+
- [IPSuite](https://github.com/zincware/IPSuite)
166+
167+
---
168+
169+
## Technical Details
170+
171+
### ZnTrack as an Object-Relational Mapping for DVC
172+
173+
ZnTrack provides an easy-to-use interface for DVC directly from Python. It handles all the computational overhead of reading config files, defining outputs in the `dvc.yaml`, and much more.
174+
175+
For more information on DVC, visit their [homepage](https://dvc.org/doc).
176+
177+
---
178+
179+
## References
180+
181+
If you use ZnTrack in your research, please cite us:
182+
183+
```bibtex
184+
@misc{zillsZnTrackDataCode2024,
185+
title = {{{ZnTrack}} -- {{Data}} as {{Code}}},
186+
author = {Zills, Fabian and Sch{\"a}fer, Moritz and Tovey, Samuel and K{\"a}stner, Johannes and Holm, Christian},
187+
year = {2024},
188+
eprint={2401.10603},
189+
archivePrefix={arXiv},
190+
}
191+
```
192+
193+
---
194+
195+
## Copyright
196+
197+
This project is distributed under the [Apache License Version 2.0](https://github.com/zincware/ZnTrack/blob/main/LICENSE).
198+
199+
---
200+
201+
## Similar Tools
202+
203+
Here’s a list of other projects that either work together with ZnTrack or achieve similar results with slightly different goals or programming languages:
204+
205+
- [DVC](https://dvc.org/) - Main dependency of ZnTrack for Data Version Control.
206+
- [dvthis](https://github.com/jcpsantiago/dvthis) - Introduce DVC to R.
207+
- [DAGsHub Client](https://github.com/DAGsHub/client) - Logging parameters from within Python.
208+
- [MLFlow](https://mlflow.org/) - A Machine Learning Lifecycle Platform.
209+
- [Metaflow](https://metaflow.org/) - A framework for real-life data science.
210+
- [Hydra](https://hydra.cc/) - A framework for elegantly configuring complex applications.
211+
- [Snakemake](https://snakemake.readthedocs.io/en/stable/) - Workflow management system for reproducible and scalable data analyses.

0 commit comments

Comments
 (0)