-
Notifications
You must be signed in to change notification settings - Fork 96
fixed scheduler checkpoint loading, typo #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add doc - Center badges
Add badges to README
Updated `shear_flow` results
* Add arXiv badge * Update link to arXiv paper
Califronia -> California
…eadme Fix shear_flow README.md
* List all the Well dataset in utils.py * Use the dataset list in download script * Order MHD datasets by dimension
Data: - Add Rayleigh Benard uniform dataset - Edit information about Shear Flow data Statistics and Metrics: - Add RMS statistics - Add Pearson correlation metrics Code Refactoring: - Refine video generation control - Refactor sample load from HDF5 - Add transformation and augmentation based on resizing and roation - Allow specifying for dataset split - Format with ruff
* Update citation after NeurIPS release * Update citation in docs too
* Add the Well dataset collection mention to HF card * Ignore streamlit local runs * Make uploaded dataset public by default * Add option to skip repacking HDF5 file * Increase CPU resources in the uploading script
* Factorize models with a BaseModel * Improve AFNO typing * Add tests for the different models * Do not pass dataset metadata to model * Remove unecessary arguments in super Co-authored-by: François Rozet <[email protected]> --------- Co-authored-by: François Rozet <[email protected]>
* Make models inherit from PytorchModelHubMixin * Rename upload -> upload_dataset * Add script draft for uploading models * Add config template to upload model * Add ReadMes for the 4 models to upload to HF * Specify data n_inputs in model upload config * Update FNO README * Complete model uploading script * Fix path issues in model uploading script * Improve model path and name retrieval * Change model path retrieval strategy * Change dataset -> model in upload folder method * Update README.md * Factorize models with a BaseModel * Add tests for the different models * Do not pass dataset metadata to model * Improve AFNO typing * Edit model path retrieval * Update links in FNO readme * Update FNO Readme * Add header to model READMEs * Add tables to model READMEs * Add code sample to load models to READMEs * Fix model instantiation * Simplify uploading script * Simplify uploading logic * Fix typo in spatial * Convert Omegaconf containers to be jsonable * Improve type checking enforcement Co-authored-by: Miles Cranmer <[email protected]> * Simplify model path Co-authored-by: Miles Cranmer <[email protected]> * Update datasetname variable in README code snippet * Apply suggested pathlib edits * Factorize model card generation * Remove duplicated header from model READMEs * Fix model card template name * Factorize further model README files * Fix dataset name in model card * Make model name variable in model card * Fix missing model name update * Fix typo in spatial resolution of UNetConvNext * Edit links in README with appropriate model names * Edit links in model README files --------- Co-authored-by: Ruben Ohana <[email protected]> Co-authored-by: Miles Cranmer <[email protected]>
* Change HF link to point to the Well collection * Document retrieval of checkpoints through HF
- Refactor DeltaWellDataset for time step differences - Refactor normalization - Fix AFNO and AViT models Co-authored-by: Payel Mukhopadhyay <[email protected]> Co-authored-by: Mike McCabe <[email protected]>
* Increment version from 1.0.1 to 1.1.0 * Add list of maintainers * Add 3.13 to supported Python versions * Test max and min supported Python versions
* Add missing statistics * Remove try-except block causing silent failure * Add DeltaWellDataset to the list of data imports * Add dataset tests to check delta statistics * Round statistics to 4 decimal places * Fix argument in round function * Make compute statistics script parallel * Write stats with 4 decimal scientific notation * Edit yaml dumping for scientific notation * Factorize dataset download tests with fixtures * Reorganize dataset tests * Add comments to pytest fixtures * Simplify step selection * Raise error when stride and normalization are set
Co-authored-by: Payel Mukhopadhyay <[email protected]>
* Rewrite normalization tests Now only test the normalization class instead of the actual dataset stats. --------- Co-authored-by: Lucas Meyer <[email protected]>
* added max rollout steps to dataset docstring * Update the_well/data/datasets.py Co-authored-by: Lucas Meyer <[email protected]> --------- Co-authored-by: Lucas Meyer <[email protected]>
* Add template for bug reports * Update already existing issue message * Add version and environment to issue template * Add code snippet to obtain version and environment * Fix typo in code snippet
…g_page Add missing symbolic link to rayleigh_benard_uniform
fix: stop overwriting `best.pt` every validation
fix: denominator calculation for short validation
|
Thanks for the contribution @AnnihilatorChess . The change looks good at a quick glance, but I think it'll be a few days before someone can do a more detailed check. For now I'll trigger the workflow and make sure it doesn't break any of the tests. |
|
@AnnihilatorChess This was accidentally closed during a restructuring of the repo. We would love to have your contribution, so once we are done with the restructuring and release, we will ping you for submitting a PR again. |

Hello,
I found a bug where resuming a run from a checkpoint incorrectly restarts the LR scheduler's warmup and cosine decay. This is because the Trainer in training.py saves and loads the optimizer state but not the lr_scheduler state.
This PR fixes the save_model and load_checkpoint methods to include the lr_scheduler.state_dict() in the checkpoint, ensuring that training resumes with the correct learning rate.
(I also fixed a small typo: optimizer_state_dit -> optimizer_state_dict.)"