-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
checkpointingRelated to checkpointingRelated to checkpointingfeatureIs an improvement or enhancementIs an improvement or enhancementloopsRelated to the Loop APIRelated to the Loop APIplGeneric label for PyTorch Lightning packageGeneric label for PyTorch Lightning package
Milestone
Description
Bug description
When restoring checkpoints with trainer.validate, global_step and epoch are overwritten with 0.
It should keep the same global_step and epoch otherwise, it messes with the loggers.
This issue prevents to correctly validate checkpoints of a model as a postprocessing.
How to reproduce the bug
run a model with
trainer.validate(model, datamodule, cpt_path=ckpt_path
and log a metric, the result will be logged at step 0.
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0): 1.9
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0): 1.13
#- Python version (e.g., 3.9): 3.10
#- OS (e.g., Linux): Linux
#- CUDA/cuDNN version: 11.7
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response
basveeling, tanhevg, ptoews and DarkLight1337
Metadata
Metadata
Assignees
Labels
checkpointingRelated to checkpointingRelated to checkpointingfeatureIs an improvement or enhancementIs an improvement or enhancementloopsRelated to the Loop APIRelated to the Loop APIplGeneric label for PyTorch Lightning packageGeneric label for PyTorch Lightning package