You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a feature request to be able to run validation/checkpointing more often than once an epoch. For very large datasets, it feels unreasonable to only be able to run validation once an epoch, especially if it takes a couple hours to complete an epoch. Running validation more often would be useful at least for a faster response time for tuning parameters.
I was able to get around this issue by creating a hack with our usage of PyMarlin where we would set the max_steps_per_epoch to the desired logging frequency for validation. However, this requires modifying the input dataset to track where it currently is in the actual epoch and modifying the number of epochs supplied to the trainer to take into account "logging epochs". It also causes PyMarlin to now inaccurately report the actual epochs the model is trained on.
Overall, the request would be to either to integrate the hack into PyMarlin's logic for a better experience from the user's perspective, or implement more frequent validation/checkpointing through a different method. I am more than happy to supply the code for the hack.
The text was updated successfully, but these errors were encountered:
This is a feature request to be able to run validation/checkpointing more often than once an epoch. For very large datasets, it feels unreasonable to only be able to run validation once an epoch, especially if it takes a couple hours to complete an epoch. Running validation more often would be useful at least for a faster response time for tuning parameters.
I was able to get around this issue by creating a hack with our usage of PyMarlin where we would set the
max_steps_per_epoch
to the desired logging frequency for validation. However, this requires modifying the input dataset to track where it currently is in the actual epoch and modifying the number of epochs supplied to the trainer to take into account "logging epochs". It also causes PyMarlin to now inaccurately report the actual epochs the model is trained on.Overall, the request would be to either to integrate the hack into PyMarlin's logic for a better experience from the user's perspective, or implement more frequent validation/checkpointing through a different method. I am more than happy to supply the code for the hack.
The text was updated successfully, but these errors were encountered: