Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--load argument should be required when in evaluation mode #105

Open
observingClouds opened this issue Jan 26, 2025 · 3 comments
Open

--load argument should be required when in evaluation mode #105

observingClouds opened this issue Jan 26, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@observingClouds
Copy link
Contributor

In evaluation mode a checkpoint needs to be loaded, so testing for the --load arguments should be added because otherwise the code fails with a unhelpful error further down the line:

0: [rank0]:   File "neural-lam/neural_lam/models/ar_model.py", line 394, in <dictcomp>
0: [rank0]:     f"test_loss_unroll{step}": time_step_loss[step - 1]
0: [rank0]: IndexError: index 14 is out of b
0: ounds for dimension 0 with size 10
@joeloskarsson
Copy link
Collaborator

This error is not due to not having loaded a checkpoint, but rather due to running with --ar_steps_eval that is lower than the maximum number in --val_steps_to_log, e.g. trying to log validation error for a lead time that is not forecasted. However, the defaults for these arguments will give this issue and that is probably not a great setup and something we should change. It would also be good to have a check that the steps given in val_steps_to_log are all <= what is given as --ar_steps_eval, instead of failing in this unhelpful way.

@joeloskarsson
Copy link
Collaborator

Is there still an interest in requiring --load in eval mode? (even though that did not cause the issue)

Otherwise maybe we can change this issue to be about the default --val_steps_to_log argument causing this problem, as that is something I think we should fix.

@sadamov
Copy link
Collaborator

sadamov commented Feb 10, 2025

The error you are getting is now being discussed here: #120

I think we should still throw a warning when the model is in eval mode, and no checkpoint was loaded. As that is probably not what most users want to do: evaluation of initialized weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants