-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong scaling of 1-step diff_std, should be standardized #112
Comments
This is a hotfix that works for the mdp datastore, but will instead make the scalings wrong for the npy-MEPS datastore: joeloskarsson@2e46ecc So the problem should really be solved by making this consistent by deciding if these should be returned in standardized scale from the datastore or not. |
Good catch. I had a feeling that I didn't completely grok this while we were writing the datastores code 😔 I am happy with returning the standardized time/space-mean time-step differences (as npy-MEPS does it) rather than just the time/space-mean time-step differences* as we do now (from the MDP datastore). But I would change the name from |
Yes, it would. I think that sounds good, and agree that we should update the variable name and possibly add a comment about this to avoid confusion. |
|
Yes, that is the case. When you are outputting std-dev:s from the model the model will anyhow decide the weights for different variables. So even if you would for example increase the weighting for u10 with a factor 10, it would be optimal for the model to then just adjust the std-dev output by a factor 10, and you end up in the same situation. So any manually specified weights would not have any impact as you are training. It could however make sense to use manually specified weights as an initialization for the model, but this is currently not implemented. |
Oh, I didn't realize that there is an option of standardizing in both the datastore in addition to the WeatherDataset. In that case I don't think I have anything against doing it in the datastore. |
After having issues with my one variable having really high loss I realized that my issues stem from it having a very low value for diff_std in ar_model. This is the std-dev of 1-step differences. Digging a bit, I realize this is in mllam-data-prep computed for the non-standardized variables https://github.com/mllam/mllam-data-prep/blob/3a48c997144e66ef2478413668e669e124400cdd/mllam_data_prep/ops/statistics.py#L45 whereas it was earlier (and still for the MEPS datastore) for the standardized variables
neural-lam/neural_lam/datastore/npyfilesmeps/compute_standardization_stats.py
Line 283 in f233f87
Having the wrong scaling of these is actually a significant problem, and lines like
neural-lam/neural_lam/models/base_graph_model.py
Line 174 in f233f87
I don't have any strong opinions about where the standardization should happen: 1. when computing the stats and storing them on disk 2. when loading these in the datastore or 3. in the model itself. We just need to decide on some strategy to keep this clear and consistent.
The text was updated successfully, but these errors were encountered: