Wrong scaling of 1-step diff_std, should be standardized #112

joeloskarsson · 2025-02-04T15:26:08Z

After having issues with my one variable having really high loss I realized that my issues stem from it having a very low value for diff_std in ar_model. This is the std-dev of 1-step differences. Digging a bit, I realize this is in mllam-data-prep computed for the non-standardized variables https://github.com/mllam/mllam-data-prep/blob/3a48c997144e66ef2478413668e669e124400cdd/mllam_data_prep/ops/statistics.py#L45 whereas it was earlier (and still for the MEPS datastore) for the standardized variables

neural-lam/neural_lam/datastore/npyfilesmeps/compute_standardization_stats.py

Line 283 in f233f87

) # Re-load with standardization

These differences should be converted to the standardized scale before being used in the loss. It is not necessarily wrong to compute them for the non-standard vars, they just have to be re-scaled. And we should be consistent about the scaling for these that are returned from the datastore.

Having the wrong scaling of these is actually a significant problem, and lines like

neural-lam/neural_lam/models/base_graph_model.py

Line 174 in f233f87

rescaled_delta_mean = pred_delta_mean * self.diff_std + self.diff_mean

will make it harder for the model rather than help. So this is something that needs fixing.

I don't have any strong opinions about where the standardization should happen: 1. when computing the stats and storing them on disk 2. when loading these in the datastore or 3. in the model itself. We just need to decide on some strategy to keep this clear and consistent.

joeloskarsson · 2025-02-04T15:50:26Z

This is a hotfix that works for the mdp datastore, but will instead make the scalings wrong for the npy-MEPS datastore: joeloskarsson@2e46ecc So the problem should really be solved by making this consistent by deciding if these should be returned in standardized scale from the datastore or not.

leifdenby · 2025-02-04T16:23:42Z

These differences should be converted to the standardized scale before being used in the loss. It is not necessarily wrong to compute them for the non-standard vars, they just have to be re-scaled. And we should be consistent about the scaling for these that are returned from the datastore.

Good catch. I had a feeling that I didn't completely grok this while we were writing the datastores code 😔 I am happy with returning the standardized time/space-mean time-step differences (as npy-MEPS does it) rather than just the time/space-mean time-step differences* as we do now (from the MDP datastore). But I would change the name from diff_mean to diff_mean_standardized or something like that though (also in base_graph_model.py) just to make it explicit that these mean time step differences have been standardized. Hopefully, that should avoid this confusing arising again. Returning the standardized values will make the datastore function closer to how your v0.1.0 of neural-lam behaved, right?

joeloskarsson · 2025-02-04T16:59:47Z

Yes, it would. I think that sounds good, and agree that we should update the variable name and possibly add a comment about this to avoid confusion.

SimonKamuk · 2025-02-10T17:47:19Z

Okay, I've had a look at this. The mdp datastore currently gives non-standardized states and their standard deviation, with the standardization happening in WeatherDataset. Since the diff_std and diff_mean do not go through the WeatherDataset, they cannot be standardized there. I guess it could happen in the datastore, which would then mean that the datastore returns non-standardized states together with standardized state diffs. For that reason I would maybe vote for standardizing the difference in the model instead (e.g. in https://github.com/mllam/neural-lam/blob/main/neural_lam/models/ar_model.py#L71), but if you prefer, I will implement it in the datastore as @joeloskarsson and @leifdenby suggested.
Just to be clear, @joeloskarsson , when you say that code like
rescaled_delta_mean = pred_delta_mean * self.diff_std + self.diff_mean actually hurts performance, you just mean as long as the diff_std and diff_mean values are wrong (meaning not standardized), right?
In https://github.com/mllam/neural-lam/blob/main/neural_lam/models/ar_model.py#L98 this is where you mentioned @joeloskarsson that you saw the issue, as that value is supposed to be the standardized diff_std, right?
On another note, I was a bit surprised that the diff_std is combined with the variable weights in per_var_std. Is there a specific reason for this? I was mostly just confused by it because looking at the loss functions in metrics.py it is not clear that the std is sometimes weighted and sometimes not. But it would also mean that anytime --output_std is set, the weighting is not applied, which I would not expect. Or maybe I just haven't been paying attention

joeloskarsson · 2025-02-11T09:20:16Z

Datastores do optionally return standardized dataarrays now

neural-lam/neural_lam/datastore/mdp.py

Lines 222 to 224 in f342487

    
           def get_dataarray( 
        
               self, category: str, split: str, standardize: bool = False 
        
           ) -> xr.DataArray:

, but the default is still non-standardized. So it is a bit up to the user of the datastore to keep track if they want things standardized or not. But I think I agree with you that it would be a bit odd to always return these standardized from the datastore. So I am perfectly fine with just doing the standardization in the model. Then you will instead have to modify the NpyMEPS datastore to return them non-standardized, as they are standardized even before being saved to disk there.

Yes, nothing more. Just that those values will scale things to be way off and the model have to learn to compensate for that.
Yes, the resulting per_var_std took very strange values due to self.diff_std not being standardized.
Combining these originates from the graphcast paper, and does also make a lot of sense I think, to balance between fields that change a lot between different time steps and ones that do not. If I would directly put weight 1 on both z500 and u10, then most of the training would anyhow focus on u10 since there is so much more differences in those fields over 1 time step.
I have realized that the whole structure of this with std.-dev. being either computed or output and then passed to the loss functions in metrics.py is very confusing to many people. It makes total sense to me, because I view deep learning training as maximum likelihood estimation, but that requires thinking about ML in a specific way. I probably still think that viewpoint is a good idea, but could require some clarifications.

it would also mean that anytime --output_std is set, the weighting is not applied

Yes, that is the case. When you are outputting std-dev:s from the model the model will anyhow decide the weights for different variables. So even if you would for example increase the weighting for u10 with a factor 10, it would be optimal for the model to then just adjust the std-dev output by a factor 10, and you end up in the same situation. So any manually specified weights would not have any impact as you are training. It could however make sense to use manually specified weights as an initialization for the model, but this is currently not implemented.

SimonKamuk · 2025-02-11T12:05:19Z

Oh, I didn't realize that there is an option of standardizing in both the datastore in addition to the WeatherDataset. In that case I don't think I have anything against doing it in the datastore.

joeloskarsson added the bug Something isn't working label Feb 4, 2025

joeloskarsson added the help wanted Extra attention is needed label Feb 7, 2025

SimonKamuk self-assigned this Feb 10, 2025

leifdenby added this to the v0.4.0 milestone Feb 10, 2025

SimonKamuk mentioned this issue Feb 11, 2025

Standardize state diff stats in mdp datastore. Change variable name. #122

Merged

20 tasks

SimonKamuk closed this as completed in #122 Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong scaling of 1-step diff_std, should be standardized #112

Wrong scaling of 1-step diff_std, should be standardized #112

joeloskarsson commented Feb 4, 2025

joeloskarsson commented Feb 4, 2025

leifdenby commented Feb 4, 2025

joeloskarsson commented Feb 4, 2025

SimonKamuk commented Feb 10, 2025 •

edited

Loading

joeloskarsson commented Feb 11, 2025

SimonKamuk commented Feb 11, 2025

Wrong scaling of 1-step diff_std, should be standardized #112

Wrong scaling of 1-step diff_std, should be standardized #112

Comments

joeloskarsson commented Feb 4, 2025

joeloskarsson commented Feb 4, 2025

leifdenby commented Feb 4, 2025

joeloskarsson commented Feb 4, 2025

SimonKamuk commented Feb 10, 2025 • edited Loading

joeloskarsson commented Feb 11, 2025

SimonKamuk commented Feb 11, 2025

SimonKamuk commented Feb 10, 2025 •

edited

Loading