Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong scaling of 1-step diff_std, should be standardized #112

Closed
joeloskarsson opened this issue Feb 4, 2025 · 6 comments · Fixed by #122
Closed

Wrong scaling of 1-step diff_std, should be standardized #112

joeloskarsson opened this issue Feb 4, 2025 · 6 comments · Fixed by #122
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed
Milestone

Comments

@joeloskarsson
Copy link
Collaborator

After having issues with my one variable having really high loss I realized that my issues stem from it having a very low value for diff_std in ar_model. This is the std-dev of 1-step differences. Digging a bit, I realize this is in mllam-data-prep computed for the non-standardized variables https://github.com/mllam/mllam-data-prep/blob/3a48c997144e66ef2478413668e669e124400cdd/mllam_data_prep/ops/statistics.py#L45 whereas it was earlier (and still for the MEPS datastore) for the standardized variables

These differences should be converted to the standardized scale before being used in the loss. It is not necessarily wrong to compute them for the non-standard vars, they just have to be re-scaled. And we should be consistent about the scaling for these that are returned from the datastore.

Having the wrong scaling of these is actually a significant problem, and lines like

rescaled_delta_mean = pred_delta_mean * self.diff_std + self.diff_mean
will make it harder for the model rather than help. So this is something that needs fixing.

I don't have any strong opinions about where the standardization should happen: 1. when computing the stats and storing them on disk 2. when loading these in the datastore or 3. in the model itself. We just need to decide on some strategy to keep this clear and consistent.

@joeloskarsson joeloskarsson added the bug Something isn't working label Feb 4, 2025
@joeloskarsson
Copy link
Collaborator Author

This is a hotfix that works for the mdp datastore, but will instead make the scalings wrong for the npy-MEPS datastore: joeloskarsson@2e46ecc So the problem should really be solved by making this consistent by deciding if these should be returned in standardized scale from the datastore or not.

@leifdenby
Copy link
Member

These differences should be converted to the standardized scale before being used in the loss. It is not necessarily wrong to compute them for the non-standard vars, they just have to be re-scaled. And we should be consistent about the scaling for these that are returned from the datastore.

Good catch. I had a feeling that I didn't completely grok this while we were writing the datastores code 😔 I am happy with returning the standardized time/space-mean time-step differences (as npy-MEPS does it) rather than just the time/space-mean time-step differences* as we do now (from the MDP datastore). But I would change the name from diff_mean to diff_mean_standardized or something like that though (also in base_graph_model.py) just to make it explicit that these mean time step differences have been standardized. Hopefully, that should avoid this confusing arising again. Returning the standardized values will make the datastore function closer to how your v0.1.0 of neural-lam behaved, right?

@joeloskarsson
Copy link
Collaborator Author

Yes, it would. I think that sounds good, and agree that we should update the variable name and possibly add a comment about this to avoid confusion.

@joeloskarsson joeloskarsson added the help wanted Extra attention is needed label Feb 7, 2025
@SimonKamuk SimonKamuk self-assigned this Feb 10, 2025
@leifdenby leifdenby added this to the v0.4.0 milestone Feb 10, 2025
@SimonKamuk
Copy link
Contributor

SimonKamuk commented Feb 10, 2025

  1. Okay, I've had a look at this. The mdp datastore currently gives non-standardized states and their standard deviation, with the standardization happening in WeatherDataset. Since the diff_std and diff_mean do not go through the WeatherDataset, they cannot be standardized there. I guess it could happen in the datastore, which would then mean that the datastore returns non-standardized states together with standardized state diffs. For that reason I would maybe vote for standardizing the difference in the model instead (e.g. in https://github.com/mllam/neural-lam/blob/main/neural_lam/models/ar_model.py#L71), but if you prefer, I will implement it in the datastore as @joeloskarsson and @leifdenby suggested.

  2. Just to be clear, @joeloskarsson , when you say that code like
    rescaled_delta_mean = pred_delta_mean * self.diff_std + self.diff_mean actually hurts performance, you just mean as long as the diff_std and diff_mean values are wrong (meaning not standardized), right?

  3. In https://github.com/mllam/neural-lam/blob/main/neural_lam/models/ar_model.py#L98 this is where you mentioned @joeloskarsson that you saw the issue, as that value is supposed to be the standardized diff_std, right?

  4. On another note, I was a bit surprised that the diff_std is combined with the variable weights in per_var_std. Is there a specific reason for this? I was mostly just confused by it because looking at the loss functions in metrics.py it is not clear that the std is sometimes weighted and sometimes not. But it would also mean that anytime --output_std is set, the weighting is not applied, which I would not expect. Or maybe I just haven't been paying attention

@joeloskarsson
Copy link
Collaborator Author

  1. Datastores do optionally return standardized dataarrays now
    def get_dataarray(
    self, category: str, split: str, standardize: bool = False
    ) -> xr.DataArray:
    , but the default is still non-standardized. So it is a bit up to the user of the datastore to keep track if they want things standardized or not. But I think I agree with you that it would be a bit odd to always return these standardized from the datastore. So I am perfectly fine with just doing the standardization in the model. Then you will instead have to modify the NpyMEPS datastore to return them non-standardized, as they are standardized even before being saved to disk there.
  2. Yes, nothing more. Just that those values will scale things to be way off and the model have to learn to compensate for that.
  3. Yes, the resulting per_var_std took very strange values due to self.diff_std not being standardized.
  4. Combining these originates from the graphcast paper, and does also make a lot of sense I think, to balance between fields that change a lot between different time steps and ones that do not. If I would directly put weight 1 on both z500 and u10, then most of the training would anyhow focus on u10 since there is so much more differences in those fields over 1 time step.
    I have realized that the whole structure of this with std.-dev. being either computed or output and then passed to the loss functions in metrics.py is very confusing to many people. It makes total sense to me, because I view deep learning training as maximum likelihood estimation, but that requires thinking about ML in a specific way. I probably still think that viewpoint is a good idea, but could require some clarifications.

it would also mean that anytime --output_std is set, the weighting is not applied

Yes, that is the case. When you are outputting std-dev:s from the model the model will anyhow decide the weights for different variables. So even if you would for example increase the weighting for u10 with a factor 10, it would be optimal for the model to then just adjust the std-dev output by a factor 10, and you end up in the same situation. So any manually specified weights would not have any impact as you are training. It could however make sense to use manually specified weights as an initialization for the model, but this is currently not implemented.

@SimonKamuk
Copy link
Contributor

Oh, I didn't realize that there is an option of standardizing in both the datastore in addition to the WeatherDataset. In that case I don't think I have anything against doing it in the datastore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants