-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added depth ave currents #144
base: main
Are you sure you want to change the base?
Conversation
I think this is related to issue #19 . Using |
OG_Format.adoc
Outdated
//// | ||
== Depth average currents | ||
|
||
Depth average currents should be included as a geophysical value in the N_MEASUREMENTS dimension along with an associated QC channel at the time when the calculation was completed by the glider. The methodology for derivation of the depth average current can be included as a methodology attribute to the variable. Results of the OceanGliders working group on depth average currents are pending at the time of writing (25th April 2023). Once, the working group provides result the approach for inclusion of depth average currents should be reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using N_MEASUREMENTS
would impose a lot of redundancy, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@castelao Not sure what you mean by this. Do you mean as opposed to defining them as scalars? Can you explain? We discussed the pros and cons at length on the call today and came to agreement that defining these parameters using the unlimited record dimension provides the most flexibility for the data providers. Yes, there will be lots of missing/_FillValues but compression will make the file size increase negligible. There are some cons (ERDDAP aggregation, distribution, etc.) depending on whether the data providers are submitting one file/underwater segment as opposed to one file that contains all of the underwater segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kerfoot , let's first check if we are understanding the same from the proposed text. My understanding is that as temperature is defined as TEMP(N_MEASUREMENTS)
, so would be depth_averaged_velocity(N_MEASUREMENS)
. Thus, even though such an average is only possible between two surfaces, the same value would be repeated as many times as temperature measurements during a dive, which could be on the order of thousands. Did I understand it correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The calculated depth-average current value would be reported at the appropriate time(N_MEASUREMENTS) row only. The time of the reported depth-averaged current would be left up to the data provider, according to their own determination. While the underlying calculation of the depth-averaged current is likely similar for all glider types, the timing of the reported current will be left up to the data provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused, but I think we are getting there. So the depth_averaged_velocity
is not a scalar, but has the same dimension of time, which is N_MEASUREMENTS
. Only one value of depth_averaged_velocity
would be reported, which would be aligned with the respective time that the average is representative for. Is it the proposed idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@castelao I'll offer my take as well. I think your last comment matches what @kerfoot proposed on the call. As I understood it, assuming the file contains multiple profiles, the suggestion was to include a sparse depth averaged current variable where there are mostly fill values but one depth averaged current value per profile. @kerfoot proposed file size would not be an issue if nc compression is used. An alternative is to follow the NGDAC approach where each .nc file is one indiv. underwater segment. In this case the depth averaged velocity can be stored as a scalar. I don't personally have experience aggregating scalars like this but @kerfoot reports this works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI all, just to remind everyone, the main intended use of OG1.0 is for the fundamental data object of "mission" a.k.a. "deployment." So I believe the proposal we have to consider is whether to include DAC as a variable with same dimensions as the fundamental geophysical dimension (N_MEASUREMENTS), with FillValue on most points. I am fine leaving this to the next version, and then we could add surface velocity as well at the same time. My reason for pushing this is because all the glider types have been computing this for years and the data are there for the taking already (with many caveats).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the delay. I think that there are different points here. My opinion on each one of those:
- I think it is worth re-phrasing this part of the text. I don't think the text is clear for any of the two discussed paths. As it is, it can create confusion.
- I agree with @glidermann , and independent if it is possible to be used with arbitrary chunks of a mission, it should be expected to work well with a full mission/deployment. No problem in using this format with just a 'segment', but it should be a priority to accommodate well a full mission.
- There is no problem for the average velocity to be a scalar in a segment sub-dataset, and there is no problem in aggregating several segments, but that would require a dimension to align such an array of depth averages. Better we name it, or it would be free for each one decides. Alternatively, if the idea in the case of aggregation of multiple segments is to jump straight to a long array with FillValues, it would be required a new variable with the representative time for that average.
- I understand that a long array with
N_MEASUREMENTS
with mostly FillValues is possible. It is true that compression minimizes the problem (although there is a small overhead for the compression since nothing is free). If storage is not a problem, I'm not so sure that on the time to process/analyze it that overhead would be larger. The most important issue with that option in my opinion is that the depth average is a quantity representative for a period of time, not a specific time. Different than the temperature where each measurement could be assumed instantaneous (in fact an average of a short time window), the depth average is for the period underwater. If that is not implicit within the data structure, it would be a good idea to designate what is the 'segment', otherwise we will let every user figure that out by themselves. I strongly think that the priority should be on the final user and the format should be the easiest way that science can be done with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm the only one seeing an issue here, feel free to 'resolve conversation'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all, I think we should leave this issue until the next version. Looking at the manual as a whole, a dedicated, but cryptic, section devoted to DAC seems out of place. After all, we have the generic and reference to vocabularies in the Geophysical Variables section and do not prescribe that any particular quantity should be included, only how it should be done. Since current is in the vocab, users can do it now as they wish, or wait until the next version for more guidance. this discussion above will be a good starting point, and is here for posterity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the team decides to move forward with the long array with FillValues, please move ahead. I won't stop it.
OG_Format.adoc
Outdated
//// | ||
== Depth average currents | ||
|
||
Depth average currents should be included as a geophysical value in the N_MEASUREMENTS dimension along with an associated QC channel at the time when the calculation was completed by the glider. The methodology for derivation of the depth average current can be included as a methodology attribute to the variable. Results of the OceanGliders working group on depth average currents are pending at the time of writing (25th April 2023). Once, the working group provides result the approach for inclusion of depth average currents should be reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the delay. I think that there are different points here. My opinion on each one of those:
- I think it is worth re-phrasing this part of the text. I don't think the text is clear for any of the two discussed paths. As it is, it can create confusion.
- I agree with @glidermann , and independent if it is possible to be used with arbitrary chunks of a mission, it should be expected to work well with a full mission/deployment. No problem in using this format with just a 'segment', but it should be a priority to accommodate well a full mission.
- There is no problem for the average velocity to be a scalar in a segment sub-dataset, and there is no problem in aggregating several segments, but that would require a dimension to align such an array of depth averages. Better we name it, or it would be free for each one decides. Alternatively, if the idea in the case of aggregation of multiple segments is to jump straight to a long array with FillValues, it would be required a new variable with the representative time for that average.
- I understand that a long array with
N_MEASUREMENTS
with mostly FillValues is possible. It is true that compression minimizes the problem (although there is a small overhead for the compression since nothing is free). If storage is not a problem, I'm not so sure that on the time to process/analyze it that overhead would be larger. The most important issue with that option in my opinion is that the depth average is a quantity representative for a period of time, not a specific time. Different than the temperature where each measurement could be assumed instantaneous (in fact an average of a short time window), the depth average is for the period underwater. If that is not implicit within the data structure, it would be a good idea to designate what is the 'segment', otherwise we will let every user figure that out by themselves. I strongly think that the priority should be on the final user and the format should be the easiest way that science can be done with it.
@rtodd-WHOI , maybe this discussion is of your interest. |
Is this PR ready to merge? I believe it's one of the few remaining decisions to finalize for the first release |
@callumrollo , thanks for raising this point again. I still think that this approach will create confusion, and even if moving forward with this, the text is not clear. But if everyone else is happy as it is, feel free to 'resolve conversation' and merge it. |
I do not think we should get into details of a potential geophysical variable we think should be included in this document. We can discuss details of currents, or other derived/processed quantities in a best practice document as they become available and mature. Then reference those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinion here.
I would tag this issue in order to discover it back easily.
Proponents:
Moderator: @OceanGlidersCommunity/format-mantainers
Type of PR
Related Issues
Dates when it got review approvals
Release checklist
ask reviewers to update their review.
to merge.
according to the final decision.
For maintainers
best to have one single moderator to guide and help this PR to move
forward. It is OK to update the moderador pass it to another one.
Comments