Skip to content

Merge reload_boutdataset() functionality into open_boutdataset()#137

Merged
johnomotani merged 5 commits intomasterfrom
merge-reload-into-open
Sep 10, 2020
Merged

Merge reload_boutdataset() functionality into open_boutdataset()#137
johnomotani merged 5 commits intomasterfrom
merge-reload-into-open

Conversation

@johnomotani
Copy link
Collaborator

Removes reload_boutdataset(). Instead updates open_boutdataset() to automatically detect and reload datasets saved from xBOUT, including those saved with separate_vars=True. pre_squashed argument is no longer needed so is removed.

Replaces #136.

@TomNicholas maybe overkill for your issue, but I think this solution is nicer! Does it fix the problem for you? If the broken backward compatibility with pre_squashed is going to be much effort for you to fix (the re-opened Datasets will be different now, e.g. will include a metadata attribute), then we could keep pre_squashed with the old behaviour on a deprecation cycle.

@johnomotani johnomotani added the enhancement New feature or request label Sep 9, 2020
@pep8speaks
Copy link

pep8speaks commented Sep 9, 2020

Hello @johnomotani! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-10 12:24:52 UTC

Global coordinates are useful for later re-loading (using
combine="by_coords") Datasets which were split up, so use add_geometry()
to include a coordinate for each dimension even if geometry=None so no
physical coordinates are needed.
Use combine='by_coords' option to xr.open_mfdataset() so
reload_boutdataset() can load datasets that were saved split (in any
dimension) into several files.

pre_squashed option to reload_boutdataset() is no longer needed, so is
removed.
Remove reload_boutdataset() function, putting its functionality into
open_boutdataset.

Also removes pre_squashed argument - datasets saved with xBOUT are
automatically detected now.
@johnomotani johnomotani force-pushed the merge-reload-into-open branch from f255b4a to e83f3a0 Compare September 9, 2020 21:34
@johnomotani
Copy link
Collaborator Author

@TomNicholas about backward compatibility - the test for a file that was saved by xBOUT relies on metadata being saved in the way implemented in #124, which was only merged in May, so I guess you probably have data files that don't have metadata:keep_yboundaries in the attributes? So I'll put pre_squashed back in for backward compatibility.

@TomNicholas
Copy link
Collaborator

@johnomotani I was actually just testing this!

the test for a file that was saved by xBOUT relies on metadata being saved in the way implemented in #124, which was only merged in May, so I guess you probably have data files that don't have metadata:keep_yboundaries in the attributes?

I don't think I do - I tried it on my big 3D run that I resaved and it seems to work...

This necessitates corresponding changes in xstorm which I was just trying to do though

Copy link
Collaborator

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @johnomotani

I like the idea, I probably should have just used reload_boudataset originally but this is a better solution.

Only thing I would say is that I don't think there should be a filetype_fake return type, because code should never behave differently depending on whether or not it thinks it is being tested, otherwise the tests aren't testing the real code!

@johnomotani
Copy link
Collaborator Author

Only thing I would say is that I don't think there should be a filetype_fake return type, because code should never behave differently depending on whether or not it thinks it is being tested, otherwise the tests aren't testing the real code!

I agree, but the trouble is the test suite was timing out on Travis (taking more than 50 minutes). Having the ability to create a test dataset without writing to disk and reading from disk, when testing non-I/O related features, gets this down to ~34 minutes. Pragmatically then we have a choice between testing fewer options, or introducing the complication of a second code path to avoid necessarily writing-to/reading-from disk... I prefer the second one (and pushed it in in #132), but as I write this I realise that there probably needs to be a test that the from-disk and 'faked' versions do exactly the same thing... I'll make another PR for that.

@johnomotani
Copy link
Collaborator Author

...also maybe 'fake' was a bad choice of name. I was thinking of the ability to create a BoutDataset from a list of xr.Dataset as 'just for testing' but in principle it's a feature, e.g. you could run a simulation with boutcore, stick the outputs into one or several Datasets (might need more features added in boutcore to gather arrays to a single process or something, so this might not actually be possible right now...) and then use open_boutdataset() to convert them into a BoutDataset. Honestly though, it's just useful for speeding the tests up!

Copy link
Collaborator

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realise there was a practical problem with tests timing out. I'm quite surprised that they take that long... But we can deal with that later - this general approach should still be merged I think.

@johnomotani
Copy link
Collaborator Author

Yes, when I added the Region functionality, I added a lot of tests for different cases, and when they all run the test suite gets pretty long! I've added more tests of the new open-from-list-of-Datasets approach in #139.

@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2020

Codecov Report

Merging #137 into master will decrease coverage by 0.25%.
The diff coverage is 75.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #137      +/-   ##
==========================================
- Coverage   77.88%   77.62%   -0.26%     
==========================================
  Files          14       14              
  Lines        2139     2150      +11     
  Branches      480      486       +6     
==========================================
+ Hits         1666     1669       +3     
- Misses        304      308       +4     
- Partials      169      173       +4     
Impacted Files Coverage Δ
xbout/geometries.py 78.87% <73.33%> (+0.30%) ⬆️
xbout/load.py 79.40% <75.34%> (-1.83%) ⬇️
xbout/__init__.py 100.00% <100.00%> (ø)
xbout/utils.py 80.42% <100.00%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec95524...59bf07e. Read the comment docs.

@johnomotani johnomotani merged commit c90b790 into master Sep 10, 2020
@johnomotani johnomotani deleted the merge-reload-into-open branch September 10, 2020 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants