-
Notifications
You must be signed in to change notification settings - Fork 21
Add functionality to use arbitrary singular values in ConcatDiskArrays #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fg/cat
Are you sure you want to change the base?
Conversation
src/cat.jl
Outdated
et = Base.nonmissingtype(eltype(arrays)) | ||
T = Union{Missing,eltype(et)} | ||
T = promotetype(typeof(fill), eltype(et)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the passed in array would still contain either arrays or missing
values, but we would just return fill
for those sections, so fill
would need to be a struct field.
So what we have here is kinda half way. Probably we should either drop the fill
keyword (and just rely on the array values) or keep it as a struct field and use missing
in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to drop the fill keyword. I actual thought I already did so.
With my approach we can have different fill values in different parts of the concatenated array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is that actually useful? I can't imagine a real use case for multiple fill values. And putting missing
in the array seems clearer for representing missing arrays.
The missing value used in the array of arrays doesn't have to be coupled to the fill value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it might be a bit dangerous to allow arbitrary values to represent missings. OTOH one never knows when this might become useful for some reason. I am ok with both ways and would currently slightly prefer @felixcremer s solution because there is already an implementation. If we want to be a bit more on the safe side, we could introduce a small wrapper type like
struct MissingTile{F}
fillvalue::F
end
that explicitly marks missing entries to be concatenated and how they should be filled. The main reason why the current solution might be problematic is for DiskArrays with arrays as eltype, but maybe this is overthinking things a bit too much.
Co-authored-by: Rafael Schouten <[email protected]>
Co-authored-by: Rafael Schouten <[email protected]>
src/cat.jl
Outdated
et = Base.nonmissingtype(eltype(arrays)) | ||
T = Union{Missing,eltype(et)} | ||
T = promotetype(typeof(fill), eltype(et)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T = promotetype(typeof(fill), eltype(et)) | |
T = Union{Missing,eltype(et)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you have to read all the fill values in the array and promote them? Missing is not necessarily in the type?
Co-authored-by: Rafael Schouten <[email protected]>
Why do the tests not run on this pr? |
I took this for a spin and I am not sure, whether it is this change or #256 or ConcatDiskArray even before but saving a ConcatDiskArray where I have some missings takes a very long time. Saving the european dataset took the whole weekend. |
This adds the possibility to use any value instead of only missing in ConcatDiskArrays.
We can even mix the singular values.