Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for loading variable length packets from CSV #115

Closed
jmbhughes opened this issue Feb 20, 2024 · 10 comments
Closed

Add support for loading variable length packets from CSV #115

jmbhughes opened this issue Feb 20, 2024 · 10 comments

Comments

@jmbhughes
Copy link
Contributor

When completing #114 , I noticed that you cannot load variable length packets from a CSV. There is no way to do the reference linking or specifying an expanding field. PUNCH would like to load variable length packets from CSV. I'm happy to work on this, but I'd like guidance on how you want this structured. At the moment, the loading CSV is in the _BasePacket class. This change would make the loading different for fixed and variable length.

@ddasilva
Copy link
Collaborator

Thanks for pulling on this thread!

I was thinking of a couple ways this could work:

  1. Have the data_type column support values like uint8(*) (expanding) or uint8(other_field) (reference linking) or uint8(3,3) (3x3 array)
  2. Add a new CSV column for array_shape, which maps directly to the keyword argument

Option 1 seems maybe more readable when skimming the file, but Option 2 is easier for code to work with.

What do you think?

@jmbhughes
Copy link
Contributor Author

I can see pros and cons with both. Option 1 does not require existing CSV files to be modified for anyone already using this feature. Option 2 is simpler to write (both code on our side to process and the CSV on the user side) and understand, but it feels less elegant to have empty entries for PacketFields. I like the idea of keeping the data type and array shape columns separate and making them directly correspond to the code keywords.

If there is already wide adoption of the CSV format, I would lean towards Option 1 since it doesn't break anything for users. However, I think I like Option 2 if the CSV loading feature is not currently widely used. Do you know if the CSV loading is currently being used by any mission?

If we go with Option 1, I wonder if it might be clearer to just literally write uint8(expand) so one doesn't have to remember the mapping between * and expanding... however said mapping does feel pretty obvious to me. So I'm not strongly committed to that opinion.

@ddasilva
Copy link
Collaborator

Any thoughts @ehsteve ? I think that some missions are using the CSV format, but we can do something like

from_file(file, version="v2") and change the default the next major CCSDSPy release

@ehsteve
Copy link
Member

ehsteve commented Feb 26, 2024

@ddasilva thanks for considering my opinion! I'm much more in favor of option 1 both because it does not break existing use of csv but also because it feels right to add new data types that expand.

Also I would use * and not the word 'expand' just in case someone wants to call one of their fields 'expand'. It also makes the code for parsing clearer.

I am really not a fan of something like from_file(file, version="v2") to ever be used because then someone would have to pay attention to version numbers of things and we'd have to support code to support multiple versions of csv files which would be a pain (and also hard to document).

@ddasilva
Copy link
Collaborator

Great! I'll go with @ehsteve's suggestion. I think for array order we can do uint(8, 8; F) to e.g. specify Fortran order, where what's after the semicolon is optional and uint(8, 8) uses the default array order.

Regarding uint(*) vs uint(expand), I think we should stick with expand, because that's what the API does. I think having two parallel vocabularies to describe things is going to get confusing.

Are there any other missing features of CSV formats what we haven't covered?

@ddasilva
Copy link
Collaborator

@tloubrieu-jpl feel free to offer your thoughts if you are interested

@ehsteve
Copy link
Member

ehsteve commented Feb 27, 2024

Do we need to support specifying Fortran order? The array can be easily flipped after the fact with the numpy array, right?

@ddasilva
Copy link
Collaborator

Probably not tbh. We can wait until someone complains to reconsider.
Does this cover everything @jmbhughes ?

@jmbhughes
Copy link
Contributor Author

Yes @ddasilva this gives me enough to get something started.

@jmbhughes
Copy link
Contributor Author

Closing since the branch was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants