-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue when first <field/> does not have an 'index' attribute #17
Comments
Thanks @nielsklazenga. I've fixed the bug for the dwca meta xml format for this dataset. I've added some test cases to test the dwca extraction and validation, including the format related to this dataset https://github.com/AtlasOfLivingAustralia/dwcahandler/blob/feature/v0.4.0/tests/input_files/dwca/dwca-sample2/meta.xml |
* #17 - Fix for metadata fields and add validate dwca test cases * #17 - Remove commented code * #17 - Validate unit test * Update macOS version for test * AtlasOfLivingAustralia/preingestion#272 - Strip column header spaces in csv files. Add test cases to test the core and ext dataframe * increase test version
Some data sets we get through the GBIF repatriation do not pass pre-ingestion and give the following error message:
The error can be tracked down to the extraction of metadata from the
meta.xml
file in the DwcaHandler, https://github.com/AtlasOfLivingAustralia/dwcahandler/blob/develop/src/dwcahandler/dwca/dwca_meta.py/#L160:The fields list is created in https://github.com/AtlasOfLivingAustralia/dwcahandler/blob/develop/src/dwcahandler/dwca/dwca_meta.py/#L146:
The
meta.xml
file of the DwCA that failed starts with:So, the first
<field/>
orfield[0]
indeed does not have an 'index' attribute.Default fields that are not associated with columns in the CSV are part of the DwC-A specification, so we should be able to deal with them, or at least they should not break pr-ingestion.
The easy solution would be to only include
<fields/>
with an 'index' attribute in thefields
list, but this could also be taken as an opportunity the deal with default values that are provided in the DwCAmeta.xml
(maybe later).The text was updated successfully, but these errors were encountered: