Skip to content

Add proper migration for "Organization" -> "Affiliation" change #276

@yarikoptic

Description

@yarikoptic

In #266 (comment) @candleindark identified oddity in our metadata records, that Affiliation records include fields which are not part of the Affiliation model, e.g.

https://api.dandiarchive.org/api/dandisets/000029/versions/draft/info/ ATM has

                "affiliation": [
                    {
                        "name": "An Institution",
                        "roleName": [],
                        "schemaKey": "Affiliation",
                        "contactPoint": [],
                        "includeInCitation": false
                    }
                ],

after doing archeological metadata expedition we figured that it is 99% likely due to

  • edd44f6 by @satra
    released in 0.2.0 version of the library corresponding to schema-0.4.0

Image

where affiliations got their own Affiliation class. But migrate() function was not adjusted to filter them out somehow... but here we do not even need explicit migration since pydantic likely to do the right thing:

In [10]: Affiliation.model_construct(**{
    ...:                         "name": "An Institution",
    ...:                         "roleName": [],
    ...:                         "schemaKey": "Organization",
    ...:                         "contactPoint": [],
    ...:                         "includeInCitation": False
    ...:                                             }).model_dump()
Out[10]: 
{'id': None,
 'schemaKey': 'Organization',
 'identifier': None,
 'name': 'An Institution'}

and here is with the full

In [11]: Affiliation(**{
    ...:                         "name": "An Institution",
    ...:                         "roleName": [],
    ...:                         "contactPoint": [],
    ...:                         "includeInCitation": False
    ...:                                             }).model_dump()
Out[11]: 
{'id': None,
 'schemaKey': 'Affiliation',
 'identifier': None,
 'name': 'An Institution'}

so the hypothesis that absence of metadata migration on dandi-archive side, ref:

keeps old metadata versions present, and it is so:

dandi@drogon:/mnt/backup/dandi/dandisets$ grep -h schemaVersion */dandiset.yaml  | sort | uniq -c
      8 schemaVersion: 0.4.4
    139 schemaVersion: 0.6.0
     26 schemaVersion: 0.6.2
     85 schemaVersion: 0.6.3
    311 schemaVersion: 0.6.4
     12 schemaVersion: 0.6.6
    111 schemaVersion: 0.6.7
    109 schemaVersion: 0.6.8

which would forbid us to validate using more strict models such as the ones disallowing for extra fields, but also potentially simply having "bugs" due to migration not carried out at all.

On the side of dandi-schema I would like us to check what would happen if we .migrate() metadata records for dandisets -- would they succeed/fail and get rid of those irrelevant values.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions