Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] correctly annotated column not detected as categorical if the BIDS "Levels" key is missing #153

Closed
4 tasks done
surchs opened this issue Jun 10, 2023 · 0 comments · Fixed by #154
Closed
4 tasks done
Assignees

Comments

@surchs
Copy link
Contributor

surchs commented Jun 10, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Expected Behavior

"sex": {
        "Annotations": {
            "IsAbout": {
                "TermURL": "nb:Sex",
                "Label": ""
            },
            "Levels": {
                "F": {
                    "TermURL": "snomed:248152002",
                    "Label": ""
                },
                "M": {
                    "TermURL": "snomed:248153007",
                    "Label": ""
                }
            }
        },
        "Description": "There should have been a description here, but there wasn't. :("
    }

is a valid data dictionary entry for a categorical column under our current data model for data dictionaries.
It does not however have the "Levels" key in the BIDS portion of the dictionary (i.e. outside the "Annotations" key).
The fact that this is valid is it's own issue (#151 and #152).

But we do rely on the presence of the BIDS "Levels" key to determine whether a column is categorical or not:

if is_column_categorical(col, data_dict):
transf_val.append(map_cat_val_to_term(value, col, data_dict))
else:
# TODO: replace with more flexible solution when we have more
# continuous variables than just age
transf_val.append(
transform_age(str(value), get_age_heuristic(col, data_dict))
)

and

def is_column_categorical(column: str, data_dict: dict) -> bool:
"""Determine whether a column in a Neurobagel data dictionary is categorical"""
if "Levels" in data_dict[column]:
return True
return False

And as a consequence, in the above example these things will happen:

  1. The CLI accepts the data dictionary as valid (incorrectly but see other bugs [BUG] data dictionary schema let's anything with an annotation pass #151 and [BUG] Data dictionary model allows for a continuous BIDS column to have categorical Neurobagel annotations #152)
  2. The CLI determines that the "sex" column is continuous because it does not have a "Levels" key and therefore is not categorical (and we do not positively assert continous columns, it's just the catch-all bucket if it isn't missing or categorical)
  3. The CLI tries to run get_age_heuristic on the sex column because it thinks it is continuous

def get_age_heuristic(column: str, data_dict: dict) -> str:
return data_dict[column]["Annotations"]["Transformation"]["TermURL"]

4. We get a KeyError because our column does not have a Transformation key

Current Behavior

No response

Error message

Long snippet
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/lib/python3.10/site-packages/bagel/cli.py:80 in pheno             │
│                                                                              │
│    77 │   │                                                                  │
│    78 │   │   subject = models.Subject(hasLabel=str(participant))            │
│    79 │   │   if "sex" in column_mapping.keys():                             │
│ ❱  80 │   │   │   _sex_val = putil.get_transformed_values(                   │
│    81 │   │   │   │   column_mapping["sex"], _sub_pheno, data_dictionary     │
│    82 │   │   │   )                                                          │
│    83 │   │   │   if _sex_val:                                               │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │      _sub_pheno = participant_id    sub-01                               │ │
│ │                   sex                    M                               │ │
│ │                   age                   25                               │ │
│ │                   Name: 0, dtype: object                                 │ │
│ │  column_mapping = {                                                      │ │
│ │                   │   'participant': ['participant_id'],                 │ │
│ │                   │   'sex': ['sex'],                                    │ │
│ │                   │   'age': ['age']                                     │ │
│ │                   }                                                      │ │
│ │ data_dictionary = {                                                      │ │
│ │                   │   'age': {                                           │ │
│ │                   │   │   'Annotations': {                               │ │
│ │                   │   │   │   'IsAbout': {                               │ │
│ │                   │   │   │   │   'TermURL': 'nb:Age',                   │ │
│ │                   │   │   │   │   'Label': ''                            │ │
│ │                   │   │   │   },                                         │ │
│ │                   │   │   │   'Transformation': {                        │ │
│ │                   │   │   │   │   'TermURL': 'nb:float',                 │ │
│ │                   │   │   │   │   'Label': 'hello'                       │ │
│ │                   │   │   │   }                                          │ │
│ │                   │   │   },                                             │ │
│ │                   │   │   'Description': "There should have been a       │ │
│ │                   description here, but there wasn't. :("                │ │
│ │                   │   },                                                 │ │
│ │                   │   'participant_id': {                                │ │
│ │                   │   │   'Annotations': {                               │ │
│ │                   │   │   │   'IsAbout': {                               │ │
│ │                   │   │   │   │   'TermURL': 'nb:ParticipantID',         │ │
│ │                   │   │   │   │   'Label': ''                            │ │
│ │                   │   │   │   }                                          │ │
│ │                   │   │   },                                             │ │
│ │                   │   │   'Description': "There should have been a       │ │
│ │                   description here, but there wasn't. :("                │ │
│ │                   │   },                                                 │ │
│ │                   │   'sex': {                                           │ │
│ │                   │   │   'Annotations': {                               │ │
│ │                   │   │   │   'IsAbout': {                               │ │
│ │                   │   │   │   │   'TermURL': 'nb:Sex',                   │ │
│ │                   │   │   │   │   'Label': ''                            │ │
│ │                   │   │   │   },                                         │ │
│ │                   │   │   │   'Levels': {                                │ │
│ │                   │   │   │   │   'F': {                                 │ │
│ │                   │   │   │   │   │   'TermURL': 'snomed:248152002',     │ │
│ │                   │   │   │   │   │   'Label': ''                        │ │
│ │                   │   │   │   │   },                                     │ │
│ │                   │   │   │   │   'M': {                                 │ │
│ │                   │   │   │   │   │   'TermURL': 'snomed:248153007',     │ │
│ │                   │   │   │   │   │   'Label': ''                        │ │
│ │                   │   │   │   │   }                                      │ │
│ │                   │   │   │   }                                          │ │
│ │                   │   │   },                                             │ │
│ │                   │   │   'Description': "There should have been a       │ │
│ │                   description here, but there wasn't. :("                │ │
│ │                   │   }                                                  │ │
│ │                   }                                                      │ │
│ │      dictionary = PosixPath('/home/surchs/Repositories/Testplace/get_mo… │ │
│ │            name = 'ds000003'                                             │ │
│ │          output = PosixPath('/home/surchs/Repositories/Testplace/get_mo… │ │
│ │     participant = 'sub-01'                                               │ │
│ │    participants = 'participant_id'                                       │ │
│ │           pheno = PosixPath('/home/surchs/Repositories/Testplace/get_mo… │ │
│ │        pheno_df =    participant_id sex age                              │ │
│ │                   0          sub-01   M  25                              │ │
│ │                   1          sub-02   M  18                              │ │
│ │                   2          sub-03   F  22                              │ │
│ │                   3          sub-04   F  25                              │ │
│ │                   4          sub-05   M  22                              │ │
│ │                   5          sub-06   M  38                              │ │
│ │                   6          sub-07   M  36                              │ │
│ │                   7          sub-08   M  19                              │ │
│ │                   8          sub-09   M  20                              │ │
│ │                   9          sub-10   F  19                              │ │
│ │                   10         sub-11   F  21                              │ │
│ │                   11         sub-12   M  19                              │ │
│ │                   12         sub-13   F  29                              │ │
│ │         subject = Subject(                                               │ │
│ │                   │                                                      │ │
│ │                   identifier='nb:b6e692fe-0ade-476c-b79a-0ba12f86ad37',  │ │
│ │                   │   hasLabel='sub-01',                                 │ │
│ │                   │   hasSession=None,                                   │ │
│ │                   │   hasAge=None,                                       │ │
│ │                   │   hasSex=None,                                       │ │
│ │                   │   isSubjectGroup=None,                               │ │
│ │                   │   hasDiagnosis=None,                                 │ │
│ │                   │   hasAssessment=None,                                │ │
│ │                   │   schemaKey='Subject'                                │ │
│ │                   )                                                      │ │
│ │    subject_list = []                                                     │ │
│ │    tool_mapping = defaultdict(<class 'list'>, {})                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/local/lib/python3.10/site-packages/bagel/pheno_utils.py:172 in          │
│ get_transformed_values                                                       │
│                                                                              │
│   169 │   │   │   # TODO: replace with more flexible solution when we have m │
│   170 │   │   │   # continuous variables than just age                       │
│   171 │   │   │   transf_val.append(                                         │
│ ❱ 172 │   │   │   │   transform_age(str(value), get_age_heuristic(col, data_ │
│   173 │   │   │   )                                                          │
│   174 │                                                                      │
│   175 │   # TODO: once we can handle multiple columns, this section should b │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │        col = 'sex'                                                       │ │
│ │    columns = ['sex']                                                     │ │
│ │  data_dict = {                                                           │ │
│ │              │   'age': {                                                │ │
│ │              │   │   'Annotations': {                                    │ │
│ │              │   │   │   'IsAbout': {'TermURL': 'nb:Age', 'Label': ''},  │ │
│ │              │   │   │   'Transformation': {                             │ │
│ │              │   │   │   │   'TermURL': 'nb:float',                      │ │
│ │              │   │   │   │   'Label': 'hello'                            │ │
│ │              │   │   │   }                                               │ │
│ │              │   │   },                                                  │ │
│ │              │   │   'Description': "There should have been a            │ │
│ │              description here, but there wasn't. :("                     │ │
│ │              │   },                                                      │ │
│ │              │   'participant_id': {                                     │ │
│ │              │   │   'Annotations': {                                    │ │
│ │              │   │   │   'IsAbout': {                                    │ │
│ │              │   │   │   │   'TermURL': 'nb:ParticipantID',              │ │
│ │              │   │   │   │   'Label': ''                                 │ │
│ │              │   │   │   }                                               │ │
│ │              │   │   },                                                  │ │
│ │              │   │   'Description': "There should have been a            │ │
│ │              description here, but there wasn't. :("                     │ │
│ │              │   },                                                      │ │
│ │              │   'sex': {                                                │ │
│ │              │   │   'Annotations': {                                    │ │
│ │              │   │   │   'IsAbout': {'TermURL': 'nb:Sex', 'Label': ''},  │ │
│ │              │   │   │   'Levels': {                                     │ │
│ │              │   │   │   │   'F': {                                      │ │
│ │              │   │   │   │   │   'TermURL': 'snomed:248152002',          │ │
│ │              │   │   │   │   │   'Label': ''                             │ │
│ │              │   │   │   │   },                                          │ │
│ │              │   │   │   │   'M': {                                      │ │
│ │              │   │   │   │   │   'TermURL': 'snomed:248153007',          │ │
│ │              │   │   │   │   │   'Label': ''                             │ │
│ │              │   │   │   │   }                                           │ │
│ │              │   │   │   }                                               │ │
│ │              │   │   },                                                  │ │
│ │              │   │   'Description': "There should have been a            │ │
│ │              description here, but there wasn't. :("                     │ │
│ │              │   }                                                       │ │
│ │              }                                                           │ │
│ │        row = participant_id    sub-01                                    │ │
│ │              sex                    M                                    │ │
│ │              age                   25                                    │ │
│ │              Name: 0, dtype: object                                      │ │
│ │ transf_val = []                                                          │ │
│ │      value = 'M'                                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/local/lib/python3.10/site-packages/bagel/pheno_utils.py:122 in          │
│ get_age_heuristic                                                            │
│                                                                              │
│   119                                                                        │
│   120                                                                        │
│   121 def get_age_heuristic(column: str, data_dict: dict) -> str:            │
│ ❱ 122 │   return data_dict[column]["Annotations"]["Transformation"]["TermURL │
│   123                                                                        │
│   124                                                                        │
│   125 def transform_age(value: str, heuristic: str) -> float:                │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │    column = 'sex'                                                        │ │
│ │ data_dict = {                                                            │ │
│ │             │   'age': {                                                 │ │
│ │             │   │   'Annotations': {                                     │ │
│ │             │   │   │   'IsAbout': {'TermURL': 'nb:Age', 'Label': ''},   │ │
│ │             │   │   │   'Transformation': {                              │ │
│ │             │   │   │   │   'TermURL': 'nb:float',                       │ │
│ │             │   │   │   │   'Label': 'hello'                             │ │
│ │             │   │   │   }                                                │ │
│ │             │   │   },                                                   │ │
│ │             │   │   'Description': "There should have been a description │ │
│ │             here, but there wasn't. :("                                  │ │
│ │             │   },                                                       │ │
│ │             │   'participant_id': {                                      │ │
│ │             │   │   'Annotations': {                                     │ │
│ │             │   │   │   'IsAbout': {                                     │ │
│ │             │   │   │   │   'TermURL': 'nb:ParticipantID',               │ │
│ │             │   │   │   │   'Label': ''                                  │ │
│ │             │   │   │   }                                                │ │
│ │             │   │   },                                                   │ │
│ │             │   │   'Description': "There should have been a description │ │
│ │             here, but there wasn't. :("                                  │ │
│ │             │   },                                                       │ │
│ │             │   'sex': {                                                 │ │
│ │             │   │   'Annotations': {                                     │ │
│ │             │   │   │   'IsAbout': {'TermURL': 'nb:Sex', 'Label': ''},   │ │
│ │             │   │   │   'Levels': {                                      │ │
│ │             │   │   │   │   'F': {                                       │ │
│ │             │   │   │   │   │   'TermURL': 'snomed:248152002',           │ │
│ │             │   │   │   │   │   'Label': ''                              │ │
│ │             │   │   │   │   },                                           │ │
│ │             │   │   │   │   'M': {                                       │ │
│ │             │   │   │   │   │   'TermURL': 'snomed:248153007',           │ │
│ │             │   │   │   │   │   'Label': ''                              │ │
│ │             │   │   │   │   }                                            │ │
│ │             │   │   │   }                                                │ │
│ │             │   │   },                                                   │ │
│ │             │   │   'Description': "There should have been a description │ │
│ │             here, but there wasn't. :("                                  │ │
│ │             │   }                                                        │ │
│ │             }                                                            │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyError: 'Transformation'

ToDo:

  • Add a test and test example for a data dictionary that currently breaks in the way described
  • Change the detect_categorical heuristic to look for "Levels" key inside of the "Annotations" section
  • make sure tests pass
@surchs surchs moved this to Backlog in Neurobagel Jun 10, 2023
@surchs surchs moved this from Backlog to Specify - Active in Neurobagel Jun 10, 2023
@surchs surchs moved this from Specify - Active to Specify - Done in Neurobagel Jun 10, 2023
@surchs surchs moved this from Specify - Done to Implement - Active in Neurobagel Jun 10, 2023
@surchs surchs self-assigned this Jun 10, 2023
@surchs surchs moved this from Implement - Active to Implement - Done in Neurobagel Jun 10, 2023
@rmanaem rmanaem moved this from Implement - Done to Review - Active in Neurobagel Jun 10, 2023
@github-project-automation github-project-automation bot moved this from Review - Active to Review - Done in Neurobagel Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant