Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AD-2727: Improve anatomy term mapping completeness #54

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 3 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@

This repository will initially serve as a staging point for the source and data files associated with the C2M2 submission process. It might eventually grow into a pipeline for the C2M2 process, but we are starting small.

"A journey of a thousand miles begins with a single step" -Laozi

### Important Links

1. [Base Wiki Page for C2M2 Submissions](https://github.com/nih-cfde/published-documentation/wiki/Quickstart)
Expand All @@ -31,7 +29,7 @@ This repository will initially serve as a staging point for the source and data
- Installs package dependencies from requirements.txt

```bash
source setup_evn.sh
source setup_env.sh
```

2. Acquire submission tools from OSF
Expand Down Expand Up @@ -61,7 +59,7 @@ OR
* Also, adds empty tables required for submission

```bash
python /kf_to_c2m2_etl/etl.py {FHIR|DS}
python3 ./kf_to_c2m2_etl/etl.py {FHIR|DS}
```

4. Execute osf script for preparing c2m2 submission
Expand All @@ -84,24 +82,4 @@ python /kf_to_c2m2_etl/etl.py {FHIR|DS}

6. Submit data to CFDE portal

*** Refer to Important Links #7 for additional info***
- Login with submit tool
- Execute submission
* Check tables for conformance to C2M2's latest release notes
* Set DCC for submission (cfde_registry_dcc:kidsfirst)
- Verify submission in progress
- Review submission results

```bash
# Will be redirected to web browser for credentials
cfde-submit login

# Command starts submission and sets data coordinating center
cfde-submit run path-to-frictionless-validation --ignore-git --dcc-id cfde_registry_dcc:kidsfirst

# Can be executed intermittently to verify submission status
cfde-submit status

# Logout when submission is completed
cfde-submit logout
```
- Upload the C2M2 zip file via the CFDE Data Portal at https://data.cfde.cloud/submit/form.
2 changes: 1 addition & 1 deletion kf_to_c2m2_etl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ The mapping that serves as the guideline/requirement for the transform portion o

4. Value Conversion: The pipeline reads the target column data from the input DataFrame and merges it with the conversion table. The merge is performed based on the "name" column in the conversion table and the target column in the input DataFrame. The resulting merged DataFrame contains the converted values in the C2M2 column, which are then stored in the input DataFrame using the KF column name retrieved from the column mapping.

5. Anatomy/More nuanced Mapping: The pipeline performs additional mapping for the "composition_term" column to lowercase using an OrderedDict object that maps the composition terms to their corresponding anatomical entities in lowercase format.
5. Anatomy/More nuanced Mapping: The pipeline performs additional mapping for the "source_text_anatomical_site" column to lowercase using an OrderedDict object that maps the source text anatomical terms to their corresponding UBERON terms in lowercase format.

The ETL flow described above is performed for each target column in the input DataFrame, allowing for the conversion of multiple columns from the KF data model to the NIH's C2M2 Crosscut Metadata Model in a single run of the pipeline. This pipeline is intended to facilitate data integration and analysis for human subjects research related to cancer and birth defects, and can be used to convert data from diverse sources into a standardized format for further analysis and insights.
6 changes: 3 additions & 3 deletions kf_to_c2m2_etl/cfde_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ def fhir_to_cfde_value_converter(target_df: pd.DataFrame, target_column: str):
return target_df


uberon_mapping_df = pd.read_csv(os.path.join(file_locations.get_ontology_mappings_path(),'anatomy_fixed_tabs.tsv'),sep='\t').fillna('')
uberon_mapping_df['composition_term'] = uberon_mapping_df['composition_term'].apply(str.lower)
uberon_mapping_df = pd.read_csv(os.path.join(file_locations.get_ontology_mappings_path(),'anatomy_mappings.tsv'),sep='\t').fillna('')
uberon_mapping_df['source_text_anatomical_site'] = uberon_mapping_df['source_text_anatomical_site'].apply(str.lower)
uberon_mapping_df['uberon_id'] = uberon_mapping_df['uberon_id'].apply(str.lower)
uberon_mapping_df = uberon_mapping_df[::-1]
uberon_mapping_dict = OrderedDict(zip(uberon_mapping_df.composition_term,uberon_mapping_df.uberon_id))
uberon_mapping_dict = OrderedDict(zip(uberon_mapping_df.source_text_anatomical_site,uberon_mapping_df.uberon_id))

This file was deleted.

Loading