diff --git a/.idea/aws.xml b/.idea/aws.xml
index b63b642..fce7d91 100644
--- a/.idea/aws.xml
+++ b/.idea/aws.xml
@@ -1,4 +1,4 @@
-
+
@@ -8,4 +8,4 @@
-
\ No newline at end of file
+
diff --git a/.idea/codeStyles/Project.xml b/.idea/codeStyles/Project.xml
index 5fc93d9..99a48c8 100644
--- a/.idea/codeStyles/Project.xml
+++ b/.idea/codeStyles/Project.xml
@@ -13,4 +13,4 @@
-
\ No newline at end of file
+
diff --git a/.idea/codeStyles/codeStyleConfig.xml b/.idea/codeStyles/codeStyleConfig.xml
index 79ee123..0f7bc51 100644
--- a/.idea/codeStyles/codeStyleConfig.xml
+++ b/.idea/codeStyles/codeStyleConfig.xml
@@ -2,4 +2,4 @@
-
\ No newline at end of file
+
diff --git a/.idea/inspectionProfiles/Project_Default.xml b/.idea/inspectionProfiles/Project_Default.xml
index c5b9e39..358fddb 100644
--- a/.idea/inspectionProfiles/Project_Default.xml
+++ b/.idea/inspectionProfiles/Project_Default.xml
@@ -1,19 +1,84 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
+
+
-
-
-
-
-
+
+
+
+
+
-
+
-
-
-
-
+
+
+
+
-
+
-
+
-
+
-
+
-
-
+
+
-
-
+
+
-
-
-
-
+
+
+
+
-
-
-
-
-
-
-
+
+
+
+
+
+
+
-
\ No newline at end of file
+
diff --git a/.idea/misc.xml b/.idea/misc.xml
index 9715c22..ec803ea 100644
--- a/.idea/misc.xml
+++ b/.idea/misc.xml
@@ -1,6 +1,6 @@
-
-
-
-
-
-
\ No newline at end of file
+
+
+
+
+
+
diff --git a/.idea/modules.xml b/.idea/modules.xml
index 5bb2b3d..a9e0f4a 100644
--- a/.idea/modules.xml
+++ b/.idea/modules.xml
@@ -1,8 +1,11 @@
-
-
-
-
-
-
-
-
\ No newline at end of file
+
+
+
+
+
+
+
+
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
index c8397c9..ab603ac 100644
--- a/.idea/vcs.xml
+++ b/.idea/vcs.xml
@@ -1,6 +1,6 @@
-
-
-
-
-
-
\ No newline at end of file
+
+
+
+
+
+
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..430d6b1
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,15 @@
+repos:
+ - repo: https://github.com/psf/black
+ rev: 24.2.0
+ hooks:
+ - id: black-jupyter
+ args: ['--line-length','80']
+ - repo: https://github.com/pre-commit/mirrors-prettier
+ rev: 'v4.0.0-alpha.8'
+ hooks:
+ - id: prettier
+ types_or: [java, xml, javascript, ts, html, css, markdown]
+ additional_dependencies:
+ - prettier@2.8.8
+ - "prettier-plugin-java@1.6.1"
+ - "@prettier/plugin-xml@2.2.0"
diff --git a/README.md b/README.md
index a258b48..a6ced08 100644
--- a/README.md
+++ b/README.md
@@ -3,13 +3,12 @@
This is the documentation repository for the [dbGaP](https://www.ncbi.nlm.nih.gov/gap/) [FHIR](https://hl7.org/fhir/) API. ([API base URL](http://dbgap-api.ncbi.nlm.nih.gov/fhir/x1))
dbGaP is the Database of Genotypes and Phenotypes.
-FHIR is HL7's REST API standard for transmission of electronic health record data.
+FHIR is HL7's REST API standard for transmission of electronic health record data.
- [**Quickstart**](quickstart.md)
- [**Obtaining a Task-Specific Token for Controlled Data**](obtaining_a_token.md)
- [**Notebooks**](jupyter)
-
## Prerequisites
You should already have a basic understanding of FHIR - especially
@@ -24,27 +23,31 @@ dataframes](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe).
## Issues
If you have a query that returns after 20 seconds with an error like the following:
+
```json
{
- "error": {
- "status": 500,
- "message": "error forwarding request",
- "api-key": "192.168.0.1"
- }
+ "error": {
+ "status": 500,
+ "message": "error forwarding request",
+ "api-key": "192.168.0.1"
+ }
}
```
+
Then you probably hit a 20-second timeout. Removing sorting, simplifying your query, or including fewer
sub-queries in a batch can sometimes help.
For other issues please see [the Issues list in this GitHub repository][issues].
## Privacy
+
The example code in this repository does not collect user data or send it to
NCBI. However, using the code to access our FHIR servers will send data to
NCBI. To learn more about how we handle that data, see the ["NCBI Website
and Data Usage Policies and Disclaimers"][policies] page.
-## Contact
+## Contact
+
The dbGaP FHIR API is provided by NCBI. Please [contact us][contact] with any
questions.
diff --git a/jupyter/Notebook01_intro.ipynb b/jupyter/Notebook01_intro.ipynb
index 00585f2..faecbce 100644
--- a/jupyter/Notebook01_intro.ipynb
+++ b/jupyter/Notebook01_intro.ipynb
@@ -47,11 +47,12 @@
"outputs": [],
"source": [
"import os\n",
- "API_KEY_PATH = '~/.keys/ncbi_api_key.txt'\n",
+ "\n",
+ "API_KEY_PATH = \"~/.keys/ncbi_api_key.txt\"\n",
"\n",
"# the os.path.expanduser expands file paths which include ~/ representation for the user's home directory\n",
- "with open(os.path.expanduser(API_KEY_PATH)) as f: \n",
- " api_key = f.read()\n"
+ "with open(os.path.expanduser(API_KEY_PATH)) as f:\n",
+ " api_key = f.read()"
]
},
{
@@ -79,12 +80,12 @@
"import json\n",
"from dbgap_fhir import DbGapFHIR\n",
"\n",
- "FHIR_SERVER = 'https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1'\n",
+ "FHIR_SERVER = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1\"\n",
"\n",
"mf = DbGapFHIR(FHIR_SERVER, api_key=api_key)\n",
"\n",
"# The client may be created without an API Key as follows\n",
- "#mf = DbGapFHIR(FHIR_SERVER)"
+ "# mf = DbGapFHIR(FHIR_SERVER)"
]
},
{
@@ -378,11 +379,11 @@
"\n",
"for s in documents:\n",
"\n",
- " print (\"Study id: {}\".format(s['id']))\n",
- " print (\"Study title: {}\".format(s['title']))\n",
- " print (\"Full resource\")\n",
+ " print(\"Study id: {}\".format(s[\"id\"]))\n",
+ " print(\"Study title: {}\".format(s[\"title\"]))\n",
+ " print(\"Full resource\")\n",
" print(json.dumps(s, indent=3))\n",
- " print('_'*40)"
+ " print(\"_\" * 40)"
]
},
{
diff --git a/jupyter/Notebook02_studies.ipynb b/jupyter/Notebook02_studies.ipynb
index 32b4085..27fde41 100644
--- a/jupyter/Notebook02_studies.ipynb
+++ b/jupyter/Notebook02_studies.ipynb
@@ -47,12 +47,12 @@
"from dbgap_fhir import DbGapFHIR\n",
"import pandas as pd\n",
"\n",
- "FHIR_SERVER = 'https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1'\n",
- "API_KEY_PATH = '~/.keys/ncbi_api_key.txt'\n",
+ "FHIR_SERVER = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1\"\n",
+ "API_KEY_PATH = \"~/.keys/ncbi_api_key.txt\"\n",
"\n",
- "with open(os.path.expanduser(API_KEY_PATH)) as f: \n",
+ "with open(os.path.expanduser(API_KEY_PATH)) as f:\n",
" api_key = f.read()\n",
- " \n",
+ "\n",
"mf = DbGapFHIR(FHIR_SERVER, api_key=api_key)"
]
},
@@ -349,11 +349,11 @@
"\n",
"for s in documents:\n",
"\n",
- " print (\"Study id: {}\".format(s['id']))\n",
- " print (\"Study title: {}\".format(s['title']))\n",
- " print (\"Full resource\")\n",
+ " print(\"Study id: {}\".format(s[\"id\"]))\n",
+ " print(\"Study title: {}\".format(s[\"title\"]))\n",
+ " print(\"Full resource\")\n",
" print(json.dumps(s, indent=3))\n",
- " print('_'*40)"
+ " print(\"_\" * 40)"
]
},
{
@@ -1314,11 +1314,11 @@
"\n",
"for s in documents:\n",
"\n",
- " print (\"Study id: {}\".format(s['id']))\n",
- " print (\"Study title: {}\".format(s['title']))\n",
- " print (\"Full resource\")\n",
+ " print(\"Study id: {}\".format(s[\"id\"]))\n",
+ " print(\"Study title: {}\".format(s[\"title\"]))\n",
+ " print(\"Full resource\")\n",
" print(json.dumps(s, indent=3))\n",
- " print('_'*40)"
+ " print(\"_\" * 40)"
]
},
{
@@ -1412,8 +1412,8 @@
"\n",
"for s in documents:\n",
"\n",
- " print (\"Study id: {}\".format(s['id']))\n",
- " print (\"Study title: {}\".format(s['title']))"
+ " print(\"Study id: {}\".format(s[\"id\"]))\n",
+ " print(\"Study title: {}\".format(s[\"title\"]))"
]
},
{
@@ -1515,8 +1515,8 @@
"# The assumption is that there is only one such extension within a given resource\n",
"# For the dbGaP ResearchStudy resource that is true\n",
"def getExtension(resource, uri):\n",
- " exts = [d for d in resource['extension'] if d['url'] == uri]\n",
- " if len(exts) > 0 :\n",
+ " exts = [d for d in resource[\"extension\"] if d[\"url\"] == uri]\n",
+ " if len(exts) > 0:\n",
" return exts[0]\n",
" else:\n",
" return None"
@@ -1546,45 +1546,62 @@
" for s in documents:\n",
"\n",
" if verbose:\n",
- " print (s['id'])\n",
- " print (s['title'])\n",
+ " print(s[\"id\"])\n",
+ " print(s[\"title\"])\n",
" # use our function to find the \"study content\" extension\n",
- " content = getExtension(s, \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1/StructureDefinition/ResearchStudy-Content\")\n",
+ " content = getExtension(\n",
+ " s,\n",
+ " \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1/StructureDefinition/ResearchStudy-Content\",\n",
+ " )\n",
" # use our function again to find the \"number of subjects\" extension nested within the content extension\n",
- " subject_ext = getExtension(content, \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1/StructureDefinition/ResearchStudy-Content-NumSubjects\")\n",
- " #print(subject_ext)\n",
+ " subject_ext = getExtension(\n",
+ " content,\n",
+ " \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1/StructureDefinition/ResearchStudy-Content-NumSubjects\",\n",
+ " )\n",
+ " # print(subject_ext)\n",
" # Handle the fact that not all studies may have this extension\n",
- " if subject_ext != None and 'value' in subject_ext['valueCount']:\n",
- " subject_count = subject_ext['valueCount']['value']\n",
+ " if subject_ext != None and \"value\" in subject_ext[\"valueCount\"]:\n",
+ " subject_count = subject_ext[\"valueCount\"][\"value\"]\n",
" else:\n",
" subject_count = 0\n",
"\n",
" # Now find the extension containing the study consents\n",
- " consent_ext = getExtension(s, \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1/StructureDefinition/ResearchStudy-StudyConsents\")\n",
+ " consent_ext = getExtension(\n",
+ " s,\n",
+ " \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1/StructureDefinition/ResearchStudy-StudyConsents\",\n",
+ " )\n",
" # extract the display name for each consent group and print them\n",
" if consent_ext != None:\n",
- " consents = [d['valueCoding']['display'] for d in consent_ext['extension'] ]\n",
+ " consents = [\n",
+ " d[\"valueCoding\"][\"display\"] for d in consent_ext[\"extension\"]\n",
+ " ]\n",
" if verbose:\n",
" print(consents)\n",
" else:\n",
" consents = []\n",
- " \n",
+ "\n",
" # focus\n",
- " if 'focus' in s:\n",
- " focus = s['focus'][0]['text']\n",
- " if 'coding' in s['focus'][0]:\n",
- " focus_code = s['focus'][0]['coding'][0]['code']\n",
+ " if \"focus\" in s:\n",
+ " focus = s[\"focus\"][0][\"text\"]\n",
+ " if \"coding\" in s[\"focus\"][0]:\n",
+ " focus_code = s[\"focus\"][0][\"coding\"][0][\"code\"]\n",
" else:\n",
- " focus_code = ''\n",
+ " focus_code = \"\"\n",
" else:\n",
- " focus = ''\n",
- " focus_code = ''\n",
+ " focus = \"\"\n",
+ " focus_code = \"\"\n",
" # Add the relevant details to our list of studies\n",
- " study = {\"id\":s['id'], \"title\":s[\"title\"], \"num_subjects\":subject_count,\n",
- " \"focus\":focus,\"focus_mesh\":focus_code,\"consents\":consents}\n",
+ " study = {\n",
+ " \"id\": s[\"id\"],\n",
+ " \"title\": s[\"title\"],\n",
+ " \"num_subjects\": subject_count,\n",
+ " \"focus\": focus,\n",
+ " \"focus_mesh\": focus_code,\n",
+ " \"consents\": consents,\n",
+ " }\n",
" studies.append(study)\n",
" if verbose:\n",
- " print('_'*40)\n",
+ " print(\"_\" * 40)\n",
" df = pd.DataFrame(studies)\n",
" return df"
]
@@ -2031,9 +2048,9 @@
}
],
"source": [
- "pd.set_option('display.max_colwidth', 0)\n",
- "df.sort_values(by=['id'], inplace=True)\n",
- "df\n"
+ "pd.set_option(\"display.max_colwidth\", 0)\n",
+ "df.sort_values(by=[\"id\"], inplace=True)\n",
+ "df"
]
},
{
@@ -2451,7 +2468,7 @@
}
],
"source": [
- "studies_to_df(documents) "
+ "studies_to_df(documents)"
]
},
{
@@ -3047,7 +3064,7 @@
}
],
"source": [
- "studies_to_df(ic_studies) "
+ "studies_to_df(ic_studies)"
]
},
{
@@ -3243,7 +3260,7 @@
],
"source": [
"az_studies = mf.run_query(\"ResearchStudy?focus=D000544\")\n",
- "studies_to_df(az_studies) "
+ "studies_to_df(az_studies)"
]
},
{
diff --git a/jupyter/Notebook03_test_study.ipynb b/jupyter/Notebook03_test_study.ipynb
index 6ac79d3..07a349c 100644
--- a/jupyter/Notebook03_test_study.ipynb
+++ b/jupyter/Notebook03_test_study.ipynb
@@ -43,7 +43,7 @@
"source": [
"from dbgap_fhir import DbGapFHIR\n",
"\n",
- "FHIR_SERVER = 'https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1'\n",
+ "FHIR_SERVER = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1\"\n",
"mf = DbGapFHIR(FHIR_SERVER)"
]
},
@@ -77,8 +77,10 @@
}
],
"source": [
- "study_id = 'phs002409'\n",
- "patients = mf.run_query(f\"Patient?_has:ResearchSubject:individual:study={study_id}\")"
+ "study_id = \"phs002409\"\n",
+ "patients = mf.run_query(\n",
+ " f\"Patient?_has:ResearchSubject:individual:study={study_id}\"\n",
+ ")"
]
},
{
@@ -241,12 +243,13 @@
"outputs": [],
"source": [
"import os\n",
- "API_KEY_PATH = '~/.keys/ncbi_api_key.txt'\n",
+ "\n",
+ "API_KEY_PATH = \"~/.keys/ncbi_api_key.txt\"\n",
"\n",
"# the os.path.expanduser expands file paths which include ~/ representation for the user's home directory\n",
- "with open(os.path.expanduser(API_KEY_PATH)) as f: \n",
+ "with open(os.path.expanduser(API_KEY_PATH)) as f:\n",
" api_key = f.read()\n",
- " \n",
+ "\n",
" mf = DbGapFHIR(FHIR_SERVER, api_key=api_key)"
]
},
@@ -292,21 +295,21 @@
"from ipywidgets import IntProgress\n",
"from IPython.display import display\n",
"\n",
- "prog = IntProgress(min=0, max=len(patients)) # instantiate the bar\n",
- "display(prog) # display the bar\n",
+ "prog = IntProgress(min=0, max=len(patients)) # instantiate the bar\n",
+ "display(prog) # display the bar\n",
"\n",
"all_obs = []\n",
"patients_with_obs = []\n",
"for p in patients:\n",
" # print(p['id'])\n",
" obs = mf.run_query(f\"Observation?subject={p['id']}\", show_stats=False)\n",
- " if len(obs)>0:\n",
- " all_obs += obs\n",
- " patients_with_obs.append(p)\n",
+ " if len(obs) > 0:\n",
+ " all_obs += obs\n",
+ " patients_with_obs.append(p)\n",
" prog.value += 1\n",
"\n",
"print(f\"{len(patients_with_obs)} patients had observations\")\n",
- "print(f\"{len(all_obs)} total observations\")\n"
+ "print(f\"{len(all_obs)} total observations\")"
]
},
{
@@ -341,7 +344,7 @@
"patient_observations_dict = {}\n",
"variable_definitions = {}\n",
"observations = []\n",
- "obsCounter = Counter()\n",
+ "obsCounter = Counter()\n",
"codeCounter = Counter()\n",
"vccCounter = Counter()\n",
"printObsCounts = False\n",
@@ -349,39 +352,41 @@
"nn = 0\n",
"for r in all_obs:\n",
"\n",
- " if r['resourceType'] == 'Observation':\n",
- " #print(json.dumps(r,indent=3))\n",
- " #nn+=1\n",
- " #if nn > rlimit:\n",
+ " if r[\"resourceType\"] == \"Observation\":\n",
+ " # print(json.dumps(r,indent=3))\n",
+ " # nn+=1\n",
+ " # if nn > rlimit:\n",
" # break\n",
- " subject_id = r['subject']['reference']\n",
- " obsCounter[subject_id] +=1\n",
- " obs_display_name = r['code']['coding'][0]['display']\n",
- " if 'valueQuantity' in r:\n",
- " value_text = r['valueQuantity']['value']\n",
- " #value_unit = r['valueQuantity']['unit']\n",
- " elif 'valueCodeableConcept' in r:\n",
- " value_text = r['valueCodeableConcept']['coding'][0]['display']\n",
+ " subject_id = r[\"subject\"][\"reference\"]\n",
+ " obsCounter[subject_id] += 1\n",
+ " obs_display_name = r[\"code\"][\"coding\"][0][\"display\"]\n",
+ " if \"valueQuantity\" in r:\n",
+ " value_text = r[\"valueQuantity\"][\"value\"]\n",
+ " # value_unit = r['valueQuantity']['unit']\n",
+ " elif \"valueCodeableConcept\" in r:\n",
+ " value_text = r[\"valueCodeableConcept\"][\"coding\"][0][\"display\"]\n",
" else:\n",
- " value_text = 'unknown'\n",
- " codeCounter[obs_display_name] +=1\n",
- " #vccCounter[vcc_text] +=1\n",
+ " value_text = \"unknown\"\n",
+ " codeCounter[obs_display_name] += 1\n",
+ " # vccCounter[vcc_text] +=1\n",
" observations.append(r)\n",
- " \n",
+ "\n",
" if subject_id not in patient_observations_dict:\n",
- " patient_observations_dict[subject_id] = {obs_display_name: value_text}\n",
+ " patient_observations_dict[subject_id] = {\n",
+ " obs_display_name: value_text\n",
+ " }\n",
" else:\n",
" patient_observations_dict[subject_id][obs_display_name] = value_text\n",
- " \n",
- "#Summarize\n",
+ "\n",
+ "# Summarize\n",
"print(f\"Number of patients with observations {len(obsCounter.keys())}\")\n",
"\n",
"if printObsCounts:\n",
" print(\"Observation count per patient\")\n",
" print(json.dumps(obsCounter, indent=3))\n",
- "#print(\"Coding counts\")\n",
- "#print(json.dumps(codeCounter, indent=3))\n",
- "df = pd.DataFrame.from_dict(codeCounter, orient='index')"
+ "# print(\"Coding counts\")\n",
+ "# print(json.dumps(codeCounter, indent=3))\n",
+ "df = pd.DataFrame.from_dict(codeCounter, orient=\"index\")"
]
},
{
@@ -1289,8 +1294,8 @@
],
"source": [
"pd.set_option(\"display.max_rows\", 30, \"display.max_columns\", None)\n",
- "patient_df = pd.DataFrame.from_dict(patient_observations_dict, orient='index')\n",
- "#patient_df.fillna('', inplace=True)\n",
+ "patient_df = pd.DataFrame.from_dict(patient_observations_dict, orient=\"index\")\n",
+ "# patient_df.fillna('', inplace=True)\n",
"display(patient_df)"
]
},
@@ -1317,8 +1322,8 @@
"metadata": {},
"outputs": [],
"source": [
- "txt_file_path = 'results/phs002409_workaround_obs.txt'\n",
- "patient_df.to_csv(txt_file_path, sep='\\t')"
+ "txt_file_path = \"results/phs002409_workaround_obs.txt\"\n",
+ "patient_df.to_csv(txt_file_path, sep=\"\\t\")"
]
},
{
@@ -2233,7 +2238,7 @@
}
],
"source": [
- "patient_df[patient_df.ENV_SMOKE=='1']"
+ "patient_df[patient_df.ENV_SMOKE == \"1\"]"
]
},
{
@@ -3144,7 +3149,7 @@
}
],
"source": [
- "patient_df[patient_df.PREFVC_baseline<1.2]"
+ "patient_df[patient_df.PREFVC_baseline < 1.2]"
]
},
{
@@ -3181,7 +3186,9 @@
}
],
"source": [
- "vals = mf.run_query(\"Observation?combo-code-value-quantity=phv00492057.v1.p1$gt1\")"
+ "vals = mf.run_query(\n",
+ " \"Observation?combo-code-value-quantity=phv00492057.v1.p1$gt1\"\n",
+ ")"
]
},
{
@@ -3202,7 +3209,9 @@
}
],
"source": [
- "vals = mf.run_query(\"Observation?combo-code-value-quantity=PX091601370000$gt150\")"
+ "vals = mf.run_query(\n",
+ " \"Observation?combo-code-value-quantity=PX091601370000$gt150\"\n",
+ ")"
]
},
{
diff --git a/jupyter/Notebook04_controlled_COPDGene.ipynb b/jupyter/Notebook04_controlled_COPDGene.ipynb
index cf49a0d..5cc44ba 100644
--- a/jupyter/Notebook04_controlled_COPDGene.ipynb
+++ b/jupyter/Notebook04_controlled_COPDGene.ipynb
@@ -68,15 +68,16 @@
"source": [
"from dbgap_fhir import DbGapFHIR\n",
"import os\n",
- "API_KEY_PATH = '~/.keys/ncbi_api_key.txt'\n",
- "FHIR_SERVER = 'https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1'\n",
"\n",
- "with open(os.path.expanduser(API_KEY_PATH)) as f: \n",
+ "API_KEY_PATH = \"~/.keys/ncbi_api_key.txt\"\n",
+ "FHIR_SERVER = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1\"\n",
+ "\n",
+ "with open(os.path.expanduser(API_KEY_PATH)) as f:\n",
" api_key = f.read()\n",
- " \n",
- "mf = DbGapFHIR(FHIR_SERVER,\n",
- " api_key=api_key,\n",
- " passport='~/Downloads/task-specific-token.txt')\n"
+ "\n",
+ "mf = DbGapFHIR(\n",
+ " FHIR_SERVER, api_key=api_key, passport=\"~/Downloads/task-specific-token.txt\"\n",
+ ")"
]
},
{
@@ -110,9 +111,11 @@
}
],
"source": [
- "study_id = 'phs000179'\n",
+ "study_id = \"phs000179\"\n",
"\n",
- "patients = mf.run_query(f\"Patient?_has:ResearchSubject:individual:study={study_id}&_count=250\")"
+ "patients = mf.run_query(\n",
+ " f\"Patient?_has:ResearchSubject:individual:study={study_id}&_count=250\"\n",
+ ")"
]
},
{
@@ -181,7 +184,9 @@
}
],
"source": [
- "studies = mf.run_query(\"ResearchStudy?_has:ResearchSubject:study:status=on-study\")"
+ "studies = mf.run_query(\n",
+ " \"ResearchStudy?_has:ResearchSubject:study:status=on-study\"\n",
+ ")"
]
},
{
@@ -426,13 +431,13 @@
"source": [
"sdict = []\n",
"for s in studies:\n",
- " sdict.append({'id':s['id'], 'title':s['title']})\n",
+ " sdict.append({\"id\": s[\"id\"], \"title\": s[\"title\"]})\n",
"\n",
"\n",
"import pandas as pd\n",
"\n",
"df = pd.DataFrame.from_dict(sdict)\n",
- "df.sort_values(by=['id'])"
+ "df.sort_values(by=[\"id\"])"
]
},
{
@@ -508,21 +513,21 @@
"from ipywidgets import IntProgress\n",
"from IPython.display import display\n",
"\n",
- "prog = IntProgress(min=0, max=len(patients)) # instantiate the bar\n",
- "display(prog) # display the bar\n",
+ "prog = IntProgress(min=0, max=len(patients)) # instantiate the bar\n",
+ "display(prog) # display the bar\n",
"\n",
"increment = 100\n",
"\n",
"all_obs = []\n",
"patients_with_obs = []\n",
- "for p in patients[10:len(patients):increment]:\n",
+ "for p in patients[10 : len(patients) : increment]:\n",
" obs = mf.run_query(f\"Observation?subject={p['id']}\", show_stats=False)\n",
- " if len(obs)>0:\n",
- " all_obs += obs\n",
- " patients_with_obs.append(p)\n",
+ " if len(obs) > 0:\n",
+ " all_obs += obs\n",
+ " patients_with_obs.append(p)\n",
" prog.value += increment\n",
"print(f\"{len(patients_with_obs)} patients had observations\")\n",
- "print(f\"{len(all_obs)} total observations\")\n"
+ "print(f\"{len(all_obs)} total observations\")"
]
},
{
@@ -558,48 +563,48 @@
"patient_observations_dict = {}\n",
"variable_definitions = {}\n",
"observations = []\n",
- "obsCounter = Counter()\n",
+ "obsCounter = Counter()\n",
"codeCounter = Counter()\n",
"printObsCounts = False\n",
"rlimit = 20\n",
"nn = 0\n",
"for r in all_obs:\n",
"\n",
- " if r['resourceType'] == 'Observation':\n",
- " subject_id = r['subject']['reference']\n",
- " obsCounter[subject_id] +=1\n",
- " obs_display_name = r['code']['coding'][0]['display']\n",
- " if 'valueQuantity' in r:\n",
- " value_text = r['valueQuantity']['value']\n",
- " elif 'valueCodeableConcept' in r:\n",
- " value_text = r['valueCodeableConcept']['coding'][0]['display']\n",
+ " if r[\"resourceType\"] == \"Observation\":\n",
+ " subject_id = r[\"subject\"][\"reference\"]\n",
+ " obsCounter[subject_id] += 1\n",
+ " obs_display_name = r[\"code\"][\"coding\"][0][\"display\"]\n",
+ " if \"valueQuantity\" in r:\n",
+ " value_text = r[\"valueQuantity\"][\"value\"]\n",
+ " elif \"valueCodeableConcept\" in r:\n",
+ " value_text = r[\"valueCodeableConcept\"][\"coding\"][0][\"display\"]\n",
" else:\n",
- " value_text = 'unknown'\n",
- " codeCounter[obs_display_name] +=1\n",
+ " value_text = \"unknown\"\n",
+ " codeCounter[obs_display_name] += 1\n",
" observations.append(r)\n",
- " \n",
+ "\n",
" if subject_id not in patient_observations_dict:\n",
- " patient_observations_dict[subject_id] = {obs_display_name: value_text}\n",
+ " patient_observations_dict[subject_id] = {\n",
+ " obs_display_name: value_text\n",
+ " }\n",
" else:\n",
" patient_observations_dict[subject_id][obs_display_name] = value_text\n",
- " \n",
- " var_def = {\"extension\": r['extension'], \"code\":r['code']}\n",
+ "\n",
+ " var_def = {\"extension\": r[\"extension\"], \"code\": r[\"code\"]}\n",
" if obs_display_name not in variable_definitions:\n",
" variable_definitions[obs_display_name] = var_def\n",
"\n",
"\n",
- "\n",
- "#Summarize\n",
+ "# Summarize\n",
"print(f\"Number of patients with observations {len(obsCounter.keys())}\")\n",
"\n",
"if printObsCounts:\n",
" print(\"Observation count per patient\")\n",
" print(json.dumps(obsCounter, indent=3))\n",
"\n",
- "df = pd.DataFrame.from_dict(codeCounter, orient='index')\n",
+ "df = pd.DataFrame.from_dict(codeCounter, orient=\"index\")\n",
"pd.set_option(\"display.max_rows\", 30, \"display.max_columns\", None)\n",
- "patient_df = pd.DataFrame.from_dict(patient_observations_dict, orient='index')\n",
- "\n"
+ "patient_df = pd.DataFrame.from_dict(patient_observations_dict, orient=\"index\")"
]
},
{
@@ -650,15 +655,17 @@
"from matplotlib.ticker import PercentFormatter\n",
"import matplotlib.pyplot as plt\n",
"\n",
- "var_name = 'FEV1_FVC_pre'\n",
+ "var_name = \"FEV1_FVC_pre\"\n",
"\n",
- "fig = plt.figure(figsize=(12,3))\n",
+ "fig = plt.figure(figsize=(12, 3))\n",
"ax = fig.gca()\n",
"\n",
- "plt.hist(patient_df[var_name], \n",
- " bins=18,\n",
- " weights=np.ones(len(patient_df)) / len(patient_df), \n",
- " edgecolor='black')\n",
+ "plt.hist(\n",
+ " patient_df[var_name],\n",
+ " bins=18,\n",
+ " weights=np.ones(len(patient_df)) / len(patient_df),\n",
+ " edgecolor=\"black\",\n",
+ ")\n",
"plt.gca().yaxis.set_major_formatter(PercentFormatter(1))\n",
"plt.show()"
]
@@ -705,7 +712,7 @@
}
],
"source": [
- "print(patient_df[['FEV1_FVC_pre']].describe())"
+ "print(patient_df[[\"FEV1_FVC_pre\"]].describe())"
]
},
{
@@ -726,28 +733,29 @@
"outputs": [],
"source": [
"def plot_dbgap_var(var_name, covariate, df=None):\n",
- " \n",
+ "\n",
" if not type(df) == pd.core.frame.DataFrame:\n",
" df = patient_df\n",
- " #fig = plt.figure(figsize=(12,3))\n",
- " plt.rcParams['figure.figsize'] = [12, 3]\n",
- " #ax = fig.gca()\n",
+ " # fig = plt.figure(figsize=(12,3))\n",
+ " plt.rcParams[\"figure.figsize\"] = [12, 3]\n",
+ " # ax = fig.gca()\n",
" plt.gca().yaxis.set_major_formatter(PercentFormatter(1))\n",
" covariate_vals = df[covariate].unique().tolist()\n",
" for c in covariate_vals:\n",
" cdf = df.loc[df[covariate] == c, var_name]\n",
- " plt.hist(cdf, alpha=0.5, label=c,\n",
- " weights=np.ones(len(cdf)) / len(patient_df))\n",
+ " plt.hist(\n",
+ " cdf, alpha=0.5, label=c, weights=np.ones(len(cdf)) / len(patient_df)\n",
+ " )\n",
"\n",
- " #add plot title and axis labels\n",
- " plt.title(f'{var_name} Distribution by {covariate}')\n",
+ " # add plot title and axis labels\n",
+ " plt.title(f\"{var_name} Distribution by {covariate}\")\n",
" plt.xlabel(var_name)\n",
- " plt.ylabel('Frequency')\n",
- " \n",
- " #add legend\n",
+ " plt.ylabel(\"Frequency\")\n",
+ "\n",
+ " # add legend\n",
" plt.legend(title=covariate)\n",
"\n",
- " #display plot\n",
+ " # display plot\n",
" plt.show()"
]
},
@@ -783,7 +791,7 @@
}
],
"source": [
- "plot_dbgap_var('FEV1_FVC_pre', 'AFFECTION_STATUS')"
+ "plot_dbgap_var(\"FEV1_FVC_pre\", \"AFFECTION_STATUS\")"
]
},
{
@@ -821,16 +829,16 @@
],
"source": [
"# Create a dictionary key-value pairs to translate Affection status to something meaningful.\n",
- "affection_status_map = {'1' : 'COPD Case', '2' : 'Controls'}\n",
+ "affection_status_map = {\"1\": \"COPD Case\", \"2\": \"Controls\"}\n",
"\n",
- "# subset the data for only cases and controls \n",
- "sub_df = patient_df.loc[patient_df['AFFECTION_STATUS'].isin(['1','2'])]\n",
+ "# subset the data for only cases and controls\n",
+ "sub_df = patient_df.loc[patient_df[\"AFFECTION_STATUS\"].isin([\"1\", \"2\"])]\n",
"\n",
- "# Use our dictionary to translate affection status \n",
+ "# Use our dictionary to translate affection status\n",
"sub_df = sub_df.replace({\"AFFECTION_STATUS\": affection_status_map})\n",
"\n",
"# Plot the new dataframe\n",
- "plot_dbgap_var('FEV1_FVC_pre', 'AFFECTION_STATUS', df=sub_df)"
+ "plot_dbgap_var(\"FEV1_FVC_pre\", \"AFFECTION_STATUS\", df=sub_df)"
]
},
{
@@ -876,7 +884,7 @@
}
],
"source": [
- "plot_dbgap_var('Resting_SaO2', 'AFFECTION_STATUS', df=sub_df)"
+ "plot_dbgap_var(\"Resting_SaO2\", \"AFFECTION_STATUS\", df=sub_df)"
]
},
{
@@ -925,10 +933,11 @@
"outputs": [],
"source": [
"import json\n",
+ "\n",
"with open(\"data/COPDGene_interactive_vars.json\") as f:\n",
" ivars = json.load(f)\n",
- "varlist = ivars['dependent_vars']\n",
- "independent_vars = ivars['independent_vars']"
+ "varlist = ivars[\"dependent_vars\"]\n",
+ "independent_vars = ivars[\"independent_vars\"]"
]
},
{
@@ -959,10 +968,15 @@
"from io import BytesIO\n",
"\n",
"vars = patient_df.columns.tolist()\n",
+ "\n",
+ "\n",
"@interact\n",
- "def var_selector(varname=widgets.Dropdown(options = varlist, description = \"Variable\"),\n",
- " covariate=widgets.Dropdown(options = independent_vars, description = \"Group by\")\n",
- " ):\n",
+ "def var_selector(\n",
+ " varname=widgets.Dropdown(options=varlist, description=\"Variable\"),\n",
+ " covariate=widgets.Dropdown(\n",
+ " options=independent_vars, description=\"Group by\"\n",
+ " ),\n",
+ "):\n",
" plot_dbgap_var(varname, covariate)"
]
},
@@ -983,9 +997,9 @@
"outputs": [],
"source": [
"# create dataframe\n",
- "heatmap_df = pd.DataFrame.from_dict(patient_observations_dict, orient='index')\n",
+ "heatmap_df = pd.DataFrame.from_dict(patient_observations_dict, orient=\"index\")\n",
"# convert to 0,1 based on whether null or not\n",
- "df2 = heatmap_df.notnull().astype(\"int\")\n"
+ "df2 = heatmap_df.notnull().astype(\"int\")"
]
},
{
@@ -997,14 +1011,15 @@
"source": [
"import matplotlib\n",
"\n",
+ "\n",
"def observation_heat_map(plot_df, study_label):\n",
- " cmap = matplotlib.colors.ListedColormap(['xkcd:eggshell', 'green'])\n",
- " #fig, ax = plt.subplots(figsize=(10,5), layout='constrained')\n",
- " fig, ax = plt.subplots(figsize=(10,5))\n",
- " ax.imshow(plot_df, cmap='Greens')\n",
- " ax.set_xlabel('Observation')\n",
- " ax.set_ylabel('Subject')\n",
- " ax.set_title(f'Observations present per subject in {study_label}')\n"
+ " cmap = matplotlib.colors.ListedColormap([\"xkcd:eggshell\", \"green\"])\n",
+ " # fig, ax = plt.subplots(figsize=(10,5), layout='constrained')\n",
+ " fig, ax = plt.subplots(figsize=(10, 5))\n",
+ " ax.imshow(plot_df, cmap=\"Greens\")\n",
+ " ax.set_xlabel(\"Observation\")\n",
+ " ax.set_ylabel(\"Subject\")\n",
+ " ax.set_title(f\"Observations present per subject in {study_label}\")"
]
},
{
diff --git a/jupyter/dbgap_fhir.py b/jupyter/dbgap_fhir.py
index 15f29be..2cd19b4 100644
--- a/jupyter/dbgap_fhir.py
+++ b/jupyter/dbgap_fhir.py
@@ -2,23 +2,32 @@
import sys
import json
import requests
-import pandas as pd
+import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
import time
import pprint
+
class DbGapFHIR:
- def __init__(self, fhir_server, verify_ssl = True, api_key=None, passport=None, debug=False, show_stats=True):
-
+ def __init__(
+ self,
+ fhir_server,
+ verify_ssl=True,
+ api_key=None,
+ passport=None,
+ debug=False,
+ show_stats=True,
+ ):
+
# Optional: Turn off SSL verification. Useful when dealing with a corporate proxy with self-signed certificates.
# This should be set to True unless you actually see certificate errors.
if not verify_ssl:
requests.packages.urllib3.disable_warnings()
-
+
self.fhir_server = fhir_server
self.api_key = api_key
self.passport = passport
@@ -27,59 +36,64 @@ def __init__(self, fhir_server, verify_ssl = True, api_key=None, passport=None,
# We make a requests.Session to ensure consistent headers/cookie across all the requests we make
self.s = requests.Session()
- self.s.headers.update({'Accept': 'application/fhir+json'})
+ self.s.headers.update({"Accept": "application/fhir+json"})
# handle security needed for dbGaP
self.__add_passport()
- self.s.verify = verify_ssl
+ self.s.verify = verify_ssl
self.bytes_retrieved = 0
-
+
# Test out the client by querying the server metadata#
r = self.s.get(f"{self.fhir_server}/metadata")
if "" in r.text:
- sys.stderr.write('ERROR: Could not get the server capability statement. ')
+ sys.stderr.write(
+ "ERROR: Could not get the server capability statement. "
+ )
# Resolves all pages for the bundle. Returns an array with all Bundles, including the original Bundle.
def resolve_pages(self, bundle, debug=False, sleep=None):
- max_tries = 10 # maximum number of tries to get next page
- retry_sleep = 10 # after multiple failures, wait this number of seconds for a retry
+ max_tries = 10 # maximum number of tries to get next page
+ retry_sleep = 10 # after multiple failures, wait this number of seconds for a retry
try:
- next_page_link = next(filter(lambda link: link['relation'] == 'next', bundle['link']), None)
+ next_page_link = next(
+ filter(lambda link: link["relation"] == "next", bundle["link"]),
+ None,
+ )
except KeyError:
- print('Key error link/next_page')
+ print("Key error link/next_page")
print(json.dumps(bundle, indent=3))
raise
n = 1
if next_page_link:
if sleep != None:
time.sleep(sleep)
- fhir_query = next_page_link['url']
+ fhir_query = next_page_link["url"]
if self.api_key != None:
fhir_query += f"&api_key={self.api_key}"
if debug:
- print('_'*80)
+ print("_" * 80)
print(fhir_query)
tries = 1
r = self.s.get(fhir_query)
while r.status_code == 500 and tries < max_tries:
- tries += 1
- if tries > 6:
- time.sleep(retry_sleep)
- print (f"trying again - waiting {retry_sleep}s")
- else:
- print ("trying again")
- r = self.s.get(fhir_query)
+ tries += 1
+ if tries > 6:
+ time.sleep(retry_sleep)
+ print(f"trying again - waiting {retry_sleep}s")
+ else:
+ print("trying again")
+ r = self.s.get(fhir_query)
if tries > 1:
- print(f'took {tries} tries')
+ print(f"took {tries} tries")
next_page = r.json()
self.bytes_retrieved += len(r.content)
- if 'link' not in next_page:
+ if "link" not in next_page:
print(json.dumps(next_page, indent=3))
- nl = [l for l in next_page['link'] if l['relation'] == 'next']
+ nl = [l for l in next_page["link"] if l["relation"] == "next"]
if debug:
if len(nl) < 1:
- print('Full last response')
+ print("Full last response")
print(json.dumps(next_page, indent=3))
return [bundle] + self.resolve_pages(next_page, debug, sleep)
else:
@@ -89,16 +103,18 @@ def resolve_pages(self, bundle, debug=False, sleep=None):
# Run a query, and get the whole set of results back as a list of resources
# Set limit to True if you want to the first page if you like
- def run_query(self, query, limit=None, debug=False, sleep=None, show_stats=None):
-
+ def run_query(
+ self, query, limit=None, debug=False, sleep=None, show_stats=None
+ ):
+
if show_stats == None:
show_stats = self.show_stats
-
+
t_start = time.perf_counter()
self.bytes_retrieved = 0
subset = False
-
+
fhir_query = f"{self.fhir_server}/{query}"
if self.api_key != None:
fhir_query += f"&api_key={self.api_key}"
@@ -109,11 +125,15 @@ def run_query(self, query, limit=None, debug=False, sleep=None, show_stats=None)
self.bytes_retrieved += len(r.content)
if debug:
print(json.dumps(first_bundle, indent=3))
- print('got response')
+ print("got response")
# if it's just a summary
- if 'meta' in first_bundle and 'tag' in first_bundle['meta'] and first_bundle['meta']['tag'][0]['code'] == 'SUBSETTED':
- subset = True
- all_bundles = [first_bundle]
+ if (
+ "meta" in first_bundle
+ and "tag" in first_bundle["meta"]
+ and first_bundle["meta"]["tag"][0]["code"] == "SUBSETTED"
+ ):
+ subset = True
+ all_bundles = [first_bundle]
elif limit == None:
all_bundles = self.resolve_pages(first_bundle, debug, sleep)
else:
@@ -121,16 +141,18 @@ def run_query(self, query, limit=None, debug=False, sleep=None, show_stats=None)
t_end = time.perf_counter()
pagecount = len(all_bundles)
-
+
resources = []
if subset:
resources = [first_bundle]
else:
resources = []
for bundle in all_bundles:
- if 'entry' in bundle:
- resources.extend([entry['resource'] for entry in bundle['entry']])
-
+ if "entry" in bundle:
+ resources.extend(
+ [entry["resource"] for entry in bundle["entry"]]
+ )
+
elapsed = t_end - t_start
if show_stats:
print(f"Total Resources: {len(resources)}")
@@ -138,50 +160,56 @@ def run_query(self, query, limit=None, debug=False, sleep=None, show_stats=None)
print(f"Total Pages: {pagecount}")
print(f"Time elapsed {elapsed:0.4f} seconds")
return resources
-
-
+
def __add_passport(self, passport=None):
- '''Adds Passport/TST to session header
- '''
+ """Adds Passport/TST to session header"""
if passport == None:
passport = self.passport
-
+
if passport != None:
full_key_path = os.path.expanduser(passport)
file_content = ""
- if self.debug: print(f"passport path {full_key_path}")
+ if self.debug:
+ print(f"passport path {full_key_path}")
try:
with open(full_key_path) as f:
file_content = f.read()
- if self.debug: print(f"content of passport file {file_content}")
- self.s.headers.update({'Authorization': f'Bearer {file_content}'})
+ if self.debug:
+ print(f"content of passport file {file_content}")
+ self.s.headers.update(
+ {"Authorization": f"Bearer {file_content}"}
+ )
except:
print("Could not find passport file")
+
def obs_to_df(observations):
patient_observations_dict = {}
-
+
for obs in observations:
- subject_id = obs['subject']['reference']
- obs_display_name = obs['code']['coding'][0]['display']
- if obs_display_name in ['SUBJECT_ID','SAMPLE_ID']:
- obs_code = obs['code']['coding'][0]['code']
- obs_display_name = f'{obs_display_name}_{obs_code}'
- if 'valueQuantity' in obs:
- value_text = obs['valueQuantity']['value']
- #value_unit = obs['valueQuantity']['unit']
- elif 'valueCodeableConcept' in obs:
- value_text = obs['valueCodeableConcept']['coding'][0]['display']
+ subject_id = obs["subject"]["reference"]
+ obs_display_name = obs["code"]["coding"][0]["display"]
+ if obs_display_name in ["SUBJECT_ID", "SAMPLE_ID"]:
+ obs_code = obs["code"]["coding"][0]["code"]
+ obs_display_name = f"{obs_display_name}_{obs_code}"
+ if "valueQuantity" in obs:
+ value_text = obs["valueQuantity"]["value"]
+ # value_unit = obs['valueQuantity']['unit']
+ elif "valueCodeableConcept" in obs:
+ value_text = obs["valueCodeableConcept"]["coding"][0]["display"]
else:
- value_text = 'unknown'
+ value_text = "unknown"
if subject_id not in patient_observations_dict:
- patient_observations_dict[subject_id] = {obs_display_name: value_text}
+ patient_observations_dict[subject_id] = {
+ obs_display_name: value_text
+ }
else:
patient_observations_dict[subject_id][obs_display_name] = value_text
- df = pd.DataFrame.from_dict(patient_observations_dict, orient='index')
+ df = pd.DataFrame.from_dict(patient_observations_dict, orient="index")
return df
-
+
+
def prettyprint(some_json):
- print(json.dumps(some_json, indent=3))
\ No newline at end of file
+ print(json.dumps(some_json, indent=3))
diff --git a/jupyter/pilot/Notebook01_phs002921_URECA_subject_phenotype.ipynb b/jupyter/pilot/Notebook01_phs002921_URECA_subject_phenotype.ipynb
index 1808cf0..578cf39 100644
--- a/jupyter/pilot/Notebook01_phs002921_URECA_subject_phenotype.ipynb
+++ b/jupyter/pilot/Notebook01_phs002921_URECA_subject_phenotype.ipynb
@@ -15,58 +15,70 @@
"import csv\n",
"from datetime import datetime\n",
"from time import sleep\n",
- "from fhir_fetcher import fetch_all_data # Ensure this module is available \n",
+ "from fhir_fetcher import fetch_all_data # Ensure this module is available\n",
+ "\n",
"# and handles paging through all records\n",
"\n",
+ "\n",
"def fetch_patient_observations(session, fhir_base_url, patient_id):\n",
- " qstr = f'Observation?subject=Patient/{patient_id}'\n",
+ " qstr = f\"Observation?subject=Patient/{patient_id}\"\n",
" start_url = f\"{fhir_base_url}/{qstr}\"\n",
- " observations = fetch_all_data(session, start_url, 0) # Fetch all observations for the patient\n",
+ " observations = fetch_all_data(\n",
+ " session, start_url, 0\n",
+ " ) # Fetch all observations for the patient\n",
" return observations\n",
"\n",
+ "\n",
"def extract_observation_data(observations):\n",
" data = {}\n",
" for entry in observations:\n",
- " resource = entry.get('resource', {})\n",
- " code = resource.get('code', {}).get('coding', [{}])[0]\n",
- " attribute_name = code.get('display', '')\n",
- " value_string = resource.get('valueString', '')\n",
- " value_quantity = resource.get('valueQuantity', {}).get('value', '')\n",
+ " resource = entry.get(\"resource\", {})\n",
+ " code = resource.get(\"code\", {}).get(\"coding\", [{}])[0]\n",
+ " attribute_name = code.get(\"display\", \"\")\n",
+ " value_string = resource.get(\"valueString\", \"\")\n",
+ " value_quantity = resource.get(\"valueQuantity\", {}).get(\"value\", \"\")\n",
" if attribute_name:\n",
" value = value_string if value_string else value_quantity\n",
" if value:\n",
" data[attribute_name] = value\n",
" return data\n",
"\n",
+ "\n",
"def fetch_patient_ids(session, fhir_base_url, study_reference):\n",
" query_url = f\"{fhir_base_url}/ResearchSubject?study={study_reference}\"\n",
- " print ( query_url)\n",
- " research_subjects = fetch_all_data(session, query_url, 0, 'n')\n",
- " patient_ids = [entry['resource']['individual']['reference'].split('/')[-1] for entry in research_subjects]\n",
+ " print(query_url)\n",
+ " research_subjects = fetch_all_data(session, query_url, 0, \"n\")\n",
+ " patient_ids = [\n",
+ " entry[\"resource\"][\"individual\"][\"reference\"].split(\"/\")[-1]\n",
+ " for entry in research_subjects\n",
+ " ]\n",
" return patient_ids\n",
"\n",
+ "\n",
"def main():\n",
" starttime = datetime.now()\n",
- " starttimeStr = starttime.strftime('%Y-%m-%d %H:%M:%S')\n",
+ " starttimeStr = starttime.strftime(\"%Y-%m-%d %H:%M:%S\")\n",
" print(\"====== start time:\", starttimeStr)\n",
"\n",
" ###############################################################################################\n",
- " # get the token from https://www.ncbi.nlm.nih.gov/gap/power-user-portal/. \n",
+ " # get the token from https://www.ncbi.nlm.nih.gov/gap/power-user-portal/.\n",
" # Scroll down and click on the \"Task Specific Token\" button to get the light-weight version of the dbGaP RAS Passport.\n",
- " # Save the file into a text. In my example, it is saved to task-specific-token_all.txt. \n",
+ " # Save the file into a text. In my example, it is saved to task-specific-token_all.txt.\n",
" ###############################################################################################\n",
- " TST_PATH = '~/dev/fhir/task-specific-token-all.txt' \n",
+ " TST_PATH = \"~/dev/fhir/task-specific-token-all.txt\"\n",
" fhir_base_url = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1\"\n",
- " \n",
- " with open(os.path.expanduser(TST_PATH), 'r') as f: \n",
+ "\n",
+ " with open(os.path.expanduser(TST_PATH), \"r\") as f:\n",
" tst_token = f.read().strip()\n",
- " \n",
+ "\n",
" session = requests.Session()\n",
- " session.headers.update({\n",
- " 'Accept': 'application/fhir+json',\n",
- " 'Authorization': f'Bearer {tst_token}',\n",
- " 'Content-Type': 'application/x-www-form-urlencoded',\n",
- " })\n",
+ " session.headers.update(\n",
+ " {\n",
+ " \"Accept\": \"application/fhir+json\",\n",
+ " \"Authorization\": f\"Bearer {tst_token}\",\n",
+ " \"Content-Type\": \"application/x-www-form-urlencoded\",\n",
+ " }\n",
+ " )\n",
"\n",
" # study_reference = \"phs002921.v2.p1.c1\"\n",
" study_reference = \"phs002921\"\n",
@@ -75,7 +87,7 @@
" # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/ResearchSubject?study=phs002921\n",
" # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/Observation?subject=Patient/4317770\n",
" # ###############################################################################################\n",
- " \n",
+ "\n",
" patient_ids = fetch_patient_ids(session, fhir_base_url, study_reference)\n",
" print(f\"Total patients fetched: {len(patient_ids)}\")\n",
"\n",
@@ -83,22 +95,30 @@
" columns = set()\n",
" patients_with_observations = 0\n",
" for patient_id in patient_ids[:2]:\n",
- " observations = fetch_patient_observations(session, fhir_base_url, patient_id)\n",
+ " observations = fetch_patient_observations(\n",
+ " session, fhir_base_url, patient_id\n",
+ " )\n",
" observation_data = extract_observation_data(observations)\n",
" if observation_data:\n",
- " observation_data['Patient'] = patient_id\n",
+ " observation_data[\"Patient\"] = patient_id\n",
" columns.update(observation_data.keys())\n",
" data.append(observation_data)\n",
" patients_with_observations += 1\n",
" # print(f\"Observations obtained for patient: {patient_id}\")\n",
- " print(f\"Accumulative patients with observations: {patients_with_observations}\")\n",
+ " print(\n",
+ " f\"Accumulative patients with observations: {patients_with_observations}\"\n",
+ " )\n",
"\n",
- " sleep(1) # Add a delay of 1 second between each patient API request to avoid rate limits\n",
+ " sleep(\n",
+ " 1\n",
+ " ) # Add a delay of 1 second between each patient API request to avoid rate limits\n",
"\n",
- " columns = ['Patient'] + sorted(columns) # Ensure 'Patient' is the first column\n",
+ " columns = [\"Patient\"] + sorted(\n",
+ " columns\n",
+ " ) # Ensure 'Patient' is the first column\n",
"\n",
- " output_file = 'patient_observations.csv'\n",
- " with open(output_file, 'w', newline='') as csvfile:\n",
+ " output_file = \"patient_observations.csv\"\n",
+ " with open(output_file, \"w\", newline=\"\") as csvfile:\n",
" csvwriter = csv.DictWriter(csvfile, fieldnames=columns)\n",
" csvwriter.writeheader()\n",
" csvwriter.writerows(data)\n",
@@ -106,7 +126,7 @@
" print(f\"Data written to {output_file}\")\n",
"\n",
" endtime = datetime.now()\n",
- " endtimeStr = endtime.strftime('%Y-%m-%d %H:%M:%S')\n",
+ " endtimeStr = endtime.strftime(\"%Y-%m-%d %H:%M:%S\")\n",
" print(\"====== end time:\", endtimeStr)\n",
"\n",
" elapsed_time = endtime - starttime\n",
@@ -114,10 +134,13 @@
" eminutes = elapsed_seconds // 60\n",
" eseconds = elapsed_seconds % 60\n",
"\n",
- " print(f\"===========Elapsed time: {int(eminutes)} minutes and {int(eseconds)} seconds.\")\n",
+ " print(\n",
+ " f\"===========Elapsed time: {int(eminutes)} minutes and {int(eseconds)} seconds.\"\n",
+ " )\n",
+ "\n",
"\n",
"if __name__ == \"__main__\":\n",
- " main()\n"
+ " main()"
],
"outputs": [
{
diff --git a/jupyter/pilot/README.md b/jupyter/pilot/README.md
index a98eb56..bf82f95 100644
--- a/jupyter/pilot/README.md
+++ b/jupyter/pilot/README.md
@@ -1,4 +1,5 @@
# OVERVIEW
+
## dir content:
-pilot subdir will contain sample jupyter scripts to access the dbGaP pilot FHIR server.
+pilot subdir will contain sample jupyter scripts to access the dbGaP pilot FHIR server.
diff --git a/jupyter/pilot/fhir_fetcher.py b/jupyter/pilot/fhir_fetcher.py
index 80b33b7..d1a6413 100644
--- a/jupyter/pilot/fhir_fetcher.py
+++ b/jupyter/pilot/fhir_fetcher.py
@@ -1,7 +1,8 @@
import requests
import json
-def fetch_all_data(session, url, num_pages=0, print_entry='n'):
+
+def fetch_all_data(session, url, num_pages=0, print_entry="n"):
all_entries = []
page_counter = 0 # Initialize page counter
@@ -10,22 +11,22 @@ def fetch_all_data(session, url, num_pages=0, print_entry='n'):
response = session.get(url) # Use session to make the GET request
response.raise_for_status()
data = response.json()
- entries = data.get('entry', [])
+ entries = data.get("entry", [])
all_entries.extend(entries)
# Increment the page counter after successfully fetching a page
page_counter += 1
# Pretty-print the JSON data returned in this iteration
- if print_entry == 'y':
+ if print_entry == "y":
print("Data returned in this iteration:")
for entry in entries:
print(json.dumps(entry, indent=4))
next_link = None
- for link in data.get('link', []):
- if link.get('relation') == 'next':
- next_link = link.get('url')
+ for link in data.get("link", []):
+ if link.get("relation") == "next":
+ next_link = link.get("url")
break
# If num_pages is 0, keep fetching until there are no more pages.
@@ -38,9 +39,13 @@ def fetch_all_data(session, url, num_pages=0, print_entry='n'):
print(f"HTTP error occurred: {http_err}") # HTTP error
try:
error_content = response.json()
- print(f"Response content: {json.dumps(error_content, indent=4)}") # Pretty-print JSON response
+ print(
+ f"Response content: {json.dumps(error_content, indent=4)}"
+ ) # Pretty-print JSON response
except ValueError:
- print(f"Response content: {response.content}") # Response content for debugging
+ print(
+ f"Response content: {response.content}"
+ ) # Response content for debugging
break
except Exception as err:
print(f"Other error occurred: {err}") # Other errors
diff --git a/postman/dbGaP_FHIR_API.postman_collection.json.md b/postman/dbGaP_FHIR_API.postman_collection.json.md
index 4be744a..43a72c0 100644
--- a/postman/dbGaP_FHIR_API.postman_collection.json.md
+++ b/postman/dbGaP_FHIR_API.postman_collection.json.md
@@ -3,63 +3,64 @@
Once you have installed Postman, you will import the collection json file.
## Collection Structure
-* Collection: dbGaP FHIR API
- * Folder: FHIR Resource
- * HTTP Requests
-
-
+- Collection: dbGaP FHIR API
+ - Folder: FHIR Resource
+ - HTTP Requests
+
## Pre-request Scripts
-Postman allows you to run a script before the request is executed. The
-scripts can be defined at each level (collection, folder, request). If the
-script is defined at the collection level, then it will be executed before every request in the collection. If
-it is defined at the folder level, then it will be run only for the requests
-in that folder.
-Since the FHIR API standard requires an http accept header, I have defined a
+Postman allows you to run a script before the request is executed. The
+scripts can be defined at each level (collection, folder, request). If the
+script is defined at the collection level, then it will be executed before every request in the collection. If
+it is defined at the folder level, then it will be run only for the requests
+in that folder.
+
+Since the FHIR API standard requires an http accept header, I have defined a
script at the collection level to set a header for every request.

-
-
-
## Variables
+
Postman allows variables to be set at each level (collection, folder,
request). These variables can be used in various places to help with testing
the api.
In order to minimize effort for similar requests for multiple resources, I
-have defined two variables:
+have defined two variables:
#### url
-```url``` is defined at the collection level. This is the base url for the API.
+
+`url` is defined at the collection level. This is the base url for the API.

#### resourceName
-```resourceName``` Is defined at the folder level. (Notice this is
-actually a script that is setting the variable since there is no variables
+
+`resourceName` Is defined at the folder level. (Notice this is
+actually a script that is setting the variable since there is no variables
tab at the folder level).
-
+
-When the two variables are combined in the request url field (accessed using
+When the two variables are combined in the request url field (accessed using
curly braces), a fully formed parameterized url is created.

## Tests
-You can define a test script using javascript to execute immediately after a
-request has been executed. You can test for things like response time,
+
+You can define a test script using javascript to execute immediately after a
+request has been executed. You can test for things like response time,
response code, and response body.
-I created a test that will parse the response body for the metadata
-resource for each resource type to determine their search parameters. The
+I created a test that will parse the response body for the metadata
+resource for each resource type to determine their search parameters. The
search parameters will be displayed in the console.

Patient search parameters:
-
\ No newline at end of file
+
diff --git a/pre-commit-3.6.0.pyz b/pre-commit-3.6.0.pyz
new file mode 100755
index 0000000..2eb3528
Binary files /dev/null and b/pre-commit-3.6.0.pyz differ