Skip to content

Latest commit

 

History

History
25 lines (14 loc) · 1.56 KB

README.md

File metadata and controls

25 lines (14 loc) · 1.56 KB

Biobanks

This repository contains the code to create a table of articles taken from Microsoft Academic Graph introducing biobanks.

  • Version: V1
  • Creator: Rodrigo Dorantes Gilardi [email protected]
  • Last code update: 03/14/2022
  • Last data update: 08/31/2021
  • Keywords: Biobanks; Cohort Studies;
  • Rights Statement: Open Data Commons Attribution License (ODC-By) v1.0

Methods

In order to obtain the dataset one needs to first obtain the full Microsoft Academic Graph (MAG) in a directory where the code is going to be run.

The path to the MAG should be set in the script serendipity.py in the python subdirectory. Once this is set, the script keywords.py will return a first list of articles. These articles are based on a set of keywords usually contained in papers introducing biobanks (in the broad term, including cohorts of different sizes and purposes.).

The jupyter notebook initial.ipynb should be then run to obtain the final list from the automatized process. Both steps are explained in this notebook.

The final list of biobanks was manually curated to remove duplicates, and biobanks not human-based. In order to obtain the last version of this table, you need to run the script manual_table.py in the python directory. Note that in order to do so you will need to install the python module gspread.

The final list of biobanks is biobanks.csv.