This project provides two main functionalities for GEO datasets:
- Generation and updating of dataset quality report
- Creation of auto-scan library based on these quality report
We use pipenv for managing Python dependencies and virtual environments.
-
Install dependencies:
pipenv install
-
Run a script within the pipenv environment, for example:
pipenv run python new_report.py
This ensures that all scripts are executed with the correct dependencies.
The project consists of three main scripts:
new_report.py: Generates a new GEO dataset quality report from clockbase_geo_list.csv, notice this script will take a long time to generate the report because it will download the data from GEO and process itupdate_report.py: Updates an existing GEO Series or add a new GEO Series on the generated report - it's convenient to modify individual entries without regenerating the entire reportgenerate_library.py: Creates an autoscan library from the GEO report
The src/analyzer.py handles the quality analysis of GEO data using biolearn.
The src/scraper.py class manages web scraping of GEO metadata.
The new_report.py script generates a new quality report from a CSV file containing GEO dataset IDs.
# Before running, ensure:
# 1. Create a 'dist' directory
# 2. Place your clockbase_geo_list.csv file in the root directory
pipenv run python new_report.py- Output:
- Generated report in
dist/directorygeo_report.yaml - Failed IDs list in
dist/failed_geo_ids.txt(if any failures occur)
- Generated report in
The update_report.py script updates an existing report by processing a list of GEO Series IDs.
# Ensure you have:
# 1. dist/geo_report.yaml (existing report)
# 2. dist/geo_series_ids.txt (file containing GEO Series IDs joined by comma, like this: GSE123456,GSE123457,GSE123458)
pipenv run python update_report.py- Output: Updated report in the
dist/directory
The generate_library.py script creates an autoscan library from a GEO report.
# Ensure you have:
# dist/geo_report.yaml file
pipenv run python generate_library.py- Output: Generated autoscan library
dist/geo_autoscan_library.yaml