Skip to content

bio-learn/biolearn-autoscan-library

Repository files navigation

Overview

This project provides two main functionalities for GEO datasets:

  1. Generation and updating of dataset quality report
  2. Creation of auto-scan library based on these quality report

Prerequisites

We use pipenv for managing Python dependencies and virtual environments.

Setting Up the Environment

  1. Install dependencies:

    pipenv install
  2. Run a script within the pipenv environment, for example:

    pipenv run python new_report.py

This ensures that all scripts are executed with the correct dependencies.

Main functionalities

The project consists of three main scripts:

  1. new_report.py: Generates a new GEO dataset quality report from clockbase_geo_list.csv, notice this script will take a long time to generate the report because it will download the data from GEO and process it
  2. update_report.py: Updates an existing GEO Series or add a new GEO Series on the generated report - it's convenient to modify individual entries without regenerating the entire report
  3. generate_library.py: Creates an autoscan library from the GEO report

Core Modules

The src/analyzer.py handles the quality analysis of GEO data using biolearn. The src/scraper.py class manages web scraping of GEO metadata.

Usage

1. Generating a New Report

The new_report.py script generates a new quality report from a CSV file containing GEO dataset IDs.

# Before running, ensure:
# 1. Create a 'dist' directory
# 2. Place your clockbase_geo_list.csv file in the root directory

pipenv run python new_report.py
  • Output:
    • Generated report in dist/ directory geo_report.yaml
    • Failed IDs list in dist/failed_geo_ids.txt (if any failures occur)

2. Updating an Existing Report

The update_report.py script updates an existing report by processing a list of GEO Series IDs.

# Ensure you have:
# 1. dist/geo_report.yaml (existing report)
# 2. dist/geo_series_ids.txt (file containing GEO Series IDs joined by comma, like this: GSE123456,GSE123457,GSE123458)

pipenv run python update_report.py
  • Output: Updated report in the dist/ directory

3. Generating Autoscan Library

The generate_library.py script creates an autoscan library from a GEO report.

# Ensure you have:
# dist/geo_report.yaml file

pipenv run python generate_library.py
  • Output: Generated autoscan library dist/geo_autoscan_library.yaml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages