
Yggdrasil is an in-house orchestration framework designed to automate well-defined workflows. It watches directories, CouchDB changes, etc., then calls realm modules (external or internal packages) to do the heavy lifting. Example realms today:
tenx
(internal) - 10x Genomics best practice analysissmartseq3
(internal) - Smart-seq3 best practice analysisdataflow-dmx
(external) - [under developmennt] Demultiplexing pipeline for Illumina / Aviti / ONT- (more to come)
External realms self-register through the entry-point group ygg.handler
.
- Installation
- Install External Realms
- Project Structure
- Usage
- Configuration
- Development Guidelines
- Contributing
- License
# Clone & create an isolated env
git clone https://github.com/NationalGenomicsInfrastructure/Yggdrasil.git
cd Yggdrasil
conda create -n ygg-dev python=3.11 pip
conda activate ygg-dev
# Editable install with dev extras (ruff, mypy, ...)
pip install -e .[dev]
# Run Yggdrasil
yggdrasil
# Or alternatively
python -m yggdrasil
- Runtime dependencies come from
[project] dependencies
inpyproject.toml
. - Dev tooling is pulled from
[project.optional-dependencies] dev
.
# 1. Clone & create an isolated env
git clone https://github.com/NationalGenomicsInfrastructure/Yggdrasil.git
cd Yggdrasil
conda create -n ygg python=3.11 pip
conda activate ygg
# 2. Install locked runtime stack
pip install -r requirements/lock.txt
# 3. Install Yggdrasil itself (no dev extras)
pip install -e .
requirements/lock.txt
is generated from the dependency list with pip-compile --strip-extras
# Clone next to Yggdrasil or organize in a `realms` dir (any folder works)
git clone https://github.com/NationalGenomicsInfrastructure/dmx.git
pip install -e ./dmx
Restart Yggdrasil so it re-scans entry-points. Startup log shows the handler is active:
✓ registered external handler flowcell-dmx for FLOWCELL_READY
When a new event is detected, Yggdrasil schedules the appropriate handler as an async background task in its event loop.
Brief overview of the main components and directories:
Yggdrasil/
├── lib/
│ ├── base/
│ ├── core_utils/
│ ├── couchdb/
│ ├── handlers/
│ ├── module_utils/
│ ├── realms/
│ │ ├── tenx/
│ │ └── smartseq3/
│ └── watchers/
├── tests/
├── .github/
│ └── workflows/
├── requirements/
├── yggdrasil.py
├── ygg_trunk.py (depr)
├── ygg-mule.py (depr)
├── pyproject.toml
├── LICENSE
└── README.md
- lib/: Core library containing base classes and utilities.
- base/: Abstract base classes and interfaces.
- core_utils/: Utility modules for Yggdrasil core functionalities.
- couchdb/: Classes specific for Yggdrasil-CouchDB interactions and document management.
- handlers/: Base classes and built-in event/data handlers for processing and workflow orchestration.
- module_utils/: Utility modules for various Yggdrasil module functionalities.
- realms/: Internal modules specific to different sequencing technologies (e.g. TenX, SmartSeq3, etc.)
- watchers/: File system and CouchDB watchers for monitoring and triggering events.
- tests/: Test cases for the application.
- .github/workflows/: GitHub Actions workflows for CI/CD.
- requirements/: Dependency lock files and requirements management for reproducible environments.
Yggdrasil has a single entry-point for both daemon operation (background watchers + handlers) and one-off project processing. After you installed Yggdrasil in an environment, call it in the following way:
yggdrasil [--dev] {daemon | run-doc} [OPTIONS]
Global flag | Description |
---|---|
--dev |
Turns on development mode: • DEBUG-level logging • Dev-mode configuration overrides (useful on a laptop) |
You can also run the CLI via python -m yggdrasil
or python -m yggdrasil.cli
if you prefer.
Starts the long-running service:
- instantiates all configured watchers (file-system, CouchDB, ...);
- auto-registers built-in and external handlers;
- processes events until you stop it with Ctrl-C.
# production-style run
yggdrasil daemon
# verbose local run
yggdrasil --dev daemon
Logs are written to the directory set in yggdrasil_workspace/common/configurations/config.json
→ yggdrasil_log_dir
.
Processes exactly one CouchDB project document and then exits. Useful for manual re-processing or debugging.
yggdrasil run-doc DOC_ID [--manual-submit]
Option | Meaning |
---|---|
--manual-submit |
Force manual HPC submission for this invocation (handlers check a session flag instead of auto‐calling sbatch ). |
Objective: Rerun project N.Surname (CouchDB doc_id: a1b2c3d4e5f), but stop before Slurm submission because we need to manually edit the project's configurations.
# Initially run
yggdrasil run-doc a1b2c3d4e5f --manual-submit
After you run this, manually edit the project as needed and submit to Slurm. Copy the Slurm job_id
to the respective field in the project's CouchDB doc, and re-run the same command:
yggdrasil run-doc a1b2c3d4e5f --manual-submit`
Yggdrasil will pick up the running Slurm job and wait for it until it finishes, to continue with post-processing.
You want to… | Command |
---|---|
Run Yggdrasil as a background service | yggdrasil daemon |
Same, but with dev logging & dev servers | yggdrasil --dev daemon |
(re)Process one document | yggdrasil run-doc <DOC_ID> |
(re)Process with manual Slurm submission | yggdrasil run-doc <DOC_ID> --manual-submit |
When developing, use module form instead of console-script | python -m yggdrasil ... |
Yggdrasil uses a configuration loader to manage settings. Configuration files should be placed in the yggdrasil_workspace/common/configurations
directory. This directory path can be adjusted in the lib/core_utils/common.py
script if needed.
config.json: This file contains global settings for Yggdrasil.
Fields:
- yggdrasil_log_dir: Directory where logs will be stored.
- couchdb_url: URL of the CouchDB server (host:port format).
- couchdb_database: Name of the CouchDB project database.
- couchdb_status_tracking: Name of the CouchDB yggdrasil database for project status tracking.
- couchdb_poll_interval: Interval (in seconds) for polling CouchDB for changes.
- job_monitor_poll_interval: Interval (in seconds) for polling the job monitor.
- activate_ngi_cmd: Command to activate NGI environment (can be "None" if not used).
- report_transfer: Settings for transferring reports (server, user, destination, ssh_key).
Example Configuration File (config.json)
{
"yggdrasil_log_dir": "yggdrasil_workspace/logs",
"couchdb_url": "<host>:<port>",
"couchdb_database": "my_projects",
"couchdb_status_tracking": "my_yggdrasil_db",
"couchdb_poll_interval": 3,
"job_monitor_poll_interval": 5,
"activate_ngi_cmd": "None",
"report_transfer": {
"server": "<server>",
"user": "<username>",
"destination": "<destination_path>",
"ssh_key": "<ssh_key_path>"
}
}
module_registry.json: This file maps different library construction methods to their respective internal processing modules. The modules specified here will be dynamically loaded and executed based on the entire name of a library_prep_method
specified in the CouchDB document, or a designated prefix of them.
Example:
{
"SmartSeq 3": {
"module": "lib.realms.smartseq3.smartseq3.SmartSeq3"
},
"10X": {
"module": "lib.realms.tenx.tenx_project.TenXProject",
"prefix": true
}
}
- SmartSeq 3:
- module: The path to the module handling SmartSeq 3 library data.
- 10X:
- module: The path to the module handling 10X-prefixed library data.
The following variables can also be set in the config.json
, but for safety reasons, you are endorsed to set them as environment variables, like so:
- COUCH_USER: Your CouchDB username.
- COUCH_PASS: Your CouchDB password.
Yggdrasil uses a custom logging utility to manage logs. Logs are stored in the directory specified by the yggdrasil_log_dir
configuration.
Debug Logging: By setting the --dev
flag when running Yggdrasil, the debug logging is enabled automatically.
Ensure you have activated the Conda environment, and have installed runtime + dev tools. The latter can be done in one go with:
pip install -e .[dev]
.[dev]
pulls:
- ruff (lint) · black (format) · mypy (type-check)
- pip-tools (
pip-compile
) - pre-commit itself — no separate pip install needed.
Use pre-commit to automate code formatting and linting on each commit.
# Install Git hooks (runs ruff / black / mypy automatically)
pre-commit install
Task | Command |
---|---|
Format everything | black . |
Lint | ruff check . |
Static types | mypy . |
Run all hooks | pre-commit run --all-files |
(Hooks fire automatically on git commit
; run manually only if you want a
full pass before staging.)
Install extensions:
- Python (Microsoft)
- Ruff (Astral Software)
- Black Formatter (Microsoft)
- Mypy Type Checker (Microsoft)
VSCode Settings
Add to settings.json
(user or workspace):
{
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.formatOnSave": true,
"ruff.configuration": "pyproject.toml",
"mypy-type-checker.args": [ "--config-file=pyproject.toml" ]
}
Ignore bulk-format commits so git blame stays useful:
git config blame.ignoreRevsFile .git-blame-ignore-revs
Append the commit (full) hashes of large "black-only" or "ruff-fix" commits to the .git-blame-ignore-revs
file (one hash per line), e.g.:
a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0
b1c2d3e4f5g6h7i8j9k0l1m2n3o4p5q6r7s8t9u0
GitHub Actions are set up to automatically run ruff
, black
, and mypy
on pushes and pull requests.
- Workflow File:
.github/workflows/lint.yml
- Jobs:
ruff-check
,black-check
,mypy-check
- Each job installs exact runtime versions from
requirements/lock.txt
, then the tool it needs.
Contributions are very welcome! To have as smooth of an experience as possible, the following guidelines are recommended:
- Forking: Fork the main repository to your personal GitHub account.
- Git workflow: Open pull-requests against the
dev
branch. - Code Style: Format with
black
and lint withruff
. - Type Annotations: If you use type annotations make sure to set (and pass)
mypy
checks. - Pre-commit:
black
,ruff
, andmypy
run automatically. Make surepre-commit install
is enabled and hooks pass before pushing. - Documentation: Documented contributions are easier to understand and review.
Suggested contributions: Tests, Bug Fixes, Code Optimization, New Modules (reach out to Anastasios if you don't know where to start with developing a new module).
Yggdrasil is licensed under the MIT License - see the LICENSE file for details.