Dev runs #135

javfg · 2025-06-19T16:10:59Z

This PR adds a way to run single steps from ETL or Gentropy into the orchestrator. See the updated README.md for more info.

Also, it sets all clusters' idle_ttl to 5 minutes, given the fact all steps now respawn clusters if needed.

ireneisdoomed · 2025-08-04T10:22:37Z

I have tried running the L2G training step using this branch and didn't manage to.

I first run into an issue with the new dependency: deepmerge. It was installed in my project, but not on the Airflow instance.
I reset the Airflow instance by running docker compose build --no-cache, and then doing make dev
I then had a DAG error indicating that PIS_L2G was meant to run, even though I have specified to only run the gentropy step.

Broken DAG: [/opt/airflow/dags/src/orchestration/dags/unified_pipeline.py]
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/opt/airflow/dags/src/orchestration/dags/unified_pipeline.py", line 495, in <module>
    step_tasks["start"].set_upstream(steps[dep]["end"])
                                     ~~~~~^^^^^
KeyError: 'pis_l2g'

This is the config I used:

# unified_pipeline.yaml

################################################################################
# UNIFIED PIPELINE CONFIGURATION
################################################################################

# `release_name` is the prefix used as work path when the pipeline is run in
# _production mode_ (`is_dev: false`). It is where step inputs will be read
# from, and outputs will be written to.
release_name: '25.06'

# `is_dev` decides whether this is a run for a release or development purposes.
# On a dev run, only one step of the pipeline can be selected to run; and that
# step can only be from the `etl` or `gentropy` stages (PIS and PTS steps can be
# run locally in a very easy way without using the orchestrator).
# If `is_dev` is true, the `run_name` and `run_steps` parameters must be set.
is_dev: true

### NOTE: The next two settings are exclusive for dev runs.
# `dev_run_name` is the output folder for a dev run. The convention is:
# `<username>/<release_name>-<description>`
dev_run_name: 'il/new-l2g'
# `dev_run_step` is the step that will be run in a dev run.
dev_run_step: gentropy_l2g_training

# gentropy.yaml - where i configured the step
  l2g_training:
    params:
      step: locus_to_gene
      step.session.write_mode: overwrite
      step.run_mode: train
      step.wandb_run_name: '{{l2g_training_version}}'
      step.cross_validate: false
      step.hf_hub_repo_id: opentargets/locus_to_gene_xgboost
      step.hf_model_commit_message: 'chore: update model base model for {{l2g_training_version}} run'
      +step.session.extended_spark_conf: "{spark.kryoserializer.buffer.max:500m, spark.sql.autoBroadcastJoinThreshold:'-1'}"
      # INPUTS
      step.credible_set_path: '{{release_uri}}/output/credible_set'
      step.feature_matrix_path: '{{release_uri}}/intermediate/l2g_feature_matrix'
      step.gold_standard_curation_path: '{{release_uri}}/input/l2g/gold_standard.json'
      # OUTPUTS
      step.model_path: '{{output_uri}}/etc/model/locus_to_gene_model/classifier.skops'

@javfg Whenever you have some time, could you let me know if I've done anything wrong here?

javfg added 5 commits June 19, 2025 16:55

chore: clean up and format etl.conf

a5cb85c

feat: dev runs

c519df7

chore: move manifest stuff into differ

889bb5c

chore: decomission clusters faster

abf650b

chore: set local instance as default in make

851336d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev runs #135

Dev runs #135

Uh oh!

javfg commented Jun 19, 2025

Uh oh!

ireneisdoomed commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dev runs #135

Are you sure you want to change the base?

Dev runs #135

Uh oh!

Conversation

javfg commented Jun 19, 2025

Uh oh!

ireneisdoomed commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants