Skip to content

interval index dag #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open

interval index dag #120

wants to merge 3 commits into from

Conversation

xyg123
Copy link

@xyg123 xyg123 commented Mar 27, 2025

this PR adds the step config for running the interval index needed to generate interval features for L2G.

It processes the interval files from their raw format, then identifies interval regions that overlaps our variant index.

@xyg123 xyg123 requested review from Copilot and project-defiant and removed request for Copilot March 27, 2025 13:43
Copy link
Collaborator

@project-defiant project-defiant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will require additional changes to the gentropy.yaml and pis.yaml

  • Ideally we want to move the interval raw sources to the $release_bucket/input/interval with PIS. Have a look at pis.yaml
  • After having raw intervals in the input/intervals we want to refer to them from the gentropy.yaml in simillar way how you defined it in the genetics_etl.yaml.

Note

genetics_etl.yaml is going to be superseded by gentropy.yaml once we define the development workflow for the orchestration and allow to run a subpart of the unified pipeline dag.

@project-defiant
Copy link
Collaborator

project-defiant commented Apr 1, 2025

@xyg123 can you post into the PR the link to the execution of the intervalStep from google cloud dtaproc job and other steps that depend on it?

@xyg123
Copy link
Author

xyg123 commented Apr 4, 2025

Successful run of interval step to generate interval index, took ~10 hours.

@project-defiant
Copy link
Collaborator

@xyg123 we need to make some strategic decisions here, because we can not have a step that takes ~10h to calculate during each release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants