Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanner reliability #325

Open
2 tasks
mwhamgenomics opened this issue Mar 22, 2018 · 1 comment
Open
2 tasks

Scanner reliability #325

mwhamgenomics opened this issue Mar 22, 2018 · 1 comment

Comments

@mwhamgenomics
Copy link
Collaborator

At the moment, we sometimes see two datasets kicked off at the same time, probably because of lag in the dataset scan. We should:

  • Add some logging to the scanner so we can see everything being run
    • This should be set up independently of the (dataset-specific) pipeline logging
  • Decouple the scan from the pipeline processing so there is always a single scanning process
    • This would mean the core pipeline is run with something like analysis_driver --sample <sample_id>
    • The status manipulation (reset, resume, etc.) could be moved somewhere else
@tcezard
Copy link

tcezard commented Mar 24, 2018

It seems that the scanner is not the only part that could be slow.
When the lustre storage is busy python can take a long time to load. This means that the scanners would all start off at the same time. we should also explore:

  • reducing the amount of dependencies of analysis driver. (I've often seen it stuck in loading panda).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants