Scanner reliability #325

mwhamgenomics · 2018-03-22T11:56:15Z

At the moment, we sometimes see two datasets kicked off at the same time, probably because of lag in the dataset scan. We should:

Add some logging to the scanner so we can see everything being run
- This should be set up independently of the (dataset-specific) pipeline logging
Decouple the scan from the pipeline processing so there is always a single scanning process
- This would mean the core pipeline is run with something like analysis_driver --sample <sample_id>
- The status manipulation (reset, resume, etc.) could be moved somewhere else

The text was updated successfully, but these errors were encountered:

tcezard · 2018-03-24T06:52:39Z

It seems that the scanner is not the only part that could be slow.
When the lustre storage is busy python can take a long time to load. This means that the scanners would all start off at the same time. we should also explore:

reducing the amount of dependencies of analysis driver. (I've often seen it stuck in loading panda).

mwhamgenomics added new feature enhancement labels Mar 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scanner reliability #325

Scanner reliability #325

mwhamgenomics commented Mar 22, 2018

tcezard commented Mar 24, 2018 •

edited

Loading

Scanner reliability #325

Scanner reliability #325

Comments

mwhamgenomics commented Mar 22, 2018

tcezard commented Mar 24, 2018 • edited Loading

tcezard commented Mar 24, 2018 •

edited

Loading