-
Notifications
You must be signed in to change notification settings - Fork 4
test full runs in WorkflowFactory.py #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -14,6 +14,19 @@ INSERT INTO | |||
study_abstract | |||
) | |||
VALUES | |||
( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amplicon actually inserts preps to study 1 (exists as default in qiita) and 2, which doesn't exist.
@AmandaBirmingham, this is ready for review. This supersedes #104. This version is actually running all the pipelines to the end: submitting new preps to Qiita; which is nice and should help for future improvements; this additionally means that now we know all the steps for all pipelines and all of them have the same steps (even if they are just "pass"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions and a typo :)
tests/test_WorkflowFactory.py
Outdated
|
||
# Metagenomic is a valid data type in the default qiita test | ||
# database but job-id: 78901 doesn't exist; however, if we get | ||
# to here, it means that all steps have ran to completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: should be "have run"
self._inject_data(wf) | ||
# Metatranscriptomic is not a valid data type in the default qiita test | ||
# database; however, if we get to here, it means that all steps have | ||
# ran to completion and the system is trying to create the preps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same typo as above
# to here, it means that all steps have ran to completion | ||
# and the system is trying to create the preps. | ||
with self.assertRaisesRegex(RuntimeError, 'invalid input ' | ||
'syntax for type uuid: "78901"'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by the error message and the comment here. What does "invalid input syntax for type uuid " mean? The comment mentions that 78901 is a job id, so why does the error message refer to it as a "type uuid"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qiita job ids are UUID in the database, this tests uses 78901 as the job id so the database complains about the format. IMOO it is fine to let it fail here as is the last step of the processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, got it! For the benefit of non-experts like me reading the tests, would you consider having these repeated comments say essentially what you just did?
# Metagenomic is a valid data type in the default qiita test
# database but the test job-id, 78901, isn't in the
# format of a uuid as expected; however, if we get
# to here, it means that all steps have run to completion
tests/test_WorkflowFactory.py
Outdated
|
||
# Amplicon is a valid data type in the default qiita test | ||
# database but job-id: 78901 doesn't exist; however, if we get | ||
# to here, it means that all steps have ran to completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same typo as above
tests/test_WorkflowFactory.py
Outdated
|
||
# Metagenomic is a valid data type in the default qiita test | ||
# database but job-id: 78901 doesn't exist; however, if we get | ||
# to here, it means that all steps have ran to completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same typo as above
src/qp_klp/Assays.py
Outdated
""" | ||
pass | ||
|
||
def quality_control(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, post_process_raw_fastq_output
is only for amplicons and quality_control
is only for metaomics; they are mutually exclusive, and whichever one the assay in question has gets called in "qc-ing reads" step of execute_pipeline
:
self.update_status("QC-ing reads", 2, 9)
if "NuQCJob" not in self.skip_steps:
self.post_process_raw_fastq_output()
self.quality_control()
Given this, why have two separate methods? It seems like we could have a def qc_reads()
method on Assay
with no default contents and we could override it in Amplicon
with whatever is currently inAmplicon.post_process_raw_fastq_output
and in MetaOmics
with whatever is currently in MetaOmics.quality_control
. Then execute_pipeline
could just say:
self.update_status("QC-ing reads", 2, 9)
if "NuQCJob" not in self.skip_steps:
self.qc_reads()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch and suggestion, thank you!
src/qp_klp/Assays.py
Outdated
if "GenPrepFileJob" not in self.skip_steps: | ||
# preps are currently associated with array mixin, but only | ||
# because there are currently some slight differences in how | ||
# FastQCJob gets instantiated(). This could get moved into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little confused by the mention of FastQCJob
here since I thought we handled that at line 189 ... is this some sort of downstream consequence of that stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing, left behind by mistake.
# prep is pointing to. | ||
self.load_preps_into_qiita() | ||
|
||
self.fsr.generate_report() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we toss this line a comment, too? I see that it was in MetaOmic.execute_pipeline
but not in StandardAmpliconWorkflow.execute_pipeline
, so now I'm curious about why ... :D. I also notice that at one point in MetaOmic
the code checks if hasattr(self, 'fsr'):
before calling self.fsr.<something>()
... do we need to worry that not every assay type or every individual assay instance will have an fsr
property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fsr is an instance of FailedSamplesRecord, which is its own object, and as far as I can tell is used with Job.audit to keep track of the samples lost on each of the steps of the pipeline. Here we call FailedSamplesRecord. generate_report to output the report so it's moved to the final output and the user can get access. Adding this info in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes and explanations. One more request for changes--only to comments, though!--and it looks good to me.
# to here, it means that all steps have ran to completion | ||
# and the system is trying to create the preps. | ||
with self.assertRaisesRegex(RuntimeError, 'invalid input ' | ||
'syntax for type uuid: "78901"'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, got it! For the benefit of non-experts like me reading the tests, would you consider having these repeated comments say essentially what you just did?
# Metagenomic is a valid data type in the default qiita test
# database but the test job-id, 78901, isn't in the
# format of a uuid as expected; however, if we get
# to here, it means that all steps have run to completion
Thank you @AmandaBirmingham ! |
No description provided.