Skip to content

test full runs in WorkflowFactory.py #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
May 6, 2025
Merged

Conversation

antgonza
Copy link
Member

@antgonza antgonza commented May 1, 2025

No description provided.

@antgonza antgonza changed the title test full runs in WorkflowFactory.py [WIP] test full runs in WorkflowFactory.py May 1, 2025
@@ -14,6 +14,19 @@ INSERT INTO
study_abstract
)
VALUES
(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amplicon actually inserts preps to study 1 (exists as default in qiita) and 2, which doesn't exist.

@antgonza
Copy link
Member Author

antgonza commented May 5, 2025

@AmandaBirmingham, this is ready for review. This supersedes #104. This version is actually running all the pipelines to the end: submitting new preps to Qiita; which is nice and should help for future improvements; this additionally means that now we know all the steps for all pipelines and all of them have the same steps (even if they are just "pass").

@antgonza antgonza requested a review from AmandaBirmingham May 5, 2025 19:23
@antgonza antgonza changed the title [WIP] test full runs in WorkflowFactory.py test full runs in WorkflowFactory.py May 5, 2025
Copy link
Contributor

@AmandaBirmingham AmandaBirmingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions and a typo :)


# Metagenomic is a valid data type in the default qiita test
# database but job-id: 78901 doesn't exist; however, if we get
# to here, it means that all steps have ran to completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be "have run"

self._inject_data(wf)
# Metatranscriptomic is not a valid data type in the default qiita test
# database; however, if we get to here, it means that all steps have
# ran to completion and the system is trying to create the preps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same typo as above

# to here, it means that all steps have ran to completion
# and the system is trying to create the preps.
with self.assertRaisesRegex(RuntimeError, 'invalid input '
'syntax for type uuid: "78901"'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by the error message and the comment here. What does "invalid input syntax for type uuid " mean? The comment mentions that 78901 is a job id, so why does the error message refer to it as a "type uuid"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qiita job ids are UUID in the database, this tests uses 78901 as the job id so the database complains about the format. IMOO it is fine to let it fail here as is the last step of the processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, got it! For the benefit of non-experts like me reading the tests, would you consider having these repeated comments say essentially what you just did?

        # Metagenomic is a valid data type in the default qiita test
        # database but the test job-id, 78901, isn't in the 
        # format of a uuid as expected; however, if we get
        # to here, it means that all steps have run to completion


# Amplicon is a valid data type in the default qiita test
# database but job-id: 78901 doesn't exist; however, if we get
# to here, it means that all steps have ran to completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same typo as above


# Metagenomic is a valid data type in the default qiita test
# database but job-id: 78901 doesn't exist; however, if we get
# to here, it means that all steps have ran to completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same typo as above

"""
pass

def quality_control(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, post_process_raw_fastq_output is only for amplicons and quality_control is only for metaomics; they are mutually exclusive, and whichever one the assay in question has gets called in "qc-ing reads" step of execute_pipeline:

self.update_status("QC-ing reads", 2, 9)
if "NuQCJob" not in self.skip_steps:
            self.post_process_raw_fastq_output()
            self.quality_control()

Given this, why have two separate methods? It seems like we could have a def qc_reads() method on Assay with no default contents and we could override it in Amplicon with whatever is currently inAmplicon.post_process_raw_fastq_output and in MetaOmics with whatever is currently in MetaOmics.quality_control. Then execute_pipeline could just say:

self.update_status("QC-ing reads", 2, 9)
if "NuQCJob" not in self.skip_steps:
           self.qc_reads()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch and suggestion, thank you!

if "GenPrepFileJob" not in self.skip_steps:
# preps are currently associated with array mixin, but only
# because there are currently some slight differences in how
# FastQCJob gets instantiated(). This could get moved into a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little confused by the mention of FastQCJob here since I thought we handled that at line 189 ... is this some sort of downstream consequence of that stuff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing, left behind by mistake.

# prep is pointing to.
self.load_preps_into_qiita()

self.fsr.generate_report()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we toss this line a comment, too? I see that it was in MetaOmic.execute_pipeline but not in StandardAmpliconWorkflow.execute_pipeline, so now I'm curious about why ... :D. I also notice that at one point in MetaOmic the code checks if hasattr(self, 'fsr'): before calling self.fsr.<something>() ... do we need to worry that not every assay type or every individual assay instance will have an fsr property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsr is an instance of FailedSamplesRecord, which is its own object, and as far as I can tell is used with Job.audit to keep track of the samples lost on each of the steps of the pipeline. Here we call FailedSamplesRecord. generate_report to output the report so it's moved to the final output and the user can get access. Adding this info in the code.

Copy link
Contributor

@AmandaBirmingham AmandaBirmingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes and explanations. One more request for changes--only to comments, though!--and it looks good to me.

# to here, it means that all steps have ran to completion
# and the system is trying to create the preps.
with self.assertRaisesRegex(RuntimeError, 'invalid input '
'syntax for type uuid: "78901"'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, got it! For the benefit of non-experts like me reading the tests, would you consider having these repeated comments say essentially what you just did?

        # Metagenomic is a valid data type in the default qiita test
        # database but the test job-id, 78901, isn't in the 
        # format of a uuid as expected; however, if we get
        # to here, it means that all steps have run to completion

@antgonza
Copy link
Member Author

antgonza commented May 6, 2025

Thank you @AmandaBirmingham !

@antgonza antgonza merged commit c0230b9 into qiita-spots:main May 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants