Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: production hotfixes #98

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 2 additions & 7 deletions qp_klp/Assays.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ def generate_prep_file(self):
seqpro_path,
config['modules_to_load'],
self.master_qiita_job_id,
join(self.pipeline.output_path, 'ConvertJob'),
self.reports_path,
is_amplicon=True)

if 'GenPrepFileJob' not in self.skip_steps:
Expand Down Expand Up @@ -417,11 +417,6 @@ def generate_reports(self):
def generate_prep_file(self):
config = self.pipeline.get_software_configuration('seqpro')

if 'ConvertJob' in self.raw_fastq_files_path:
reports_dir = join(self.pipeline.output_path, 'ConvertJob')
elif 'TRIntegrateJob' in self.raw_fastq_files_path:
reports_dir = join(self.pipeline.output_path, 'SeqCountsJob')

job = GenPrepFileJob(self.pipeline.run_dir,
self.raw_fastq_files_path,
join(self.pipeline.output_path, 'NuQCJob'),
Expand All @@ -430,7 +425,7 @@ def generate_prep_file(self):
config['seqpro_path'],
config['modules_to_load'],
self.master_qiita_job_id,
reports_dir)
self.reports_path)

if 'GenPrepFileJob' not in self.skip_steps:
job.run(callback=self.job_callback)
Expand Down
13 changes: 13 additions & 0 deletions qp_klp/Protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,14 @@ def get_config(command):
if 'ConvertJob' not in self.skip_steps:
job.run(callback=self.job_callback)

# if successful, set self.reports_path
self.reports_path = join(self.pipeline.output_path,
'ConvertJob',
'Reports',
'Demultiplex_Stats.csv')
# TODO: Include alternative path when using bcl2fastq instead of
# bcl-convert.

# audit the results to determine which samples failed to convert
# properly. Append these to the failed-samples report and also
# return the list directly to the caller.
Expand Down Expand Up @@ -157,6 +165,11 @@ def generate_sequence_counts(self):
if 'SeqCountsJob' not in self.skip_steps:
job.run(callback=self.job_callback)

# if successful, set self.reports_path
self.reports_path = join(self.pipeline.output_path,
'SeqCountsJob',
'SeqCounts.csv')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file consistent in schema with Demultiplex_Stats.csv used in the other object for reports_path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Demultiplex_Stats.csv's schema is defined and maintained by Illumina and carries additional stats beyond sequence counts. SeqCounts.csv is defined and maintained by us and just records the total raw-read counts for forward and reverse and the lane number per sample.

Consistency in this case is not an issue since the consumer of the metadata is the metapool module, and it only uses Demultuplex_Stats.csv for its raw-read counts. The equivalent output file for the bcl2fastq is similarly different. Our new method is more accurate in that it counts both the forward and reverse reads while I believe in Demultuplex_Stats.csv we just double the forward count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it risky, and unexpected, for an attribute common across subclasses to have a different interpretation?


# Do not add an entry to fsr because w/respect to counting, either
# all jobs are going to fail or none are going to fail. It's not
# likely that we're going to fail to count sequences for only some
Expand Down
2 changes: 1 addition & 1 deletion qp_klp/StandardAmpliconWorkflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def __init__(self, **kwargs):
# NB: Amplicon workflows don't have failed samples records because
# the fastq files are not demultiplexed.

self.master_qiita_job_id = None
self.master_qiita_job_id = self.kwargs['job_id']

self.lane_number = self.kwargs['lane_number']
self.is_restart = bool(self.kwargs['is_restart'])
Expand Down
40 changes: 10 additions & 30 deletions qp_klp/StandardMetagenomicWorkflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def __init__(self, **kwargs):
self.fsr = FailedSamplesRecord(self.kwargs['output_dir'],
self.pipeline.sample_sheet.samples)

self.master_qiita_job_id = None
self.master_qiita_job_id = self.kwargs['job_id']

self.lane_number = self.kwargs['lane_number']
self.is_restart = bool(self.kwargs['is_restart'])
Expand Down Expand Up @@ -73,45 +73,25 @@ def execute_pipeline(self):
Executes steps of pipeline in proper sequence.
:return: None
'''
if not self.is_restart:
self.pre_check()
self.pre_check()

# this is performed even in the event of a restart.
self.generate_special_map()

# even if a job is being skipped, it's being skipped because it was
# determined that it already completed successfully. Hence,
# increment the status because we are still iterating through them.

self.update_status("Converting data", 1, 9)
if "ConvertJob" not in self.skip_steps:
# converting raw data to fastq depends heavily on the instrument
# used to generate the run_directory. Hence this method is
# supplied by the instrument mixin.
# NB: convert_raw_to_fastq() now generates fsr on its own.
self.convert_raw_to_fastq()

self.convert_raw_to_fastq()

self.update_status("Performing quality control", 2, 9)
if "NuQCJob" not in self.skip_steps:
# quality_control generates its own fsr now
self.quality_control(self.pipeline)

self.quality_control()

self.update_status("Generating reports", 3, 9)
if "FastQCJob" not in self.skip_steps:
# reports are currently implemented by the assay mixin. This is
# only because metagenomic runs currently require a failed-samples
# report to be generated. This is not done for amplicon runs since
# demultiplexing occurs downstream of SPP.
results = self.generate_reports()
self.fsr_write(results, 'FastQCJob')

self.generate_reports()

self.update_status("Generating preps", 4, 9)
if "GenPrepFileJob" not in self.skip_steps:
# preps are currently associated with array mixin, but only
# because there are currently some slight differences in how
# FastQCJob gets instantiated(). This could get moved into a
# shared method, but probably still in Assay.
self.generate_prep_file()

self.generate_prep_file()

# moved final component of genprepfilejob outside of object.
# obtain the paths to the prep-files generated by GenPrepFileJob
Expand Down
40 changes: 10 additions & 30 deletions qp_klp/StandardMetatranscriptomicWorkflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def __init__(self, **kwargs):
self.fsr = FailedSamplesRecord(self.kwargs['output_dir'],
self.pipeline.sample_sheet.samples)

self.master_qiita_job_id = None
self.master_qiita_job_id = self.kwargs['job_id']

self.lane_number = self.kwargs['lane_number']
self.is_restart = bool(self.kwargs['is_restart'])
Expand Down Expand Up @@ -74,45 +74,25 @@ def execute_pipeline(self):
Executes steps of pipeline in proper sequence.
:return: None
'''
if not self.is_restart:
self.pre_check()
self.pre_check()

# this is performed even in the event of a restart.
self.generate_special_map()

# even if a job is being skipped, it's being skipped because it was
# determined that it already completed successfully. Hence,
# increment the status because we are still iterating through them.

self.update_status("Converting data", 1, 9)
if "ConvertJob" not in self.skip_steps:
# converting raw data to fastq depends heavily on the instrument
# used to generate the run_directory. Hence this method is
# supplied by the instrument mixin.
# NB: convert_raw_to_fastq() now generates fsr on its own
results = self.convert_raw_to_fastq()

self.convert_raw_to_fastq()

self.update_status("Performing quality control", 2, 9)
if "NuQCJob" not in self.skip_steps:
# NB: quality_control generates its own fsr
self.quality_control(self.pipeline)

self.quality_control()

self.update_status("Generating reports", 3, 9)
if "FastQCJob" not in self.skip_steps:
# reports are currently implemented by the assay mixin. This is
# only because metaranscriptomic runs currently require a failed-
# samples report to be generated. This is not done for amplicon
# runs since demultiplexing occurs downstream of SPP.
results = self.generate_reports()
self.fsr_write(results, 'FastQCJob')

self.generate_reports()

self.update_status("Generating preps", 4, 9)
if "GenPrepFileJob" not in self.skip_steps:
# preps are currently associated with array mixin, but only
# because there are currently some slight differences in how
# FastQCJob gets instantiated(). This could get moved into a
# shared method, but probably still in Assay.
self.generate_prep_file()

self.generate_prep_file()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the stuff in this method is the same as what's in the StandardMetagenomicWorkflow, can execute_pipeline be pushed to a base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could push much of both into the MetaOmics base class off the top of my head. Those two child classes are very similar. Currently they differ only in the string constants we define, off the top of my head. More than happy to include that in my next push.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea?


# moved final component of genprepfilejob outside of object.
# obtain the paths to the prep-files generated by GenPrepFileJob
Expand Down
2 changes: 1 addition & 1 deletion qp_klp/TellseqMetagenomicWorkflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def __init__(self, **kwargs):

self.iseq_run = True if type == 'iSeq' else False

self.master_qiita_job_id = None
self.master_qiita_job_id = self.kwargs['job_id']

self.lane_number = self.kwargs['lane_number']
self.is_restart = bool(self.kwargs['is_restart'])
Expand Down
Loading