rm duplicated execute_pipeline #104

antgonza · 2025-04-03T20:33:49Z

No description provided.

AmandaBirmingham

Some requests for comments but no requests for changes :) Thank you for hunting down such duplications and excising them ... I hope that someday soon we will be able to do the same with MetaOmic.execute_pipeline too!

qp-knight-lab-processing/qp_klp/Assays.py

Lines 524 to 627 in d7aec03

    
               def execute_pipeline(self): 
        
                   ''' 
        
                   Executes steps of pipeline in proper sequence. 
        
                   :return: None 
        
                   ''' 
        
                   self.pre_check() 
        
                   self.generate_special_map() 
        
                   self.update_status("Converting data", 1, 9) 
        
                   self.convert_raw_to_fastq() 
        
                   self.integrate_results() 
        
                   self.generate_sequence_counts() 
        
                   self.update_status("Performing quality control", 2, 9) 
        
                   self.quality_control() 
        
                   self.update_status("Generating reports", 3, 9) 
        
                   self.generate_reports() 
        
                   self.update_status("Generating preps", 4, 9) 
        
                   self.generate_prep_file() 
        
                   # moved final component of genprepfilejob outside of object. 
        
                   # obtain the paths to the prep-files generated by GenPrepFileJob 
        
                   # w/out having to recover full state. 
        
                   tmp = join(self.pipeline.output_path, 'GenPrepFileJob', 'PrepFiles') 
        
                   self.has_replicates = False 
        
                   prep_paths = [] 
        
                   self.prep_file_paths = {} 
        
                   rematch = re.compile( 
        
                       r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)" 
        
                       r"(?P<qid>[0-9]{5,6})\..\.tsv") 
        
                   for root, dirs, files in walk(tmp): 
        
                       for _file in files: 
        
                           # breakup the prep-info-file into segments 
        
                           # (run-id, project_qid, other) and cleave 
        
                           # the qiita-id from the project_name. 
        
                           rer = rematch.match(_file) 
        
                           if rer is None: 
        
                               continue 
        
                           _, _, qid = rer.groups() 
        
                           if qid not in self.prep_file_paths: 
        
                               self.prep_file_paths[qid] = [] 
        
                           _path = abspath(join(root, _file)) 
        
                           if _path.endswith('.tsv'): 
        
                               prep_paths.append(_path) 
        
                               self.prep_file_paths[qid].append(_path) 
        
                       for _dir in dirs: 
        
                           if _dir == '1': 
        
                               # if PrepFiles contains the '1' directory, then it's a 
        
                               # given that this sample-sheet contains replicates. 
        
                               self.has_replicates = True 
        
                   # currently imported from Assay although it is a base method. it 
        
                   # could be imported into Workflows potentially, since it is a post- 
        
                   # processing step. All pairings of assay and instrument type need to 
        
                   # generate prep-info files in the same format. 
        
                   self.overwrite_prep_files(prep_paths) 
        
                   # for now, simply re-run any line below as if it was a new job, even 
        
                   # for a restart. functionality is idempotent, except for the 
        
                   # registration of new preps in Qiita. These will simply be removed 
        
                   # manually. 
        
                   # post-processing steps are by default associated with the Workflow 
        
                   # class, since they deal with fastq files and Qiita, and don't depend 
        
                   # on assay or instrument type. 
        
                   self.update_status("Generating sample information", 5, 9) 
        
                   self.sifs = self.generate_sifs() 
        
                   # post-processing step. 
        
                   self.update_status("Registering blanks in Qiita", 6, 9) 
        
                   if self.update: 
        
                       self.update_blanks_in_qiita() 
        
                   self.update_status("Loading preps into Qiita", 7, 9) 
        
                   if self.update: 
        
                       self.update_prep_templates() 
        
                   # before we load preps into Qiita we need to copy the fastq 
        
                   # files n times for n preps and correct the file-paths each 
        
                   # prep is pointing to. 
        
                   self.load_preps_into_qiita() 
        
                   self.fsr.generate_report() 
        
                   self.update_status("Generating packaging commands", 8, 9) 
        
                   self.generate_commands() 
        
                   self.update_status("Packaging results", 9, 9) 
        
                   if self.update: 
        
                       self.execute_commands()

AmandaBirmingham · 2025-04-03T21:05:26Z

qp_klp/Assays.py

+        if "NuQCJob" not in self.skip_steps:
+            # there is no failed samples reporting for amplicon runs.
+            self.post_process_raw_fastq_output()
+            self.integrate_results()


It looks like, in the previous implementation of Assays.py's execute_pipeline method, the integrate_results() and generate_sequence_counts() calls were made while the status/step was set to "Converting data". Has it been decided that they fit more accurately under the "Performing quality control" step?

AmandaBirmingham · 2025-04-03T21:35:03Z

qp_klp/Assays.py

+            # supplied by the instrument mixin.
+            self.convert_raw_to_fastq()
+
+        self.update_status("Performing quality control", 2, 9)


Is it possible to add a bit of additional commenting for this section? My understanding is that this PR is merging the previous Amplicon execute_pipeline method with the StandardAmpliconWorkflow execute_pipeline method, both of which are for (duh) Amplicon processing, right? But now we are moving the method up into the Assay class, which is the parent of both Amplicon and MetaOmic. I assume I am not reading the codebase right, because it looks to me like the post_process_raw_fastq_output() method only exists in Amplicon and the quality_control method only exists in MetaOmic. What is going to happen when an Amplicon assay invokes this method and gets to the quality_control() call, but doesn't have a mixin providing that method?

AmandaBirmingham · 2025-04-03T21:44:51Z

qp_klp/Assays.py

+        if "GenPrepFileJob" not in self.skip_steps:
+            # preps are currently associated with array mixin, but only
+            # because there are currently some slight differences in how
+            # FastQCJob gets instantiated(). This could get moved into a


A little confused by the mention of FastQCJob here since I thought we handled that at line 178 ... is this some sort of downstream consequence of that stuff?

AmandaBirmingham · 2025-04-03T21:49:42Z

qp_klp/Assays.py

+            # shared method, but probably still in Assay.
+            self.generate_prep_file()
+        else:
+            # if GenPrepFileJob is not run, then we need to find the


Hmm, this is a pretty large change in logic from both the previous versions of this code, right? In both the previous versions, this big block of code that populates the public self.prep_file_paths property was being performed in all cases, not just when GenPrepFileJob was being skipped or generate_prep_file wasn't being called. What has changed that makes it no longer necessary to explicitly populate this property every time?

AmandaBirmingham · 2025-04-03T21:56:39Z

qp_klp/Assays.py

+        # prep is pointing to.
+        self.load_preps_into_qiita()
+
+        self.fsr.generate_report()


Could we toss this line a comment, too? I see that it was in Amplicon.execute_pipeline but not in StandardAmpliconWorkflow.execute_pipeline, so now I'm curious about why ... :D

antgonza · 2025-05-05T19:24:10Z

Superseded by #131

rm duplicated execute_pipeline

7cf2ef4

AmandaBirmingham reviewed Apr 3, 2025

View reviewed changes

antgonza added 2 commits April 3, 2025 18:14

adding missing methods

49bdc7b

add self

720318a

antgonza mentioned this pull request May 5, 2025

test full runs in WorkflowFactory.py #131

Merged

antgonza closed this May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rm duplicated execute_pipeline #104

rm duplicated execute_pipeline #104

Uh oh!

antgonza commented Apr 3, 2025

Uh oh!

AmandaBirmingham left a comment

Uh oh!

AmandaBirmingham Apr 3, 2025

Uh oh!

AmandaBirmingham Apr 3, 2025

Uh oh!

AmandaBirmingham Apr 3, 2025

Uh oh!

AmandaBirmingham Apr 3, 2025

Uh oh!

AmandaBirmingham Apr 3, 2025

Uh oh!

antgonza commented May 5, 2025

Uh oh!

Uh oh!

	def execute_pipeline(self):
	'''
	Executes steps of pipeline in proper sequence.
	:return: None
	'''
	self.pre_check()

	self.generate_special_map()

	self.update_status("Converting data", 1, 9)

	self.convert_raw_to_fastq()

	self.integrate_results()

	self.generate_sequence_counts()

	self.update_status("Performing quality control", 2, 9)
	self.quality_control()

	self.update_status("Generating reports", 3, 9)
	self.generate_reports()

	self.update_status("Generating preps", 4, 9)
	self.generate_prep_file()

	# moved final component of genprepfilejob outside of object.
	# obtain the paths to the prep-files generated by GenPrepFileJob
	# w/out having to recover full state.
	tmp = join(self.pipeline.output_path, 'GenPrepFileJob', 'PrepFiles')

	self.has_replicates = False

	prep_paths = []
	self.prep_file_paths = {}

	rematch = re.compile(
	r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)"
	r"(?P<qid>[0-9]{5,6})\..\.tsv")

	for root, dirs, files in walk(tmp):
	for _file in files:
	# breakup the prep-info-file into segments
	# (run-id, project_qid, other) and cleave
	# the qiita-id from the project_name.
	rer = rematch.match(_file)
	if rer is None:
	continue

	_, _, qid = rer.groups()

	if qid not in self.prep_file_paths:
	self.prep_file_paths[qid] = []

	_path = abspath(join(root, _file))
	if _path.endswith('.tsv'):
	prep_paths.append(_path)
	self.prep_file_paths[qid].append(_path)

	for _dir in dirs:
	if _dir == '1':
	# if PrepFiles contains the '1' directory, then it's a
	# given that this sample-sheet contains replicates.
	self.has_replicates = True

	# currently imported from Assay although it is a base method. it
	# could be imported into Workflows potentially, since it is a post-
	# processing step. All pairings of assay and instrument type need to
	# generate prep-info files in the same format.
	self.overwrite_prep_files(prep_paths)

	# for now, simply re-run any line below as if it was a new job, even
	# for a restart. functionality is idempotent, except for the
	# registration of new preps in Qiita. These will simply be removed
	# manually.

	# post-processing steps are by default associated with the Workflow
	# class, since they deal with fastq files and Qiita, and don't depend
	# on assay or instrument type.
	self.update_status("Generating sample information", 5, 9)
	self.sifs = self.generate_sifs()

	# post-processing step.
	self.update_status("Registering blanks in Qiita", 6, 9)
	if self.update:
	self.update_blanks_in_qiita()

	self.update_status("Loading preps into Qiita", 7, 9)
	if self.update:
	self.update_prep_templates()

	# before we load preps into Qiita we need to copy the fastq
	# files n times for n preps and correct the file-paths each
	# prep is pointing to.
	self.load_preps_into_qiita()

	self.fsr.generate_report()

	self.update_status("Generating packaging commands", 8, 9)
	self.generate_commands()

	self.update_status("Packaging results", 9, 9)
	if self.update:
	self.execute_commands()

rm duplicated execute_pipeline #104

rm duplicated execute_pipeline #104

Uh oh!

Conversation

antgonza commented Apr 3, 2025

Uh oh!

AmandaBirmingham left a comment

Choose a reason for hiding this comment

Uh oh!

AmandaBirmingham Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

AmandaBirmingham Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

AmandaBirmingham Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

AmandaBirmingham Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

AmandaBirmingham Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

antgonza commented May 5, 2025

Uh oh!

Uh oh!