-
Notifications
You must be signed in to change notification settings - Fork 4
fixes after deploy 03.25 #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
qp_klp/Assays.py
Outdated
@@ -540,6 +567,8 @@ def execute_pipeline(self): | |||
# (run-id, project_qid, other) and cleave | |||
# the qiita-id from the project_name. | |||
qid = _file.split('.')[1].split('_')[-1] | |||
if not qid.isnumeric(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know the structure of the filename sufficiently to express this as a regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about regex and if adding that will actually help. Anyway, for clarity, as the comment says the file name is: run-id, project_qid, other - for example: 20231009_FS10001773_73_BTR67718-2515.U19_ADRC_14736.1.tsv so we could do something like '{self.pipeline.run_id}.{project_qid}.{extra}' where we are looking for the digits after '_', in this case '14736'. Does this help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be this, right? Which would allow for matching 5 or 6 digit study IDs to future proof for a while?
>>> x = "20231009_FS10001773_73_BTR67718-2515.U19_ADRC_14736.1.tsv"
>>> re.compile(r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)(?P<qid>[0-9]{5,6})\..\.tsv").match(x).groups()
('20231009_FS10001773_73_BTR67718-2515', 'U19_ADRC_', '14736')
>>>
If we know the pattern, then it seems reasonable to formally describe the pattern as presumably anything which does not fit is erroneous. If we do not know the pattern, then it suggests we should enforce control upstream to minimize inference and favor explicit behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable but before making the change, how do you think we should handle errors, like in this example:
>>> x = 'XX'
>>> re.compile(r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)(?P<qid>[0-9]{5,6})\..\.tsv").match(x).groups()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'
just a try/except-pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just store the result of match
in a variable and test whether it is None
?
Thank you for the review @wasade ! |
No description provided.