Skip to content

fixes after deploy 03.25 #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 10, 2025

Conversation

antgonza
Copy link
Member

@antgonza antgonza commented Mar 6, 2025

No description provided.

@antgonza antgonza changed the title fixes after deploy 03.25 WIP: fixes after deploy 03.25 Mar 6, 2025
qp_klp/Assays.py Outdated
@@ -540,6 +567,8 @@ def execute_pipeline(self):
# (run-id, project_qid, other) and cleave
# the qiita-id from the project_name.
qid = _file.split('.')[1].split('_')[-1]
if not qid.isnumeric():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the structure of the filename sufficiently to express this as a regex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about regex and if adding that will actually help. Anyway, for clarity, as the comment says the file name is: run-id, project_qid, other - for example: 20231009_FS10001773_73_BTR67718-2515.U19_ADRC_14736.1.tsv so we could do something like '{self.pipeline.run_id}.{project_qid}.{extra}' where we are looking for the digits after '_', in this case '14736'. Does this help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be this, right? Which would allow for matching 5 or 6 digit study IDs to future proof for a while?

>>> x = "20231009_FS10001773_73_BTR67718-2515.U19_ADRC_14736.1.tsv"
>>> re.compile(r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)(?P<qid>[0-9]{5,6})\..\.tsv").match(x).groups()
('20231009_FS10001773_73_BTR67718-2515', 'U19_ADRC_', '14736')
>>>

If we know the pattern, then it seems reasonable to formally describe the pattern as presumably anything which does not fit is erroneous. If we do not know the pattern, then it suggests we should enforce control upstream to minimize inference and favor explicit behavior

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable but before making the change, how do you think we should handle errors, like in this example:

>>> x = 'XX'
>>> re.compile(r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)(?P<qid>[0-9]{5,6})\..\.tsv").match(x).groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'

just a try/except-pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just store the result of match in a variable and test whether it is None?

@antgonza antgonza changed the title WIP: fixes after deploy 03.25 fixes after deploy 03.25 Mar 10, 2025
@antgonza
Copy link
Member Author

Thank you for the review @wasade !

@antgonza antgonza merged commit 2d0f041 into qiita-spots:main Mar 10, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants