-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformers should include concepts that fail to normalize #417
Comments
MOA does not have internal identifiers for therapy, genes, or diseases. In cases where normalization succeeds, we use f"moa.{norm_resp.{therapy|disease|gene}.id}" (examples: I'm not sure what we want the def _sanitize_name(name: str) -> str:
"""Trim leading and trailing whitespace and replace whitespace characters with
underscores
:param name: Name to sanitize
:return: Sanitized string with whitespace characters replaced by underscores
"""
return re.sub(r"\s+", "_", name.strip())
id_ = f"moa.{therapy|disease|gene}:{_sanitize_name(name)}" @ahwagner is this okay? Would you like something different? We require an |
Seems like a reasonable approach to me. 👍 |
…427) close #417 * This adds support for transforming concepts (such as gene, variant, disease, and therapy) that fail to normalize. * Since this work is mainly needed for ClinVar Submission work, this is *only* focused on adding support in CDM files generated by the Transformers. The database will NOT load statements that fail to normalize concepts, and therefore the QueryHandler is unable to search on these non-normalizable concepts. * Concepts that fail to normalize will have the following `Extension`: `{"name": "vicc_normalizer_failure", "value": True}` * Since MOA does not have record IDs for some objects, the IDs used in these cases will be `f"moa.{therapy|disease|gene}:{name}"`, where name is the MOA label for the concept where whitespace characters are replaced with underscores.
Closed by #427. |
…427) close #417 * This adds support for transforming concepts (such as gene, variant, disease, and therapy) that fail to normalize. * Since this work is mainly needed for ClinVar Submission work, this is *only* focused on adding support in CDM files generated by the Transformers. The database will NOT load statements that fail to normalize concepts, and therefore the QueryHandler is unable to search on these non-normalizable concepts. * Concepts that fail to normalize will have the following `Extension`: `{"name": "vicc_normalizer_failure", "value": True}` * Since MOA does not have record IDs for some objects, the IDs used in these cases will be `f"moa.{therapy|disease|gene}:{name}"`, where name is the MOA label for the concept where whitespace characters are replaced with underscores.
Currently, we only include concepts that succeed in VICC normalization. However, in the CDMs we also want to be able to include concepts that fail to normalize. For these cases, we'll simply add an extension (
{"name": "vicc_normalizer_failure", "value": True}
).I'm not really sure how we want to handle this in /search... @mcannon068nw may have some guidance. We can create a separate issue for this. For now, we'll skip loading concepts in the DB that have this extension.
The text was updated successfully, but these errors were encountered: