Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformers should include concepts that fail to normalize #417

Closed
korikuzma opened this issue Jan 20, 2025 · 3 comments
Closed

Transformers should include concepts that fail to normalize #417

korikuzma opened this issue Jan 20, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request priority:medium Medium priority

Comments

@korikuzma
Copy link
Member

Currently, we only include concepts that succeed in VICC normalization. However, in the CDMs we also want to be able to include concepts that fail to normalize. For these cases, we'll simply add an extension ({"name": "vicc_normalizer_failure", "value": True}).

I'm not really sure how we want to handle this in /search... @mcannon068nw may have some guidance. We can create a separate issue for this. For now, we'll skip loading concepts in the DB that have this extension.

@korikuzma korikuzma added enhancement New feature or request priority:medium Medium priority labels Jan 20, 2025
@korikuzma korikuzma self-assigned this Jan 20, 2025
@korikuzma
Copy link
Member Author

MOA does not have internal identifiers for therapy, genes, or diseases.

In cases where normalization succeeds, we use

f"moa.{norm_resp.{therapy|disease|gene}.id}"

(examples:
moa.normalize.therapy.rxcui:1727455, moa.normalize.disease.ncit:C2926, moa.normalize.gene.hgnc:427) for the MappableConcept.id.

I'm not sure what we want the id to be for concepts that fail normalization. Currently, I'm just doing:

def _sanitize_name(name: str) -> str:
    """Trim leading and trailing whitespace and replace whitespace characters with
    underscores

    :param name: Name to sanitize
    :return: Sanitized string with whitespace characters replaced by underscores
    """
    return re.sub(r"\s+", "_", name.strip())

id_ = f"moa.{therapy|disease|gene}:{_sanitize_name(name)}"

@ahwagner is this okay? Would you like something different? We require an id for these concepts in the database.

@ahwagner
Copy link
Member

Seems like a reasonable approach to me. 👍

korikuzma added a commit that referenced this issue Feb 21, 2025
…427)

close #417 

* This adds support for transforming concepts (such as gene, variant,
disease, and therapy) that fail to normalize.
* Since this work is mainly needed for ClinVar Submission work, this is
*only* focused on adding support in CDM files generated by the
Transformers. The database will NOT load statements that fail to
normalize concepts, and therefore the QueryHandler is unable to search
on these non-normalizable concepts.
* Concepts that fail to normalize will have the following `Extension`:
`{"name": "vicc_normalizer_failure", "value": True}`
* Since MOA does not have record IDs for some objects, the IDs used in
these cases will be `f"moa.{therapy|disease|gene}:{name}"`, where name
is the MOA label for the concept where whitespace characters are
replaced with underscores.
Copy link

Closed by #427.

korikuzma added a commit that referenced this issue Mar 9, 2025
…427)

close #417 

* This adds support for transforming concepts (such as gene, variant,
disease, and therapy) that fail to normalize.
* Since this work is mainly needed for ClinVar Submission work, this is
*only* focused on adding support in CDM files generated by the
Transformers. The database will NOT load statements that fail to
normalize concepts, and therefore the QueryHandler is unable to search
on these non-normalizable concepts.
* Concepts that fail to normalize will have the following `Extension`:
`{"name": "vicc_normalizer_failure", "value": True}`
* Since MOA does not have record IDs for some objects, the IDs used in
these cases will be `f"moa.{therapy|disease|gene}:{name}"`, where name
is the MOA label for the concept where whitespace characters are
replaced with underscores.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants