-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add batch search by variations #361
Conversation
variations: list[str] | None = Query( # noqa: B008 | ||
None, description=_batch_search_studies_descr["arg_variations"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we can't do below?
variations: list[str] | None = Query( # noqa: B008 | |
None, description=_batch_search_studies_descr["arg_variations"] | |
variations: list[str] = Query( # noqa: B008 | |
[], description=_batch_search_studies_descr["arg_variations"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the FastAPI Query method shield from the usual mutable defaults problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, apparently this is the "old way", anyway
Because this method could be expanded to include other kinds of search terms, | ||
``variations`` is optionally nullable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. Can we have it default to empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely can't use mutable defaults here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if it's documented well that variations
will never be mutated then we could but may not follow best practices. We can leave as is
Co-authored-by: Kori Kuzma <[email protected]>
* PR suggestions were applied, but forgot to update tests
…368) * PR suggestions were applied, but forgot to update tests
close #290
Current use case is variations only (e.g. in the context of a VCF), but the objects/API are structured in such a way to (relatively) smoothly accommodate other kinds of terms*.
Very naively implemented. Tune-ups in #311 should include fetching everything in a single query rather than iteratively getting every study once study IDs are acquired. I also wonder if there's anything else we can do cache-wise to optimize for those cases where a user is performing repeated lookups on things that are probably pretty closely located in the genome, such as VCFs (seqrepo already uses a big LRU cache for sequence lookups so that might be all that's necessary)
The ability to submit multiple terms for the same kind of entity raises questions about transparent management of redundancy, failed lookups, etc. I added an extra field to supply the normalized ID, if available, for each term, so that the client can tell if terms are normalizing to the same ID or if they fail to normalize at all:
This also makes it possible to understand which studies correspond to each search term.
get_search_studies
--response.studies
is just a list of studies. If you wanted to figure out which ones came from a given search term, you could take the normalized ID fromresponse.queries
and then filter throughresponse.studies
based on the value instudy.variant.definingContext.id
(for ProteinSequenceConsequences, or something else for CategoricalVariations). Alternatively, you could groupresponse.studies
into a dict where the key is the normalized ID and the value is the list of studies for that ID. However, that becomes VERY messy if you do want to support searching by multiple entity types (what would the key be, a concatenated string or something?).