Skip to content

Add AlignedSegment.query_qualities_str and fix sequence and qualities caching #1341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jmarshall
Copy link
Member

On PR #1324 @nh13 requested being able to set AlignedSegment.query_qualities via a SAM/FASTQ-style base quality string. This expands that to be able to retrieve QUAL as a string too.

Add a query_qualities_str property implemented directly against the underlying bam1_t data structure rather than translating to an array first via qualitystring_to_array()/array_to_qualitystring(). Recode query_qualities to work more obviously directly against the bam1_t too, and fix several bugs in its caching layer. (See #121 for why this caching is important.) Enable setting query_qualities to use a string too, by delegating to the new property.

Fix some similar caching bugs in query_sequence too.

jmarshall added 8 commits May 3, 2025 21:27
These deprecated properties forward to corresponding query_alignment_XYZ
properties which are all read-only as they are derived from query_XYZ.
Rewrite query_qualities directly, and have query_alignment_qualities
trim that's final value rather than computing its own similarly.

Clear caches in __set__ et al and populate them only in __get__ routines.
Previously the input value was cached rather than the canonical one
reconverted from the raw data, and setting to None did not clear the
previously cached value.
The existing query_qualities property provides QUAL as a Python array
(and in fact has long allowed it to be set from most iterables other
than strings and tuples). This new query_qualities_str property enables
QUAL to be accessed as the usual ASCII-encoded base quality string (and
now query_qualities also allows it to be set from such a string).

Also add (read-only) query_alignment_qualities_str paralleling
query_alignment_qualities similarly.
Updating query_sequence did not invalidate cache_query_alignment_sequence
previously. Clear caches in __set__ et al and populate them only in
__get__ routines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant