-
Notifications
You must be signed in to change notification settings - Fork 177
multi-chunk gzip files are truncated at end of first block #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here's a small example file with multiple chunks: small.fastq.gz |
sorry, dup of #41. closing. |
Ah yeah this is the same as #41, but it's common enough that it seems prudent to add something to process these sorts of streams to the library directly. I wouldn't mind adding a helper alongside the existing ones! |
What would this helper look like? Apparently it's not #43, because that's been sitting there for quite a while. |
@veldsla oh my I'm so sorry! That must have fallen out of my inbox by accident, I had no idea that PR was open! I'll review promptly |
'block' gzip files (see page 10 here: http://samtools.github.io/hts-specs/SAMv1.pdf), are commonly used in bioinformatics. They are valid gzip files, but use the somewhat rare scheme of storing the data in many consecutive gzip blocks. System gunzip or zcat will yield all the data in all blocks, however flate2:GzDecoder will only yield data from the first block. I believe that correct behavior is to continue reading the steam to find another block until EOF.
The text was updated successfully, but these errors were encountered: