Skip to content

multi-chunk gzip files are truncated at end of first block #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmarks opened this issue Feb 1, 2017 · 5 comments
Closed

multi-chunk gzip files are truncated at end of first block #61

pmarks opened this issue Feb 1, 2017 · 5 comments

Comments

@pmarks
Copy link

pmarks commented Feb 1, 2017

'block' gzip files (see page 10 here: http://samtools.github.io/hts-specs/SAMv1.pdf), are commonly used in bioinformatics. They are valid gzip files, but use the somewhat rare scheme of storing the data in many consecutive gzip blocks. System gunzip or zcat will yield all the data in all blocks, however flate2:GzDecoder will only yield data from the first block. I believe that correct behavior is to continue reading the steam to find another block until EOF.

@pmarks
Copy link
Author

pmarks commented Feb 1, 2017

Here's a small example file with multiple chunks: small.fastq.gz

@pmarks
Copy link
Author

pmarks commented Feb 1, 2017

sorry, dup of #41. closing.

@pmarks pmarks closed this as completed Feb 1, 2017
@alexcrichton
Copy link
Member

Ah yeah this is the same as #41, but it's common enough that it seems prudent to add something to process these sorts of streams to the library directly. I wouldn't mind adding a helper alongside the existing ones!

@veldsla
Copy link
Contributor

veldsla commented Feb 2, 2017

What would this helper look like? Apparently it's not #43, because that's been sitting there for quite a while.

@alexcrichton
Copy link
Member

@veldsla oh my I'm so sorry! That must have fallen out of my inbox by accident, I had no idea that PR was open! I'll review promptly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants