multi-chunk gzip files are truncated at end of first block #61

pmarks · 2017-02-01T22:19:13Z

'block' gzip files (see page 10 here: http://samtools.github.io/hts-specs/SAMv1.pdf), are commonly used in bioinformatics. They are valid gzip files, but use the somewhat rare scheme of storing the data in many consecutive gzip blocks. System gunzip or zcat will yield all the data in all blocks, however flate2:GzDecoder will only yield data from the first block. I believe that correct behavior is to continue reading the steam to find another block until EOF.

pmarks · 2017-02-01T22:24:13Z

Here's a small example file with multiple chunks: small.fastq.gz

pmarks · 2017-02-01T22:46:55Z

sorry, dup of #41. closing.

alexcrichton · 2017-02-02T01:21:38Z

Ah yeah this is the same as #41, but it's common enough that it seems prudent to add something to process these sorts of streams to the library directly. I wouldn't mind adding a helper alongside the existing ones!

veldsla · 2017-02-02T08:20:22Z

What would this helper look like? Apparently it's not #43, because that's been sitting there for quite a while.

alexcrichton · 2017-02-02T16:58:31Z

@veldsla oh my I'm so sorry! That must have fallen out of my inbox by accident, I had no idea that PR was open! I'll review promptly

pmarks closed this as completed Feb 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-chunk gzip files are truncated at end of first block #61

multi-chunk gzip files are truncated at end of first block #61

pmarks commented Feb 1, 2017

pmarks commented Feb 1, 2017

pmarks commented Feb 1, 2017

alexcrichton commented Feb 2, 2017

veldsla commented Feb 2, 2017

alexcrichton commented Feb 2, 2017

multi-chunk gzip files are truncated at end of first block #61

multi-chunk gzip files are truncated at end of first block #61

Comments

pmarks commented Feb 1, 2017

pmarks commented Feb 1, 2017

pmarks commented Feb 1, 2017

alexcrichton commented Feb 2, 2017

veldsla commented Feb 2, 2017

alexcrichton commented Feb 2, 2017