@@ -6,7 +6,7 @@ Read lines from a gzip-compressed file
66
77The following snippet is an example of using CodecZlib.jl, which exports
88` GzipDecompressorStream{S} ` as an alias of
9- ` TranscodingStream{GzipDecompressor,S} where S<: IO ` :
9+ ` TranscodingStream{GzipDecompressor,S} ` , where ` S ` is a subtype of ` IO ` :
1010``` julia
1111using CodecZlib
1212stream = GzipDecompressorStream (open (" data.txt.gz" ))
1616close (stream)
1717```
1818
19- Note that the last ` close ` call will close the file as well. Alternatively,
20- ` open(<stream type>, <filepath>) do ... end ` syntax will close the file at the
21- end:
19+ Note that the last ` close ` call closes the wrapped file as well.
20+ Alternatively, ` open(<stream type>, <filepath>) do ... end ` syntax closes the
21+ file at the end:
2222``` julia
2323using CodecZlib
2424open (GzipDecompressorStream, " data.txt.gz" ) do stream
@@ -32,11 +32,11 @@ Read compressed data from a pipe
3232--------------------------------
3333
3434The input is not limited to usual files. You can read data from a pipe
35- (actually, any ` IO ` object that implements basic I/O methods) as follows:
35+ (actually, any ` IO ` object that implements standard I/O methods) as follows:
3636``` julia
3737using CodecZlib
38- pipe, proc = open (` cat some.data.gz` )
39- stream = GzipDecompressorStream (pipe )
38+ proc = open (` cat some.data.gz` )
39+ stream = GzipDecompressorStream (proc )
4040for line in eachline (stream)
4141 # do something...
4242end
@@ -50,15 +50,17 @@ Writing compressed data is easy. One thing you need to keep in mind is to call
5050` close ` after writing data; otherwise, the output file will be incomplete:
5151``` julia
5252using CodecZstd
53+ using DelimitedFiles
5354mat = randn (100 , 100 )
5455stream = ZstdCompressorStream (open (" data.mat.zst" , " w" ))
5556writedlm (stream, mat)
5657close (stream)
5758```
5859
59- Of course, ` open(<stream type>, ...) do ... end ` works well :
60+ Of course, ` open(<stream type>, ...) do ... end ` just works :
6061``` julia
6162using CodecZstd
63+ using DelimitedFiles
6264mat = randn (100 , 100 )
6365open (ZstdCompressorStream, " data.mat.zst" , " w" ) do stream
6466 writedlm (stream, mat)
@@ -69,10 +71,11 @@ Explicitly finish transcoding by writing `TOKEN_END`
6971----------------------------------------------------
7072
7173When writing data, the end of a data stream is indicated by calling ` close ` ,
72- which may write an epilogue if necessary and flush all buffered data to the
73- underlying I/O stream. If you want to explicitly specify the end position of a
74- stream for some reason, you can write ` TranscodingStreams.TOKEN_END ` to the
75- transcoding stream as follows:
74+ which writes an epilogue if necessary and flushes all buffered data to the
75+ underlying I/O stream. If you want to explicitly specify the end of a data
76+ chunk for some reason, you can write ` TranscodingStreams.TOKEN_END ` to the
77+ transcoding stream, which finishes the current transcoding process without
78+ closing the underlying stream:
7679``` julia
7780using CodecZstd
7881using TranscodingStreams
@@ -87,34 +90,35 @@ close(stream)
8790Use a noop codec
8891----------------
8992
90- Sometimes, the ` Noop ` codec, which does nothing, may be useful. The following
91- example creates a decompressor stream based on the extension of a filepath:
93+ The ` Noop ` codec does nothing (i.e., buffering data without transformation).
94+ ` NoopStream ` is an alias of ` TranscodingStream{Noop} ` . The following example
95+ creates a decompressor stream based on the extension of a filepath:
9296``` julia
9397using CodecZlib
94- using CodecBzip2
98+ using CodecXz
9599using TranscodingStreams
96100
97101function makestream (filepath)
98102 if endswith (filepath, " .gz" )
99103 codec = GzipDecompressor ()
100- elseif endswith (filepath, " .bz2 " )
101- codec = Bzip2Decompressor ()
104+ elseif endswith (filepath, " .xz " )
105+ codec = XzDecompressor ()
102106 else
103107 codec = Noop ()
104108 end
105109 return TranscodingStream (codec, open (filepath))
106110end
107111
108112makestream (" data.txt.gz" )
109- makestream (" data.txt.bz2 " )
113+ makestream (" data.txt.xz " )
110114makestream (" data.txt" )
111115```
112116
113117Change the codec of a file
114118--------------------------
115119
116120` TranscodingStream ` s are composable: a stream can be an input/output of another
117- stream. You can use this to chage the codec of a file by composing different
121+ stream. You can use this to change the format of a file by composing different
118122codecs as below:
119123``` julia
120124using CodecZlib
@@ -135,11 +139,13 @@ Effectively, this is equivalent to the following pipeline:
135139Stop decoding on the end of a block
136140-----------------------------------
137141
138- Most codecs support decoding concatenated data blocks. For example, if you
139- concatenate two gzip files into a file and read it using
140- ` GzipDecompressorStream ` , you will see the byte stream of concatenation of two
141- files. If you need the first part of the file, you can set ` stop_on_end ` to
142- ` true ` to stop transcoding at the end of the first block:
142+ Many codecs support decoding concatenated data blocks (or chunks). For example,
143+ if you concatenate two gzip files into a single file and read it using
144+ ` GzipDecompressorStream ` , you will see the byte stream of concatenation of the
145+ two files. If you need the part corresponding the first file, you can set
146+ ` stop_on_end ` to ` true ` to stop transcoding at the end of the first block.
147+ Note that setting ` stop_on_end ` to ` true ` does not close the wrapped stream
148+ because you will often want to reuse it.
143149``` julia
144150using CodecZlib
145151# cat foo.txt.gz bar.txt.gz > foobar.txt.gz
@@ -150,8 +156,8 @@ eof(stream) #> true
150156
151157In the case where you need to reuse the wrapped stream, the code above must be
152158slightly modified because the transcoding stream may read more bytes than
153- necessary from the wrapped stream. By wrapping a stream with ` NoopStream ` , the
154- problem of overreading is resolved:
159+ necessary from the wrapped stream. Wrapping the stream with ` NoopStream ` solves
160+ the problem because adjacent transcoding streams share the same buffer.
155161``` julia
156162using CodecZlib
157163using TranscodingStreams
@@ -170,9 +176,9 @@ error:
170176using CodecZlib
171177
172178function decompress (input, output)
173- buffer = Vector {UInt8} (16 * 1024 )
179+ buffer = Vector {UInt8} (undef, 16 * 1024 )
174180 while ! eof (input)
175- n = min (nb_available (input), length (buffer))
181+ n = min (bytesavailable (input), length (buffer))
176182 unsafe_read (input, pointer (buffer), n)
177183 unsafe_write (output, pointer (buffer), n)
178184 stats = TranscodingStreams. stats (input)
@@ -207,11 +213,17 @@ Transcode lots of strings
207213` transcode(<codec type>, data) ` method is convenient but suboptimal when
208214transcoding a number of objects. This is because the method reallocates a new
209215codec object for every call. Instead, you can use `transcode(<codec object >,
210- data)` method that reuses the allocated object as follows:
216+ data)` method that reuses the allocated object as follows. In this usage, you
217+ need to explicitly allocate and free resources by calling
218+ ` TranscodingStreams.initialize ` and ` TranscodingStreams.finalize ` ,
219+ respectively.
220+
211221``` julia
212222using CodecZstd
223+ using TranscodingStreams
213224strings = [" foo" , " bar" , " baz" ]
214225codec = ZstdCompressor ()
226+ TranscodingStreams. initialize (codec) # allocate resources
215227try
216228 for s in strings
217229 data = transcode (codec, s)
220232catch
221233 rethrow ()
222234finally
223- CodecZstd . TranscodingStreams. finalize (codec)
235+ TranscodingStreams. finalize (codec) # free resources
224236end
225237```
226238
@@ -240,9 +252,9 @@ data2 = read(stream, 8)
240252@assert data1 == data2
241253```
242254
243- The unread operaion is different from the write operation in that the unreaded
255+ The unread operation is different from the write operation in that the unreaded
244256data are not written to the wrapped stream. The unreaded data are stored in the
245257internal buffer of a transcoding stream.
246258
247259Unfortunately, * unwrite* operation is not provided because there is no way to
248- cancel write operations that are already commited to the wrapped stream.
260+ cancel write operations that are already committed to the wrapped stream.
0 commit comments