@@ -6,7 +6,7 @@ Read lines from a gzip-compressed file
6
6
7
7
The following snippet is an example of using CodecZlib.jl, which exports
8
8
` GzipDecompressorStream{S} ` as an alias of
9
- ` TranscodingStream{GzipDecompressor,S} where S<: IO ` :
9
+ ` TranscodingStream{GzipDecompressor,S} ` , where ` S ` is a subtype of ` IO ` :
10
10
``` julia
11
11
using CodecZlib
12
12
stream = GzipDecompressorStream (open (" data.txt.gz" ))
16
16
close (stream)
17
17
```
18
18
19
- Note that the last ` close ` call will close the file as well. Alternatively,
20
- ` open(<stream type>, <filepath>) do ... end ` syntax will close the file at the
21
- end:
19
+ Note that the last ` close ` call closes the wrapped file as well.
20
+ Alternatively, ` open(<stream type>, <filepath>) do ... end ` syntax closes the
21
+ file at the end:
22
22
``` julia
23
23
using CodecZlib
24
24
open (GzipDecompressorStream, " data.txt.gz" ) do stream
@@ -32,11 +32,11 @@ Read compressed data from a pipe
32
32
--------------------------------
33
33
34
34
The input is not limited to usual files. You can read data from a pipe
35
- (actually, any ` IO ` object that implements basic I/O methods) as follows:
35
+ (actually, any ` IO ` object that implements standard I/O methods) as follows:
36
36
``` julia
37
37
using CodecZlib
38
- pipe, proc = open (` cat some.data.gz` )
39
- stream = GzipDecompressorStream (pipe )
38
+ proc = open (` cat some.data.gz` )
39
+ stream = GzipDecompressorStream (proc )
40
40
for line in eachline (stream)
41
41
# do something...
42
42
end
@@ -50,15 +50,17 @@ Writing compressed data is easy. One thing you need to keep in mind is to call
50
50
` close ` after writing data; otherwise, the output file will be incomplete:
51
51
``` julia
52
52
using CodecZstd
53
+ using DelimitedFiles
53
54
mat = randn (100 , 100 )
54
55
stream = ZstdCompressorStream (open (" data.mat.zst" , " w" ))
55
56
writedlm (stream, mat)
56
57
close (stream)
57
58
```
58
59
59
- Of course, ` open(<stream type>, ...) do ... end ` works well :
60
+ Of course, ` open(<stream type>, ...) do ... end ` just works :
60
61
``` julia
61
62
using CodecZstd
63
+ using DelimitedFiles
62
64
mat = randn (100 , 100 )
63
65
open (ZstdCompressorStream, " data.mat.zst" , " w" ) do stream
64
66
writedlm (stream, mat)
@@ -69,10 +71,11 @@ Explicitly finish transcoding by writing `TOKEN_END`
69
71
----------------------------------------------------
70
72
71
73
When writing data, the end of a data stream is indicated by calling ` close ` ,
72
- which may write an epilogue if necessary and flush all buffered data to the
73
- underlying I/O stream. If you want to explicitly specify the end position of a
74
- stream for some reason, you can write ` TranscodingStreams.TOKEN_END ` to the
75
- transcoding stream as follows:
74
+ which writes an epilogue if necessary and flushes all buffered data to the
75
+ underlying I/O stream. If you want to explicitly specify the end of a data
76
+ chunk for some reason, you can write ` TranscodingStreams.TOKEN_END ` to the
77
+ transcoding stream, which finishes the current transcoding process without
78
+ closing the underlying stream:
76
79
``` julia
77
80
using CodecZstd
78
81
using TranscodingStreams
@@ -87,34 +90,35 @@ close(stream)
87
90
Use a noop codec
88
91
----------------
89
92
90
- Sometimes, the ` Noop ` codec, which does nothing, may be useful. The following
91
- example creates a decompressor stream based on the extension of a filepath:
93
+ The ` Noop ` codec does nothing (i.e., buffering data without transformation).
94
+ ` NoopStream ` is an alias of ` TranscodingStream{Noop} ` . The following example
95
+ creates a decompressor stream based on the extension of a filepath:
92
96
``` julia
93
97
using CodecZlib
94
- using CodecBzip2
98
+ using CodecXz
95
99
using TranscodingStreams
96
100
97
101
function makestream (filepath)
98
102
if endswith (filepath, " .gz" )
99
103
codec = GzipDecompressor ()
100
- elseif endswith (filepath, " .bz2 " )
101
- codec = Bzip2Decompressor ()
104
+ elseif endswith (filepath, " .xz " )
105
+ codec = XzDecompressor ()
102
106
else
103
107
codec = Noop ()
104
108
end
105
109
return TranscodingStream (codec, open (filepath))
106
110
end
107
111
108
112
makestream (" data.txt.gz" )
109
- makestream (" data.txt.bz2 " )
113
+ makestream (" data.txt.xz " )
110
114
makestream (" data.txt" )
111
115
```
112
116
113
117
Change the codec of a file
114
118
--------------------------
115
119
116
120
` TranscodingStream ` s are composable: a stream can be an input/output of another
117
- stream. You can use this to chage the codec of a file by composing different
121
+ stream. You can use this to change the format of a file by composing different
118
122
codecs as below:
119
123
``` julia
120
124
using CodecZlib
@@ -135,11 +139,13 @@ Effectively, this is equivalent to the following pipeline:
135
139
Stop decoding on the end of a block
136
140
-----------------------------------
137
141
138
- Most codecs support decoding concatenated data blocks. For example, if you
139
- concatenate two gzip files into a file and read it using
140
- ` GzipDecompressorStream ` , you will see the byte stream of concatenation of two
141
- files. If you need the first part of the file, you can set ` stop_on_end ` to
142
- ` true ` to stop transcoding at the end of the first block:
142
+ Many codecs support decoding concatenated data blocks (or chunks). For example,
143
+ if you concatenate two gzip files into a single file and read it using
144
+ ` GzipDecompressorStream ` , you will see the byte stream of concatenation of the
145
+ two files. If you need the part corresponding the first file, you can set
146
+ ` stop_on_end ` to ` true ` to stop transcoding at the end of the first block.
147
+ Note that setting ` stop_on_end ` to ` true ` does not close the wrapped stream
148
+ because you will often want to reuse it.
143
149
``` julia
144
150
using CodecZlib
145
151
# cat foo.txt.gz bar.txt.gz > foobar.txt.gz
@@ -150,8 +156,8 @@ eof(stream) #> true
150
156
151
157
In the case where you need to reuse the wrapped stream, the code above must be
152
158
slightly modified because the transcoding stream may read more bytes than
153
- necessary from the wrapped stream. By wrapping a stream with ` NoopStream ` , the
154
- problem of overreading is resolved:
159
+ necessary from the wrapped stream. Wrapping the stream with ` NoopStream ` solves
160
+ the problem because adjacent transcoding streams share the same buffer.
155
161
``` julia
156
162
using CodecZlib
157
163
using TranscodingStreams
@@ -170,9 +176,9 @@ error:
170
176
using CodecZlib
171
177
172
178
function decompress (input, output)
173
- buffer = Vector {UInt8} (16 * 1024 )
179
+ buffer = Vector {UInt8} (undef, 16 * 1024 )
174
180
while ! eof (input)
175
- n = min (nb_available (input), length (buffer))
181
+ n = min (bytesavailable (input), length (buffer))
176
182
unsafe_read (input, pointer (buffer), n)
177
183
unsafe_write (output, pointer (buffer), n)
178
184
stats = TranscodingStreams. stats (input)
@@ -207,11 +213,17 @@ Transcode lots of strings
207
213
` transcode(<codec type>, data) ` method is convenient but suboptimal when
208
214
transcoding a number of objects. This is because the method reallocates a new
209
215
codec object for every call. Instead, you can use `transcode(<codec object >,
210
- data)` method that reuses the allocated object as follows:
216
+ data)` method that reuses the allocated object as follows. In this usage, you
217
+ need to explicitly allocate and free resources by calling
218
+ ` TranscodingStreams.initialize ` and ` TranscodingStreams.finalize ` ,
219
+ respectively.
220
+
211
221
``` julia
212
222
using CodecZstd
223
+ using TranscodingStreams
213
224
strings = [" foo" , " bar" , " baz" ]
214
225
codec = ZstdCompressor ()
226
+ TranscodingStreams. initialize (codec) # allocate resources
215
227
try
216
228
for s in strings
217
229
data = transcode (codec, s)
220
232
catch
221
233
rethrow ()
222
234
finally
223
- CodecZstd . TranscodingStreams. finalize (codec)
235
+ TranscodingStreams. finalize (codec) # free resources
224
236
end
225
237
```
226
238
@@ -240,9 +252,9 @@ data2 = read(stream, 8)
240
252
@assert data1 == data2
241
253
```
242
254
243
- The unread operaion is different from the write operation in that the unreaded
255
+ The unread operation is different from the write operation in that the unreaded
244
256
data are not written to the wrapped stream. The unreaded data are stored in the
245
257
internal buffer of a transcoding stream.
246
258
247
259
Unfortunately, * unwrite* operation is not provided because there is no way to
248
- cancel write operations that are already commited to the wrapped stream.
260
+ cancel write operations that are already committed to the wrapped stream.
0 commit comments