Skip to content

Commit b87d348

Browse files
authored
update docs (#61)
1 parent 324c990 commit b87d348

File tree

8 files changed

+115
-94
lines changed

8 files changed

+115
-94
lines changed

docs/make.jl

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@ makedocs(
55
format=:html,
66
sitename="TranscodingStreams.jl",
77
modules=[TranscodingStreams],
8-
pages=["index.md", "examples.md", "references.md", "devnotes.md"],
8+
pages=["index.md", "examples.md", "reference.md", "devnotes.md"],
99
assets=["assets/custom.css"])
1010

1111
deploydocs(
1212
repo="github.com/bicycle1885/TranscodingStreams.jl.git",
13-
julia="0.6",
13+
julia="0.7",
1414
target="build",
1515
deps=nothing,
1616
make=nothing)

docs/src/assets/custom.css

+1-12
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,3 @@
1-
h1 {
2-
font-size: 2.0em;
3-
}
4-
5-
h2 {
6-
font-size: 1.8em;
7-
margin-top: 40px;
8-
border-bottom: 1px solid #eeeeee;
9-
}
10-
111
table {
12-
width: 125%;
13-
font-size: 13px;
2+
font-size: 0.8em;
143
}

docs/src/devnotes.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Developer's Notes
1+
Developer's notes
22
=================
33

44
These notes are not for end users but rather for developers who are interested

docs/src/examples.md

+44-32
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Read lines from a gzip-compressed file
66

77
The following snippet is an example of using CodecZlib.jl, which exports
88
`GzipDecompressorStream{S}` as an alias of
9-
`TranscodingStream{GzipDecompressor,S} where S<:IO`:
9+
`TranscodingStream{GzipDecompressor,S}`, where `S` is a subtype of `IO`:
1010
```julia
1111
using CodecZlib
1212
stream = GzipDecompressorStream(open("data.txt.gz"))
@@ -16,9 +16,9 @@ end
1616
close(stream)
1717
```
1818

19-
Note that the last `close` call will close the file as well. Alternatively,
20-
`open(<stream type>, <filepath>) do ... end` syntax will close the file at the
21-
end:
19+
Note that the last `close` call closes the wrapped file as well.
20+
Alternatively, `open(<stream type>, <filepath>) do ... end` syntax closes the
21+
file at the end:
2222
```julia
2323
using CodecZlib
2424
open(GzipDecompressorStream, "data.txt.gz") do stream
@@ -32,11 +32,11 @@ Read compressed data from a pipe
3232
--------------------------------
3333

3434
The input is not limited to usual files. You can read data from a pipe
35-
(actually, any `IO` object that implements basic I/O methods) as follows:
35+
(actually, any `IO` object that implements standard I/O methods) as follows:
3636
```julia
3737
using CodecZlib
38-
pipe, proc = open(`cat some.data.gz`)
39-
stream = GzipDecompressorStream(pipe)
38+
proc = open(`cat some.data.gz`)
39+
stream = GzipDecompressorStream(proc)
4040
for line in eachline(stream)
4141
# do something...
4242
end
@@ -50,15 +50,17 @@ Writing compressed data is easy. One thing you need to keep in mind is to call
5050
`close` after writing data; otherwise, the output file will be incomplete:
5151
```julia
5252
using CodecZstd
53+
using DelimitedFiles
5354
mat = randn(100, 100)
5455
stream = ZstdCompressorStream(open("data.mat.zst", "w"))
5556
writedlm(stream, mat)
5657
close(stream)
5758
```
5859

59-
Of course, `open(<stream type>, ...) do ... end` works well:
60+
Of course, `open(<stream type>, ...) do ... end` just works:
6061
```julia
6162
using CodecZstd
63+
using DelimitedFiles
6264
mat = randn(100, 100)
6365
open(ZstdCompressorStream, "data.mat.zst", "w") do stream
6466
writedlm(stream, mat)
@@ -69,10 +71,11 @@ Explicitly finish transcoding by writing `TOKEN_END`
6971
----------------------------------------------------
7072

7173
When writing data, the end of a data stream is indicated by calling `close`,
72-
which may write an epilogue if necessary and flush all buffered data to the
73-
underlying I/O stream. If you want to explicitly specify the end position of a
74-
stream for some reason, you can write `TranscodingStreams.TOKEN_END` to the
75-
transcoding stream as follows:
74+
which writes an epilogue if necessary and flushes all buffered data to the
75+
underlying I/O stream. If you want to explicitly specify the end of a data
76+
chunk for some reason, you can write `TranscodingStreams.TOKEN_END` to the
77+
transcoding stream, which finishes the current transcoding process without
78+
closing the underlying stream:
7679
```julia
7780
using CodecZstd
7881
using TranscodingStreams
@@ -87,34 +90,35 @@ close(stream)
8790
Use a noop codec
8891
----------------
8992

90-
Sometimes, the `Noop` codec, which does nothing, may be useful. The following
91-
example creates a decompressor stream based on the extension of a filepath:
93+
The `Noop` codec does nothing (i.e., buffering data without transformation).
94+
`NoopStream` is an alias of `TranscodingStream{Noop}`. The following example
95+
creates a decompressor stream based on the extension of a filepath:
9296
```julia
9397
using CodecZlib
94-
using CodecBzip2
98+
using CodecXz
9599
using TranscodingStreams
96100

97101
function makestream(filepath)
98102
if endswith(filepath, ".gz")
99103
codec = GzipDecompressor()
100-
elseif endswith(filepath, ".bz2")
101-
codec = Bzip2Decompressor()
104+
elseif endswith(filepath, ".xz")
105+
codec = XzDecompressor()
102106
else
103107
codec = Noop()
104108
end
105109
return TranscodingStream(codec, open(filepath))
106110
end
107111

108112
makestream("data.txt.gz")
109-
makestream("data.txt.bz2")
113+
makestream("data.txt.xz")
110114
makestream("data.txt")
111115
```
112116

113117
Change the codec of a file
114118
--------------------------
115119

116120
`TranscodingStream`s are composable: a stream can be an input/output of another
117-
stream. You can use this to chage the codec of a file by composing different
121+
stream. You can use this to change the format of a file by composing different
118122
codecs as below:
119123
```julia
120124
using CodecZlib
@@ -135,11 +139,13 @@ Effectively, this is equivalent to the following pipeline:
135139
Stop decoding on the end of a block
136140
-----------------------------------
137141

138-
Most codecs support decoding concatenated data blocks. For example, if you
139-
concatenate two gzip files into a file and read it using
140-
`GzipDecompressorStream`, you will see the byte stream of concatenation of two
141-
files. If you need the first part of the file, you can set `stop_on_end` to
142-
`true` to stop transcoding at the end of the first block:
142+
Many codecs support decoding concatenated data blocks (or chunks). For example,
143+
if you concatenate two gzip files into a single file and read it using
144+
`GzipDecompressorStream`, you will see the byte stream of concatenation of the
145+
two files. If you need the part corresponding the first file, you can set
146+
`stop_on_end` to `true` to stop transcoding at the end of the first block.
147+
Note that setting `stop_on_end` to `true` does not close the wrapped stream
148+
because you will often want to reuse it.
143149
```julia
144150
using CodecZlib
145151
# cat foo.txt.gz bar.txt.gz > foobar.txt.gz
@@ -150,8 +156,8 @@ eof(stream) #> true
150156

151157
In the case where you need to reuse the wrapped stream, the code above must be
152158
slightly modified because the transcoding stream may read more bytes than
153-
necessary from the wrapped stream. By wrapping a stream with `NoopStream`, the
154-
problem of overreading is resolved:
159+
necessary from the wrapped stream. Wrapping the stream with `NoopStream` solves
160+
the problem because adjacent transcoding streams share the same buffer.
155161
```julia
156162
using CodecZlib
157163
using TranscodingStreams
@@ -170,9 +176,9 @@ error:
170176
using CodecZlib
171177

172178
function decompress(input, output)
173-
buffer = Vector{UInt8}(16 * 1024)
179+
buffer = Vector{UInt8}(undef, 16 * 1024)
174180
while !eof(input)
175-
n = min(nb_available(input), length(buffer))
181+
n = min(bytesavailable(input), length(buffer))
176182
unsafe_read(input, pointer(buffer), n)
177183
unsafe_write(output, pointer(buffer), n)
178184
stats = TranscodingStreams.stats(input)
@@ -207,11 +213,17 @@ Transcode lots of strings
207213
`transcode(<codec type>, data)` method is convenient but suboptimal when
208214
transcoding a number of objects. This is because the method reallocates a new
209215
codec object for every call. Instead, you can use `transcode(<codec object>,
210-
data)` method that reuses the allocated object as follows:
216+
data)` method that reuses the allocated object as follows. In this usage, you
217+
need to explicitly allocate and free resources by calling
218+
`TranscodingStreams.initialize` and `TranscodingStreams.finalize`,
219+
respectively.
220+
211221
```julia
212222
using CodecZstd
223+
using TranscodingStreams
213224
strings = ["foo", "bar", "baz"]
214225
codec = ZstdCompressor()
226+
TranscodingStreams.initialize(codec) # allocate resources
215227
try
216228
for s in strings
217229
data = transcode(codec, s)
@@ -220,7 +232,7 @@ try
220232
catch
221233
rethrow()
222234
finally
223-
CodecZstd.TranscodingStreams.finalize(codec)
235+
TranscodingStreams.finalize(codec) # free resources
224236
end
225237
```
226238

@@ -240,9 +252,9 @@ data2 = read(stream, 8)
240252
@assert data1 == data2
241253
```
242254

243-
The unread operaion is different from the write operation in that the unreaded
255+
The unread operation is different from the write operation in that the unreaded
244256
data are not written to the wrapped stream. The unreaded data are stored in the
245257
internal buffer of a transcoding stream.
246258

247259
Unfortunately, *unwrite* operation is not provided because there is no way to
248-
cancel write operations that are already commited to the wrapped stream.
260+
cancel write operations that are already committed to the wrapped stream.

docs/src/index.md

+50-42
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,41 @@
1-
TranscodingStreams.jl
2-
=====================
1+
# Home
32

4-
Overview
5-
--------
3+
![TranscodingStream](./assets/transcodingstream.png)
64

7-
TranscodingStreams.jl is a package for transcoding (e.g. compression) data
8-
streams. It exports a type `TranscodingStream`, which is a subtype of `IO` and
9-
supports various I/O operations like other usual I/O streams in the standard
10-
library. Operations are quick, simple, and consistent.
5+
## Overview
116

12-
In this page, we intorduce the basic concepts of TranscodingStreams.jl and
13-
available packages. The [Examples](@ref) page demonstrates common usage. The
14-
[References](@ref) page offers a comprehensive API document.
7+
TranscodingStreams.jl is a package for transcoding data streams. Transcoding
8+
may be compression, decompression, ASCII encoding, and any other codec. The
9+
package exports a data type `TranscodingStream`, which is a subtype of `IO` and
10+
wraps other `IO` object to transcode data read from or written to the wrapped
11+
stream.
1512

13+
In this page, we introduce the basic concepts of TranscodingStreams.jl and
14+
currently available packages. The [Examples](@ref) page demonstrates common
15+
usage. The [Reference](@ref) page offers a comprehensive API document.
1616

17-
Introduction
18-
------------
17+
## Introduction
1918

2019
`TranscodingStream` has two type parameters, `C<:Codec` and `S<:IO`, and hence
21-
the actual type should be written as `TranscodingStream{C<:Codec,S<:IO}`. This
22-
type wraps an underlying I/O stream `S` by a codec `C`. The codec defines
23-
transformation (or transcoding) of the stream. For example, when `C` is a
24-
lossless decompressor type and `S` is a file, `TranscodingStream{C,S}` behaves
25-
like a data stream that incrementally decompresses data from the file.
20+
the concrete data type is written as `TranscodingStream{C<:Codec,S<:IO}`. This
21+
type wraps an underlying I/O stream `S` by a transcoding codec `C`. `C` and `S`
22+
are orthogonal and hence you can use any combination of these two types. The
23+
underlying stream may be any stream that supports I/O operations defined by the
24+
`Base` module. For example, it may be `IOStream`, `TTY`, `IOBuffer`, or
25+
`TranscodingStream`. The codec `C` must define the transcoding protocol defined
26+
in this package. We already have various codecs in packages listed below. Of
27+
course, you can define your own codec by implementing the transcoding protocol
28+
described in [`TranscodingStreams.Codec`](@ref).
2629

27-
Codecs are defined in other packages listed below:
30+
You can install codec packages using the standard package manager. These codec
31+
packages are independent of each other and can be installed separately. You
32+
won't need to explicitly install the TranscodingStreams.jl package unless you
33+
will use lower-level interfaces of it. Each codec package defines some codec
34+
types, which is a subtype of `TranscodingStreams.Codec`, and their
35+
corresponding transcoding stream aliases. These aliases are partially
36+
instantiated by a codec type; for example, `GzipDecompressionStream{S}` is an
37+
alias of `TranscodingStream{GzipDecompressor,S}`, where `S` is a subtype of
38+
`IO`.
2839

2940
```@raw html
3041
<table>
@@ -33,7 +44,7 @@ Codecs are defined in other packages listed below:
3344
<th>Library</th>
3445
<th>Format</th>
3546
<th>Codec</th>
36-
<th>Stream</th>
47+
<th>Stream alias</th>
3748
<th>Description</th>
3849
</tr>
3950
<tr>
@@ -100,7 +111,7 @@ Codecs are defined in other packages listed below:
100111
<tr>
101112
<td rowspan="2"><a href="https://github.com/bicycle1885/CodecZstd.jl">CodecZstd.jl</a></td>
102113
<td rowspan="2"><a href="http://facebook.github.io/zstd/">zstd</a></td>
103-
<td rowspan="2"><a href="https://github.com/facebook/zstd/blob/dev/doc/zstd_compressor_format.md">Zstandard Compressor Format</a></td>
114+
<td rowspan="2"><a href="https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md">Zstandard Compression Format</a></td>
104115
<td><code>ZstdCompressor</code></td>
105116
<td><code>ZstdCompressorStream</code></td>
106117
<td>Compress data in zstd (.zst) format.</td>
@@ -146,27 +157,24 @@ Codecs are defined in other packages listed below:
146157
</table>
147158
```
148159

149-
Install packages you need by calling `Pkg.add(<package name>)` in a Julia
150-
session. For example, if you want to read gzip-compressed files, call
151-
`Pkg.add("CodecZlib")` to use `GzipDecompressor` or `GzipDecompressorStream`.
152-
By convention, codec types have a name that matches `.*(Co|Deco)mpression` and
153-
I/O types have a codec name with `Stream` suffix. All codecs are a subtype
154-
`TranscodingStreams.Codec` and streams are a subtype of `Base.IO`. An important
155-
thing is these packages depend on TranscodingStreams.jl and not *vice versa*.
156-
This means you can install any codec package you need without installing all
157-
codec packages. Also, if you want to define your own codec, it is totally
158-
feasible like these packages. TranscodingStreams.jl requests a codec to
159-
implement some interface functions which will be described later.
160160

161+
## Notes
161162

162-
Error handling
163-
--------------
163+
### Wrapped streams
164164

165-
You may encounter an error while processing data with this package. For example,
166-
your compressed data may be corrupted or truncated and the decompressor codec
167-
cannot handle it properly. In this case, the codec informs the stream of the
168-
error and the stream goes to an unrecoverable mode. In this mode, the only
169-
possible operations are `isopen` and `close`. Other operations, such as `read`
170-
or `write`, will result in an argument error exception. Resources allocated in
171-
the codec will be released by the stream and hence you must not call the
172-
finalizer of a codec that is once passed to a transcoding stream object.
165+
The wrapper stream takes care of the wrapped stream. Reading or writing data
166+
from or to the wrapped stream outside the management will result in unexpected
167+
behaviors. When you close the wrapped stream, you must call the `close` method
168+
of the wrapper stream, which releases allocated resources and closes the
169+
wrapped stream.
170+
171+
### Error handling
172+
173+
You may encounter an error while processing data with this package. For
174+
example, your compressed data may be corrupted or truncated for some reason,
175+
and the decompressor cannot recover the original data. In such a case, the
176+
codec informs the stream of the error, and the stream goes to an unrecoverable
177+
mode. In this mode, the only possible operations are `isopen` and `close`.
178+
Other operations, such as `read` or `write`, will result in an argument error
179+
exception. Resources allocated by the codec will be released by the stream, and
180+
hence you must not call the finalizer of the codec.

docs/src/references.md renamed to docs/src/reference.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
References
2-
==========
1+
Reference
2+
=========
33

44
```@meta
55
CurrentModule = TranscodingStreams
@@ -10,7 +10,8 @@ TranscodingStream
1010

1111
```@docs
1212
TranscodingStream(codec::Codec, stream::IO)
13-
transcode(codec::Codec, data::Vector{UInt8})
13+
transcode(::Type{<:Codec}, data::ByteData)
14+
transcode(codec::Codec, data::ByteData)
1415
TranscodingStreams.TOKEN_END
1516
TranscodingStreams.unsafe_read
1617
TranscodingStreams.unread

src/state.jl

+5
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22
# =================
33

44
# See docs/src/devnotes.md.
5+
"""
6+
A mutable state type of transcoding streams.
7+
8+
See Developer's notes for details.
9+
"""
510
mutable struct State
611
# current stream mode
712
mode::Symbol # {:idle, :read, :write, :stop, :close, :panic}

0 commit comments

Comments
 (0)