Skip to content

Commit 3db47b1

Browse files
committed
doc: add notes on important details from the RFC
1 parent be23977 commit 3db47b1

File tree

1 file changed

+80
-0
lines changed

1 file changed

+80
-0
lines changed

format_notes.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
This module implements the text format, described in [RFC 7464], to represent
2+
streams of JSON objects (application/json-seq).
3+
4+
[RFC 7464]: https://datatracker.ietf.org/doc/html/rfc7464
5+
6+
The format is simple. A stream consists of normal UTF-8 JSON texts, each
7+
separated by a ASCII Record Separator character (0x1E), and terminated by an
8+
ASCII Line Feed (\n or 0x0A). The JSON texts can have whatever indentation is
9+
desired, so the encoded format can be as human-readable as regular JSON.
10+
11+
## Key points from the RFC
12+
13+
- The spec emphasises the importance of parsers recovering from stream errors
14+
rather than failing on a problematic item:
15+
- Section 2:
16+
> "Having two different sets of rules permits recovery parsers from
17+
> sequences where some of the elements are truncated for whatever reason."
18+
- Section 2.1:
19+
> "If parsing of such an octet string as a UTF-8-encoded JSON text fails,
20+
> the parser SHOULD nonetheless continue parsing the remainder of the
21+
> sequence."
22+
- Section 2.1 and 2.2: It follows the [Robustness Principle] in specifying a
23+
reasonably precise encoding for the stream format, but reasonably lenient
24+
rules for parsing.
25+
26+
[Robustness Principle]: https://en.wikipedia.org/wiki/Robustness_principle
27+
28+
- Section 2.3:
29+
30+
> "Per Section 2.1, JSON text sequence parsers should not abort when an
31+
> octet string contains a malformed JSON text. Instead, the JSON text
32+
> sequence parser should skip to the next RS."
33+
34+
- Section 2.4:
35+
36+
> Parsers MUST check that any JSON texts that are a top-level number, or
37+
> that might be 'true', 'false', or 'null', include JSON whitespace (at
38+
> least one byte matching the "ws" ABNF rule from [RFC7159]) after that
39+
> value; otherwise, the JSON-text may have been truncated. Note that the LF
40+
> following each JSON text matches the "ws" ABNF rule.
41+
42+
> Parsers MUST drop JSON-text sequence elements consisting of non-self-
43+
> delimited top-level values that may have been truncated (that are not
44+
> delimited by whitespace). Parsers can report such texts as warnings
45+
> (including, optionally, the parsed text and/or the original octet string).
46+
47+
For example, it's not possible to know if a top-level number that isn't
48+
followed by whitespace was truncated, so it's not safe for the parser to
49+
output it as a complete value.:
50+
51+
> For example, `'<RS>123<RS>'` might have been intended to carry the top-
52+
> level number 1234, but it got truncated.
53+
54+
However, strings and other unambiguously-terminated values are safe to
55+
report:
56+
57+
> Implementations may produce a value when parsing `'<RS>"foo"<RS>'`
58+
59+
The spec allows skipping over junk after a JSON value prior to the next
60+
`<RS>` char:
61+
62+
> [...] Such implementations ought to skip to the next RS byte, possibly
63+
> reporting any intervening non-whitespace bytes.
64+
65+
However the `JSON.parse` function doesn't really allow identifying and
66+
skipping such junk content (short of hacks involving parsing a thrown
67+
`SyntaxError` to guess the end of a valid JSON string prefix).
68+
69+
- Section 3, Security Considerations warns of the danger of different
70+
implementations reporting or not emitting values for valid JSON with junk
71+
suffixes:
72+
73+
> Note that incremental JSON text parsers can produce partial results and
74+
> later indicate failure to parse the remainder of a text. A sequence parser
75+
> that uses an incremental JSON text parser might treat a sequence like
76+
> `'<RS>"foo"<LF>456<LF><RS>'` as a sequence of one element ("foo"), while a
77+
> sequence parser that uses a non-incremental JSON text parser might treat
78+
> the same sequence as being empty. This effect, and texts that fail to
79+
> parse and are ignored, can be used to smuggle data past sequence parsers
80+
> that don't warn about JSON text failures.

0 commit comments

Comments
 (0)