|
| 1 | +This module implements the text format, described in [RFC 7464], to represent |
| 2 | +streams of JSON objects (application/json-seq). |
| 3 | + |
| 4 | +[RFC 7464]: https://datatracker.ietf.org/doc/html/rfc7464 |
| 5 | + |
| 6 | +The format is simple. A stream consists of normal UTF-8 JSON texts, each |
| 7 | +separated by a ASCII Record Separator character (0x1E), and terminated by an |
| 8 | +ASCII Line Feed (\n or 0x0A). The JSON texts can have whatever indentation is |
| 9 | +desired, so the encoded format can be as human-readable as regular JSON. |
| 10 | + |
| 11 | +## Key points from the RFC |
| 12 | + |
| 13 | +- The spec emphasises the importance of parsers recovering from stream errors |
| 14 | + rather than failing on a problematic item: |
| 15 | + - Section 2: |
| 16 | + > "Having two different sets of rules permits recovery parsers from |
| 17 | + > sequences where some of the elements are truncated for whatever reason." |
| 18 | + - Section 2.1: |
| 19 | + > "If parsing of such an octet string as a UTF-8-encoded JSON text fails, |
| 20 | + > the parser SHOULD nonetheless continue parsing the remainder of the |
| 21 | + > sequence." |
| 22 | + - Section 2.1 and 2.2: It follows the [Robustness Principle] in specifying a |
| 23 | + reasonably precise encoding for the stream format, but reasonably lenient |
| 24 | + rules for parsing. |
| 25 | + |
| 26 | + [Robustness Principle]: https://en.wikipedia.org/wiki/Robustness_principle |
| 27 | + |
| 28 | + - Section 2.3: |
| 29 | + |
| 30 | + > "Per Section 2.1, JSON text sequence parsers should not abort when an |
| 31 | + > octet string contains a malformed JSON text. Instead, the JSON text |
| 32 | + > sequence parser should skip to the next RS." |
| 33 | +
|
| 34 | + - Section 2.4: |
| 35 | + |
| 36 | + > Parsers MUST check that any JSON texts that are a top-level number, or |
| 37 | + > that might be 'true', 'false', or 'null', include JSON whitespace (at |
| 38 | + > least one byte matching the "ws" ABNF rule from [RFC7159]) after that |
| 39 | + > value; otherwise, the JSON-text may have been truncated. Note that the LF |
| 40 | + > following each JSON text matches the "ws" ABNF rule. |
| 41 | +
|
| 42 | + > Parsers MUST drop JSON-text sequence elements consisting of non-self- |
| 43 | + > delimited top-level values that may have been truncated (that are not |
| 44 | + > delimited by whitespace). Parsers can report such texts as warnings |
| 45 | + > (including, optionally, the parsed text and/or the original octet string). |
| 46 | +
|
| 47 | + For example, it's not possible to know if a top-level number that isn't |
| 48 | + followed by whitespace was truncated, so it's not safe for the parser to |
| 49 | + output it as a complete value.: |
| 50 | + |
| 51 | + > For example, `'<RS>123<RS>'` might have been intended to carry the top- |
| 52 | + > level number 1234, but it got truncated. |
| 53 | +
|
| 54 | + However, strings and other unambiguously-terminated values are safe to |
| 55 | + report: |
| 56 | + |
| 57 | + > Implementations may produce a value when parsing `'<RS>"foo"<RS>'` |
| 58 | +
|
| 59 | + The spec allows skipping over junk after a JSON value prior to the next |
| 60 | + `<RS>` char: |
| 61 | + |
| 62 | + > [...] Such implementations ought to skip to the next RS byte, possibly |
| 63 | + > reporting any intervening non-whitespace bytes. |
| 64 | +
|
| 65 | + However the `JSON.parse` function doesn't really allow identifying and |
| 66 | + skipping such junk content (short of hacks involving parsing a thrown |
| 67 | + `SyntaxError` to guess the end of a valid JSON string prefix). |
| 68 | + |
| 69 | + - Section 3, Security Considerations warns of the danger of different |
| 70 | + implementations reporting or not emitting values for valid JSON with junk |
| 71 | + suffixes: |
| 72 | + |
| 73 | + > Note that incremental JSON text parsers can produce partial results and |
| 74 | + > later indicate failure to parse the remainder of a text. A sequence parser |
| 75 | + > that uses an incremental JSON text parser might treat a sequence like |
| 76 | + > `'<RS>"foo"<LF>456<LF><RS>'` as a sequence of one element ("foo"), while a |
| 77 | + > sequence parser that uses a non-incremental JSON text parser might treat |
| 78 | + > the same sequence as being empty. This effect, and texts that fail to |
| 79 | + > parse and are ignored, can be used to smuggle data past sequence parsers |
| 80 | + > that don't warn about JSON text failures. |
0 commit comments