Skip to content

Commit 2e50d5b

Browse files
authored
BJData optimized binary array type (nlohmann#4513)
1 parent 60c4875 commit 2e50d5b

File tree

7 files changed

+479
-260
lines changed

7 files changed

+479
-260
lines changed

docs/mkdocs/docs/api/basic_json/to_bjdata.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,16 @@
44
// (1)
55
static std::vector<std::uint8_t> to_bjdata(const basic_json& j,
66
const bool use_size = false,
7-
const bool use_type = false);
7+
const bool use_type = false,
8+
const bjdata_version_t version = bjdata_version_t::draft2);
89

910
// (2)
1011
static void to_bjdata(const basic_json& j, detail::output_adapter<std::uint8_t> o,
11-
const bool use_size = false, const bool use_type = false);
12+
const bool use_size = false, const bool use_type = false,
13+
const bjdata_version_t version = bjdata_version_t::draft2);
1214
static void to_bjdata(const basic_json& j, detail::output_adapter<char> o,
13-
const bool use_size = false, const bool use_type = false);
15+
const bool use_size = false, const bool use_type = false,
16+
const bjdata_version_t version = bjdata_version_t::draft2);
1417
```
1518
1619
Serializes a given JSON value `j` to a byte vector using the BJData (Binary JData) serialization format. BJData aims to
@@ -34,6 +37,9 @@ The exact mapping and its limitations is described on a [dedicated page](../../f
3437
3538
`use_type` (in)
3639
: whether to add type annotations to container types (must be combined with `#!cpp use_size = true`); optional,
40+
41+
`version` (in)
42+
: which version of BJData to use (see [draft 3](../../features/binary_formats/bjdata.md#draft-3-binary-format)); optional,
3743
`#!cpp false` by default.
3844
3945
## Return value
@@ -68,3 +74,4 @@ Linear in the size of the JSON value `j`.
6874
## Version history
6975
7076
- Added in version 3.11.0.
77+
- BJData version parameter (for draft3 binary encoding) added in version 3.12.0.

docs/mkdocs/docs/features/binary_formats/bjdata.md

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
The [BJData format](https://neurojson.org) was derived from and improved upon
44
[Universal Binary JSON(UBJSON)](https://ubjson.org) specification (Draft 12). Specifically, it introduces an optimized
5-
array container for efficient storage of N-dimensional packed arrays (**ND-arrays**); it also adds 4 new type markers -
6-
`[u] - uint16`, `[m] - uint32`, `[M] - uint64` and `[h] - float16` - to unambiguously map common binary numeric types;
7-
furthermore, it uses little-endian (LE) to store all numerics instead of big-endian (BE) as in UBJSON to avoid
8-
unnecessary conversions on commonly available platforms.
5+
array container for efficient storage of N-dimensional packed arrays (**ND-arrays**); it also adds 5 new type markers -
6+
`[u] - uint16`, `[m] - uint32`, `[M] - uint64`, `[h] - float16` and `[B] - byte` - to unambiguously map common binary
7+
numeric types; furthermore, it uses little-endian (LE) to store all numerics instead of big-endian (BE) as in UBJSON to
8+
avoid unnecessary conversions on commonly available platforms.
99

1010
Compared to other binary JSON-like formats such as MessagePack and CBOR, both BJData and UBJSON demonstrate a rare
1111
combination of being both binary and **quasi-human-readable**. This is because all semantic elements in BJData and
@@ -49,6 +49,7 @@ The library uses the following mapping from JSON values types to BJData types ac
4949
| string | *with shortest length indicator* | string | `S` |
5050
| array | *see notes on optimized format/ND-array* | array | `[` |
5151
| object | *see notes on optimized format* | map | `{` |
52+
| binary | *see notes on binary values* | array | `[$B` |
5253

5354
!!! success "Complete mapping"
5455

@@ -128,15 +129,24 @@ The library uses the following mapping from JSON values types to BJData types ac
128129

129130
Due to diminished space saving, hampered readability, and increased security risks, in BJData, the allowed data
130131
types following the `$` marker in an optimized array and object container are restricted to
131-
**non-zero-fixed-length** data types. Therefore, the valid optimized type markers can only be one of `UiuImlMLhdDC`.
132-
This also means other variable (`[{SH`) or zero-length types (`TFN`) can not be used in an optimized array or object
133-
in BJData.
132+
**non-zero-fixed-length** data types. Therefore, the valid optimized type markers can only be one of
133+
`UiuImlMLhdDCB`. This also means other variable (`[{SH`) or zero-length types (`TFN`) can not be used in an
134+
optimized array or object in BJData.
134135

135136
!!! info "Binary values"
136137

137-
If the JSON data contains the binary type, the value stored is a list of integers, as suggested by the BJData
138-
documentation. In particular, this means that the serialization and the deserialization of JSON containing binary
139-
values into BJData and back will result in a different JSON object.
138+
BJData provides a dedicated `B` marker (defined in the [BJData specification (Draft 3)][BJDataBinArr]) that is used
139+
in optimized arrays to designate binary data. This means that, unlike UBJSON, binary data can be both serialized and
140+
deserialized.
141+
142+
To preserve compatibility with BJData Draft 2, the Draft 3 optimized binary array must be explicitly enabled using
143+
the `version` parameter of [`to_bjdata`](../../api/basic_json/to_bjdata.md).
144+
145+
In Draft2 mode (default), if the JSON data contains the binary type, the value stored as a list of integers, as
146+
suggested by the BJData documentation. In particular, this means that the serialization and the deserialization of
147+
JSON containing binary values into BJData and back will result in a different JSON object.
148+
149+
[BJDataBinArr]: https://github.com/NeuroJSON/bjdata/blob/master/Binary_JData_Specification.md#optimized-binary-array)
140150

141151
??? example
142152

@@ -171,11 +181,13 @@ The library maps BJData types to JSON value types as follows:
171181
| int32 | number_integer | `l` |
172182
| uint64 | number_unsigned | `M` |
173183
| int64 | number_integer | `L` |
184+
| byte | number_unsigned | `B` |
174185
| string | string | `S` |
175186
| char | string | `C` |
176187
| array | array (optimized values are supported) | `[` |
177188
| ND-array | object (in JData annotated array format)|`[$.#[.`|
178189
| object | object (optimized values are supported) | `{` |
190+
| binary | binary (strongly-typed byte array) | `[$B` |
179191

180192
!!! success "Complete mapping"
181193

include/nlohmann/detail/input/binary_reader.hpp

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2313,6 +2313,16 @@ class binary_reader
23132313
case 'Z': // null
23142314
return sax->null();
23152315

2316+
case 'B': // byte
2317+
{
2318+
if (input_format != input_format_t::bjdata)
2319+
{
2320+
break;
2321+
}
2322+
std::uint8_t number{};
2323+
return get_number(input_format, number) && sax->number_unsigned(number);
2324+
}
2325+
23162326
case 'U':
23172327
{
23182328
std::uint8_t number{};
@@ -2513,7 +2523,7 @@ class binary_reader
25132523
return false;
25142524
}
25152525

2516-
if (size_and_type.second == 'C')
2526+
if (size_and_type.second == 'C' || size_and_type.second == 'B')
25172527
{
25182528
size_and_type.second = 'U';
25192529
}
@@ -2535,6 +2545,13 @@ class binary_reader
25352545
return (sax->end_array() && sax->end_object());
25362546
}
25372547

2548+
// If BJData type marker is 'B' decode as binary
2549+
if (input_format == input_format_t::bjdata && size_and_type.first != npos && size_and_type.second == 'B')
2550+
{
2551+
binary_t result;
2552+
return get_binary(input_format, size_and_type.first, result) && sax->binary(result);
2553+
}
2554+
25382555
if (size_and_type.first != npos)
25392556
{
25402557
if (JSON_HEDLEY_UNLIKELY(!sax->start_array(size_and_type.first)))
@@ -3008,6 +3025,7 @@ class binary_reader
30083025

30093026
#define JSON_BINARY_READER_MAKE_BJD_TYPES_MAP_ \
30103027
make_array<bjd_type>( \
3028+
bjd_type{'B', "byte"}, \
30113029
bjd_type{'C', "char"}, \
30123030
bjd_type{'D', "double"}, \
30133031
bjd_type{'I', "int16"}, \

include/nlohmann/detail/output/binary_writer.hpp

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,13 @@ NLOHMANN_JSON_NAMESPACE_BEGIN
2828
namespace detail
2929
{
3030

31+
/// how to encode BJData
32+
enum class bjdata_version_t
33+
{
34+
draft2,
35+
draft3,
36+
};
37+
3138
///////////////////
3239
// binary writer //
3340
///////////////////
@@ -735,11 +742,14 @@ class binary_writer
735742
@param[in] use_type whether to use '$' prefixes (optimized format)
736743
@param[in] add_prefix whether prefixes need to be used for this value
737744
@param[in] use_bjdata whether write in BJData format, default is false
745+
@param[in] bjdata_version which BJData version to use, default is draft2
738746
*/
739747
void write_ubjson(const BasicJsonType& j, const bool use_count,
740748
const bool use_type, const bool add_prefix = true,
741-
const bool use_bjdata = false)
749+
const bool use_bjdata = false, const bjdata_version_t bjdata_version = bjdata_version_t::draft2)
742750
{
751+
const bool bjdata_draft3 = bjdata_version == bjdata_version_t::draft3;
752+
743753
switch (j.type())
744754
{
745755
case value_t::null:
@@ -829,7 +839,7 @@ class binary_writer
829839

830840
for (const auto& el : *j.m_data.m_value.array)
831841
{
832-
write_ubjson(el, use_count, use_type, prefix_required, use_bjdata);
842+
write_ubjson(el, use_count, use_type, prefix_required, use_bjdata, bjdata_version);
833843
}
834844

835845
if (!use_count)
@@ -847,11 +857,11 @@ class binary_writer
847857
oa->write_character(to_char_type('['));
848858
}
849859

850-
if (use_type && !j.m_data.m_value.binary->empty())
860+
if (use_type && ((use_bjdata && bjdata_draft3) || !j.m_data.m_value.binary->empty()))
851861
{
852862
JSON_ASSERT(use_count);
853863
oa->write_character(to_char_type('$'));
854-
oa->write_character('U');
864+
oa->write_character(use_bjdata && bjdata_draft3 ? 'B' : 'U');
855865
}
856866

857867
if (use_count)
@@ -870,7 +880,7 @@ class binary_writer
870880
{
871881
for (size_t i = 0; i < j.m_data.m_value.binary->size(); ++i)
872882
{
873-
oa->write_character(to_char_type('U'));
883+
oa->write_character(to_char_type((use_bjdata && bjdata_draft3) ? 'B' : 'U'));
874884
oa->write_character(j.m_data.m_value.binary->data()[i]);
875885
}
876886
}
@@ -887,7 +897,7 @@ class binary_writer
887897
{
888898
if (use_bjdata && j.m_data.m_value.object->size() == 3 && j.m_data.m_value.object->find("_ArrayType_") != j.m_data.m_value.object->end() && j.m_data.m_value.object->find("_ArraySize_") != j.m_data.m_value.object->end() && j.m_data.m_value.object->find("_ArrayData_") != j.m_data.m_value.object->end())
889899
{
890-
if (!write_bjdata_ndarray(*j.m_data.m_value.object, use_count, use_type)) // decode bjdata ndarray in the JData format (https://github.com/NeuroJSON/jdata)
900+
if (!write_bjdata_ndarray(*j.m_data.m_value.object, use_count, use_type, bjdata_version)) // decode bjdata ndarray in the JData format (https://github.com/NeuroJSON/jdata)
891901
{
892902
break;
893903
}
@@ -931,7 +941,7 @@ class binary_writer
931941
oa->write_characters(
932942
reinterpret_cast<const CharType*>(el.first.c_str()),
933943
el.first.size());
934-
write_ubjson(el.second, use_count, use_type, prefix_required, use_bjdata);
944+
write_ubjson(el.second, use_count, use_type, prefix_required, use_bjdata, bjdata_version);
935945
}
936946

937947
if (!use_count)
@@ -1615,10 +1625,11 @@ class binary_writer
16151625
/*!
16161626
@return false if the object is successfully converted to a bjdata ndarray, true if the type or size is invalid
16171627
*/
1618-
bool write_bjdata_ndarray(const typename BasicJsonType::object_t& value, const bool use_count, const bool use_type)
1628+
bool write_bjdata_ndarray(const typename BasicJsonType::object_t& value, const bool use_count, const bool use_type, const bjdata_version_t bjdata_version)
16191629
{
16201630
std::map<string_t, CharType> bjdtype = {{"uint8", 'U'}, {"int8", 'i'}, {"uint16", 'u'}, {"int16", 'I'},
1621-
{"uint32", 'm'}, {"int32", 'l'}, {"uint64", 'M'}, {"int64", 'L'}, {"single", 'd'}, {"double", 'D'}, {"char", 'C'}
1631+
{"uint32", 'm'}, {"int32", 'l'}, {"uint64", 'M'}, {"int64", 'L'}, {"single", 'd'}, {"double", 'D'},
1632+
{"char", 'C'}, {"byte", 'B'}
16221633
};
16231634

16241635
string_t key = "_ArrayType_";
@@ -1648,10 +1659,10 @@ class binary_writer
16481659
oa->write_character('#');
16491660

16501661
key = "_ArraySize_";
1651-
write_ubjson(value.at(key), use_count, use_type, true, true);
1662+
write_ubjson(value.at(key), use_count, use_type, true, true, bjdata_version);
16521663

16531664
key = "_ArrayData_";
1654-
if (dtype == 'U' || dtype == 'C')
1665+
if (dtype == 'U' || dtype == 'C' || dtype == 'B')
16551666
{
16561667
for (const auto& el : value.at(key))
16571668
{

include/nlohmann/json.hpp

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,8 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
171171
using error_handler_t = detail::error_handler_t;
172172
/// how to treat CBOR tags
173173
using cbor_tag_handler_t = detail::cbor_tag_handler_t;
174+
/// how to encode BJData
175+
using bjdata_version_t = detail::bjdata_version_t;
174176
/// helper type for initializer lists of basic_json values
175177
using initializer_list_t = std::initializer_list<detail::json_ref<basic_json>>;
176178

@@ -4352,27 +4354,30 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
43524354
/// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
43534355
static std::vector<std::uint8_t> to_bjdata(const basic_json& j,
43544356
const bool use_size = false,
4355-
const bool use_type = false)
4357+
const bool use_type = false,
4358+
const bjdata_version_t version = bjdata_version_t::draft2)
43564359
{
43574360
std::vector<std::uint8_t> result;
4358-
to_bjdata(j, result, use_size, use_type);
4361+
to_bjdata(j, result, use_size, use_type, version);
43594362
return result;
43604363
}
43614364

43624365
/// @brief create a BJData serialization of a given JSON value
43634366
/// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
43644367
static void to_bjdata(const basic_json& j, detail::output_adapter<std::uint8_t> o,
4365-
const bool use_size = false, const bool use_type = false)
4368+
const bool use_size = false, const bool use_type = false,
4369+
const bjdata_version_t version = bjdata_version_t::draft2)
43664370
{
4367-
binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type, true, true);
4371+
binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type, true, true, version);
43684372
}
43694373

43704374
/// @brief create a BJData serialization of a given JSON value
43714375
/// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
43724376
static void to_bjdata(const basic_json& j, detail::output_adapter<char> o,
4373-
const bool use_size = false, const bool use_type = false)
4377+
const bool use_size = false, const bool use_type = false,
4378+
const bjdata_version_t version = bjdata_version_t::draft2)
43744379
{
4375-
binary_writer<char>(o).write_ubjson(j, use_size, use_type, true, true);
4380+
binary_writer<char>(o).write_ubjson(j, use_size, use_type, true, true, version);
43764381
}
43774382

43784383
/// @brief create a BSON serialization of a given JSON value

0 commit comments

Comments
 (0)