Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions source/bson-binary-vector/bson-binary-vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,13 +180,18 @@ End Function

#### Validation

Drivers MUST validate vector metadata and raise an error if any invariant is violated:
Drivers MUST validate vector metadata and raise an exception if any invariant is violated:

- Padding MUST be 0 for all dtypes where padding doesn’t apply, and MUST be within [0, 7] for PACKED_BIT.
- A PACKED_BIT vector MUST NOT be empty if padding is in the range [1, 7].
- For a PACKED_BIT vector, ignored bits must be zero.
- When unpacking binary data into a FLOAT32 Vector structure, the length of the binary data following the dtype and
padding MUST be a multiple of 4 bytes.
- Padding MUST be 0 for all dtypes where padding doesn’t apply, and MUST be within [0, 7] for PACKED_BIT.
- A PACKED_BIT vector MUST NOT be empty if padding is in the range [1, 7].
- For a PACKED_BIT vector with non-zero padding, ignored bits SHOULD be zero.
- When encoding, if ignored bits aren't zero, drivers SHOULD raise an exception, but drivers MAY leave them as-is if
backwards-compatibility is a concern.
- When decoding, drivers SHOULD raise an exception if decoding non-zero ignored bits, but drivers MAY choose not to
for backwards compatibility.
- Drivers SHOULD use the next major release to conform to ignored bits being zero.

Drivers MUST perform this validation when a numeric vector and padding are provided through the API, and when unpacking
binary data (BSON or similar) into a Vector structure.
Expand Down Expand Up @@ -249,13 +254,15 @@ See the [README](tests/README.md) for tests.
example in Python, see
[numpy.unpackbits](https://numpy.org/doc/2.0/reference/generated/numpy.unpackbits.html#numpy.unpackbits).

- In PACKED_BIT, why are ignored bits required to be zero?
- In PACKED_BIT, why are ignored bits recommended to be zero?

- To ensure the same data representation has the same encoding. For drivers supporting comparison operations, this
avoids comparing different unused bits.

## Changelog

- 2025-06-23: In PACKED_BIT vectors, ignored bits MAY be zero for backwards-compatibility. Prose tests added.

- 2025-04-08: In PACKED_BIT vectors, ignored bits must be zero.

- 2025-03-07: Update tests to use Extended JSON representation of +/-Infinity. (DRIVERS-3095)
Expand Down
60 changes: 60 additions & 0 deletions source/bson-binary-vector/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,66 @@ MUST assert that the input float array is the same after encoding and decoding.
- if the canonical_bson field is present, raise an exception when attempting to deserialize it into the corresponding
numeric values, as the field contains corrupted data.

## Prose Tests

### Treatment of non-zero ignored bits

All drivers MUST test encoding and decoding behavior according to their design and version. For drivers that haven't
been completed, raise exceptions in both cases. For those that have, update to this behavior according to semantic
versioning rules, and update tests accordingly.

In both cases, [255], a single byte PACKED_BIT vector of length 1 (hence padding of 7) provides a good example to use,
as all of its bits are ones.

#### 1. Encoding

- Test encoding with non-zero ignored bits. Use the driver API that validates vector metadata.
- If the driver validates ignored bits are zero (preferred), expect an error. Otherwise expect the ignored bits are
preserved.

```python
with pytest.raises(ValueError):
Binary.from_vector([0b11111111], BinaryVectorDtype.PACKED_BIT, padding=7)
```

### 2. Decoding

- Test the behaviour of your driver when one attempts to decode from binary to vector.
- e.g. As of pymongo 4.14, a warning is raised. From 5.0, it will be an exception.

```python
b = Binary(b'\x10\x07\xff', subtype=9)
with pytest.warns():
Binary.as_vector(b)
```

Drivers MAY skip this test if they choose not to implement a `Vector` type.

### 3. Comparison

Once we can guarantee that all ignored bits are non-zero, then equality can be tested on the binary subtype. Until then,
equality is ambiguous, and depends on whether one compares by bits (uint1), or uint8. Drivers SHOULD test equality
behavior according to their design and version.

For example, in `pymongo < 5.0`, we define equality of a BinaryVector by matching padding, dtype, and integer. This
means that two single bit vectors in which 7 bits are ignored do not match unless all bits match. This mirrors what the
server does.

```python
b1 = Binary.from_vector([0b10000000], BinaryVectorDtype.PACKED_BIT, padding=7)
assert b1 == Binary(b'\x10\x07\x80', subtype=9) # This is effectively a roundtrip.
v1 = Binary.as_vector(b1)

b2 = Binary.from_vector([0b11111111], BinaryVectorDtype.PACKED_BIT, padding=7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect Binary.from_vector would raise an exception for non-zero ignored bits. Suggest instead directly doing:

b1 = Binary(b'\x10\x07\x80', subtype=9)
b2 = Binary(b'\x10\x07\xff', subtype=9)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #1819.

assert b2 == Binary(b'\x10\x07\xff', subtype=9)
v2 = Binary.as_vector(b2)

assert b1 != b2 # Unequal at naive Binary level
assert v2 != v1 # Also chosen to be unequal at BinaryVector level as [255] != [128]
```

Drivers MAY skip this test if they choose not to implement a `Vector` type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Drivers MAY skip this test if they choose not to implement a `Vector` type.
Drivers MAY skip this test if they choose not to implement a `Vector` type, or the type does not support comparison.

Suggest permitting drivers skip this test if the Vector type does not support comparison. The C driver bson_vector_packed_bit_view_t (which I expect is the closest analog to the Vector type) does not support comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #1819.


## FAQ

- What MongoDB Server version does this apply to?
Expand Down
9 changes: 0 additions & 9 deletions source/bson-binary-vector/tests/packed_bit.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,6 @@
"padding": 3,
"canonical_bson": "1600000005766563746F7200040000000910037F0800"
},
{
"description": "PACKED_BIT with inconsistent padding",
"valid": false,
"vector": [127, 7],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 3,
"canonical_bson": "1600000005766563746F7200040000000910037F0700"
},
{
"description": "Empty Vector PACKED_BIT",
"valid": true,
Expand Down
Loading