Skip to content

Commit 067ae4e

Browse files
committed
docs: explain surrogate replacement
1 parent da49137 commit 067ae4e

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

__test__/strings.spec.ts

+16
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,22 @@ describe("Binarypack", () => {
1717
expect(await packAndUnpack(v)).toEqual(v);
1818
}
1919
});
20+
21+
/**
22+
* A Javascript string with unpaired surrogates is not actually valid
23+
* UTF-16, and so it cannot be round-tripped to UTF-8 and back.
24+
* The recommended way to handle this is to replace each unpaired surrogate
25+
* with \uFFFD (the "replacement character").
26+
*
27+
* Note a surrogate pair means two adjacent Javascript characters where the
28+
* first is in the range \uD800 - \uDBFF and the second is in the
29+
* range \uDC00 - \uDFFF.
30+
* To be valid UTF-16, Javascript characters from these ranges must *only*
31+
* appear in surrogate pairs. An *unpaired* surrogate means any such
32+
* Javascript character that is not paired up properly.
33+
*
34+
* https://github.com/peers/js-binarypack/issues/11#issuecomment-1445129237
35+
*/
2036
it("should replace unpaired surrogates", async () => {
2137
const v = "un\ud800paired\udfffsurrogates";
2238
const expected = v.replace(

0 commit comments

Comments
 (0)