You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/string-representation.md
+20-7Lines changed: 20 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -59,13 +59,13 @@ A given ICU4X operation must always provide a version that accepts potentially i
59
59
60
60
This is what Rust uses natively. C and C++ programmers who want maximum performance and are confident in their code correctness in a particular place in their code could use this for specific things. For input, the type is `&str`, which is idiomatic Rust. For output, the type is `&mut str`, which isn't quite idiomatic Rust. (In particular, the language currently doesn't allow materializing a zero-initialized stack-allocated `&mut str` without `unsafe`, even though there is no fundamental reason why this operation could be provided in the language without `unsafe`.) The pointer types in C and C++ header are the same as in the potentially-invalid case.
61
61
62
-
In the Rust context, the name annotation is `_str`. In the FFI context of the input side, the main limitation is `_utf8_unsafe`. The point of the different annotations is that of the Rust side, thanks to the guarantees of Rust, calling these functions is safe. However, when called from the language that doesn't have Rust's guarantees as part of the language, it is up to the programmer to ensure that the input is valid UTF-8. Otherwise, Undefined Behavior ensues. Hence the "unsafe" designation in FFI, i.e. the C API. (This type is irrelevant in the FFI context on the output side.) It is
62
+
In the Rust context, the name annotation is `_str`. In the FFI context of the input side, the main limitation is `_utf8_unsafe`. The point of the different annotations is that of the Rust side, thanks to the guarantees of Rust, calling these functions is safe. However, when called from the language that doesn't have Rust's guarantees as part of the language, it is up to the programmer to ensure that the input is valid UTF-8. Otherwise, Undefined Behavior ensues. Hence the "unsafe" designation in FFI, i.e. the C API. (This type is irrelevant in the FFI context on the output side.)
63
63
64
64
On the input side, `&str` differs from `&[u8]` holding potentially-invalid UTF-8 in performance. The former can be iterated over by Unicode scalar value without having to have branches that take into account invalid byte sequences. In this sense, the distinction between the function that takes `&str` and a function that takes UTF-8 `&[u8]` is one of performance. Correctness-wise, it is always okay for the function that takes `&str` to call `as_bytes()` on the argument and then delegate to the function that takes `&[u8]`. When performance of the operation is more important than minimizing binary size, the function that takes `&str` should make use of the knowledge that there are no ill-formed sequences.
65
65
66
-
On the outside, the function that takes `&mut str` always delegates to the version that takes `&[u8]` and trailing loop that zeros bytes after the last byte written by the delegate function until either the end of the slice or UTF a lead byte is reached. This is enough to uphold the invariant of `&mut str`.
66
+
On the outside, the function that takes `&mut str` always delegates to the version that takes `&mut [u8]` and trailing loop that zeros bytes after the last byte written by the delegate function until either the end of the slice or UTF a lead byte is reached. This is enough to uphold the invariant of `&mut str`.
67
67
68
-
A given ICU4X operation that outputs text must always provide a version, whose input type is `&str` and output type is `&[u8]` (this one gets exposed via FFI as `_unsafe_utf8`), the version whose input type is `&str` and output type is `&mut str`, and a version whose input type is `&str` and that returns a `String` instead of having an output argument. The last two or always implemented as a thin mechanical wrappers around the first one. If performance considerations permit, the first one may just delegate to the version that takes `&[u8]` input. The function that returs a `String` has no name annotation.
68
+
A given ICU4X operation that outputs text must always provide a version, whose input type is `&str` and output type is `&mut [u8]` (this one gets exposed via FFI as `_unsafe_utf8`), the version whose input type is `&str` and output type is `&mut str`, and a version whose input type is `&str` and that returns a `String` instead of having an output argument. The last two or always implemented as a thin mechanical wrappers around the first one. If performance considerations permit, the first one may just delegate to the version that takes `&[u8]` input. The function that returs a `String` has no name annotation.
0 commit comments