|
12 | 12 | //!
|
13 | 13 | //! This module provides utilities to handle data across non-Rust
|
14 | 14 | //! interfaces, like other programming languages and the underlying
|
15 |
| -//! operating system. It is mainly of use for FFI (Foreign Function |
| 15 | +//! operating system. It is mainly of use for FFI (Foreign Function |
16 | 16 | //! Interface) bindings and code that needs to exchange C-like strings
|
17 | 17 | //! with other languages.
|
18 | 18 | //!
|
19 | 19 | //! # Overview
|
20 | 20 | //!
|
21 | 21 | //! Rust represents owned strings with the [`String`] type, and
|
22 |
| -//! borrowed slices of strings with the [`str`] primitive. Both are |
| 22 | +//! borrowed slices of strings with the [`str`] primitive. Both are |
23 | 23 | //! always in UTF-8 encoding, and may contain nul bytes in the middle,
|
24 | 24 | //! i.e. if you look at the bytes that make up the string, there may
|
25 |
| -//! be a `\0` among them. Both `String` and `str` store their length |
| 25 | +//! be a `\0` among them. Both `String` and `str` store their length |
26 | 26 | //! explicitly; there are no nul terminators at the end of strings
|
27 | 27 | //! like in C.
|
28 | 28 | //!
|
29 | 29 | //! C strings are different from Rust strings:
|
30 | 30 | //!
|
31 | 31 | //! * **Encodings** - Rust strings are UTF-8, but C strings may use
|
32 |
| -//! other encodings. If you are using a string from C, you should |
| 32 | +//! other encodings. If you are using a string from C, you should |
33 | 33 | //! check its encoding explicitly, rather than just assuming that it
|
34 | 34 | //! is UTF-8 like you can do in Rust.
|
35 | 35 | //!
|
36 | 36 | //! * **Character size** - C strings may use `char` or `wchar_t`-sized
|
37 | 37 | //! characters; please **note** that C's `char` is different from Rust's.
|
38 | 38 | //! The C standard leaves the actual sizes of those types open to
|
39 | 39 | //! interpretation, but defines different APIs for strings made up of
|
40 |
| -//! each character type. Rust strings are always UTF-8, so different |
| 40 | +//! each character type. Rust strings are always UTF-8, so different |
41 | 41 | //! Unicode characters will be encoded in a variable number of bytes
|
42 |
| -//! each. The Rust type [`char`] represents a '[Unicode scalar |
| 42 | +//! each. The Rust type [`char`] represents a '[Unicode scalar |
43 | 43 | //! value]', which is similar to, but not the same as, a '[Unicode
|
44 | 44 | //! code point]'.
|
45 | 45 | //!
|
46 | 46 | //! * **Nul terminators and implicit string lengths** - Often, C
|
47 | 47 | //! strings are nul-terminated, i.e. they have a `\0` character at the
|
48 |
| -//! end. The length of a string buffer is not stored, but has to be |
| 48 | +//! end. The length of a string buffer is not stored, but has to be |
49 | 49 | //! calculated; to compute the length of a string, C code must
|
50 | 50 | //! manually call a function like `strlen()` for `char`-based strings,
|
51 |
| -//! or `wcslen()` for `wchar_t`-based ones. Those functions return |
| 51 | +//! or `wcslen()` for `wchar_t`-based ones. Those functions return |
52 | 52 | //! the number of characters in the string excluding the nul
|
53 | 53 | //! terminator, so the buffer length is really `len+1` characters.
|
54 | 54 | //! Rust strings don't have a nul terminator; their length is always
|
55 |
| -//! stored and does not need to be calculated. While in Rust |
| 55 | +//! stored and does not need to be calculated. While in Rust |
56 | 56 | //! accessing a string's length is a O(1) operation (becasue the
|
57 | 57 | //! length is stored); in C it is an O(length) operation because the
|
58 | 58 | //! length needs to be computed by scanning the string for the nul
|
|
61 | 61 | //! * **Internal nul characters** - When C strings have a nul
|
62 | 62 | //! terminator character, this usually means that they cannot have nul
|
63 | 63 | //! characters in the middle — a nul character would essentially
|
64 |
| -//! truncate the string. Rust strings *can* have nul characters in |
| 64 | +//! truncate the string. Rust strings *can* have nul characters in |
65 | 65 | //! the middle, because nul does not have to mark the end of the
|
66 | 66 | //! string in Rust.
|
67 | 67 | //!
|
|
80 | 80 | //!
|
81 | 81 | //! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
|
82 | 82 | //! is what you would use to wrap a raw `*const u8` that you got from
|
83 |
| -//! a C function. A `CStr` is guaranteed to be a nul-terminated array |
84 |
| -//! of bytes. Once you have a `CStr`, you can convert it to a Rust |
| 83 | +//! a C function. A `CStr` is guaranteed to be a nul-terminated array |
| 84 | +//! of bytes. Once you have a `CStr`, you can convert it to a Rust |
85 | 85 | //! `&str` if it's valid UTF-8, or lossily convert it by adding
|
86 | 86 | //! replacement characters.
|
87 | 87 | //!
|
88 | 88 | //! [`OsString`] and [`OsStr`] are useful when you need to transfer
|
89 | 89 | //! strings to and from the operating system itself, or when capturing
|
90 |
| -//! the output of external commands. Conversions between `OsString`, |
| 90 | +//! the output of external commands. Conversions between `OsString`, |
91 | 91 | //! `OsStr` and Rust strings work similarly to those for [`CString`]
|
92 | 92 | //! and [`CStr`].
|
93 | 93 | //!
|
94 | 94 | //! * [`OsString`] represents an owned string in whatever
|
95 |
| -//! representation the operating system prefers. In the Rust standard |
| 95 | +//! representation the operating system prefers. In the Rust standard |
96 | 96 | //! library, various APIs that transfer strings to/from the operating
|
97 |
| -//! system use `OsString` instead of plain strings. For example, |
| 97 | +//! system use `OsString` instead of plain strings. For example, |
98 | 98 | //! [`env::var_os()`] is used to query environment variables; it
|
99 |
| -//! returns an `Option<OsString>`. If the environment variable exists |
| 99 | +//! returns an `Option<OsString>`. If the environment variable exists |
100 | 100 | //! you will get a `Some(os_string)`, which you can *then* try to
|
101 |
| -//! convert to a Rust string. This yields a [`Result<>`], so that |
| 101 | +//! convert to a Rust string. This yields a [`Result<>`], so that |
102 | 102 | //! your code can detect errors in case the environment variable did
|
103 | 103 | //! not in fact contain valid Unicode data.
|
104 | 104 | //!
|
105 | 105 | //! * [`OsStr`] represents a borrowed reference to a string in a
|
106 |
| -//! format that can be passed to the operating system. It can be |
| 106 | +//! format that can be passed to the operating system. It can be |
107 | 107 | //! converted into an UTF-8 Rust string slice in a similar way to
|
108 | 108 | //! `OsString`.
|
109 | 109 | //!
|
|
125 | 125 | //!
|
126 | 126 | //! On Windows, [`OsStr`] implements the
|
127 | 127 | //! `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] trait,
|
128 |
| -//! which provides an [`encode_wide`] method. This provides an |
| 128 | +//! which provides an [`encode_wide`] method. This provides an |
129 | 129 | //! iterator that can be [`collect`]ed into a vector of [`u16`].
|
130 | 130 | //!
|
131 | 131 | //! Additionally, on Windows [`OsString`] implements the
|
132 | 132 | //! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt]
|
133 |
| -//! trait, which provides a [`from_wide`] method. The result of this |
| 133 | +//! trait, which provides a [`from_wide`] method. The result of this |
134 | 134 | //! method is an `OsString` which can be round-tripped to a Windows
|
135 | 135 | //! string losslessly.
|
136 | 136 | //!
|
|
0 commit comments