|
12 | 12 | //! |
13 | 13 | //! This module provides utilities to handle data across non-Rust |
14 | 14 | //! interfaces, like other programming languages and the underlying |
15 | | -//! operating system. It is mainly of use for FFI (Foreign Function |
| 15 | +//! operating system. It is mainly of use for FFI (Foreign Function |
16 | 16 | //! Interface) bindings and code that needs to exchange C-like strings |
17 | 17 | //! with other languages. |
18 | 18 | //! |
19 | 19 | //! # Overview |
20 | 20 | //! |
21 | 21 | //! Rust represents owned strings with the [`String`] type, and |
22 | | -//! borrowed slices of strings with the [`str`] primitive. Both are |
| 22 | +//! borrowed slices of strings with the [`str`] primitive. Both are |
23 | 23 | //! always in UTF-8 encoding, and may contain nul bytes in the middle, |
24 | 24 | //! i.e. if you look at the bytes that make up the string, there may |
25 | | -//! be a `\0` among them. Both `String` and `str` store their length |
| 25 | +//! be a `\0` among them. Both `String` and `str` store their length |
26 | 26 | //! explicitly; there are no nul terminators at the end of strings |
27 | 27 | //! like in C. |
28 | 28 | //! |
29 | 29 | //! C strings are different from Rust strings: |
30 | 30 | //! |
31 | 31 | //! * **Encodings** - Rust strings are UTF-8, but C strings may use |
32 | | -//! other encodings. If you are using a string from C, you should |
| 32 | +//! other encodings. If you are using a string from C, you should |
33 | 33 | //! check its encoding explicitly, rather than just assuming that it |
34 | 34 | //! is UTF-8 like you can do in Rust. |
35 | 35 | //! |
36 | 36 | //! * **Character size** - C strings may use `char` or `wchar_t`-sized |
37 | 37 | //! characters; please **note** that C's `char` is different from Rust's. |
38 | 38 | //! The C standard leaves the actual sizes of those types open to |
39 | 39 | //! interpretation, but defines different APIs for strings made up of |
40 | | -//! each character type. Rust strings are always UTF-8, so different |
| 40 | +//! each character type. Rust strings are always UTF-8, so different |
41 | 41 | //! Unicode characters will be encoded in a variable number of bytes |
42 | | -//! each. The Rust type [`char`] represents a '[Unicode scalar |
| 42 | +//! each. The Rust type [`char`] represents a '[Unicode scalar |
43 | 43 | //! value]', which is similar to, but not the same as, a '[Unicode |
44 | 44 | //! code point]'. |
45 | 45 | //! |
46 | 46 | //! * **Nul terminators and implicit string lengths** - Often, C |
47 | 47 | //! strings are nul-terminated, i.e. they have a `\0` character at the |
48 | | -//! end. The length of a string buffer is not stored, but has to be |
| 48 | +//! end. The length of a string buffer is not stored, but has to be |
49 | 49 | //! calculated; to compute the length of a string, C code must |
50 | 50 | //! manually call a function like `strlen()` for `char`-based strings, |
51 | | -//! or `wcslen()` for `wchar_t`-based ones. Those functions return |
| 51 | +//! or `wcslen()` for `wchar_t`-based ones. Those functions return |
52 | 52 | //! the number of characters in the string excluding the nul |
53 | 53 | //! terminator, so the buffer length is really `len+1` characters. |
54 | 54 | //! Rust strings don't have a nul terminator; their length is always |
55 | | -//! stored and does not need to be calculated. While in Rust |
| 55 | +//! stored and does not need to be calculated. While in Rust |
56 | 56 | //! accessing a string's length is a O(1) operation (becasue the |
57 | 57 | //! length is stored); in C it is an O(length) operation because the |
58 | 58 | //! length needs to be computed by scanning the string for the nul |
|
61 | 61 | //! * **Internal nul characters** - When C strings have a nul |
62 | 62 | //! terminator character, this usually means that they cannot have nul |
63 | 63 | //! characters in the middle — a nul character would essentially |
64 | | -//! truncate the string. Rust strings *can* have nul characters in |
| 64 | +//! truncate the string. Rust strings *can* have nul characters in |
65 | 65 | //! the middle, because nul does not have to mark the end of the |
66 | 66 | //! string in Rust. |
67 | 67 | //! |
|
80 | 80 | //! |
81 | 81 | //! * **From C to Rust:** [`CStr`] represents a borrowed C string; it |
82 | 82 | //! is what you would use to wrap a raw `*const u8` that you got from |
83 | | -//! a C function. A `CStr` is guaranteed to be a nul-terminated array |
84 | | -//! of bytes. Once you have a `CStr`, you can convert it to a Rust |
| 83 | +//! a C function. A `CStr` is guaranteed to be a nul-terminated array |
| 84 | +//! of bytes. Once you have a `CStr`, you can convert it to a Rust |
85 | 85 | //! `&str` if it's valid UTF-8, or lossily convert it by adding |
86 | 86 | //! replacement characters. |
87 | 87 | //! |
88 | 88 | //! [`OsString`] and [`OsStr`] are useful when you need to transfer |
89 | 89 | //! strings to and from the operating system itself, or when capturing |
90 | | -//! the output of external commands. Conversions between `OsString`, |
| 90 | +//! the output of external commands. Conversions between `OsString`, |
91 | 91 | //! `OsStr` and Rust strings work similarly to those for [`CString`] |
92 | 92 | //! and [`CStr`]. |
93 | 93 | //! |
94 | 94 | //! * [`OsString`] represents an owned string in whatever |
95 | | -//! representation the operating system prefers. In the Rust standard |
| 95 | +//! representation the operating system prefers. In the Rust standard |
96 | 96 | //! library, various APIs that transfer strings to/from the operating |
97 | | -//! system use `OsString` instead of plain strings. For example, |
| 97 | +//! system use `OsString` instead of plain strings. For example, |
98 | 98 | //! [`env::var_os()`] is used to query environment variables; it |
99 | | -//! returns an `Option<OsString>`. If the environment variable exists |
| 99 | +//! returns an `Option<OsString>`. If the environment variable exists |
100 | 100 | //! you will get a `Some(os_string)`, which you can *then* try to |
101 | | -//! convert to a Rust string. This yields a [`Result<>`], so that |
| 101 | +//! convert to a Rust string. This yields a [`Result<>`], so that |
102 | 102 | //! your code can detect errors in case the environment variable did |
103 | 103 | //! not in fact contain valid Unicode data. |
104 | 104 | //! |
105 | 105 | //! * [`OsStr`] represents a borrowed reference to a string in a |
106 | | -//! format that can be passed to the operating system. It can be |
| 106 | +//! format that can be passed to the operating system. It can be |
107 | 107 | //! converted into an UTF-8 Rust string slice in a similar way to |
108 | 108 | //! `OsString`. |
109 | 109 | //! |
|
125 | 125 | //! |
126 | 126 | //! On Windows, [`OsStr`] implements the |
127 | 127 | //! `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] trait, |
128 | | -//! which provides an [`encode_wide`] method. This provides an |
| 128 | +//! which provides an [`encode_wide`] method. This provides an |
129 | 129 | //! iterator that can be [`collect`]ed into a vector of [`u16`]. |
130 | 130 | //! |
131 | 131 | //! Additionally, on Windows [`OsString`] implements the |
132 | 132 | //! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] |
133 | | -//! trait, which provides a [`from_wide`] method. The result of this |
| 133 | +//! trait, which provides a [`from_wide`] method. The result of this |
134 | 134 | //! method is an `OsString` which can be round-tripped to a Windows |
135 | 135 | //! string losslessly. |
136 | 136 | //! |
|
0 commit comments