|
8 | 8 | // option. This file may not be copied, modified, or distributed
|
9 | 9 | // except according to those terms.
|
10 | 10 |
|
11 |
| -//! This module provides utilities to handle C-like strings. It is |
12 |
| -//! mainly of use for FFI (Foreign Function Interface) bindings and |
13 |
| -//! code that needs to exchange C-like strings with other languages. |
| 11 | +//! This module provides utilities to handle data across non-Rust |
| 12 | +//! interfaces, like other programming languages and the underlying |
| 13 | +//! operating system. It is mainly of use for FFI (Foreign Function |
| 14 | +//! Interface) bindings and code that needs to exchange C-like strings |
| 15 | +//! with other languages. |
14 | 16 | //!
|
15 | 17 | //! # Overview
|
16 | 18 | //!
|
17 | 19 | //! Rust represents owned strings with the [`String`] type, and
|
18 | 20 | //! borrowed slices of strings with the [`str`] primitive. Both are
|
19 | 21 | //! always in UTF-8 encoding, and may contain nul bytes in the middle,
|
20 | 22 | //! i.e. if you look at the bytes that make up the string, there may
|
21 |
| -//! be a `0` among them. Both `String` and `str` know their length; |
22 |
| -//! there are no nul terminators at the end of strings like in C. |
| 23 | +//! be a `\0` among them. Both `String` and `str` store their length |
| 24 | +//! explicitly; there are no nul terminators at the end of strings |
| 25 | +//! like in C. |
23 | 26 | //!
|
24 | 27 | //! C strings are different from Rust strings:
|
25 | 28 | //!
|
26 |
| -//! * **Encodings** - C strings may have different encodings. If |
27 |
| -//! you are bringing in strings from C APIs, you should check what |
28 |
| -//! encoding you are getting. Rust strings are always UTF-8. |
| 29 | +//! * **Encodings** - Rust strings are UTF-8, but C strings may use |
| 30 | +//! other encodings. If you are using a string from C, you should |
| 31 | +//! check its encoding explicitly, rather than just assuming that it |
| 32 | +//! is UTF-8 like you can do in Rust. |
29 | 33 | //!
|
30 |
| -//! * **Character width** - C strings may use "normal" or "wide" |
31 |
| -//! characters, i.e. `char` or `wchar_t`, respectively. The C |
32 |
| -//! standard leaves the actual sizes of those types open to |
| 34 | +//! * **Character size** - C strings may use `char` or `wchar_t`-sized |
| 35 | +//! characters; please **note** that C's `char` is different from Rust's. |
| 36 | +//! The C standard leaves the actual sizes of those types open to |
33 | 37 | //! interpretation, but defines different APIs for strings made up of
|
34 | 38 | //! each character type. Rust strings are always UTF-8, so different
|
35 | 39 | //! Unicode characters will be encoded in a variable number of bytes
|
36 |
| -//! each. The Rust type [`char`] represents a '[Unicode |
37 |
| -//! scalar value]', which is similar to, but not the same as, a |
38 |
| -//! '[Unicode code point]'. |
| 40 | +//! each. The Rust type [`char`] represents a '[Unicode scalar |
| 41 | +//! value]', which is similar to, but not the same as, a '[Unicode |
| 42 | +//! code point]'. |
39 | 43 | //!
|
40 | 44 | //! * **Nul terminators and implicit string lengths** - Often, C
|
41 |
| -//! strings are nul-terminated, i.e. they have a `0` character at the |
42 |
| -//! end. The length of a string buffer is not known *a priori*; |
43 |
| -//! instead, to compute the length of a string, C code must manually |
44 |
| -//! call a function like `strlen()` for `char`-based strings, or |
45 |
| -//! `wcslen()` for `wchar_t`-based ones. Those functions return the |
46 |
| -//! number of characters in the string excluding the nul terminator, |
47 |
| -//! so the buffer length is really `len+1` characters. Rust strings |
48 |
| -//! don't have a nul terminator, and they always know their length. |
49 |
| -//! |
50 |
| -//! * **No nul characters in the middle of the string** - When C |
51 |
| -//! strings have a nul terminator character, this usually means that |
52 |
| -//! they cannot have nul characters in the middle — a nul character |
53 |
| -//! would essentially truncate the string. Rust strings *can* have |
54 |
| -//! nul characters in the middle, since they don't use nul |
55 |
| -//! terminators. |
| 45 | +//! strings are nul-terminated, i.e. they have a `\0` character at the |
| 46 | +//! end. The length of a string buffer is not stored, but has to be |
| 47 | +//! calculated; to compute the length of a string, C code must |
| 48 | +//! manually call a function like `strlen()` for `char`-based strings, |
| 49 | +//! or `wcslen()` for `wchar_t`-based ones. Those functions return |
| 50 | +//! the number of characters in the string excluding the nul |
| 51 | +//! terminator, so the buffer length is really `len+1` characters. |
| 52 | +//! Rust strings don't have a nul terminator; their length is always |
| 53 | +//! stored and does not need to be calculated. While in Rust |
| 54 | +//! accessing a string's length is a O(1) operation (becasue the |
| 55 | +//! length is stored); in C it is an O(length) operation because the |
| 56 | +//! length needs to be computed by scanning the string for the nul |
| 57 | +//! terminator. |
| 58 | +//! |
| 59 | +//! * **Internal nul characters** - When C strings have a nul |
| 60 | +//! terminator character, this usually means that they cannot have nul |
| 61 | +//! characters in the middle — a nul character would essentially |
| 62 | +//! truncate the string. Rust strings *can* have nul characters in |
| 63 | +//! the middle, because nul does not have to mark the end of the |
| 64 | +//! string in Rust. |
56 | 65 | //!
|
57 | 66 | //! # Representations of non-Rust strings
|
58 | 67 | //!
|
59 | 68 | //! [`CString`] and [`CStr`] are useful when you need to transfer
|
60 |
| -//! UTF-8 strings to and from C, respectively: |
| 69 | +//! UTF-8 strings to and from languages with a C ABI, like Python. |
61 | 70 | //!
|
62 | 71 | //! * **From Rust to C:** [`CString`] represents an owned, C-friendly
|
63 |
| -//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no |
64 |
| -//! nul characters in the middle. Rust code can create a `CString` |
65 |
| -//! out of a normal string (provided that the string doesn't have nul |
66 |
| -//! characters in the middle), and then use a variety of methods to |
67 |
| -//! obtain a raw `*mut u8` that can then be passed as an argument to C |
68 |
| -//! functions. |
| 72 | +//! string: it is nul-terminated, and has no internal nul characters. |
| 73 | +//! Rust code can create a `CString` out of a normal string (provided |
| 74 | +//! that the string doesn't have nul characters in the middle), and |
| 75 | +//! then use a variety of methods to obtain a raw `*mut u8` that can |
| 76 | +//! then be passed as an argument to functions which use the C |
| 77 | +//! conventions for strings. |
69 | 78 | //!
|
70 | 79 | //! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
|
71 | 80 | //! is what you would use to wrap a raw `*const u8` that you got from
|
72 |
| -//! a C function. A `CStr` is just guaranteed to be a nul-terminated |
73 |
| -//! array of bytes; the UTF-8 validation step only happens when you |
74 |
| -//! request to convert it to a `&str`. |
| 81 | +//! a C function. A `CStr` is guaranteed to be a nul-terminated array |
| 82 | +//! of bytes. Once you have a `CStr`, you can convert it to a Rust |
| 83 | +//! `&str` if it's valid UTF-8, or lossily convert it by adding |
| 84 | +//! replacement characters. |
75 | 85 | //!
|
76 | 86 | //! [`OsString`] and [`OsStr`] are useful when you need to transfer
|
77 |
| -//! strings to and from operating system calls. If you need Rust |
78 |
| -//! strings out of them, they can take care of conversion to and from |
79 |
| -//! the operating system's preferred form for strings — of course, it |
80 |
| -//! may not be possible to convert all valid operating system strings |
81 |
| -//! into valid UTF-8; the `OsString` and `OsStr` functions let you know |
82 |
| -//! when this is the case. |
| 87 | +//! strings to and from the operating system itself, or when capturing |
| 88 | +//! the output of external commands. Conversions between `OsString`, |
| 89 | +//! `OsStr` and Rust strings work similarly to those for [`CString`] |
| 90 | +//! and [`CStr`]. |
83 | 91 | //!
|
84 | 92 | //! * [`OsString`] represents an owned string in whatever
|
85 | 93 | //! representation the operating system prefers. In the Rust standard
|
|
101 | 109 | //!
|
102 | 110 | //! ## On Unix
|
103 | 111 | //!
|
104 |
| -//! On Unix, [`OsStr`] implements the `std::os::unix:ffi::`[`OsStrExt`][unix.OsStrExt] trait, which |
105 |
| -//! augments it with two methods, [`from_bytes`] and [`as_bytes`]. These do inexpensive conversions |
106 |
| -//! from and to UTF-8 byte slices. |
| 112 | +//! On Unix, [`OsStr`] implements the |
| 113 | +//! `std::os::unix:ffi::`[`OsStrExt`][unix.OsStrExt] trait, which |
| 114 | +//! augments it with two methods, [`from_bytes`] and [`as_bytes`]. |
| 115 | +//! These do inexpensive conversions from and to UTF-8 byte slices. |
107 | 116 | //!
|
108 | 117 | //! Additionally, on Unix [`OsString`] implements the
|
109 | 118 | //! `std::os::unix:ffi::`[`OsStringExt`][unix.OsStringExt] trait,
|
|
112 | 121 | //!
|
113 | 122 | //! ## On Windows
|
114 | 123 | //!
|
115 |
| -//! On Windows, [`OsStr`] implements the `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] |
116 |
| -//! trait, which provides an [`encode_wide`] method. This provides an iterator that can be |
117 |
| -//! [`collect`]ed into a vector of [`u16`]. |
| 124 | +//! On Windows, [`OsStr`] implements the |
| 125 | +//! `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] trait, |
| 126 | +//! which provides an [`encode_wide`] method. This provides an |
| 127 | +//! iterator that can be [`collect`]ed into a vector of [`u16`]. |
118 | 128 | //!
|
119 | 129 | //! Additionally, on Windows [`OsString`] implements the
|
120 |
| -//! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] trait, which provides a |
121 |
| -//! [`from_wide`] method. The result of this method is an `OsString` which can be round-tripped to |
122 |
| -//! a Windows string losslessly. |
| 130 | +//! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] |
| 131 | +//! trait, which provides a [`from_wide`] method. The result of this |
| 132 | +//! method is an `OsString` which can be round-tripped to a Windows |
| 133 | +//! string losslessly. |
123 | 134 | //!
|
124 | 135 | //! [`String`]: ../string/struct.String.html
|
125 | 136 | //! [`str`]: ../primitive.str.html
|
|
0 commit comments