Skip to content

Commit 50505aa

Browse files
Clarify the ffi module's toplevel docs, per @clarcharr's comments
1 parent 9854e83 commit 50505aa

File tree

1 file changed

+65
-54
lines changed

1 file changed

+65
-54
lines changed

src/libstd/ffi/mod.rs

Lines changed: 65 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -8,78 +8,86 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
//! This module provides utilities to handle C-like strings. It is
12-
//! mainly of use for FFI (Foreign Function Interface) bindings and
13-
//! code that needs to exchange C-like strings with other languages.
11+
//! This module provides utilities to handle data across non-Rust
12+
//! interfaces, like other programming languages and the underlying
13+
//! operating system. It is mainly of use for FFI (Foreign Function
14+
//! Interface) bindings and code that needs to exchange C-like strings
15+
//! with other languages.
1416
//!
1517
//! # Overview
1618
//!
1719
//! Rust represents owned strings with the [`String`] type, and
1820
//! borrowed slices of strings with the [`str`] primitive. Both are
1921
//! always in UTF-8 encoding, and may contain nul bytes in the middle,
2022
//! i.e. if you look at the bytes that make up the string, there may
21-
//! be a `0` among them. Both `String` and `str` know their length;
22-
//! there are no nul terminators at the end of strings like in C.
23+
//! be a `\0` among them. Both `String` and `str` store their length
24+
//! explicitly; there are no nul terminators at the end of strings
25+
//! like in C.
2326
//!
2427
//! C strings are different from Rust strings:
2528
//!
26-
//! * **Encodings** - C strings may have different encodings. If
27-
//! you are bringing in strings from C APIs, you should check what
28-
//! encoding you are getting. Rust strings are always UTF-8.
29+
//! * **Encodings** - Rust strings are UTF-8, but C strings may use
30+
//! other encodings. If you are using a string from C, you should
31+
//! check its encoding explicitly, rather than just assuming that it
32+
//! is UTF-8 like you can do in Rust.
2933
//!
30-
//! * **Character width** - C strings may use "normal" or "wide"
31-
//! characters, i.e. `char` or `wchar_t`, respectively. The C
32-
//! standard leaves the actual sizes of those types open to
34+
//! * **Character size** - C strings may use `char` or `wchar_t`-sized
35+
//! characters; please **note** that C's `char` is different from Rust's.
36+
//! The C standard leaves the actual sizes of those types open to
3337
//! interpretation, but defines different APIs for strings made up of
3438
//! each character type. Rust strings are always UTF-8, so different
3539
//! Unicode characters will be encoded in a variable number of bytes
36-
//! each. The Rust type [`char`] represents a '[Unicode
37-
//! scalar value]', which is similar to, but not the same as, a
38-
//! '[Unicode code point]'.
40+
//! each. The Rust type [`char`] represents a '[Unicode scalar
41+
//! value]', which is similar to, but not the same as, a '[Unicode
42+
//! code point]'.
3943
//!
4044
//! * **Nul terminators and implicit string lengths** - Often, C
41-
//! strings are nul-terminated, i.e. they have a `0` character at the
42-
//! end. The length of a string buffer is not known *a priori*;
43-
//! instead, to compute the length of a string, C code must manually
44-
//! call a function like `strlen()` for `char`-based strings, or
45-
//! `wcslen()` for `wchar_t`-based ones. Those functions return the
46-
//! number of characters in the string excluding the nul terminator,
47-
//! so the buffer length is really `len+1` characters. Rust strings
48-
//! don't have a nul terminator, and they always know their length.
49-
//!
50-
//! * **No nul characters in the middle of the string** - When C
51-
//! strings have a nul terminator character, this usually means that
52-
//! they cannot have nul characters in the middle — a nul character
53-
//! would essentially truncate the string. Rust strings *can* have
54-
//! nul characters in the middle, since they don't use nul
55-
//! terminators.
45+
//! strings are nul-terminated, i.e. they have a `\0` character at the
46+
//! end. The length of a string buffer is not stored, but has to be
47+
//! calculated; to compute the length of a string, C code must
48+
//! manually call a function like `strlen()` for `char`-based strings,
49+
//! or `wcslen()` for `wchar_t`-based ones. Those functions return
50+
//! the number of characters in the string excluding the nul
51+
//! terminator, so the buffer length is really `len+1` characters.
52+
//! Rust strings don't have a nul terminator; their length is always
53+
//! stored and does not need to be calculated. While in Rust
54+
//! accessing a string's length is a O(1) operation (becasue the
55+
//! length is stored); in C it is an O(length) operation because the
56+
//! length needs to be computed by scanning the string for the nul
57+
//! terminator.
58+
//!
59+
//! * **Internal nul characters** - When C strings have a nul
60+
//! terminator character, this usually means that they cannot have nul
61+
//! characters in the middle — a nul character would essentially
62+
//! truncate the string. Rust strings *can* have nul characters in
63+
//! the middle, because nul does not have to mark the end of the
64+
//! string in Rust.
5665
//!
5766
//! # Representations of non-Rust strings
5867
//!
5968
//! [`CString`] and [`CStr`] are useful when you need to transfer
60-
//! UTF-8 strings to and from C, respectively:
69+
//! UTF-8 strings to and from languages with a C ABI, like Python.
6170
//!
6271
//! * **From Rust to C:** [`CString`] represents an owned, C-friendly
63-
//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no
64-
//! nul characters in the middle. Rust code can create a `CString`
65-
//! out of a normal string (provided that the string doesn't have nul
66-
//! characters in the middle), and then use a variety of methods to
67-
//! obtain a raw `*mut u8` that can then be passed as an argument to C
68-
//! functions.
72+
//! string: it is nul-terminated, and has no internal nul characters.
73+
//! Rust code can create a `CString` out of a normal string (provided
74+
//! that the string doesn't have nul characters in the middle), and
75+
//! then use a variety of methods to obtain a raw `*mut u8` that can
76+
//! then be passed as an argument to functions which use the C
77+
//! conventions for strings.
6978
//!
7079
//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
7180
//! is what you would use to wrap a raw `*const u8` that you got from
72-
//! a C function. A `CStr` is just guaranteed to be a nul-terminated
73-
//! array of bytes; the UTF-8 validation step only happens when you
74-
//! request to convert it to a `&str`.
81+
//! a C function. A `CStr` is guaranteed to be a nul-terminated array
82+
//! of bytes. Once you have a `CStr`, you can convert it to a Rust
83+
//! `&str` if it's valid UTF-8, or lossily convert it by adding
84+
//! replacement characters.
7585
//!
7686
//! [`OsString`] and [`OsStr`] are useful when you need to transfer
77-
//! strings to and from operating system calls. If you need Rust
78-
//! strings out of them, they can take care of conversion to and from
79-
//! the operating system's preferred form for strings — of course, it
80-
//! may not be possible to convert all valid operating system strings
81-
//! into valid UTF-8; the `OsString` and `OsStr` functions let you know
82-
//! when this is the case.
87+
//! strings to and from the operating system itself, or when capturing
88+
//! the output of external commands. Conversions between `OsString`,
89+
//! `OsStr` and Rust strings work similarly to those for [`CString`]
90+
//! and [`CStr`].
8391
//!
8492
//! * [`OsString`] represents an owned string in whatever
8593
//! representation the operating system prefers. In the Rust standard
@@ -101,9 +109,10 @@
101109
//!
102110
//! ## On Unix
103111
//!
104-
//! On Unix, [`OsStr`] implements the `std::os::unix:ffi::`[`OsStrExt`][unix.OsStrExt] trait, which
105-
//! augments it with two methods, [`from_bytes`] and [`as_bytes`]. These do inexpensive conversions
106-
//! from and to UTF-8 byte slices.
112+
//! On Unix, [`OsStr`] implements the
113+
//! `std::os::unix:ffi::`[`OsStrExt`][unix.OsStrExt] trait, which
114+
//! augments it with two methods, [`from_bytes`] and [`as_bytes`].
115+
//! These do inexpensive conversions from and to UTF-8 byte slices.
107116
//!
108117
//! Additionally, on Unix [`OsString`] implements the
109118
//! `std::os::unix:ffi::`[`OsStringExt`][unix.OsStringExt] trait,
@@ -112,14 +121,16 @@
112121
//!
113122
//! ## On Windows
114123
//!
115-
//! On Windows, [`OsStr`] implements the `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt]
116-
//! trait, which provides an [`encode_wide`] method. This provides an iterator that can be
117-
//! [`collect`]ed into a vector of [`u16`].
124+
//! On Windows, [`OsStr`] implements the
125+
//! `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] trait,
126+
//! which provides an [`encode_wide`] method. This provides an
127+
//! iterator that can be [`collect`]ed into a vector of [`u16`].
118128
//!
119129
//! Additionally, on Windows [`OsString`] implements the
120-
//! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] trait, which provides a
121-
//! [`from_wide`] method. The result of this method is an `OsString` which can be round-tripped to
122-
//! a Windows string losslessly.
130+
//! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt]
131+
//! trait, which provides a [`from_wide`] method. The result of this
132+
//! method is an `OsString` which can be round-tripped to a Windows
133+
//! string losslessly.
123134
//!
124135
//! [`String`]: ../string/struct.String.html
125136
//! [`str`]: ../primitive.str.html

0 commit comments

Comments
 (0)