Skip to content

Commit 9329dc4

Browse files
authored
Merge pull request #1457 from mattheww/2024-01_c_string_literal_expr
C string literal expressions
2 parents a0b1195 + deac889 commit 9329dc4

File tree

2 files changed

+50
-3
lines changed

2 files changed

+50
-3
lines changed

src/expressions/literal-expr.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,48 @@ b"\\x52"; br"\x52"; // \x52
252252

253253
A C string literal expression consists of a single [C_STRING_LITERAL] or [RAW_C_STRING_LITERAL] token.
254254

255-
> **Note**: This section is incomplete.
255+
The expression's type is a shared reference (with `static` lifetime) to the standard library [CStr] type.
256+
That is, the type is `&'static core::ffi::CStr`.
257+
258+
The token must not have a suffix.
259+
260+
The token's _literal content_ is the sequence of characters following the first `"` and preceding the last `"` in the string representation of the token.
261+
262+
The literal expression's _represented bytes_ are a sequence of bytes derived from the literal content as follows:
263+
264+
* If the token is a [C_STRING_LITERAL], the literal content is treated as a sequence of items, each of which is either a single Unicode character other than `\` or an [escape].
265+
The sequence of items is converted to a sequence of bytes as follows:
266+
* Each single Unicode character contributes its UTF-8 representation.
267+
* Each [simple escape] contributes the [Unicode scalar value] of its escaped value.
268+
* Each [8-bit escape] contributes a single byte containing the [Unicode scalar value] of its escaped value.
269+
* Each [unicode escape] contributes the UTF-8 representation of its escaped value.
270+
* Each [string continuation escape] contributes no bytes.
271+
272+
* If the token is a [RAW_C_STRING_LITERAL], the represented bytes are the UTF-8 encoding of the literal content.
273+
274+
> **Note**: the permitted forms of [C_STRING_LITERAL] and [RAW_C_STRING_LITERAL] tokens ensure that the represented bytes never include a null byte.
275+
276+
The expression's value is a reference to a statically allocated [CStr] whose array of bytes contains the represented bytes followed by a null byte.
277+
278+
Examples of C string literal expressions:
279+
280+
```rust
281+
c"foo"; cr"foo"; // foo
282+
c"\"foo\""; cr#""foo""#; // "foo"
283+
284+
c"foo #\"# bar";
285+
cr##"foo #"# bar"##; // foo #"# bar
286+
287+
c"\x52"; c"R"; cr"R"; // R
288+
c"\\x52"; cr"\x52"; // \x52
289+
290+
c"æ"; // LATIN SMALL LETTER AE (U+00E6)
291+
c"\u{00E6}"; // LATIN SMALL LETTER AE (U+00E6)
292+
c"\xC3\xA6"; // LATIN SMALL LETTER AE (U+00E6)
293+
294+
c"\xE6".to_bytes(); // [230]
295+
c"\u{00E6}".to_bytes(); // [195, 166]
296+
```
256297

257298
## Integer literal expressions
258299

@@ -365,13 +406,20 @@ The expression's type is the primitive [boolean type], and its value is:
365406
* false if the keyword is `false`
366407

367408

409+
[Escape]: #escapes
410+
[Simple escape]: #simple-escapes
368411
[Simple escapes]: #simple-escapes
412+
[8-bit escape]: #8-bit-escapes
369413
[8-bit escapes]: #8-bit-escapes
414+
[7-bit escape]: #7-bit-escapes
370415
[7-bit escapes]: #7-bit-escapes
416+
[Unicode escape]: #unicode-escapes
371417
[Unicode escapes]: #unicode-escapes
418+
[String continuation escape]: #string-continuation-escapes
372419
[String continuation escapes]: #string-continuation-escapes
373420
[boolean type]: ../types/boolean.md
374421
[constant expression]: ../const_eval.md#constant-expressions
422+
[CStr]: ../../core/ffi/struct.CStr.html
375423
[floating-point types]: ../types/numeric.md#floating-point-types
376424
[lint check]: ../attributes/diagnostics.md#lint-check-attributes
377425
[literal tokens]: ../tokens.md#literals

src/tokens.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -330,8 +330,7 @@ A _C string literal_ is a sequence of Unicode characters and _escapes_,
330330
preceded by the characters `U+0063` (`c`) and `U+0022` (double-quote), and
331331
followed by the character `U+0022`. If the character `U+0022` is present within
332332
the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
333-
Alternatively, a C string literal can be a _raw C string literal_, defined
334-
below. The type of a C string literal is [`&core::ffi::CStr`][CStr].
333+
Alternatively, a C string literal can be a _raw C string literal_, defined below.
335334

336335
[CStr]: ../core/ffi/struct.CStr.html
337336

0 commit comments

Comments
 (0)