-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Propose code string literals #3450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
cc7e7c8
a48ef56
f889c3e
47a4b6c
827caa0
ec487c7
a00a4a9
90ff817
ceca328
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,268 @@ | ||
- Feature Name: code_literals | ||
- Start Date: 2023-06-18 | ||
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Add a new kind of multi-line string literal for embedding code which | ||
plays nicely with `rustfmt`. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
- Embedding code as a literal string within a Rust program is often | ||
necessary. A prominent example is the `sqlx` crate, which | ||
has the user write SQL queries as string literals within the program. | ||
- Rust already supports several kinds of multi-line string literal, | ||
but none of them are well suited for embedding code. | ||
|
||
1. Normal string literals, eg. `"a string literal"`. These can be | ||
written over multiple lines, but require special characters | ||
to be escaped. Whitespace is significant within the literal, | ||
which means that `rustfmt` cannot fix the indentation of the | ||
code block. For example, beginning with this code: | ||
|
||
```rust | ||
if some_condition { | ||
do_something_with( | ||
" | ||
a nicely | ||
indented code | ||
string | ||
" | ||
); | ||
} | ||
``` | ||
|
||
If the indentation is changed, such as by removing the | ||
conditional, then `rustfmt` must re-format the code like so: | ||
|
||
```rust | ||
do_something_with( | ||
" | ||
a nicely | ||
indented code | ||
string | ||
" | ||
); | ||
``` | ||
|
||
To do otherwise would be to change thange the value of | ||
the string literal. | ||
|
||
2. Normal string literals with backslash escaping, eg. | ||
```rust | ||
" | ||
this way\ | ||
whitespace at\ | ||
the beginning\ | ||
of lines can\ | ||
be ignored\ | ||
" | ||
``` | ||
|
||
This approach still suffers from the need to escape special | ||
characters. The backslashes at the end of every line are | ||
tedious to write, and are problematic if whitespace is | ||
meaningful within the code. For example, if python code | ||
was being embedded, then the indentation would be lost. | ||
Finally, although `rustfmt` could in principle reformat | ||
these strings, in practice doing so in a reasonable way | ||
is complicated and so this has never been enabled by default. | ||
|
||
3. Raw string literals, eg. `r#"I can use "s!"#` | ||
|
||
This solves the problem of special characters, but suffers | ||
from the same inability to be reformatted, and the trick | ||
of using an `\` at the end of each line cannot be applied | ||
because escape characters are not recognised. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
In addition to string literals and raw string literals, a third type | ||
of string literal exists: code string literals. | ||
|
||
```rust | ||
let code = ``` | ||
This is a code string literal | ||
|
||
I can use special characters like "" and \ freely. | ||
|
||
Indentation is preserved *relative* to the indentation level | ||
of the first line. | ||
|
||
It is an error for a line to have "negative" indentation (ie. be | ||
indented less than the indentation of the opening backticks) unless | ||
the line is empty. | ||
```; | ||
``` | ||
|
||
`rustfmt` will automatically adjust the indentation of the code string | ||
literal as a whole to match the surrounding context, but will never | ||
change the relative indentation within such a literal. | ||
|
||
Anything directly after the opening backticks is not considered | ||
part of the string literal. It may be used as a language hint or | ||
processed by macros (similar to the treatment of doc comments). | ||
|
||
```rust | ||
let sql = ```sql | ||
SELECT * FROM table; | ||
```; | ||
``` | ||
|
||
Similar to raw string literals, there is no way to escape characters | ||
within a code string literal. It is expected that procedural macros | ||
would build upon code string literals to add support for such | ||
functionality as required. | ||
|
||
If it is necessary to include triple backticks within a code string | ||
literal, more than three backticks may be used to enclose the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I'm torn here. Using the same thing as in doccomments makes sense, but at the same time when we already have "use more |
||
literal, eg. | ||
|
||
```rust | ||
let code = ```` | ||
``` | ||
````; | ||
``` | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
A code string literal will begin and end with three or more backticks. | ||
The number of backticks in the terminator must match the number used | ||
to begin the literal. | ||
|
||
The value of the string literal will be determined using the following | ||
steps: | ||
|
||
1. Start from the first newline after the opening backticks. | ||
2. Take the string exactly as written until the closing backticks. | ||
3. Remove equal numbers of spaces or tabs from every non-empty line | ||
until the first character of the first non-empty line is neither | ||
a space nor a tab, or until every line is empty. | ||
Raise a compile error if this could not be done | ||
due to a "negative" indent or inconsistent whitespace (eg. if | ||
some lines are indented using tabs and some using spaces). | ||
|
||
Here are some edge case examples: | ||
|
||
```rust | ||
// Empty string | ||
assert_eq!(```foo | ||
```, ""); | ||
|
||
// Newline | ||
assert_eq!(``` | ||
|
||
```, "\n"); | ||
|
||
// No terminating newline | ||
assert_eq!(``` | ||
bar```, "bar"); | ||
|
||
// Terminating newline | ||
assert_eq!(``` | ||
bar | ||
```, "bar\n"); | ||
|
||
// Preserved indent | ||
assert_eq!(``` | ||
if a: | ||
print(42) | ||
```, "if a:\n print(42)\n"); | ||
|
||
// Relative indent | ||
assert_eq!(``` | ||
if a: | ||
print(42) | ||
```, "if a:\n print(42)\n"); | ||
|
||
// Relative to first non-empty line | ||
assert_eq!(``` | ||
|
||
|
||
if a: | ||
print(42) | ||
```, "\n\nif a:\n print(42)\n"); | ||
``` | ||
|
||
The text between the opening backticks and the first newline is | ||
preserved within the AST, but is otherwise unused. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
The main drawback is increased complexity of the language: | ||
|
||
1. It adds a new symbol to the language, which was not previously used. | ||
2. It adds a third way of writing string literals. | ||
|
||
# Rationale and alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another alternative I have mentioned on zulip: Improve Pros:
Cons:
Since this is motivated by making things easier for rustfmt I recommend contacting the maintainers of other tools (syntax highlighters, editors, IDEs, ...) to see if this change helps or adds complexity for them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't consider this an alternative. Requiring powerful editor support to even use the feature makes it a no-go, and having to store things in separate files is a maintenance burden that's worse than the current situation, since it requires coming up with a naming scheme for those files that makes sense, makes it harder to resolve merge conflicts since tools like The advantages you list I also consider to be problems with your approach. You say it works better with simple tools, but the opposite is true: you end up with something unworkable without powerful editor features. In contrast this RFC doesn't require any editor features at all to be an improvement over the status quo. Any support for nested language is an optional extra that doesn't affect the core functionality. Your example of "stacking complexity" seems very straightforward tbh. Infinitely better than having to go to a spearate file.
It by definition does not add any complexity for tools other than rustfmt, since the only required change as a result of this RFC is allowing a new prefix letter ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Anything that adds syntax complicates There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How? Even many simple editors at least have tabs, panes or similar UI elements to view more than one file at a time. At its most primitive you rely on your window manager and file browser to open multiple files at the same time in separate windows and show them side by side.
A simple editor can have primitive syntax-highlighting that will work with separate files based on file extensions but won't work with inlined content. So this RFC makes things worse for simple editors
I don't see how it would make things more difficult for git? If anything it makes diffs simple due to fewer whitespace adjustments.
Where did I said that a powerful editor would be required? Rather I'm suggesting This covers both.
What is straight-forward about it? If you actually have to edit, indent, copy-paste, syntax-highlight or auto-complete that there are lots of pitfalls. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think improving support for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Separation of languages is the norm and should be encouraged. See the HTML/CSS/JS split that is encouraged instead of having inline script handlers and styles. See template files. See module trees. You say my approach is a no-go because it makes things more difficult for simple editors. And yet you acknowledge that this RFC will primarily benefit complex editors. While I think my approach would benefit simple editors because they can then work with the outlined language.
I assume they'd conventionally still be placed in the same directory and show up in the diffs next to each other.
Not necessarily. E.g. when you have an SQL query There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My expectation is that this RFC will not effect complex editors (in cases where they are not acting as simple editors). A complex editor that is using heuristics to determine when to apply other-language syntax highlighting to a literal could similarly use those heuristics to determine when to apply other-language auto-formatting to a literal. This RFC simply provides support for auto-indentation (but not formatting) of literals for simple editors (and complex editors where their heuristics don't apply) that use rustfmt. EDIT: actually, I forgot that this RFC also included language hints, which would allow a very strong hint to the complex editor heuristics of what other-language to treat a literal as, but it also likely allows editors in between simple and complex to use very simple heuristics and start multi-language highlighting where they couldn't previously. EDIT2: To clarify some of my categorical assumptions to make sure there's no misunderstanding:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is not my experience. I've almost never seen sql queries pulled out into separate files. Most assembly I've seen is inline. Shader languages are a bit of a mix, and I don't have as much familiarity with it, but I don't think it is at all unusual to include shader code inline, especially if it is small. And this feature would be very useful for help text for cli programs. I can't imagine using a separate file for the help comment for every option in my cli that uses clap.
But we also have frameworks like react, where html and css are embedded in Javascript. Or svelte where the JS is included in an html template. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shader languages are the one case I can think of where people actually care about "separation of languages", and then it has to do more with the fact that GPU code inherently has a modularity to it, because it is run in passes, and people tend to pull out modules into, well, modules. So you may as well have, e.g.
But ofc you may well just encounter something like
Depending. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Has syntax highlighting.
Yes, and I have encountered issues with that kind of multi-language, framework-specific file formats that makes me prefer separate files. Simple editors just didn't support it at all or mistook it as only one of the languages, complex editors had configuration issues because they picked up the wrong preprocessor version or something which led to lots of bogus squiggles in those files while vanilla JS files had no issues.
https://github.com/xiph/rav1e/tree/master/src/arm Though none of that needs to be |
||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
There is lots of room to bike-shed syntax. | ||
If there is significant opposition to the backtick syntax, then an | ||
alternative syntax such as: | ||
``` | ||
code" | ||
string | ||
" | ||
``` | ||
could be used. | ||
|
||
Similarly, the use of more than three backticks may be unpopular. | ||
It's not clear how important it is to be able to nest backticks | ||
within backticks, but a syntax mirroring raw string literals could | ||
be used instead, eg. | ||
``` | ||
`# foo | ||
string | ||
#` | ||
``` | ||
|
||
There is also the question of whether the backtick syntax would | ||
interfere with the ability to paste Rust code snippets into such | ||
blocks. Experimentally, markdown parsers do not seem to have any | ||
problems with this (as demonstrated in this document). | ||
|
||
# Prior art | ||
scottmcm marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Python's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
[prior-art]: #prior-art | ||
|
||
The proposed syntax is primarily based on markdown code block syntax, | ||
which is widely used and should be familiar to most programmers. | ||
|
||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
- None | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
- Macro authors could perform further processing | ||
on code string literals. These macros could add support for string | ||
interpolation, escaping, etc. without needing to further complicate | ||
the language itself. | ||
|
||
- Procedural macros could look at the text following the opening triple | ||
quotes and use that to influence code generation, eg. | ||
|
||
```rust | ||
query!(```postgresql | ||
<query> | ||
```) | ||
``` | ||
|
||
could parse the query in a PostgreSQL specific way. | ||
|
||
- Code literals could be used by crates like `html-macro` | ||
or `quote` to provide better surface syntax and faster | ||
compilation. | ||
|
||
- Code literals could be used with the `asm!` macro to avoid | ||
needing a new string on every line. |
This comment was marked as resolved.
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.