Skip to content

How much should rustc understand the template string? #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fintelia opened this issue Dec 18, 2019 · 5 comments
Open

How much should rustc understand the template string? #6

fintelia opened this issue Dec 18, 2019 · 5 comments

Comments

@fintelia
Copy link

There are a range of options here, some of which have already been ruled out based on feasibility / functionality. Some of these might be possible to add after a MVP, but it is worth considering whether doing so would be a breaking change or if small actions now could ensure they aren't:

  1. No understanding, not even of escape characters + register replacements
  2. Register replacements + escape characters, but no concern of whether replacements are reasonable. {} is allowed inside quoted strings and will directly concatenate register names with adjacent characters.
  3. Commented regions are stripped out, but no other semantic understanding of asm.
  4. A "C-like" preprocessor is run over the code
  5. Rustc and/or clippy do some tokenization to sanity check the string. Only issues that would definitely result in an assembler error are reported.
  6. Deprecated syntax and/or blacklisted assembler directives trigger warnings/errors
  7. Only whitelisted syntax/directives, but no code transformation or semantic understanding of what the asm code or assembler directives actually do.
  8. Simple "psuedo assembler directives" which act as aliases for more complicated ones, use Rust formatted octal literals rather than C-formatted ones, etc.
  9. More substantial syntax level transformations on the format string, without understanding individual instructions
  10. Assembly instructions for each architecture are validated against a whitelist, maybe also validating operands
  11. Inline asm as syntax: the template string is really a DSL compiled by rustc directly to llvm IR / machine code.
@fintelia
Copy link
Author

My personal opinion is that this is likely the time time Rust will ever be able to ban assembly syntax or enforce rules to make it more interpretable, so we should take advantage of it (provided these things aren't too unreasonable to implement on the compiler side). Unbanning things later would always be an option, but I doubt many people would be too upset if things like "leading zero means octal literal" or "# maybe starts a comment except when it doesn't" were no longer around.

@comex
Copy link

comex commented Dec 19, 2019

I favor minimal preprocessing. I think it's easier for users to understand a rule that "outside the format string is Rust syntax, inside the format string is native-assembler syntax", rather than creating some hybrid of the two syntaxes. Admittedly, register replacements force us to have some Rust syntax inside the format string. But I'd rather keep that as essentially a variant of format! string interpolation, where the output string just happens to be in assembler syntax.

@joshtriplett
Copy link
Member

Rust should not attempt to interpret the string at all; the assembler has far more depth than we want to teach Rust on every architecture.

@Lokathor
Copy link

Clarification question: Do you mean other than the in reg and out reg values being formatted into place?

@fintelia
Copy link
Author

fintelia commented Dec 19, 2019

Rust should not attempt to interpret the string at all; the assembler has far more depth than we want to teach Rust on every architecture.

I don't think this follows. Just because the assembler has a ton of depth doesn't mean that rustc can't try to interpret the string at all. The current RFC says that the syntax used is GNU assembler syntax which means that lines starting with a period are assembler directives and should have the same meaning regardless of the architecture. Thus, it shouldn't for instance be an issue for the compiler to figure out which directives are used in an inline assembly statement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants