Skip to content

Add unstable frontmatter support #137193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

epage
Copy link
Contributor

@epage epage commented Feb 17, 2025

This is an implementation for #136889

This strips the frontmatter, like shebang, because nothing within rustc will be using this. rustfmt folks have said this should be good enough for now as rustfmt should successfully work with this (ie not remove it) even though proper formatting would require AST support. I had considered making this part of lexing of tokens but that led to problems with ambiguous tokens. See also zulip.

This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.

This includes supporting the syntax in

  • the parser
  • pretty printing
  • rustfmt (nop, just needed a test)

This leaves to the future

  • r-a support (called out in the tracking issue)

@rustbot
Copy link
Collaborator

rustbot commented Feb 17, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@epage epage changed the title Add u Add unstable frontmatter support Feb 17, 2025
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 17, 2025
@rust-log-analyzer

This comment has been minimized.

@ytmimi
Copy link
Contributor

ytmimi commented Feb 18, 2025

Might be worth adding a test case to src/tools/rustfmt/tests/target just to show that rustfmt won't strip the frontmatter.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Feb 18, 2025

Some changes occurred in src/tools/rustfmt

cc @rust-lang/rustfmt

@fmease fmease added the F-frontmatter `#![feature(frontmatter)]` label Feb 19, 2025
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2025
@epage epage mentioned this pull request Feb 20, 2025
12 tasks
@epage epage force-pushed the frontmatter branch 3 times, most recently from ed5aad9 to 64cecab Compare February 20, 2025 16:03
@fee1-dead fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2025
return None;
}
let (fence_pattern, rest) = rest.split_at(fence_end);
let (info, rest) = rest.split_once("\n").unwrap_or((rest, ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a frontmatter like ---cargo--- going to be parsed? or ---cargo ---?

It looks like they aren't considered to be valid. Consider making that a bit clearer (maybe here? to say that if there is no newline it will be invalid) and add tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything after the 3+ - is the infostring, including any dashes that appear after something else. We check if its an identifier on the following lines which will reject that. We have a test case for invalid identifiers.

Or is the concern more with

strip_frontmatter("---")

(ie a test case where there is no newline after the opening)

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 27, 2025
@fee1-dead fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 28, 2025
@bors
Copy link
Collaborator

bors commented Mar 4, 2025

☔ The latest upstream changes (presumably #137927) made this pull request unmergeable. Please resolve the merge conflicts.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 1, 2025

Ok, so wrt code review. I would like it to be implemented in the lexer (rustc_lexer::Cursor::advance_token). It's impossible for --- to occur anywhere in Rust source, so we can basically parse it similar to /*, even if we later (in parsing) reject it anywhere but at the beginning of a file based module (so not in inline modules).

I think that would simplify a few things as you

  • can reuse the lexing functions to process the bytes instead of having to manually handle spans and such
  • get diagnostics about accidentally adding things in front of the frontmatter (which the playground likes to do even for feature gates, which also have to be before everything else)
  • don't need to rewrite it when rustc/rustfmt are supposed to handle this in the future

@epage
Copy link
Contributor Author

epage commented Apr 1, 2025

But look, I already offered to implement it, so unless you want to actually write the code that are deemed important by at least two compiler maintainers now, I'm always up for doing the work.

If you are willing to implement syntax support for frontmatters in rustc, I'm fine with you taking this over. I must have missed that offer. I only remember you offering to implement the byte stripping though I lost track of that through the conversation

Ok, so wrt code review. I would like it to be implemented in the lexer (rustc_lexer::Cursor::advance_token). It's impossible for --- to occur anywhere in Rust source, so we can basically parse it similar to /*, even if we later (in parsing) reject it anywhere but at the beginning of a file based module (so not in inline modules).

I think I might be missing some of what you are intending because it appears you can have --- in Rust source (playground).

What would be the recommended Token break down for this? Rustfmt would eventually need to track open, info string, body, close, and maybe any of the whitespace between any of that. Some options I see:

  • Be like TokenKind::BlockComment and they need to re-parse the content inside (which ends up not too different from this PR)
  • Like TokenKind::Literal, we could have offsets all inside one token though that could get unwieldy and bloat the size of TokenKind which has the potential to negatively impact performance.
  • As multiple tokens, we'd need to implement more of a state machine inside of Cursor to know to specially handle the inside content

I switched to this PR's approach at the recommendation of others when working through these kinds of problems

@wesleywiser
Copy link
Member

wesleywiser commented Apr 1, 2025

There seem to be many conflicting requirements that led to this confusing situation that we're trying to understand and you're trying to explain to us.

This list doesn't seem quite right to me but perhaps I'm missing some of the details here. After reading the RFC, my understanding is that:

  • Rust (the language) now supports a single frontmatter that may optionally appear after the shebang line. This has no effect whatsoever on the compiler and should be entirely ignored except for the purpose of lexing/parsing to find the closing delimiter.
  • Cargo supports reading this frontmatter section and interpreting it.

The RFC says:

As cargo will be the first step in the process to parse this, the responsibility for high quality error messages will largely fall on cargo.

I agree that is true of errors originating from the content of the frontmatter block, but the block itself is a syntactic element of the language now and should be properly treated for the purposes of error reporting just like block comments are. We will want to have proper error reporting for things like unclosed frontmatter sections even if Cargo explicitly told rustc where the file content should start because a user could always place one later in the file and we want to give them a useful error message.

Therefore, I think the core requirement from the compiler side is that the frontend should recognize the frontmatter block and produce errors if the block is not immediately at the start of the file, the file contains more than one block at any position or the block is not closed. Whether that happens in the lexer or the parser, I'm not sure of the best approach.

Since Cargo also needs to locate and parse the frontmatter block, it seems wasteful to implement duplicate error reporting for malformed frontmatter. I would suggest that if Cargo cannot successfully parse the frontmatter block itself (eg it does not find 3 or more - characters immediately after the shebang line that are terminated by the same number of characters), it should just invoke rustc and let rustc handle the error reporting for the malformed block.

I agree that we need proper error reporting for syntactically invalid frontmatter blocks, but I don't personally think that has to be done in this very PR provided one or more bugs are opened to track that work and marked as open issues on the tracking issue and there is no regression in quality of other error messages.

cc @Noratrieb since you also suggested an implementation for this feature on that Zulip thread 🙂

@epage
Copy link
Contributor Author

epage commented Apr 2, 2025

@wesleywiser

We will want to have proper error reporting for things like unclosed frontmatter sections even if Cargo explicitly told rustc where the file content should start because a user could always place one later in the file and we want to give them a useful error message.

Therefore, I think the core requirement from the compiler side is that the frontend should recognize the frontmatter block and produce errors if the block is not immediately at the start of the file, the file contains more than one block at any position or the block is not closed. Whether that happens in the lexer or the parser, I'm not sure of the best approach.

...

I agree that we need proper error reporting for syntactically invalid frontmatter blocks, but I don't personally think that has to be done in this very PR provided one or more bugs are opened to track that work and marked as open issues on the tracking issue and there is no regression in quality of other error messages.

So if I'm understanding, you see reporting of meanigful errors for misplaced, malformed, and multiple frontmatters as a blocker for stabilizing an MVP?

Since Cargo also needs to locate and parse the frontmatter block, it seems wasteful to implement duplicate error reporting for malformed frontmatter. I would suggest that if Cargo cannot successfully parse the frontmatter block itself (eg it does not find 3 or more - characters immediately after the shebang line that are terminated by the same number of characters), it should just invoke rustc and let rustc handle the error reporting for the malformed block.

This approach was initially discussed at #137193 (comment)

@wesleywiser
Copy link
Member

wesleywiser commented Apr 2, 2025

So if I'm understanding, you see reporting of meanigful errors for misplaced, malformed, and multiple frontmatters as a blocker for stabilizing an MVP?

For me, it's a blocker for stabilization of the feature. The helpfulness/friendliness of the errors is certainly debatable and I wouldn't block stabilization on having them be absolutely perfect but the status quo does not reach the bar for me. I would say the errors at least need to describe what element the compiler thinks it saw and some useful suggestions for resolving common issues (eg, the unclosed block case should tell you how to close the frontmatter block).

As I said, I wouldn't consider that to be a blocker for landing an initial PR though as long we have issues tracking the necessary improvements to error messages and the changes do not regress the UX for any other cases.

In case it is helpful, let me provide concrete suggestions for what I think some of the errors should look like before stabilization:

  1. Unclosed block:
---
//~^ ERROR unclosed frontmatter block
//~| HELP use `---` to close the block

#![feature(frontmatter)]

fn main() {
}
  1. Invalid infostring:
---cargo,hello-world
//~^ ERROR invalid infostring identifier
//~| NOTE identifiers may not contain `,` characters

---

#![feature(frontmatter)]

fn main() {
}
  1. Multiple frontmatters
---cargo
---

---buck
//~^ ERROR only a single frontmatter may appear in a Rust file
---

#![feature(frontmatter)]

fn main() {
}

@fee1-dead
Copy link
Member

Just let me know when I can start the work, either after this lands or now :)

@oli-obk
Copy link
Contributor

oli-obk commented Apr 2, 2025

r? @wesleywiser

@rustbot rustbot assigned wesleywiser and unassigned oli-obk Apr 2, 2025
@oli-obk
Copy link
Contributor

oli-obk commented Apr 2, 2025

In case it is helpful, let me provide concrete suggestions for what I think some of the errors should look like before stabilization:

I think we should also detect

#![feature(frontmatter)]
---cargo
//~^ ERROR frontmatter block after items
---

fn main() {
}

@epage
Copy link
Contributor Author

epage commented Apr 2, 2025

Just let me know when I can start the work, either after this lands or now :)

Some circumstances have changed at work such that I will be unavailable to give this any of my attention, even to discuss this, for the next week. If you are able to move anything forward in that time, feel free to do so. Otherwise, the exact order depends on what direction the implementation for this PR needs to go. Either way, I appreciate the help!

@bors
Copy link
Collaborator

bors commented Apr 3, 2025

☔ The latest upstream changes (presumably #139301) made this pull request unmergeable. Please resolve the merge conflicts.

@fee1-dead
Copy link
Member

Update: I got a considerable amount of progress today implementing it in the lexer -> parser level with proper diagnostics (WIP commit). I expect to have a PR ready early next week.

@fmease
Copy link
Member

fmease commented Apr 13, 2025

Whatever PR wins out, please make sure that frontmatter is not observable via <TokenStream as FromStr>::from_str (in proc-macro crates) just like shebangs. That's still something everybody agrees on hopefully (I've been subscribed to this thread since the very beginning and did read all messages but it's been a while)?

Potential UI test (draft):

extern proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn check(_: TokenStream) -> TokenStream {
    assert!("---\n---".parse::<TokenStream>().unwrap().is_empty());
    Default::default()
}
//@ check-pass
//@ proc-macro: makro.rs
//@ edition: 2021

makro::check!();

fn main() {}

fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request Apr 19, 2025
fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request Apr 30, 2025
fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request Apr 30, 2025
fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request Apr 30, 2025
fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request May 1, 2025
fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request May 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-frontmatter `#![feature(frontmatter)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants