Skip to content

Add comma to grmtools section syntax #542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions .buildbot.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,30 +15,35 @@ export PATH=`pwd`/.cargo_install/bin/:$WASMTIME_HOME/bin:$PATH

# Install wasmtime once debian trixie is stablized
# we can likely just use rust-wasmtime.
#
# Needed for wasm32-wasip2
touch .wasmtime_profile
if [ "X`which wasmtime`" = "X" ]; then
PROFILE=".wasmtime_profile" bash -c 'curl https://wasmtime.dev/install.sh -sSf | bash'
fi
. ./.wasmtime_profile

# Needed for wasm32-unknown-unknown
mkdir -p $NVM_DIR
PROFILE=/dev/null bash -c 'curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.2/install.sh | bash'
. "$NVM_DIR/nvm.sh"
# Download and install Node.js:
nvm install 22

rustup target add wasm32-unknown-unknown
rustup target add wasm32-wasip2
cargo install wasm-bindgen-cli

cargo fmt --all -- --check

rustup toolchain install stable
rustup default stable

cargo test
cargo test --release

rustup target add wasm32-unknown-unknown
cargo install wasm-bindgen-cli
cargo test --target wasm32-unknown-unknown

rustup target add wasm32-wasip2
cargo install workspace_runner
cargo test --target wasm32-wasip2

cargo test --lib cfgrammar --features serde
Expand Down
2 changes: 1 addition & 1 deletion .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[target.wasm32-wasip2]
runner = "wasmtime run --dir ."
runner = "workspace_runner --target wasm32-wasip2 --"

[target.wasm32-unknown-unknown]
# Provided by the crate wasm-bindgen-cli.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ lexer can be found in `src/calc.l`:
and where the definitions for the parser can be found in `src/calc.y`:

```rust
%grmtools{yacckind Grmtools}
%grmtools{yacckind: Grmtools}
%start Expr
%avoid_insert "INT"
%%
Expand Down
53 changes: 48 additions & 5 deletions cfgrammar/src/lib/yacc/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,14 @@ impl YaccParser {
let (key_end_pos, key) = self.parse_name(i)?;
i = self.parse_ws(key_end_pos, false)?;
if key == "yacckind" {
if let Some(j) = self.lookahead_is(":", i) {
i = self.parse_ws(j, false)?;
} else {
return Err(YaccGrammarError {
kind: YaccGrammarErrorKind::InvalidGrmtoolsHeaderKey,
spans: vec![Span::new(i, i)],
});
}
let val_end_pos = self.parse_yacckind(i, update_yacc_kind)?;
if let Some(orig) = yacc_kind_key_span {
let dupe = Span::new(key_start_pos, key_end_pos);
Expand All @@ -466,6 +474,12 @@ impl YaccParser {
yacc_kind_key_span = Some(Span::new(key_start_pos, key_end_pos));
i = self.parse_ws(val_end_pos, true)?;
}
if let Some(j) = self.lookahead_is(",", i) {
i = self.parse_ws(j, true)?;
continue;
} else {
break;
}
} else {
return Err(YaccGrammarError {
kind: YaccGrammarErrorKind::InvalidGrmtoolsHeaderKey,
Expand Down Expand Up @@ -2832,20 +2846,20 @@ B";
#[test]
fn test_grmtools_section_yacckinds() {
let srcs = [
"%grmtools{yacckind Original(NoAction)}
"%grmtools{yacckind: Original(NoAction)}
%%
Start: ;",
"%grmtools{yacckind YaccKind::Original(GenericParseTree)}
"%grmtools{yacckind: YaccKind::Original(GenericParseTree)}
%%
Start: ;",
"%grmtools{yacckind YaccKind::Original(yaccoriginalactionkind::useraction)}
"%grmtools{yacckind: YaccKind::Original(yaccoriginalactionkind::useraction)}
%actiontype ()
%%
Start: ;",
"%grmtools{yacckind Original(YACCOriginalActionKind::NoAction)}
"%grmtools{yacckind: Original(YACCOriginalActionKind::NoAction)}
%%
Start: ;",
"%grmtools{yacckind YaccKind::Grmtools}
"%grmtools{yacckind: YaccKind::Grmtools}
%%
Start -> () : ;",
];
Expand All @@ -2856,6 +2870,35 @@ B";
}
}

#[test]
fn test_grmtools_section_commas() {
// We can't actually test much here, because
// We don't have a second value to test.
//
// `RecoveryKind` seemed like an option for an additional value to allow,
// but that is part of `lrpar` which cfgrammar doesn't depend upon.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights another interesting thing, and potential place where we might want to parse
the grmtools section multiple times. E.g. recognizing yacckind in cfgrammar, but recoverer in lrpar.
Presumably that means cfgrammar should ignore unknown values (rather than it's current strict checking).

This is very similar to the "before LexerKind" vs "after LRNonStreamingLexer" thing with lexerkind except across crates.

let src = r#"
%grmtools{
yacckind: YaccKind::Grmtools,
}
%%
Start -> () : ;
"#;
YaccParser::new(YaccKindResolver::NoDefault, src.to_string())
.parse()
.unwrap();
let src = r#"
%grmtools{
yacckind: YaccKind::Grmtools
}
%%
Start -> () : ;
"#;
YaccParser::new(YaccKindResolver::NoDefault, src.to_string())
.parse()
.unwrap();
}

#[test]
fn test_empty_production_spans_issue_473() {
let empty_prod_conflicts = [
Expand Down
4 changes: 2 additions & 2 deletions doc/src/errorrecovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ make use of error recovery.
A simple calculator grammar looks as follows:

```rust,noplaypen
%grmtools{yacckind Grmtools}
%grmtools{yacckind: Grmtools}
%start Expr
%%
Expr -> u64:
Expand Down Expand Up @@ -182,7 +182,7 @@ We thus change the grammar so that inserted integers prevent evaluation from
occurring:

```rust,noplaypen
%grmtools{yacckind Grmtools}
%grmtools{yacckind: Grmtools}
%start Expr
%%
Expr -> Result<u64, ()>:
Expand Down
8 changes: 4 additions & 4 deletions doc/src/lexextensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ other flags should specify their value immediately after the flag name.

```
%grmtools {
allow_wholeline_comments
!octal
size_limit 1024
allow_wholeline_comments,
!octal,
size_limit: 1024,
}
%%
. "rule"
Expand Down Expand Up @@ -56,7 +56,7 @@ The following flags can change the behavior to match posix lex more closely.

```
%grmtools {
!dot_matches_new_line
!dot_matches_new_line,
posix_escapes
}
%%
Expand Down
2 changes: 1 addition & 1 deletion doc/src/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ is lexed, but no lexemes are created from it.

Our initial version of calc.y looks as follows:
```rust,noplaypen
%grmtools{yacckind Grmtools}
%grmtools{yacckind: Grmtools}
%start Expr
%%
Expr -> Result<u64, ()>:
Expand Down
2 changes: 1 addition & 1 deletion doc/src/yaccextensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ But a default can be set or forced by using a `YaccKindResolver`.
## Example

```
%grmtools{yacckind Grmtools}
%grmtools{yacckind: Grmtools}
%%
Start: ;
```
2 changes: 1 addition & 1 deletion lrlex/examples/calc_manual_lex/src/calc.y
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
%grmtools{yacckind Grmtools}
%grmtools{yacckind: Grmtools}
%start Expr
%avoid_insert "INT"
%expect-unused Unmatched "UNMATCHED"
Expand Down
50 changes: 42 additions & 8 deletions lrlex/src/lib/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,15 @@ where
});
}
i = self.parse_spaces(i)?;
if let Some(j) = self.lookahead_is(":", i) {
i = j
} else {
return Err(LexBuildError {
kind: LexErrorKind::InvalidGrmtoolsSectionValue,
spans: vec![Span::new(i, i)],
});
}
i = self.parse_spaces(i)?;
let j = self.parse_digits(i)?;
// This checks that the digits are valid numbers, but currently just returns `None`
// when the values are actually out of range for that type. This could be improved.
Expand Down Expand Up @@ -351,11 +360,18 @@ where
&mut grmtools_section_span_map,
&mut grmtools_section_lex_flags,
)?;
if i == self.src.len() {
return Err(self.mk_error(LexErrorKind::PrematureEnd, i));
if let Some(j) = self.lookahead_is(",", i) {
i = self.parse_ws(j)?;
continue;
} else {
break;
}
}
i = self.lookahead_is("}", i).unwrap();
if let Some(j) = self.lookahead_is("}", i) {
i = j
} else {
return Err(self.mk_error(LexErrorKind::PrematureEnd, i));
}
i = self.parse_ws(i)?;
}
grmtools_section_lex_flags.merge_from(&self.force_lex_flags);
Expand Down Expand Up @@ -864,7 +880,9 @@ mod test {
.expect_err("Parsed ok while expecting error");
assert_eq!(errs.len(), 1);
let e = &errs[0];
assert!(e.kind.is_same_kind(&kind));
if !e.kind.is_same_kind(&kind) {
panic!("expected {:?}.is_same_kind({:?})", &e.kind, &kind);
}
assert_eq!(
e.spans()
.iter()
Expand Down Expand Up @@ -1864,7 +1882,7 @@ b "A"
fn test_grmtools_section_fails() {
let src = r#"
%grmtools {
!unicode
!unicode,
unicode
}
%%
Expand All @@ -1879,8 +1897,8 @@ b "A"

let src = r#"
%grmtools {
size_limit 5
size_limit 6
size_limit: 5,
size_limit: 6,
}
%%
. "dot";
Expand All @@ -1894,7 +1912,7 @@ b "A"

let src = r#"
%grmtools {
!size_limit 5
!size_limit: 5,
}
%%
. "dot"
Expand All @@ -1906,6 +1924,22 @@ b "A"
3,
3,
);
// The following is missing comma separators for more than 2 elements
// This is to avoid parsing it as "key value" number of elements.
// However we actually error after the first element since the parser
// knows "case_insensitive" is a flag with no arguments.
let src = r#"
%grmtools {unicode, case_insensitive posix_escapes allow_comments}
%%
Copy link
Collaborator Author

@ratmice ratmice Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment indicates a difference between the grmtools_section.y parser, and the
lrlex/src/lib/parser.rs.

the grmtools_section.y parser returns a case_insensitive posix_escapes key/value pair, then fails on allow_comments.

I suppose we should consider using key: value here and to be honest, because of the closeness to rust syntax I've found myself adding the ":" accidentally a few times, so maybe it is just right?

So it'd be something like:

%grmtools{
    yacckind: Original(NoAction),
    x,
    !y,
}

Copy link
Collaborator Author

@ratmice ratmice Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess another option would be %yacckind...

I don't think %yacckind ... plays as well with %flag, and %!flag, I suppose it works, but I don't recall any case where %foo omits a subsequent value, it always seems to be %foo xyz.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried key: value in 10e6d74

. "dot"
\n+ ;
"#;
LRNonStreamingLexerDef::<DefaultLexerTypes<u8>>::from_str(src).expect_error_at_line_col(
src,
LexErrorKind::PrematureEnd,
2,
38,
);
}

#[test]
Expand Down
2 changes: 1 addition & 1 deletion lrpar/cttests/src/calc_nodefault_yacckind.test
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Test basic user actions using the calculator grammar
grammar: |
%grmtools {yacckind Original(UserAction)}
%grmtools {yacckind: Original(UserAction)}
%start Expr
%actiontype Result<u64, ()>
%avoid_insert 'INT'
Expand Down
2 changes: 1 addition & 1 deletion lrpar/cttests/src/calc_wasm.test
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Test running on wasm targets
grammar: |
%grmtools {yacckind Grmtools}
%grmtools {yacckind: Grmtools}
%start Expr
%avoid_insert "INT"
%expect-unused Unmatched "UNMATCHED"
Expand Down
4 changes: 2 additions & 2 deletions lrpar/cttests/src/cgen_helper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,10 @@ pub(crate) fn run_test_path<P: AsRef<Path>>(path: P) -> Result<(), Box<dyn std::
// Create grammar files
let base = path.file_stem().unwrap().to_str().unwrap();
let mut pg = PathBuf::from(&out_dir);
pg.push(format!("{}.y.rs", base));
pg.push(format!("{}.test.y", base));
Copy link
Collaborator Author

@ratmice ratmice Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had believed that this .y.rs thing was a mistake of sorts, because this is a .y file not a .rs file.
But it occurred to me that emitting of this file with a .rs extension may have been a way to get the same behavior as cargo:rerun-if-changed=path behavior.

In which case I don't think it's necessary we emit rerun-if-changed for the .test file these derive from.
But I do think that means we should also emit rerun-if-changed for glob(src/*.rs) which I don't think we're doing.

Perhaps best to do that and an audit in a follow up.

Or should I revert this part of the change, and change the tests to glob for *.y.rs for the time being?
Then do the whole renaming/rerun audit together in a follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps best to do that and an audit in a follow up.

Good idea!

fs::write(&pg, grm).unwrap();
let mut pl = PathBuf::from(&out_dir);
pl.push(format!("{}.l.rs", base));
pl.push(format!("{}.test.l", base));
fs::write(&pl, lex).unwrap();

// Build parser and lexer
Expand Down
Loading