Skip to content

Add comma to grmtools section syntax #542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 27, 2025

Conversation

ratmice
Copy link
Collaborator

@ratmice ratmice commented Mar 24, 2025

This uses the modified grmtools section grammar from #490 which adds a comma to make it unambiguous.
This isn't what is currently parsed by the section parsing code in parser.rs.

using a self hosted grammar isn't something we currently do,
This patch doesn't intend to add it. However it occurred to me that we could still run it in
cttests/ over the lrpar/example/*.{l,y} and those generated by cttests.

This currently works for all the files in the repository,
you need to have at least 3 entries in the grmtools section before it realizes something is awry.
because it reads the first and second entries as key/value.

Marking this as draft for the following purposes

  1. until there is agreement on whether this change of syntax is acceptable and good...
  2. until we change the parser.rs files to accept it.

Edit:
The other thing this doesn't try to do is add a specialized parser for lex or one for yacc files.
Accepting just the keys for the respective files, I guess we could do a separate parser each?

// knows "case_insensitive" is a flag with no arguments.
let src = r#"
%grmtools {unicode, case_insensitive posix_escapes allow_comments}
%%
Copy link
Collaborator Author

@ratmice ratmice Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment indicates a difference between the grmtools_section.y parser, and the
lrlex/src/lib/parser.rs.

the grmtools_section.y parser returns a case_insensitive posix_escapes key/value pair, then fails on allow_comments.

I suppose we should consider using key: value here and to be honest, because of the closeness to rust syntax I've found myself adding the ":" accidentally a few times, so maybe it is just right?

So it'd be something like:

%grmtools{
    yacckind: Original(NoAction),
    x,
    !y,
}

Copy link
Collaborator Author

@ratmice ratmice Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess another option would be %yacckind...

I don't think %yacckind ... plays as well with %flag, and %!flag, I suppose it works, but I don't recall any case where %foo omits a subsequent value, it always seems to be %foo xyz.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried key: value in 10e6d74

@@ -351,11 +351,19 @@ where
&mut grmtools_section_span_map,
&mut grmtools_section_lex_flags,
)?;
if i == self.src.len() {
return Err(self.mk_error(LexErrorKind::PrematureEnd, i));
eprintln!("foo '{:?}'", &self.src[i..]);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 57b94aa

// We don't have a second value to test.
//
// `RecoveryKind` seemed like an option for an additional value to allow,
// but that is part of `lrpar` which cfgrammar doesn't depend upon.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights another interesting thing, and potential place where we might want to parse
the grmtools section multiple times. E.g. recognizing yacckind in cfgrammar, but recoverer in lrpar.
Presumably that means cfgrammar should ignore unknown values (rather than it's current strict checking).

This is very similar to the "before LexerKind" vs "after LRNonStreamingLexer" thing with lexerkind except across crates.

@ltratt
Copy link
Member

ltratt commented Mar 24, 2025

Apart from s/comma/colon/ I think this is ready to go. Please squash (and reword the commit) appropriately.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 24, 2025

Well, it adds both comma and colon, so not just s/comma/colon/ but I'll reword it appropriately.

@ltratt
Copy link
Member

ltratt commented Mar 24, 2025

Aha, yes, now I understand the "," aspect!

@@ -340,6 +340,7 @@ where
let mut grmtools_section_span_map = HashMap::new();
let mut grmtools_section_lex_flags = UNSPECIFIED_LEX_FLAGS;
if let Some(j) = self.lookahead_is("%grmtools", i) {
// lrlex currently doesn't have any `key: value' settings.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually wrong, I totally forgot that regex defined some limit related values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in 54f907f

@ratmice ratmice force-pushed the cttest_grmtools_section branch from 10e6d74 to 54f907f Compare March 24, 2025 17:39
@ratmice ratmice marked this pull request as ready for review March 24, 2025 17:44
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 24, 2025

Will need another squash, I fixed field-like values on lrlex, and updated the docs.

@ltratt
Copy link
Member

ltratt commented Mar 24, 2025

Good catch! Please squash.

@ratmice ratmice force-pushed the cttest_grmtools_section branch from d2e6a83 to 9fcf7e1 Compare March 24, 2025 18:03
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 24, 2025

Squashed, and cleaned up the commit message regarding fields in lex.

@ltratt ltratt added this pull request to the merge queue Mar 24, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 24, 2025
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 24, 2025

Curious curious, I have a theory about what is wrong, the build logs aren't informative about it,
The new test I added runs on native and wasm32-unknown, but fails on wasmtime.

I believe it is probably wasmtime sandboxing filesystem access to the current directory, and that test tries to read lex/yacc files in ../examples/ and OUT_DIR.

I'll just run it locally to confirm

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 24, 2025

So, yes that seems to be the issue. Unfortunately I'm not certain how to fix it (properly)...

My attempt was to add a .cargo/scripts/wasi_test_runner.sh
.cargo/config.toml

runner = ".cargo/scripts/wasi_test_runner.sh"

Unfortunately it looks like the test runner may get executed without the environment variable OUT_DIR set.
(assuming I'm not doing something silly)

#!/bin/sh -x
export top_dir="$CARGO_MANIFEST_DIR"
while [ -f "${top_dir}/Cargo.toml" ]; do
        export top_dir="$top_dir/.."
done
wasmtime run --dir . \
        --dir $CARGO_MANIFEST_DIR \
        --dir $top_dir \
        --dir $OUT_DIR \
        --env CARGO_MANIFEST_DIR="$CARGO_MANIFEST_DIR" \
        --env OUT_DIR=$OUT_DIR \
        $1

So if that is the case, this would seem to indicate a cargo problem with setting that environment variable on that target,
it is possible we may have to conditionally disable the test, perhaps: #[cfg(all(not(wasm32_wasip2), test)]

@ltratt
Copy link
Member

ltratt commented Mar 24, 2025

This is outside my comfort zone, but are we suggesting this is a cargo bug? or a wasmtime limitation? or ... ?

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 24, 2025

It's actually just me misunderstanding cargo's weirdness.

Cargo sets the environment variable, but only when build.rs exists in the project,
and the various crates exist in the workspace, some of them have a build.rs and some don't.
If the project doesn't have a build.rs a projects tests shouldn't rely on it since it can't have any effect.

So I think it is just normal behavior that my test runner needs to account for the fact that it may or may not be set.

@ltratt
Copy link
Member

ltratt commented Mar 24, 2025

Can we fix things here?

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 24, 2025

I think so but still working on it, at worst we just disable the test on that particular target.
I'm also not certain whether or not the hacks required to fix it are be so terrible that we'd rather just disable it or not.

E.g. there is no way I know of to just get the workspace dir, and CARGO_MANIFEST_DIR/../examples isn't valid for all crates in the workspace. So what i'm trying to do is search in manifest_dir/../, manifest_dir/../.., up one directory at a time looking for a Cargo.toml with the [Workspace] key, and passing that path.

I think it is possible we end up with something that fixes all issues of this kind in the workspace once and for all with a unfortunate but perhaps tolerable hack.

But I'm definitely not there yet.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 25, 2025

For the day at least, i've declared bankruptcy on trying to get it working,
I couldn't even seem to get it working while hard coding paths.
I could sometimes get the tests to pass but not run because the globs were returning nothing.
Thus the test was just an empty loop.

Plus it's also becoming of a size more reasonable for it's own patch if we manage to get it working.
So for now I just disabled that test on the wasm32-wasi target

@ltratt
Copy link
Member

ltratt commented Mar 25, 2025

If you have a bright idea after a good night's sleep, go for it, but if not: please squash.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 25, 2025

There is something I can try, all the paths I was setting manually were about setting up which directory was allowed inside the sandbox...
I was still using the cargo/rust generated paths to OUT_DIR, and MANIFEST_DIR, those get set through the environment as absolute paths. It is possible that in order to get them to work in the sandbox those must be changed into relative paths.

That doesn't explain exactly why it didn't work when I tried to add / to the sandbox, so it could also be some interaction between wasmtime and how the glob crate works. But there is no crate boundary involved with seeing if I can get relative paths in place of what cargo gives.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 25, 2025

I did mange to get something working, and fixed up most of the hacks

So for the time being i've put it up on my github rather than add it to the workspace,
Until we decide if grmtools wants to take it.

Note that because .cargo/config.toml is committed to the repo people will need it to run cargo.test
on wasm targets, so having it in repo allows us to say cargo install --path grmtools_workspace_runner or something
Where otherwise it'll be a separate tool they'll have to install.

I don't have much of a preference, but i very rarely ever cargo publish anything...

https://github.com/ratmice/workspace_time_test_runner

Edit: It is probably worth renaming this to be more accurate, the runner flag is apparently also used for cargo run.
e.g. when you cargo run nimbleparse --target wasm32-wasip2 the runner gets invoked first.
So, calling it a 'test runner' isn't really accurate.

In .buildbot.sh add:

cargo install --git 'https://github.com/ratmice/workspace_time_test_runner.git'

then in .cargo/config set

[target.wasm32-wasip2]
runner = "workspace_time_test_runner"

then removing the

- #[cfg(not(all(target_arch = "wasm32", target_os = "wasi")))]
 fn test_grmtools_section_files() {

finally once it is installed cargo test --target wasm32-wasip2 -- --nocapture 2>&1 | grep %grmtools should list all the grmtools sections from the examples/**/.{l,y} and cttests/.test files

@ltratt
Copy link
Member

ltratt commented Mar 26, 2025

Wow, I must admit, this feels like a lot of complexity for one (apparently weird!) platform. Do I understand correctly that this would require normal grmtool developers to install stuff before cargo test works?

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 26, 2025

It only requires people to install stuff before cargo test --target wasm32-wasip2, technically that also requires
you to install stuff currently, but it is the normal wasmtime installation that provides the target.

cargo test --target wasm32-unknown-unknown acts similarly in that it requires cargo install wasm-bindgen-cli

But yeah I agree that the question of additional complexity vs how worthwhile this is is a tough call for one test.
Either way I'm not sorry I did this, I'm sure it'll be useful for others in some way, even if you choose not to use it.

One of the things I do like about that specific test though is that it is kind of a complete test of the stack:
by that I mean, YaccKind::Grmtools with rust code, actually parsing input, and running user actions.
We probably have other tests like that? I cannot off-hand say which one, I guess calc_actions does get run on ci but not in cargo test.

Edit:

I had originally hoped to avoid the need to cargo install by doing

runner = "cargo run -p grmtools_wasi_runner"

Putting the runner inside the project. Unfortunately the recursive invocation of cargo overwrites environment variables which causes the environment variables for the grmtools_wasi_runner to be specified instead of the ones we want. I filed rust-lang/cargo#15350
for that

Anyhow, I do find the need to only cargo install when testing this one weird target to make it somewhat more tolerable.

Edit: I suppose that one thing we could do is attempt to get it added upstream in wasmtime so that it gets installed when we install the target. I'm not sure that they would accept it but we could try (I chatted with them informally but they were pretty hesitant, from their perspective cargo is just one of many languages and toolchains so it's a bit weird to add this special stuff for cargo).

@ratmice ratmice force-pushed the cttest_grmtools_section branch from bb96395 to 06cef3e Compare March 26, 2025 13:08
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 26, 2025

Anyhow, I added some minor comments + assertions + fixed whitespace, then squashed in case you wanted to just go ahead without the runner.

@@ -79,10 +79,10 @@ pub(crate) fn run_test_path<P: AsRef<Path>>(path: P) -> Result<(), Box<dyn std::
// Create grammar files
let base = path.file_stem().unwrap().to_str().unwrap();
let mut pg = PathBuf::from(&out_dir);
pg.push(format!("{}.y.rs", base));
pg.push(format!("{}.test.y", base));
Copy link
Collaborator Author

@ratmice ratmice Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had believed that this .y.rs thing was a mistake of sorts, because this is a .y file not a .rs file.
But it occurred to me that emitting of this file with a .rs extension may have been a way to get the same behavior as cargo:rerun-if-changed=path behavior.

In which case I don't think it's necessary we emit rerun-if-changed for the .test file these derive from.
But I do think that means we should also emit rerun-if-changed for glob(src/*.rs) which I don't think we're doing.

Perhaps best to do that and an audit in a follow up.

Or should I revert this part of the change, and change the tests to glob for *.y.rs for the time being?
Then do the whole renaming/rerun audit together in a follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps best to do that and an audit in a follow up.

Good idea!

@ltratt
Copy link
Member

ltratt commented Mar 27, 2025

@ratmice OK, phew, since it's limited to that target, I'm fine with this. Please squash.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 27, 2025

Alright, Just so I'm clear we're okay with this additional cargo install requirement when it is isolated to --target wasm32-wasip2, so I'll add that and squash.

If so please give me a day or so to figure out a better name for this runner,
I figure I'll just keep it under my personal account and upload to crates.io,
which makes it easier for external projects to use it, than if it was in this repo.
and as the author I would be reluctant to try and heave it upon softdevteam/ unless it's something you all want to adopt.

The leading naming candidate for the moment is just plain old workspace_runner then add --target wasm32-wasip2 -- flags. So that in theory it could expand to other targets/sandboxes in the future. I think that is both clear in it's function,
and avoids stepping toes on any potential trademarks, and allows for future expansion.

runner = "workspace_runner --target wasm32-wasip2 --"

@ltratt
Copy link
Member

ltratt commented Mar 27, 2025

Just so I'm clear we're okay with this additional cargo install requirement when it is isolated to --target wasm32-wasip2, so I'll add that and squash.

Yep!

I figure I'll just keep it under my personal account and upload to crates.io

This seems like a good idea to me. It might be useful to others too!

This reworks the syntax for the grmtools section to be unambiguous,
adding a lexer and grammar file for it in `cttests/grmtools_section.test`.
The grammar is run over the grmtools sections extracted from the .l and .y
files in the repository.

Previous unreleased versions of grmtools including this syntax used a
space separated sequence of values. Where some values were flags,
and other values acted like key value pairs.

before:
    `%grmtools{!dot_matches_new_line case_insensitive size_limit 0}`
    `%grmtools{yacckind Grmtools}`

This changes the grammar such that values must be separated by commas,
and `key: value` pairs must be separated by colons, with an optional
trailing comma.

    `%grmtools{!dot_matches_new_line, case_insensitive, size_limit: 0}`
    `%grmtools{yacckind: Grmtools}`
@ratmice ratmice force-pushed the cttest_grmtools_section branch from 06cef3e to 5b7a126 Compare March 27, 2025 18:53
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 27, 2025

Updated to use the test runner,

I moved the cargo install of the runners for wasm targets to be after
the cargo test line, so it was more apparent that they aren't required for default targets.

Anyhow I think this should be ready now 🤞

@ltratt ltratt added this pull request to the merge queue Mar 27, 2025
Merged via the queue into softdevteam:master with commit eb831ab Mar 27, 2025
2 checks passed
@ratmice ratmice deleted the cttest_grmtools_section branch March 28, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants