Skip to content

Conversation

@csnover
Copy link

@csnover csnover commented Oct 18, 2025

This is based on the commits in #416, because I did not want to deal with slow tests :-)

I was adapting a parser for a format (wikitext) which has pathological backtracking behaviour which demands caching to avoid quadratic parsing, but also the format is context-sensitive so rules need to receive arguments in order to know which tokens are terminators or not according to the enclosing context.

This patch adds support for combining #[cache] with arguments for the specific use case I had, which is passing around a context object with interior mutability ( 😱 ) by reference.

If you have separate suggestions about how to deal with such an unpleasant format, I am all ears :-)

Thanks, and after years, I still really enjoy working with rust-peg every time I have a need for a non-trivial parser!

There are a few of them, do not know which one it is, do not care
to spend more time figuring out exactly which one it is, since this
stole over an hour of my life.
This prevents inconsistencies, like only having an MSRV on one
of the packages.
trybuild is VERY SLOW and it hides the test content from tooling
(clippy, fmt, rust-analyzer, etc.). These tests must compile and
pass so there is no reason to run them through trybuild.

This change reduces test run time from 8.7s seconds to 0.7s.

The recent-rustc tests are still run through trybuild even though
they do not need to be simply because this avoids adding a build.rs
and build dependency for versioning. Once the MSRV is increased
this can also just go away.
The code generation does a little more work than it needs to
(the cache key could always just be a tuple, or the expression
could be written as `(usize #(, #name)*)`) simply because my
experience writing proc-macros is that lints have a tendency to
leak out.

The code generation also does a little less work than it needs to
because the way it strips references is extremely naïve and, at
least, will not handle lifetimes. It also means that if someone
actually has a lifetime for a reference which is long enough to
outlive `'input` it will not be possible for them to simply cache
the reference instead of a clone of the argument.
_ => true,
})
.collect(),
RuleParamTy::Rule(ty) => ty.clone(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work? This is for a parameter declared as rule<T>, which at the Rust level is a closure returning RuleResult<T>. To work properly, the cache key would need to be the identity of the closure definition site plus any closed-over upvars, which would be quite difficult to do. So this should probably remain forbidden.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right, I improperly assumed that prohibiting generic types was sufficient for this case because I almost never used rule with a concrete type. Now that you point it out, it is embarrassingly obvious that RuleParamTy appearing here actually means I needed to think about it more :-)

Comment on lines +169 to +179
RuleParamTy::Rust(ty) => ty
.clone()
.into_iter()
.filter(|tt| match tt {
// TODO: This needs to do more; at the least, there
// may be lifetimes. Doing this correctly for every
// edge case probably requires syn, which is a heavy
// dependency.
TokenTree::Punct(punct) => punct.as_char() != '&',
_ => true,
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides lifetimes, this is going to break for things like &str. I don't think trying to turn a borrowed type into an owned type syntactically is going to be robust enough, even using syn.

The right way would be to use traits to get rustc to do it for you in the type system. ToOwned is close, but is implemented for T rather than &T. It might be hard to handle that because &T implements Clone so there couldn't be a T: Clone blanket impl and a separate one for references.

The simplest and most consistent way would be to just require Clone + Hash + 'static. If you're ok with this cloning the argument implicitly to use as the cache key, it seems like it should also be fine to pass by value.

Copy link
Author

@csnover csnover Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are saying makes sense.

I approached this largely pragmatically where a move away from “you can’t do this at all” was a step in the right direction, rather than doing the absolute best engineering. (It feels bad to say that when contributing to someone else’s project…) So I considered it OK to just have a compilation error in situations like what you describe with &str, though you make a great point that using <T as ToOwned>::Owned makes more sense—I always forget about this. (AFAIK it would still require stripping the reference in a hacky way like this, though?)

The issue of value versus reference types in cases similar to this is one which has come up in another crate I maintain and I never really came up with a good solution for what to do in the absence of type metaprogramming, and so I am not sure if there is any general solution for this case either.

I think the main issue with requiring cacheable arguments to be passed by value is that cloning is only necessary when inserting to the cache. For rules where there is a cache hit, this would just be wasted work. Additionally, in my case, requiring to pass by value would also require adding Rc, whereas right now it is not needed. These are not dealbreakers but it does feel bad to artificially restrict performance in a feature intended to improve performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants