-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating strategies from grammars #170
Comments
@DRMacIver, is this closed by |
It's definitely not closed by #792. I don't think this is necessarily a single strategy function. But an example of a capability that we don't currently support and that I'd like to is to be able to take a grammar definition in EBNF and turn it into a strategy. |
Having looked into grammar definition languages, the clear winner for context-free grammars is ABNF. This is equivalent to the many, many EBNF description syntaxes, but has actually been standardised by RFC 5234 (and case-sensitivity added in RFC 7405). ABNF is easy to write, and even easier to copy from an RFC if you're implementing a standard protocol! Sadly I couldn't find a finished ABNF definition parser in Python, but it's easy enough to write one as the structure is pretty simple. |
Another thing to consider is ANTLR grammar definitions, mostly on grounds that there are a lot of them! picireny uses these from Python to do test-case reduction, although I haven't looked into the details. We probably don't want people writing ANTLR grammar definitions by hand, but it might be a nice starting point for getting a lot of strategies almost for free. |
We could support both pretty easily; I think the ANTLR and ABNF syntax is totally disjoint so we can even autodetect which one has been passed to a single def from_grammar(
production: Union[str, tuple[str, ...]], # Name or names of the production(s) to produce
grammar: str, # The grammar definition
) -> SearchStrategy[str]: ...
def get_grammar_definition(
name: str, # A name from the ANTLR grammars repo
) -> str: ... Heck, we can probably just generate the parsers for ANTLR and ABNF grammar definitions with Thoughts on that as an API? I've deliberately gone minimalist for a first pass as it's easier to add arguments than to take them away... |
Aaaand, ugh, I couldn't find an ABNF or Antlr parser that exposes a nice structured representation of the grammar 😭 On the upside, That would make the function |
As a follow-up to close this issue, it should be fairly straightforward to write a Lark EBNF grammar for rfc5234 ABNF grammars, and use Lark's parse tree transformer functions to go translate directly to a strategy tree. We could use the same trick for ANTLR definitions too! Translating from RFC 5234 to Lark gives me the following, which works in the prototype r"""
// The production rules for ABNF grammar descriptions
rulelist : ( rule | (c_wsp* c_nl) )+
rule : rulename defined_as elements c_nl
rulename : ALPHA (ALPHA | DIGIT | "-")*
defined_as : c_wsp* ("=" | "=/") c_wsp*
elements : alternation c_wsp*
c_wsp : c_nl? WSP
c_nl : comment | CRLF
comment : ";" (WSP | VCHAR)* CRLF
alternation : concatenation (c_wsp* "/" c_wsp* concatenation)*
concatenation : repetition (c_wsp+ repetition)*
repetition : repeat? element
repeat : DIGIT+ | DIGIT* "*" DIGIT*
element : rulename | group | option | char_val | num_val | prose_val
group : "(" c_wsp* alternation c_wsp* ")"
option : "[" c_wsp* alternation c_wsp* "]"
char_val : DQUOTE (SP | VCHAR)* DQUOTE
num_val : "%" (bin_val | dec_val | hex_val)
bin_val : "b" BIT+ ( ("." | "-") BIT+ )?
dec_val : "d" DIGIT+ ( ("." | "-") DIGIT+ )?
hex_val : "x" HEXDIG+ ( ("." | "-") HEXDIG+ )?
prose_val : /<[\x20-\x3d\x3f-\x7e]*>/
// Standard terminals defined for all ABNF grammars
ALPHA : /[A-Za-z]/
BIT : "0" | "1"
CHAR : /[\x01-\x7f]/
CR : "\r"
CRLF : "\r\n"
CTL : /[\x00-\x1f\x7f]/
DIGIT : /[0-9]/
DQUOTE : "\""
HEXDIG : /[0-9A-Fa-f]/
HTAB : "\t"
LF : "\n"
LWSP : /((\r\n)?[ \t])+/
OCTET : /[\x00-\xFF]/
SP : " "
VCHAR : /[\x21-\x7e]/
WSP : " " | "\t"
""" |
On further investigation, I think our best option would actually be to handle this upstream by adding a |
I think it's worth having a wrapper |
I’d like to be able to generate strings matching some grammar. I’ve previously implemented this in a very early version of Hypothesis for regular grammars but never finished it off.
The text was updated successfully, but these errors were encountered: