Creating strategies from grammars #170

DRMacIver · 2015-09-25T09:55:32Z

I’d like to be able to generate strings matching some grammar. I’ve previously implemented this in a very early version of Hypothesis for regular grammars but never finished it off.

Zac-HD · 2017-08-28T00:44:15Z

@DRMacIver, is this closed by from_regex() in #792? If not, what additional capabilities do you want and what would the arguments to such a strategy be?

DRMacIver · 2017-08-28T07:35:23Z

It's definitely not closed by #792.

I don't think this is necessarily a single strategy function. But an example of a capability that we don't currently support and that I'd like to is to be able to take a grammar definition in EBNF and turn it into a strategy.

Zac-HD · 2018-12-14T06:53:47Z

Having looked into grammar definition languages, the clear winner for context-free grammars is ABNF. This is equivalent to the many, many EBNF description syntaxes, but has actually been standardised by RFC 5234 (and case-sensitivity added in RFC 7405).

ABNF is easy to write, and even easier to copy from an RFC if you're implementing a standard protocol!

Sadly I couldn't find a finished ABNF definition parser in Python, but it's easy enough to write one as the structure is pretty simple.

DRMacIver · 2018-12-14T10:38:02Z

Another thing to consider is ANTLR grammar definitions, mostly on grounds that there are a lot of them! picireny uses these from Python to do test-case reduction, although I haven't looked into the details.

We probably don't want people writing ANTLR grammar definitions by hand, but it might be a nice starting point for getting a lot of strategies almost for free.

Zac-HD · 2018-12-14T12:06:49Z

We could support both pretty easily; I think the ANTLR and ABNF syntax is totally disjoint so we can even autodetect which one has been passed to a single from_grammar function. For ~~bonus points~~ convenience I'd add a function to fetch grammars from antlr/grammars-v4 by name - caching the .g4 files in .hypothesis/grammars/ for obvious performance reasons.

def from_grammar(
    production: Union[str, tuple[str, ...]],  # Name or names of the production(s) to produce
    grammar: str,  # The grammar definition
) -> SearchStrategy[str]: ...

def get_grammar_definition(
    name: str,  # A name from the ANTLR grammars repo
) -> str: ...

Heck, we can probably just generate the parsers for ANTLR and ABNF grammar definitions with antlr4 and stuff them in the hypothesis.vendor namespace!

Thoughts on that as an API? I've deliberately gone minimalist for a first pass as it's easier to add arguments than to take them away...

Zac-HD · 2018-12-28T13:49:48Z

Aaaand, ugh, I couldn't find an ABNF or Antlr parser that exposes a nice structured representation of the grammar 😭

On the upside, Lark looks very nice and does. A lark.lark.Lark (...lark) object can be constructed from an EBNF grammar, and has a tree representation thereof along with the name of the start node. So I'd be all in favor of accepting one of those, and making all the validation happen before the problem gets passed off to Hypothesis.

That would make the function hypothesis.extra.grammar.from_lark, I think.

Zac-HD · 2019-01-08T13:58:34Z

As a follow-up to close this issue, it should be fairly straightforward to write a Lark EBNF grammar for rfc5234 ABNF grammars, and use Lark's parse tree transformer functions to go translate directly to a strategy tree. We could use the same trick for ANTLR definitions too!

Translating from RFC 5234 to Lark gives me the following, which works in the prototype from_lark to generate (syntactically, but usually not semantically) valid ABNF grammars:

r"""
// The production rules for ABNF grammar descriptions
rulelist        : ( rule | (c_wsp* c_nl) )+
rule            : rulename defined_as elements c_nl
rulename        : ALPHA (ALPHA | DIGIT | "-")*
defined_as      : c_wsp* ("=" | "=/") c_wsp*
elements        : alternation c_wsp*

c_wsp           : c_nl? WSP
c_nl            : comment | CRLF
comment         : ";" (WSP | VCHAR)* CRLF

alternation     : concatenation (c_wsp* "/" c_wsp* concatenation)*
concatenation   : repetition (c_wsp+ repetition)*
repetition      : repeat? element
repeat          : DIGIT+ | DIGIT* "*" DIGIT*
element         : rulename | group | option | char_val | num_val | prose_val
group           : "(" c_wsp* alternation c_wsp* ")"
option          : "[" c_wsp* alternation c_wsp* "]"

char_val        : DQUOTE (SP | VCHAR)* DQUOTE
num_val         : "%" (bin_val | dec_val | hex_val)
bin_val         : "b" BIT+ ( ("." | "-") BIT+ )?
dec_val         : "d" DIGIT+ ( ("." | "-") DIGIT+ )?
hex_val         : "x" HEXDIG+ ( ("." | "-") HEXDIG+ )?
prose_val       : /<[\x20-\x3d\x3f-\x7e]*>/

// Standard terminals defined for all ABNF grammars
ALPHA           : /[A-Za-z]/
BIT             : "0" | "1" 
CHAR            : /[\x01-\x7f]/
CR              : "\r"
CRLF            : "\r\n"
CTL             : /[\x00-\x1f\x7f]/
DIGIT           : /[0-9]/
DQUOTE          : "\""
HEXDIG          : /[0-9A-Fa-f]/
HTAB            : "\t"
LF              : "\n"
LWSP            : /((\r\n)?[ \t])+/
OCTET           : /[\x00-\xFF]/
SP              : " "
VCHAR           : /[\x21-\x7e]/
WSP             : " " | "\t"
"""

Zac-HD · 2019-02-04T02:56:57Z

On further investigation, I think our best option would actually be to handle this upstream by adding a Lark.from_abnf constructor method. I'm therefore closing this as implemented by #1740.

DRMacIver · 2019-02-04T12:45:38Z

On further investigation, I think our best option would actually be to handle this upstream by adding a Lark.from_abnf constructor method. I'm therefore closing this as implemented by #1740.

I think it's worth having a wrapper from_abnf function on our side as well. It can just be a thin shim over lark, but it's nice to have an API that doesn't tie us to too tightly to the lark implementation.

DRMacIver added the for-a-modest-fee label Sep 25, 2015

DRMacIver added this to the Hypothesis 2.0 milestone Sep 25, 2015

DRMacIver added the enhancement it's not broken, but we want it to be better label Sep 25, 2015

lucaswiman mentioned this issue Jun 18, 2016

Fixing several problems with hypothes.extra.django #335

Closed

mulkieran mentioned this issue Jan 13, 2017

D-Bus specification specifies various maxima which strategy does not conform to mulkieran/hs-dbus-signature#73

Open

Zac-HD added new-feature entirely novel capabilities or strategies and removed enhancement it's not broken, but we want it to be better labels Jun 10, 2017

DRMacIver removed the for-a-modest-fee label Feb 23, 2018

Zac-HD removed this from the Hypothesis 2.0 milestone Mar 1, 2018

Zac-HD mentioned this issue Jan 5, 2019

Create strategies for strings matching Lark (EBNF) context-free grammars #1740

Merged

Zac-HD self-assigned this Jan 8, 2019

Zac-HD removed their assignment Jan 26, 2019

Zac-HD mentioned this issue Feb 4, 2019

Construct a Lark grammar from ABNF format (RFC 5234) lark-parser/lark#318

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating strategies from grammars #170

Creating strategies from grammars #170

DRMacIver commented Sep 25, 2015

Zac-HD commented Aug 28, 2017

DRMacIver commented Aug 28, 2017

Zac-HD commented Dec 14, 2018 •

edited

Loading

DRMacIver commented Dec 14, 2018

Zac-HD commented Dec 14, 2018 •

edited

Loading

Zac-HD commented Dec 28, 2018

Zac-HD commented Jan 8, 2019 •

edited

Loading

Zac-HD commented Feb 4, 2019

DRMacIver commented Feb 4, 2019

Creating strategies from grammars #170

Creating strategies from grammars #170

Comments

DRMacIver commented Sep 25, 2015

Zac-HD commented Aug 28, 2017

DRMacIver commented Aug 28, 2017

Zac-HD commented Dec 14, 2018 • edited Loading

DRMacIver commented Dec 14, 2018

Zac-HD commented Dec 14, 2018 • edited Loading

Zac-HD commented Dec 28, 2018

Zac-HD commented Jan 8, 2019 • edited Loading

Zac-HD commented Feb 4, 2019

DRMacIver commented Feb 4, 2019

Zac-HD commented Dec 14, 2018 •

edited

Loading

Zac-HD commented Dec 14, 2018 •

edited

Loading

Zac-HD commented Jan 8, 2019 •

edited

Loading