-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Hi!
Thanks for your efforts on this project!
I have tried the working example with the latest version of this project.
The commands are listed below:
grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 -o examples/fuzzer/
grammarinator-generate HTMLGenerator.HTMLGenerator -r htmlDocument -d 20 -o /test/test_%d.html -n 10 -s HTMLGenerator.html_space_serializer --sys-path /grammarinator/examples/fuzzer
The first line in the generated file of the Generator
contains the information about the version.
# Generated by Grammarinator 23.7
However, I have found the generated files of the testing cases don't work properly as expected.
Specifically, these files are either blank without any content or filled with wrong codes (i.e., invalid with incorrect grammar according to the original antlr
grammar file).
Case 1:
<%%>
<%𰢆%>
<聉‿·>
<g3_F="" ⺣27 씨·= "" ><![CDATA[䬓꭪]]></g3_>
<![]><ﹲ⁀><![谻]>ꯚ𢝬<![]></ﹲ⁀> <!---->
<Z-/><!--䜵-->
<script𗉊𬱦></script>
Case 2:
<?xml>
Case 3:
<?xml𤢬><!>
The issues persist when I directly utilize the HTMLCustomGenerator.py
in the repository, i.e.,
<?𗠭?> <??> <?xml𧞢䯸><?𭆉ꑓ?> <![把]><script>왘𥍾𤏖</><!--渇--><!----><![𬲱榄]><bodyonbeforeprint onbeforeunload ><style놂>* { background: green; }</></body>
Another problem is that when I utilize the grammar file to generate queries for PostgreSQL,
(i.e., https://github.com/antlr/grammars-v4/blob/master/sql/postgresql/PostgreSQLLexer.g4 and
https://github.com/antlr/grammars-v4/blob/master/sql/postgresql/PostgreSQLParser.g4)
the generated file PostgreSQLGenerator
subclass the PostgreSQLLexerBase
class even when I comment the option in the grammar files. Could I ask where I can fix this issue, e.g., find the PostgreSQLLexerBase
file.
class PostgreSQLGenerator(PostgreSQLLexerBase):
I change the class PostgreSQLLexerBase
to Generator
and the generated files are all blank.
Activity
renatahodovan commentedon May 8, 2024
Hi @Beliefuture !
Thanks for your interest in Grammarinator!
Empty output are generated by HTMLGenerator if all the quantified components of the starting htmlDocument rule decides to stop generation at the first iteration. Since there is 0.5 chance for stopping and continuing the loop at every iteration and since there is 6 quantified components in
htmlDocument
, empty output happen with 0.5^6 chance.As per the invalid output... these output might look as invalid HTMLs (and some of them are indeed), however they fulfill all the requirements defined by the grammar. The grammar doesn't have any information about tag or attribute names or attribute values. It doesn't know anything about spaces between the tokens. It doesn't know the semantics of style, script or xml tags. Etc. This is simply because these grammars are parser grammars. They are responsible to check only the syntax of an input and all the further checks are usually implemented manually. Similarly, if these grammars are used to generate output, then the additional information needed to be defined manually. Either by editing the grammar itself with rule rewrites, custom predicates or actions (probably with loosing the possibility of using the grammar for parsing) or by implementing custom generator subclasses and/or models/listeners/serializers etc. HTMLCustomGenerator is a basic example for such a custom generator.
Regarding the PostgreSQL issue, are you sure you commented out the superClass options both in the lexer and parser grammars and regenerated the generator? Another option to control the superclass of the produced generator is rewriting the superClass option from CLI like this:
Getting only empty output from PostgreSQL is weird. Although stmtmulti is completely quantified, it should only result at most 50% empty result. Could you paste the command you used resulting in only empty output?
Beliefuture commentedon May 9, 2024
Hi @renatahodovan !
Thanks for your detailed explanation and sorry for the late response.
For the empty output by the
HTMLGenerator
, could I ask whether there exist some ways to customize the rules to enforce the quantified components not to decide to stop generation at the first or later iteration?In the meanwhile, how to control the complexity (i.e., the number of tokens) of the generated files (specify the value of the
-d
parameter?) since I have found the generated files are almost short with few tokens.Since there exist messy characters in the generated files (e.g.,
𧞢䯸
,왘𥍾𤏖
,𗉊𬱦
......) in the demonstrated cases above, I wonder how this tool populates these values and how to specify the set of the values to make the generated file more reasonable?For the class issue of PostgreSQL, I have checked the files, and I am sorry that I forget to comment the class in the
PostgreSQLLexer.g4
file.For the empty issue of PostgreSQL, I have tried to generate ten cases again for testing and only three of them are not empty (≈0.3). Based on your illustration that the default probability of the empty output is 0.5, I think it doesn't raise an issue. Does this tool adopt random strategy to generate testing cases now? Could I set my preference to make it generate specific clauses or expressions I want?
Again, I want to know how to specify the literal values of the generated SQL to make them more reasonable to be the testing cases (since they look strange?). I have listed a case below:
PostgreSQLGenerator
file but I think they might be attributed to theantlr
source grammar file. But I am not sure whether these errors will hinder this tool to function properly.Please leave messages if you have any questions :)
Beliefuture commentedon May 9, 2024
@renatahodovan
Besides, I have found that the generated SQLs for PostgreSQL are typically incomplete and not executable that fail to obey the grammar rule strictly?
Beliefuture commentedon May 10, 2024
Maybe the incomplete queries generated can be attributed to the truncation due to the parameter
-d
?