The different parsers with the benefits and drawback are explained in this wiki page
Here the list of the classes with the feature tested.
It has annotation with parameters.
It defines some lambda expressions. It was taken form the Sun tutorial.
Copy/Paste twice of the source in LocalDiscovery. It serves to check the performance.
Simple hello world
Class taken form the elastic search project on github repository. It is normal java code but taken from a real project.
Class with just one method without the class declaration. It is supposed to test the parse error recovery of ANTLR
A huge class to test its performances. Moreover it testsRoutes the support to <? extends T> type.
Copy/Paste three times the content of PredictionModule.java in order to check the performance with huge files.
At the moment, it seems to be the best parser among the ones tested. The only drawback is the performance, since it is 100x slower than the others. The main problem of the other parses is that they could not handle correctly the comments. Thus a round trip is impossible to achieve :(
With wrong defined classes, it managed to parse them with few errors, but still it creates a roughly, but correct enough, tree representation. It has a really good parse error recovery.
It seems that the number of lines of code does not influence the performance. Instead, the parsing is slower if we have many nested block of code. This is confirmed by the RandomLongFile where we have c/p the content of PredictionModule three times where the increasing time required to parse the file is only of 25.4%. Nonetheless, c/p the content of LocalDiscovery two times in HeavyParsing.java increase the parsing time only by 15%. The first results are available in the following table.
File | # Row | Time (ms) | Time w/Comment (ms) |
---|---|---|---|
Antlr4Mojo | 522 | 6654 | 6799 |
Calculator | 24 | 510 | 488 |
HeavyParsing | 790 | 16213 | 16455 |
HelloWorld | 3 | 235 | 229 |
LocalDiscovery | 404 | 14092 | 14549 |
OnlyMethod | 86 | 34 | 33 |
PredictionModule | 380 | 1716 | 1765 |
RandomLongFile | 1070 | 2152 | 1921 |
They are calculated with the class TimeParsing runned 10 times (with bash for to avoid that jit kicks in) and averaging the results.
A performance bottleneck is the strategy used to handle errors and how to construct the DFA. As presented in issue #192 and issue #400 do not use error recovery helps the performance. The SSL strategy improved by 5s (+ ~33%) the performance in LocalDiscovery class but, as expected, it cannot handle correctly the full java grammar. For some classes it throws some errors and it do not manage to parse them correctly.
The strategy LL_EXACT_AMBIG_DETECTION
does not look like to improve the parsing time.
At the moment we can make round trip transformations: src -> java8CommentSupportedAST -> src
. A new grammar supports the handling of the comment.
It correctness was tested over jhotdraw, hadoop and spark.
It manages to parse correctly every single java file.