msta-4.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.66">
 <TITLE>MSTA (syntax description translator): Implementation</TITLE>
 <LINK HREF="msta-5.html" REL=next>
 <LINK HREF="msta-3.html" REL=previous>
 <LINK HREF="msta.html#toc4" REL=contents>
</HEAD>
<BODY>
<A HREF="msta-5.html">Next</A>
<A HREF="msta-3.html">Previous</A>
<A HREF="msta.html#toc4">Contents</A>
<HR>
<H2><A NAME="s4">4.</A> <A HREF="msta.html#toc4">Implementation</A></H2>

<P>The following figure shows what major algorithms MSTA can uses to
generate the parsers:</P>
<P>
<BLOCKQUOTE><CODE>
<PRE>
1. Building   -----> 4. LALR-optimization -----> 5. regular-optimization
   LR(k)-sets                                ^             |
                                             |             |
                                             |             |
2. Building LALR(k)-sets-------->------------|             V
   by generalized                            |    6. equivalent tokens
   DeRemer algorithm                         |          
                                             |          
                                             |          
3. Building LALR(k)-sets------->------------- 
   by general algorithm
</PRE>
</CODE></BLOCKQUOTE>
</P>
<P>MSTA can generate LR(k)-parsers (see MSTA usage).  After building
canonical LR(k)-sets, MSTA usually makes LALR-optimization which
significantly decreases size of the result parser.  You can switch off
LALR-optimization, but my advice is to use it.  This optimization
results in not only less size of the parser, but significantly speeds
up MSTA work and decreases memory requirements (because before and
during LALR-optimization, only LR-sets are represented only by their
essential LR-situations).</P>
<P>LALR-optimization is to search for equivalent LR-sets with the same
LR-core set (LR-set with LR-situations without contexts) and to merge
them.  In other words LALR-optimization is an extracting LALR-parts of
grammars and implementing parsing them by adequate methods.  If the
input grammar describes LALR-language, the result LR-sets will be
LALR-sets.  The optimization algorithm is analogous to searching for
minimal DFA (deterministic final automaton).  There is my article
describing the optimization in one russian journal (Programming,
1989), unfortunately only on russian.  Now I have no time and place to
describe it more detail here.</P>
<P>Usually MSTA generates LALR(k)-parsers with a generalized DeRemer
algorithm.  But when you want to see contexts of LR-situations in the
description file (see MSTA usage), MSTA will use canonical algorithm
of building LALR-sets (see for example old book of Aho and Ulman).
Remember that this algorithm is slower than DeRemer ones in several
times.</P>
<P>When k > 1, the length look ahead of the parser can be less than k.
The parser generated by MSTA always look ahead only on minimal number
of tokens necessary for correct resolution of the conflicts in given
state and given input.</P>
<P>After building LR-sets, MSTA usually runs pass of so called
regular-optimization.  Unfortunately this algorithm is described by me
nowhere.  The idea of algorithm is to searching for all transitions of
parser from the one state to another which are independent from the
parser state (more correctly from the parser stack).  In other words
regular-optimization is an extracting regular-parts of grammars and
implementing parsing them by adequate methods.  As a result the
generated parser will be faster and will use less the parser stack.</P>
<P>If the input grammar describes regular language, the result parser will
not use stack at all.  This permits to use MSTA for generation of
effective scanner too.  Moreover, scanner for a language with
non-regular parts (e.g. nested comments) is described much more simply
on MSTA and is effectively implemented by MSTA.  To extract more
regular parts a splitting LR-sets can be used (this is used for
`%scanner' by default).  Usage of splitting LR-sets is not recommended
for usual programming languages grammars because this requires very
much memory during optimizations.</P>
<P>Implementation of regular-optimization requires more number of
classical LR-parser instructions (not only shift-reduce-error).  This
means that MSTA parser implementation is more oriented to compilation
model than classical YACC or BISON.  This also speeds up the parser
generated by MSTA.  The new instructions "(pop)-(shift)-goto"
(optional parts are in parentheses here) are added.  Moreover, more
one actions for different rules can be executed during one of such
instructions.  Also, each parser state has status of necessity of
pushing the result state and/or corresponding attribute into state and
attribute stacks.  MSTA parser generated with usage of the regular
optimization may have several copies of rule actions, but usually this
only slightly increases size of the parser.</P>
<P>MSTA also searches for equivalent tokens to decrease the generated
parser size.  This optimization is especially effective for scanners.</P>

<HR>
<A HREF="msta-5.html">Next</A>
<A HREF="msta-3.html">Previous</A>
<A HREF="msta.html#toc4">Contents</A>
</BODY>
</HTML>