shilka-1.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.66">
 <TITLE>SHILKA (keywords description translator): Introduction</TITLE>
 <LINK HREF="shilka-2.html" REL=next>

 <LINK HREF="shilka.html#toc1" REL=contents>
</HEAD>
<BODY>
<A HREF="shilka-2.html">Next</A>
Previous
<A HREF="shilka.html#toc1">Contents</A>
<HR>
<H2><A NAME="s1">1.</A> <A HREF="shilka.html#toc1">Introduction</A></H2>

<P>SHILKA is translator of keywords description into code for fast
recognition of keywords and standard identifiers in compilers.  SHILKA
is implemented with the aid of other components of COCOM toolset.</P>
<P>SHILKA is analogous to GNU package `gperf' but not based on perfect
hash tables.  SHILKA rather uses minimal pruned O-trie for for keyword
recognition.  As consequence SHILKA can take the presumable frequency
of keyword occurences in the program into accout.  Gperf can not make
it.  Therefore as rule keyword recognition code generated by SHILKA is
faster than one generated by Gperf up to 50%.</P>
<P>SHILKA is suitable for fast recognition from few keywords to huge
dictionary of words (strings).</P>
<P>What is minimal pruned O-trie?  Let us consider what is trie.  If we
have four keywords: case, char, else, enum.  We can recognize the
keywords with the following structure called trie.
<BLOCKQUOTE><CODE>
<PRE>
                             |
                    -----------------
                 c |               e |
                -------          --------- 
             a |     h |       l|        n|
             s |     a |       s|        u|
             e |     r |       e|        m|
</PRE>
</CODE></BLOCKQUOTE>

The corresponding code for the keywords recognition based on this
structure could be
<BLOCKQUOTE><CODE>
<PRE>
              if (kw[0] == 'c')
                {
                  if (kw[1] == 'a')
                    {
                      if (kw[2] == 's')
                        {
                          if (kw[3] == 'e')
                            {
                              /* we recognize keyword case */
                            }
                          else
                            /* this is not a keyword */
                        }
                      else
                       /* this is not a keyword */
                    }
                  else if (kw[1] == 'h')
                    {
                      if (kw[2] == 'a')
                        {
                          if (kw[3] == 'r')
                            {
                              /* we recognize keyword char */
                            }
                          else
                            /* this is not a keyword */
                        }
                      else
                       /* this is not a keyword */
                    }
                  else
                    /* this is not a keyword */
                }
              else if (kw[0] = 'e')
                {
                  if (kw[1] == 'l')
                    {
                      if (kw[2] == 's')
                        {
                          if (kw[3] == 'e')
                            {
                              /* we recognize keyword else */
                            }
                          else
                            /* this is not a keyword */
                        }
                      else
                       /* this is not a keyword */
                    }
                  else if (kw[1] == 'n')
                    {
                      if (kw[2] == 'u')
                        {
                          if (kw[3] == 'm')
                            {
                              /* we recognize keyword enum */
                            }
                          else
                            /* this is not a keyword */
                        }
                      else
                       /* this is not a keyword */
                    }
                  else
                    /* this is not a keyword */
                }
</PRE>
</CODE></BLOCKQUOTE>

You can see in the example above that it is not necessary to test all
characters of the keywords.  Instead of this, we can test only several
characters of the keywords and test all kewyord at the end of final
decision that given string is a keyword.  Such technique results in
another structure called pruned trie:
<BLOCKQUOTE><CODE>
<PRE>
                             |
                    -----------------
                 c |               e |
                -------          --------- 
             a |     h |      l |       n |
               |       |        |         |
             case     char     else      enum
</PRE>
</CODE></BLOCKQUOTE>

The corresponding code for the keywords recognition based on this
structure could be
<BLOCKQUOTE><CODE>
<PRE>
              if (kw[0] == 'c')
                {
                  if (kw[1] == 'a')
                    {
                      if (strcmp (kw, "case") == 0)
                        /* we recognize keyword case */
                      else
                        /* this is not a keyword */
                    }
                  else if (kw[1] == 'h')
                    {
                      if (strcmp (kw, "char") == 0)
                        /* we recognize keyword char */
                      else
                        /* this is not a keyword */
                    }
                  else
                    /* this is not a keyword */
                }
              else if (kw[0] = 'e')
                {
                  if (kw[1] == 'l')
                    {
                      if (strcmp (kw, "else") == 0)
                        /* we recognize keyword else */
                      else
                        /* this is not a keyword */
                    }
                  else if (kw[1] == 'n')
                    {
                      if (strcmp (kw, "enum") == 0)
                        /* we recognize keyword enum */
                      else
                        /* this is not a keyword */
                    }
                  else
                    /* this is not a keyword */
                }
</PRE>
</CODE></BLOCKQUOTE>

Probably you found that if we test keywords characters in another
order (not in sequential order), we could recognize keywords faster.
Using such approach results in another structure called pruned
O-trie:
<BLOCKQUOTE><CODE>
<PRE>
                               |
                   ------------2-------------
                a |      h |       l |     n |
                  |        |         |       |
                case      char     else     enum
</PRE>
</CODE></BLOCKQUOTE>

Here number on the intersection means what keyword character (1st,
2nd, ...) is tested.  The corresponding code for the keywords
recognition based on this structure could be
<BLOCKQUOTE><CODE>
<PRE>
              if (kw[1] == 'a')
                {
                  if (strcmp (kw, "case") == 0)
                    /* we recognize keyword case */
                  else
                    /* this is not a keyword */
                }
              else if (kw[1] == 'h')
                {
                  if (strcmp (kw, "char") == 0)
                    /* we recognize keyword char */
                  else
                    /* this is not a keyword */
                }
              else if (kw[1] == 'l')
                {
                  if (strcmp (kw, "else") == 0)
                    /* we recognize keyword else */
                  else
                    /* this is not a keyword */
                }
              else if (kw[1] == 'n')
                {
                  if (strcmp (kw, "enum") == 0)
                    /* we recognize keyword enum */
                  else
                    /* this is not a keyword */
                }
              else
                /* this is not a keyword */
</PRE>
</CODE></BLOCKQUOTE>

And finally, minimal in the phrase "minimal pruned O-trie" means that
we found pruned O-trie with minimal number of testing the keyword
characters.  Generally speaking we can introduce notion cost for
pruned O-trie and search for prunned O-trie with minimal cost.  Shilka
takes probability of keyword occurences in program into account for
evaluation of the cost of prunned O-trie.</P>


<HR>
<A HREF="shilka-2.html">Next</A>
Previous
<A HREF="shilka.html#toc1">Contents</A>
</BODY>
</HTML>