-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathshilka-1.html
243 lines (233 loc) · 8.33 KB
/
shilka-1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.66">
<TITLE>SHILKA (keywords description translator): Introduction</TITLE>
<LINK HREF="shilka-2.html" REL=next>
<LINK HREF="shilka.html#toc1" REL=contents>
</HEAD>
<BODY>
<A HREF="shilka-2.html">Next</A>
Previous
<A HREF="shilka.html#toc1">Contents</A>
<HR>
<H2><A NAME="s1">1.</A> <A HREF="shilka.html#toc1">Introduction</A></H2>
<P>SHILKA is translator of keywords description into code for fast
recognition of keywords and standard identifiers in compilers. SHILKA
is implemented with the aid of other components of COCOM toolset.</P>
<P>SHILKA is analogous to GNU package `gperf' but not based on perfect
hash tables. SHILKA rather uses minimal pruned O-trie for for keyword
recognition. As consequence SHILKA can take the presumable frequency
of keyword occurences in the program into accout. Gperf can not make
it. Therefore as rule keyword recognition code generated by SHILKA is
faster than one generated by Gperf up to 50%.</P>
<P>SHILKA is suitable for fast recognition from few keywords to huge
dictionary of words (strings).</P>
<P>What is minimal pruned O-trie? Let us consider what is trie. If we
have four keywords: case, char, else, enum. We can recognize the
keywords with the following structure called trie.
<BLOCKQUOTE><CODE>
<PRE>
|
-----------------
c | e |
------- ---------
a | h | l| n|
s | a | s| u|
e | r | e| m|
</PRE>
</CODE></BLOCKQUOTE>
The corresponding code for the keywords recognition based on this
structure could be
<BLOCKQUOTE><CODE>
<PRE>
if (kw[0] == 'c')
{
if (kw[1] == 'a')
{
if (kw[2] == 's')
{
if (kw[3] == 'e')
{
/* we recognize keyword case */
}
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
else if (kw[1] == 'h')
{
if (kw[2] == 'a')
{
if (kw[3] == 'r')
{
/* we recognize keyword char */
}
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
else if (kw[0] = 'e')
{
if (kw[1] == 'l')
{
if (kw[2] == 's')
{
if (kw[3] == 'e')
{
/* we recognize keyword else */
}
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
else if (kw[1] == 'n')
{
if (kw[2] == 'u')
{
if (kw[3] == 'm')
{
/* we recognize keyword enum */
}
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
</PRE>
</CODE></BLOCKQUOTE>
You can see in the example above that it is not necessary to test all
characters of the keywords. Instead of this, we can test only several
characters of the keywords and test all kewyord at the end of final
decision that given string is a keyword. Such technique results in
another structure called pruned trie:
<BLOCKQUOTE><CODE>
<PRE>
|
-----------------
c | e |
------- ---------
a | h | l | n |
| | | |
case char else enum
</PRE>
</CODE></BLOCKQUOTE>
The corresponding code for the keywords recognition based on this
structure could be
<BLOCKQUOTE><CODE>
<PRE>
if (kw[0] == 'c')
{
if (kw[1] == 'a')
{
if (strcmp (kw, "case") == 0)
/* we recognize keyword case */
else
/* this is not a keyword */
}
else if (kw[1] == 'h')
{
if (strcmp (kw, "char") == 0)
/* we recognize keyword char */
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
else if (kw[0] = 'e')
{
if (kw[1] == 'l')
{
if (strcmp (kw, "else") == 0)
/* we recognize keyword else */
else
/* this is not a keyword */
}
else if (kw[1] == 'n')
{
if (strcmp (kw, "enum") == 0)
/* we recognize keyword enum */
else
/* this is not a keyword */
}
else
/* this is not a keyword */
}
</PRE>
</CODE></BLOCKQUOTE>
Probably you found that if we test keywords characters in another
order (not in sequential order), we could recognize keywords faster.
Using such approach results in another structure called pruned
O-trie:
<BLOCKQUOTE><CODE>
<PRE>
|
------------2-------------
a | h | l | n |
| | | |
case char else enum
</PRE>
</CODE></BLOCKQUOTE>
Here number on the intersection means what keyword character (1st,
2nd, ...) is tested. The corresponding code for the keywords
recognition based on this structure could be
<BLOCKQUOTE><CODE>
<PRE>
if (kw[1] == 'a')
{
if (strcmp (kw, "case") == 0)
/* we recognize keyword case */
else
/* this is not a keyword */
}
else if (kw[1] == 'h')
{
if (strcmp (kw, "char") == 0)
/* we recognize keyword char */
else
/* this is not a keyword */
}
else if (kw[1] == 'l')
{
if (strcmp (kw, "else") == 0)
/* we recognize keyword else */
else
/* this is not a keyword */
}
else if (kw[1] == 'n')
{
if (strcmp (kw, "enum") == 0)
/* we recognize keyword enum */
else
/* this is not a keyword */
}
else
/* this is not a keyword */
</PRE>
</CODE></BLOCKQUOTE>
And finally, minimal in the phrase "minimal pruned O-trie" means that
we found pruned O-trie with minimal number of testing the keyword
characters. Generally speaking we can introduce notion cost for
pruned O-trie and search for prunned O-trie with minimal cost. Shilka
takes probability of keyword occurences in program into account for
evaluation of the cost of prunned O-trie.</P>
<HR>
<A HREF="shilka-2.html">Next</A>
Previous
<A HREF="shilka.html#toc1">Contents</A>
</BODY>
</HTML>