-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlanguage.txt
395 lines (307 loc) · 16.6 KB
/
language.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
The below language is a (lightly adapted) version of the Tiger language from
Andrew Appel's Modern Compiler Implementation series of books.
------------------------------------------------------------------------------
1. Lexical Issues
------------------------------------------------------------------------------
Identifiers: An identifier is a sequence of letters, digits, and underscores,
starting with a letter. Uppercase and lowercase letters are distinct. The
symbol <id> stands for an identifier.
Comments: A comment may appear between any two tokens. Comments start with /*
and end with */ and may be nested.
------------------------------------------------------------------------------
2. Declarartion
------------------------------------------------------------------------------
A declaration-sequence is a sequence of type, value, and function declarations.
No punctuation separates or teminates individual declarations.
<decs> -> { <dec> }
<dec> -> <tydec>
-> <vardec>
-> <fundec>
In the syntactic notation used here \empty stands for the empty string,
and { x } stands for a possibly empty sequence of x's.
---------------------------------------
Datatypes
---------------------------------------
The syntax of types and type declarations in Tiger is
<tydec> -> type <type-id> = <ty>
<ty> -> <type-id>
-> { <tyfields> } (these braces stand for themselves)
-> array of <type-id>
<tyfields> -> \empty
-> <id> : <type-id> {, <id> : <type-id> }
Built-in types: Two named types "int" and "string" are predefined.
Additional named types may be defined or redefined (including predefined types)
by type declarations.
Records: Record types are defined by a listing of their fields enclosed in braces, with
each filed described by fieldname : <type-id>, where <type-id> is an identifier
defined by a type declaration.
Arrays: An array of any named type may be made by array of <type-id>. The length
of the array is not specified as part of its type; each array of that type
can have a different length, and the length will be decied upon array creation,
at run time.
Record distinction: Each declaration of a record or array type creates a new type,
incompatible with all other record or array types (even if all the fields are similar.)
Mutually recursive types: A collection of types may be recursive or mutually recursive.
Mutually recursive types are declared by a consecutive sequence of type declarations
without intervening value or function declarations. Each recursion cyle must pass through
a record or array type.
This is legal:
type tree = { key:int, children:treelist }
type treelist = { hd:tree, tl:treelist }
This is not:
type b = c
type c = b
Field name reusability: Different record types may use the same field names.
---------------------------------------
Variables
---------------------------------------
<vardec> -> var <id> := <exp>
-> var <id> : <type-id> := <exp>
In the short form of variable declaration, the name of the variable is given,
followed by an expression representing the initial value of the variable. In this
case, the type of the variable is determined from the type of the expression.
In the long form, the type of the variable is also given. The expression must
have the same type.
If the initializing expression is "nil", then the long form must be used.
Each variable declaration creates a new variable, which lasts as long as the
scope of the declaration.
---------------------------------------
Functions
---------------------------------------
<fundec> -> function <id> ( <tyfields> ) = <exp>
-> function <id> ( <tyfields> ) : <type-id> = <exp>
The first of these is a procedure declaration; the second is a function decla-
ration. Procedures do not return result values; functions do, and the type is
specified after the colon. The exp is the body of the procedure or function,
and the ryfields specify the names and type of the parameters. All parameters
are passed by value.
Functions may be recursive. Mutually recursive functions and procedures
are declared by a sequence of consecutive function declarations (with no
intervening type or variable declarations):
function treeLeaves (t: tree) : int
if t=nil then 1
else treelistLeaves(t.children)
function treelistLeaves (L : treelist) : int =
if L=nil then 0
else treeLeaves (L.hd) + treelistLeaves (L.t1)
---------------------------------------
Scope Rules
---------------------------------------
Local variables: In the expression
let . .. vardec ... in exp end
the scope of the declared variable starts just after its vardec and lasts until the end.
Parameters: In function
id ( ... id : id2, . .. ) = exp
the scope of the parameter id lasts throughout the function body exp.
Nested scopes: The scope of a variable or parameter includes the bodies of any
function definitions in that scope. That is, access to variables in outer scopes
is permitted, as in Pascal and Algol.
Types: In the expression
let ... tydecs ... in exps end
the scope of a type identifier starts at the beginning of the consecutive sequence
of type declarations defining it and lasts until the end. This includes the headers
and bodies of any functions within the scope.
Functions: In the expression
let ... fundecs ... in exps end
the scope of a function identifier starts at the beginning of the consecutive sequence
of function declarations defining it and lasts until the end. This includes the headers and
bodies of any functions within the scope.
Name spaces: There are two different name spaces: one for types, and one for
functions and variables. A type a can be "in scope" at the same time as a
variable a or a function a, but variables and functions of the same name cannot
both be in scope simultaneously (one will hide the other).
Local redeclarations: A variable or function declaration may be hidden by the
redeclaration of the same name (as a variable or function) in a smaller scope;
for example, this function prints "6 7 6 8 6" when applied to 5:
function f(v: int)
let var v := 6
in print (v);
let var v : = 7 in print(v) end;
print (v);
let var v := 8 in print(v) end;
print (v)
end
Functions hide variables of the same name, and vice versa. Similarly, a type
declaration may be hidden by the redeclaration of the same name (as a type)
in a smaller scope. However, no two functions in a sequence of mutually re-
cursive functions may have the same name; and no two types in a sequence of
mutually recursive types may have the same name.
------------------------------------------------------------------------------
3. Variables and Expressions
------------------------------------------------------------------------------
---------------------------------------
L-values
---------------------------------------
An l-value is a location whose value may be read or assigned. Variables, procedure
parameters, fields of records, and elements of arrays are all l-values.
<lvalue> -> <id>
-> <lvalue> . <id>
-> <lvalue> [ <exp> ]
Variable: The form <id> refers to a variable or parameter accessible by scope
rules.
Record field: The dot notation allows the selection of the correspondingly named
field of a record value.
Array subscript: The bracket notation allows the selection of the correspond-
ingly numbered slot of an array. Arrays are indexed by consecutive integers
starting at zero (up to the size of the array minus one).
---------------------------------------
Expressions
---------------------------------------
l-value: An l-value, when used as an expression, evaluates to the contents of the
corresponding location.
Valueless expressions: Certain expressions produce no value: procedure calls,
assignment, if-then, while, break, and sometimes if-then-else. Therefore the
expression
(a := b) + c
is syntactically correct but fails to type-check.
Nil: The expression nil (a reserved word) denotes a value nil belonging to every
record type. If a record variable v contains the value nil, it is a checked run-
time error to select a field from v. Nil must be used in a context where its type
can be determined, that is:
var a : my_record := nil OK
a := nil OK
if a <> nil then ... OK
if nil <> a then ... OK
if a = nil then ... OK
function f(p: my _record) = ... f(nil) OK
var a := nil ILLEGAL
if nil = nil then ... ILLEGAL
Sequencing: A sequence of two or more expressions, surrounded by parenthe-
ses and separated by semicolons
(exp; exp : ... exp)
evaluates all the expressions in order. The result of a sequence is the result
(if any) yielded by the last of the expressions.
No value: An open parenthesis followed by a close parenthesis (two separate
tokens) is an expression that yields no value. Similarly, a let expression with
nothing between the in and end yields no value.
Integer literal: A sequence of decimal digits is an integer constant that denotes
the corresponding integer value.
String literal: A string constant is a sequence, between quotes ("), of zero or
more printable characters, spaces, or escape sequences. Each escape sequence
is introduced by the escape character \, and stands for a character sequence.
The allowed escape sequences are as follows (all other uses of \ being illegal):
\n A character interpreted by the system as end-of-line.
\t Tab.
\^c The control character c, for any appropriate c.
\ddd The single character with ASCII code ddd (3 decimal digits).
\" The double quote character (").
\\ The backslash character (\).
\f__f\ This sequence is ignored, where f__f stands for a sequence of one
or more formatting characters (a subset of the non-printable characters
including at least space, tab, newline, formfeed). This allows one to
write long strings on more than one line, by writing \ at the end of one line
and at the start of the next.
Negation: An integer-valued expression may be prefixed by a minus sign.
Function call: A function application
id()
or
id (exp{, exp})
indicates the application of function id to a list of actual parameter values
obtained by evaluating the expressions left to right. The actual parameters are
bound to the corresponding formal parameters of the function definition and the function
body is bound using conventional static scoping rules to obtain a result. If id actually
stands for a procedure (a function returning no result), then the function body must produce
no value, and the function application also produces no value.
Arithmetic: Expressions of the form exp op exp, where op is +, -, *, /, require
integer arguments and produce an integer result.
Comparison: Expressions of the form exp op exp, where op is =, <>, >, <, >=, <=,
compare their operands for equality or inequality and produce the integer 1 for
true and O for false. All these operators can be applied to integer operands. The
equals and not-equals operators can also be applied to two record or array
operands of the same type, and compare for "reference" or "pointer" equality
(they test whether two records are the same instance, not whether they have
the same contents).
String comparison: The comparison operators may also be applied to strings.
Two strings are equal if their contents are equal; there is no way to distinguish
strings whose component characters are the same. Inequality is according to
lexicographic order.
Boolean operators: Expressions of the form exp op exp, where op is & or |,
are short-circuit boolean conjunctions and disjunctions: they do not evaluate
the right-hand operand if the result is detemined by the left-hand one. Any
nonzero integer value is considered true, and an integer value of zero is false.
Precedence of operators: Unary minus (negation) has the highest precedence.
Then operators *,/ have the next highest (tightest binding) precedence, followed
by +, -, then by =, <>, >, <, >=, <=, then by &, then by |.
Associativity of operators: The operators *, /, +, - are all left-associative. The
comparison operators do not associate, so a=b=c is not a legal expression,
although a=(b=c) is legal.
Record creation: The expression
type-id { id=exp {, id=exp} }
or (for an empty record type)
type-id { }
creates a new record instance of type type-id. The field names and types of the record expression
must match those of the named type, in the order given. The braces { } stand for themselves.
Array creation: The expression
type-id [exp_1] of exp_2
evaluates exp_1, and exp_2,
(in that order) to find n, the number of elements, and v the initial value. The
type type-id must be declared as an array type. The result of the expression
is a new array of type type-id, indexed from O to n - 1, in which each slot is
initialized to the value v.
Array and record assignment: When an array or record variable A is assigned
a value B, then A references the same array or record as B. Future updates of
elements of A will affect B, and vice versa, until A is reassigned. Parameter
passing of arrays and records is similarly by reference, not by copying.
Extent: Records and arrays have infinite extent: each record or array value lasts
forever, even after control exits from the scope in which it was created.
Assignment: The assignment statement
lvalue : = exp
evaluates the lvalue, then evaluates the exp, then sets the contents of the lvalue
to the result of the expression. Syntactically, := binds weaker than the boolean operators & and |.
The assignment expression produces no value, so that (a := b) +c is illegal.
If-then-else: The if-expression
if exp1, then exp2 else exp3
evaluates the integer expression exp1. If the result is nonzero it yields the result of evaluating exp2;
otherwise it yields the result of exp3. The expressions exp2 and exp3 must
have the same type, which is also the type of the entire if-expression (or both
expressions must produce no value).
If-then: The if-expression
if exp1 then exp2
evaluates the integer expression exp1. If the result is nonzero, then exp2 (which must produce no value) is
evaluated. The entire if-expression produces no value.
While: The expression
while exp1 do exp2
evaluates the integer expression exp1. If the result is nonzero, then exp2 (which must produce no value) is
executed, and then the entire while-expression is reevaluated.
For: The expression
for id := exp1 to exp2 do exp3
iterates exp3 over each integer value of id between exp1 and exp2. The variable id is a new variable
implicitly declared by the for statement, whose scope covers only exp3, and
may not be assigned to. The body exp3 must produce no value. The upper and
lower bounds are evaluated only once, prior to entering the body of the loop.
If the upper bound is less than the lower, the body is not executed.
Break: The break expression terminates evaluation of the nearest enclosing
while-expression or for-expression. A break in procedure p cannot terminate a loop in procedure q,
even if p is nested within q. A break that is not within a while or for is illegal.
Let: The expression
let decs in expseq end
evaluates the declarations decs, binding types, variables, and procedures whose scope then extends over the
expseq. The expseq is a sequence of zero or more expressions, separated by semi-colons. The result (if any) of
the last exp in the sequence is then the result of the entire let-expression.
Parentheses: Parentheses around any expression enforce syntactic grouping, as
in most programming languages.
PROGRAMS
Tiger programs do not have arguments: a program is just an expression exp.
STANDARD LIBRARY
Several functions are predefined:
function print(s: string)
Prints on standard output.
function flush()
Flush the standard output buffer.
function getchar() : string
Read a character from standard input; return empty string on end of file.
function ord(s: string) : int
Give ASCII value of first character of s; yields -1 if s is empty string.
function chr(i: int) : string
Single-character string from ASCII value 1; halt program if i out of range.
function size(s: string) : int
Number of characters in s.
function substring (s:string, first: int, n:int) : string
Substring of string s, starting with character first, n characters long. Characters
are numbered starting at 0.
function concat (sl: string, s2: string) : string
Concatenation of s1 and s2.
function not (i : integer) : integer
Return (i=0).
function exit(i: int)
Terminate execution with code i.