Skip to content

Commit 555d507

Browse files
committed
Add new test: cc
This is a small self-hosting C compiler, supporting RV32 target and host. Usage: tests/cc-selfhost.sh Source: https://github.com/bobbl/punycc
1 parent ea0d9d2 commit 555d507

File tree

8 files changed

+1294
-0
lines changed

8 files changed

+1294
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.
148148
In `rv32emu` repository, there are some prebuilt ELF files for testing purpose.
149149
* `aes.elf` : See [tests/aes.c](tests/aes.c)
150150
* `captcha.elf` : See [tests/captcha.c](tests/captcha.c)
151+
* `cc.elf` : See [tests/cc](tests/cc)
151152
* `chacha20.elf` : See [tests/chacha20](tests/chacha20.c)
152153
* `coremark.elf` : See [eembc/coremark](https://github.com/eembc/coremark) [RV32M]
153154
* `dhrystone.elf` : See [rv8-bench](https://github.com/michaeljclark/rv8-bench)

build/cc.elf

9.5 KB
Binary file not shown.

tests/cc-selfhost.sh

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/usr/bin/env bash
2+
3+
function fail()
4+
{
5+
echo "*** Fail"
6+
exit 1
7+
}
8+
9+
O=build
10+
S=tests/cc
11+
RUN=$O/rv32emu
12+
13+
if [ ! -f $RUN ]; then
14+
echo "No build/rv32emu found!"
15+
exit 1
16+
fi
17+
18+
echo "Generating cross compiler..."
19+
cat $S/stdlib.c $S/emit.c $S/cc.c | cc -o $O/cc-native -x c - || fail
20+
echo "Generating native compiler..."
21+
cat $S/stdlib.c $S/emit.c $S/cc.c | $O/cc-native > $O/cc.elf || fail
22+
echo "Self-hosting C compiler..."
23+
cat $S/stdlib.c $S/emit.c $S/cc.c | $RUN $O/cc.elf > out.elf || fail
24+
echo "Build 'hello' program with the self-hosting compiler..."
25+
cat $S/stdlib.c $S/hello.c | $RUN out.elf > hello.elf || fail
26+
echo "Executing the compiled program..."
27+
$RUN hello.elf
28+
rm -f out.elf hello.elf $O/cc-native

tests/cc/README.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# A small cross-compiler for a subset of C
2+
3+
Origin: [Puny C](https://github.com/bobbl/punycc)
4+
5+
Features
6+
* Supported target and host architectures: **RISC-V**.
7+
* Valid source code for Puny C is also valid C99 and can be written in a way
8+
that gcc or clang compile it without any warning.
9+
* Code generation is designed to be easily portable to other target
10+
architectures.
11+
* Fast compilation, small code size.
12+
13+
Inspired by
14+
* [cc500](https://github.com/8l/cc500) -
15+
a tiny self-hosting C compiler by Edmund Grimley Evans
16+
* [Obfuscated Tiny C Compiler](https://bellard.org/otcc/) -
17+
very small self compiling C compiler by Fabrice Bellard
18+
* [Tiny C Compiler](https://savannah.nongnu.org/projects/tinycc) -
19+
a small but hyper fast C compiler.
20+
* [Compiler Construction](https://people.inf.ethz.ch/wirth/CompilerConstruction/index.html) -
21+
brief but comprehensive book by Niklaus Wirth.
22+
23+
Run the following command under top-level directory.
24+
```shell
25+
tests/cc-selfhost.sh
26+
```
27+
28+
## Language restrictions
29+
30+
* No linker.
31+
* No preprocessor.
32+
* No standard library.
33+
* No `typedef`.
34+
* No type checking. Variable types are always `unsigned int`, except if
35+
indexed with `[]` then the type is `char *`.
36+
* Any combination of `unsigned`, `long` `int`, `char`, `void` and `*` is
37+
accepted as valid type.
38+
* Type casts are allowed, but ignored.
39+
* Constants: only decimal, character and string without backslash escape
40+
* Statements: `if`, `while`, `return`.
41+
* Variable declaration: C99-style statements.
42+
* Operators: no unary, ternary, extended assignment.
43+
* Operator precedence: simplified, use parentheses instead.
44+
45+
| level | operator | description |
46+
| ----- | -------------------- | ----------------------- |
47+
| 1 | [] () | array and function call |
48+
| 2 | + - << >> & ^ &#124; | binary operation |
49+
| 3 | < <= > >= == != | comparison |
50+
| 4 | = | assignment |
51+
52+
## Low-Level Functions
53+
54+
There is no inline assembler for functions that directly access the operating
55+
system (e.g. file I/O). But code can be written in pure binary:
56+
57+
void exit(int) _Pragma("emit \x58\x5b\x31\xc0\x40\xcd\x80");
58+
/* 58 pop eax
59+
5b pop ebx
60+
31 c0 xor eax, eax
61+
40 inc eax
62+
cd 80 int 128 */
63+
64+
Other compilers ignore the `_Pragma` statement, which turns the line into a
65+
forward declaration where libc can be linked against.
66+
67+
## Implementation Details
68+
69+
Each compiler consists of three parts:
70+
71+
1. Host-specific standard functions for I/O in `stdlib.c`
72+
2. Target-specific code generation in `emit.c`
73+
3. Architecture independent compiler parts (scanner, parser and symbol table)
74+
75+
Concatenate the three files and compile it.
76+
Cross compilers can be built by using a different `ARCH` for `host_` and `emit_`.
77+
78+
### Memory Management
79+
80+
There is only one buffer `buf`.
81+
The code grows from 0 upwards, the symbol table grows from the top downwards.
82+
The token buffer for strings and identifiers is dynamically allocated in the
83+
space between them:
84+
85+
0 code_pos code_pos+256 sym_head-256 sym_head buf_size
86+
token_buf token_buf+token_size
87+
+------+---------------+-------------------+---------------+--------------+
88+
| code | 256 bytes | identifier/string | 256 bytes | symbol table |
89+
+------+---------------+-------------------+---------------+--------------+

0 commit comments

Comments
 (0)