|
| 1 | +# A small cross-compiler for a subset of C |
| 2 | + |
| 3 | +Origin: [Puny C](https://github.com/bobbl/punycc) |
| 4 | + |
| 5 | +Features |
| 6 | + * Supported target and host architectures: **RISC-V**. |
| 7 | + * Valid source code for Puny C is also valid C99 and can be written in a way |
| 8 | + that gcc or clang compile it without any warning. |
| 9 | + * Code generation is designed to be easily portable to other target |
| 10 | + architectures. |
| 11 | + * Fast compilation, small code size. |
| 12 | + |
| 13 | +Inspired by |
| 14 | + * [cc500](https://github.com/8l/cc500) - |
| 15 | + a tiny self-hosting C compiler by Edmund Grimley Evans |
| 16 | + * [Obfuscated Tiny C Compiler](https://bellard.org/otcc/) - |
| 17 | + very small self compiling C compiler by Fabrice Bellard |
| 18 | + * [Tiny C Compiler](https://savannah.nongnu.org/projects/tinycc) - |
| 19 | + a small but hyper fast C compiler. |
| 20 | + * [Compiler Construction](https://people.inf.ethz.ch/wirth/CompilerConstruction/index.html) - |
| 21 | + brief but comprehensive book by Niklaus Wirth. |
| 22 | + |
| 23 | +Run the following command under top-level directory. |
| 24 | +```shell |
| 25 | +tests/cc-selfhost.sh |
| 26 | +``` |
| 27 | + |
| 28 | +## Language restrictions |
| 29 | + |
| 30 | + * No linker. |
| 31 | + * No preprocessor. |
| 32 | + * No standard library. |
| 33 | + * No `typedef`. |
| 34 | + * No type checking. Variable types are always `unsigned int`, except if |
| 35 | + indexed with `[]` then the type is `char *`. |
| 36 | + * Any combination of `unsigned`, `long` `int`, `char`, `void` and `*` is |
| 37 | + accepted as valid type. |
| 38 | + * Type casts are allowed, but ignored. |
| 39 | + * Constants: only decimal, character and string without backslash escape |
| 40 | + * Statements: `if`, `while`, `return`. |
| 41 | + * Variable declaration: C99-style statements. |
| 42 | + * Operators: no unary, ternary, extended assignment. |
| 43 | + * Operator precedence: simplified, use parentheses instead. |
| 44 | + |
| 45 | +| level | operator | description | |
| 46 | +| ----- | -------------------- | ----------------------- | |
| 47 | +| 1 | [] () | array and function call | |
| 48 | +| 2 | + - << >> & ^ | | binary operation | |
| 49 | +| 3 | < <= > >= == != | comparison | |
| 50 | +| 4 | = | assignment | |
| 51 | + |
| 52 | +## Low-Level Functions |
| 53 | + |
| 54 | +There is no inline assembler for functions that directly access the operating |
| 55 | +system (e.g. file I/O). But code can be written in pure binary: |
| 56 | + |
| 57 | + void exit(int) _Pragma("emit \x58\x5b\x31\xc0\x40\xcd\x80"); |
| 58 | + /* 58 pop eax |
| 59 | + 5b pop ebx |
| 60 | + 31 c0 xor eax, eax |
| 61 | + 40 inc eax |
| 62 | + cd 80 int 128 */ |
| 63 | + |
| 64 | +Other compilers ignore the `_Pragma` statement, which turns the line into a |
| 65 | +forward declaration where libc can be linked against. |
| 66 | + |
| 67 | +## Implementation Details |
| 68 | + |
| 69 | +Each compiler consists of three parts: |
| 70 | + |
| 71 | + 1. Host-specific standard functions for I/O in `stdlib.c` |
| 72 | + 2. Target-specific code generation in `emit.c` |
| 73 | + 3. Architecture independent compiler parts (scanner, parser and symbol table) |
| 74 | + |
| 75 | +Concatenate the three files and compile it. |
| 76 | +Cross compilers can be built by using a different `ARCH` for `host_` and `emit_`. |
| 77 | + |
| 78 | +### Memory Management |
| 79 | + |
| 80 | +There is only one buffer `buf`. |
| 81 | +The code grows from 0 upwards, the symbol table grows from the top downwards. |
| 82 | +The token buffer for strings and identifiers is dynamically allocated in the |
| 83 | +space between them: |
| 84 | + |
| 85 | + 0 code_pos code_pos+256 sym_head-256 sym_head buf_size |
| 86 | + token_buf token_buf+token_size |
| 87 | + +------+---------------+-------------------+---------------+--------------+ |
| 88 | + | code | 256 bytes | identifier/string | 256 bytes | symbol table | |
| 89 | + +------+---------------+-------------------+---------------+--------------+ |
0 commit comments