A simple compiler written in Rust that uses LLVM to compile C code.
- Function definitions with return types
- Return statements with optional expressions
- Function parameters (basic parameter parsing implemented)
- Declarations and Assignments
- Block statements with curly braces
{}
- Integers (
int) - Void (
void) - Char (
char) - Long (
long) - Double (
double) - Float (
float)
- Arithmetic operations:
+,-,*,/ - Unary operations:
- Unary minus (
-) - Logical NOT (
!) - Bitwise NOT (
~)
- Unary minus (
- Binary expressions with proper operator precedence
- Parenthesized expressions
- Variable references (identifier lookup)
- Numeric literals
- Ternary expressions
- Single-line comments (
//) - Block comments (
/* */)
- Variable declarations and assignments
- Local variable scoping
- Function calls with argument passing
- Multiple parameter support (currently limited to 6 due to calling convention)
- Global variable declaration
- Conditional statements (
if/else) - Loops (
while,for,continue,break) - Comparison operators (
==,!=,<,>,<=,>=) - Logical operators (
&&,||)
- Arrays and indexing
- Nested Blocks
- *Pointer declaration(int ptr)
- Address-of operator(&variable)
- *Dereference operator(ptr)
- Pointer arithmetic(ptr+1)
- Pointer assignment(ptr = &x)
- Address-of operator
- Structures/records
- String handling
- Multiple source files
- Basic optimizations (constant folding, dead code elimination)
- Better error messages with line numbers and suggestions
- Debugging information generation
- Standard library functions (
printf, etc.)
- Type system improvements
- Generic/template support
- Module system
- Memory management features
The compiler follows a traditional three-phase design:
- Scanner tokenizes the source code
- Handles keywords, operators, identifiers, numbers, and strings
- Supports both single-line and block comments
- Implements "maximal munch" principle for token recognition
- Recursive descent parser builds an Abstract Syntax Tree (AST)
- Implements operator precedence for binary expressions
- Handles unary expressions and function definitions
- Error reporting for syntax errors
- SemanticAnalyzer analyzes the parsed AST semanticly
- Checks for usage before declaration errors
- LLVMCodeGenerator translates AST to llvm generated assembly
- Follows System V ABI calling conventions
- Manages stack frame allocation and variable storage
- Generates complete executable assembly with proper prologue/epilogue
- Rust compiler
- Clang
- LLVM 18.1 (Just to compile Catalyst)
- Polly (libpolly-18-dev)
- Compile the compiler:
cargo build --release- Compile a source file:
./target/release/your_compiler_name source_file.c- Run the generated executable:
./source_file# Create a simple program
echo 'int main() { return 42; }' > test.c
# Compile it
cargo run test.c
# Run the generated executable
./test
# Check the exit code
echo $? # Should output: 42This is a learning project, but contributions are welcome! Areas that need attention:
- Parser improvements - Better error recovery and reporting
- Code generation - More expression types and optimizations
- Testing - Comprehensive test suite for all components
- Documentation - Code comments and usage examples
This compiler is a work in progress and serves as an educational project for understanding compiler construction principles.