Skip to content

cmc-rep/LLVM-Code-Generation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLVM Code Generation, First Edition

A deep dive into compiler backend development

Quentin Colombet

This is the code repository for LLVM Code Generation, First Edition, published by Packt.

      Free PDF       Graphic Bundle       Amazon      

About the book

LLVM Code Generation, First Edition

The LLVM infrastructure is a popular compiler ecosystem widely used in the tech industry and academia. This technology is crucial for both experienced and aspiring compiler developers looking to make an impact in the field. Written by Quentin Colombet, a veteran LLVM contributor and architect of the GlobalISel framework, this book provides a primer on the main aspects of LLVM, with an emphasis on its backend infrastructure; that is, everything needed to transform the intermediate representation (IR) produced by frontends like Clang into assembly code and object files. You’ll learn how to write an optimizing code generator for a toy backend in LLVM. The chapters will guide you step by step through building this backend while exploring key concepts, such as the ABI, cost model, and register allocation. You’ll also find out how to express these concepts using LLVM's existing infrastructure and how established backends address these challenges. Furthermore, the book features code snippets that demonstrate the actual APIs. By the end of this book, you’ll have gained a deeper understanding of LLVM. The concepts presented are expected to remain stable across different LLVM versions, making this book a reliable quick reference guide for understanding LLVM.

Key Learnings

  • Understand essential compiler concepts, such as SSA, dominance, and ABI
  • Build and extend LLVM backends for creating custom compiler features
  • Optimize code by manipulating LLVM's Intermediate Representation
  • Contribute effectively to LLVM open-source projects and development
  • Develop debugging skills for LLVM optimizations and passes
  • Grasp how encoding and (dis)assembling work in the context of compilers
  • Utilize LLVM's TableGen DSL for creating custom compiler models

Chapters

LLVM Code Generation, First Edition
  1. Building LLVM and Understanding the Directory Structure
  2. Contributing to LLVM
  3. Compiler Basics and How They Map to LLVM APIs
  4. Writing Your First Optimization
  5. Dealing with Pass Managers
  6. TableGen – LLVM Swiss Army Knife for Modeling
  7. Understanding LLVM IR
  8. Survey of the Existing Passes
  9. Introducing Target-Specific Constructs
  10. Hands-On Debugging LLVM IR Passes
  11. Getting Started with the Backend
  12. Getting Started with the Machine Code Layer
  13. The Machine Pass Pipeline
  14. Getting Started with Instruction Selection
  15. Instruction Selection: The IR Building Phase
  16. Instruction Selection: The Legalization Phase
  17. Instruction Selection: The Selection Phase and Beyond
  18. Instruction Scheduling
  19. Register Allocation
  20. Lowering of the Stack Layout
  21. Getting Started with the Assembler

Requirements for this book

To follow the instructions in this book, you need LLVM 20 installed on your system, running on Windows, macOS, or Linux operating systems.

Navigate in the different chX directory and look at the examples provided and do the exercises when applicable. Each directory has its own README.md with specific directions.

Note: The exercises have been tested with the open source repository of LLVM at the Git hash 424c2d9b7e4d from February 13th 2025. Which is LLVM 20.1.1.

Some of the exercises interact directly with the LLVM C++ API. This API has no stability guarantee therefore it is possible that newer or older version of LLVM will not work with these exercises.

For the exercices that requires a version of LLVM handy, if you build your own make sure to use the CMAKE_INSTALL_PREFIX cmake variable to set the install path, then build the install target.

Then, you will need to provide this path to CMake in the different exercise.

Follow the READMEs in the different directories when you get there.

Get to know the author

Quentin Colombet is a veteran LLVM contributor specializing in compiler backends. He is the architect of the new instruction selection framework (GlobalISel) and code owner of the LLVM register allocators. With over two decades of experience, he has worked on compiler backends for a variety of architectures, including GPU, CPU, microcontrollers, DSP, and ASICs. Quentin joined Apple in 2012 and has contributed to x86, Aarch64, and Apple GPU backends. He is passionate about helping newcomers onboard the LLVM infrastructure, having mentored interns and new hires over the years.

Other Related Books

Errata

  • Page 6: Under the heading Identifying the right version of the tools, in step 1 the hyperlink on the URL [https://releases.llvm.org/] in the digital formats redirects to [https://www.python.org/downloads/]. Please copy and paste the link [https://releases.llvm.org/] in the browser to navigate to the correct webpage.
  • Page 11: In the command $ git clone https://github.com/llvm/llvm/project.git, the URL should be https://github.com/llvm/llvm-project.git. Therefore, the first line becomes $ git clone https://github.com/llvm/llvm-project.git.
  • Page 31: In the command $ cmake –GNinja -DCMAKE_C_COMPILER=${BUILD_DIR}/bin/clang ${TESTSUITE_DIR}, should be replaced with -. In the command $ cmake -GNinja -DCMAKE_C_COMPILER=${BUILD_DIR}/bin/clang -C${TESTSUITE_DIR}/cmake/cache/<specificOption>.cmake ${TESTSUITE_DIR}, cache should be replaced with caches. Same to /cmake/cache/O3.cmake.
  • Page 71: In the description below Figure 3.4, the sentence "Because of that, inserting a store in A and reloading in B means that the whole dotted region needs to play nicely with this memory location." should be "Because of that, inserting a store in A and reloading in C means that the whole dotted region needs to play nicely with this memory location."
  • Page 71: The sentence "At the IR level, you can check if an edge is critical using the isCriticalEdge function from the IR library." should be "At the IR level, you can check if an edge is critical using the isCriticalEdge function from the Analysis library.". (Library name is changed from IR to Analysis.)
  • Page 75: In the sentence "For this exercise, the linkage type does not matter; you can pick whatever enum value you want – for example, GlobalValue::CommonLinkage", CommonLinkage is actually the one linkage type that doesn't work for functions. Instead, one can use the GlobalValue::ExternalLinkage enum value in this case.
  • Page 80: In the C program in first question under Quiz, res should be declared as int res = 0; after the first line (int foo(int b) {) and before the for loop. Also, in the following diagram, there should be an additional arrow from the top rectangle going to the bottom most rectangle, so that it reflects the case where the loop is not entered and the function directly returns res.
  • Page 86: In the last paragraph first sentence "Using the same code snippet as the previous section, a frontend generates an IR that resembles what is depicted in Figure 4.2", the correct image reference is Figure 4.1.
  • Page 92: In Figure 4.5, the labels on the left side should just be getOperand(0) and getOperand(1). In both the lines, getopenand(0) => and getopenand(n) => are to be disregarded.
  • Page 106: In Figure 4.6, the block at the center labelled as "excluding" should be "exiting".
  • Page 361: The term MCInstrPrinter should be MCInstPrinter.
  • Page 363: Both the instances of the term XXXInstrPrinter should be XXXInstPrinter.
  • Page 455: In Table 17.1, under the Original code column on the left side, the line %vec1 = insertelement <2 x i32> %vec, i32 %a, i32 1 should be %vec1 = insertelement <2 x i32> %vec, i32 %b, i32 1 (i.e. %a should be %b).
  • Page 504: The sentence "The bindings are simple instantiations of the ReadAdvance and WriteRes classes or instantiations of InstR." should be "The bindings are simple instantiations of the ReadAdvance and WriteRes classes or instantiations of InstRW.". (InstR is changed to InstRW.)

About

LLVM Code Generation, published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 67.0%
  • CMake 15.2%
  • LLVM 10.2%
  • Shell 5.9%
  • C 1.7%