Skip to content

Augment notes one LLVM IR debugging #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
- [Constant evaluation](./const-eval.md)
- [miri const evaluator](./miri.md)
- [Parameter Environments](./param_env.md)
- [Generating LLVM IR](./codegen.md)
- [Code Generation](./codegen.md)
- [Emitting Diagnostics](./diag.md)

---
Expand Down
55 changes: 54 additions & 1 deletion src/codegen.md
Original file line number Diff line number Diff line change
@@ -1 +1,54 @@
# Generating LLVM IR
# Code generation

Code generation or "codegen" is the part of the compiler that actually
generates an executable binary. rustc uses LLVM for code generation.

> NOTE: If you are looking for hints on how to debug code generation bugs,
> please see [this section of the debugging chapter][debug].

[debug]: compiler-debugging.html#debugging-llvm

## What is LLVM?

All of the preceeding chapters of this guide have one thing in common: we never
generated any executable machine code at all! With this chapter, all of that
changes.

Like most compilers, rustc is composed of a "frontend" and a "backend". The
"frontend" is responsible for taking raw source code, checking it for
correctness, and getting it into a format `X` from which we can generate
executable machine code. The "backend" then takes that format `X` and produces
(possibly optimized) executable machine code for some platform. All of the
previous chapters deal with rustc's frontend.

rustc's backend is [LLVM](https://llvm.org), "a collection of modular and
reusable compiler and toolchain technologies". In particular, the LLVM project
contains a pluggable compiler backend (also called "LLVM"), which is used by
many compiler projects, including the `clang` C compiler and our beloved
`rustc`.

LLVM's "format `X`" is called LLVM IR. It is basically assembly code with
additional low-level types and annotations added. These annotations are helpful
for doing optimizations on the LLVM IR and outputed machine code. The end
result of all this is (at long last) something executable (e.g. an ELF object
or wasm).

There are a few benefits to using LLVM:

- We don't have to write a whole compiler backend. This reduces implementation
and maintainance burden.
- We benefit from the large suite of advanced optimizations that the LLVM
project has been collecting.
- We automatically can compile Rust to any of the platforms for which LLVM has
support. For example, as soon as LLVM added support for wasm, voila! rustc,
clang, and a bunch of other languages were able to compile to wasm! (Well,
there was some extra stuff to be done, but we were 90% there anyway).
- We and other compiler projects benefit from each other. For example, when the
[Spectre and Meltdown security vulnerabilities][spectre] were discovered,
only LLVM needed to be patched.

[spectre]: https://meltdownattack.com/

## Generating LLVM IR

TODO
170 changes: 117 additions & 53 deletions src/compiler-debugging.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**Note: This is copied from the
**Note: This is copied from the
[rust-forge](https://github.com/rust-lang-nursery/rust-forge). If anything needs
updating, please open an issue or make a PR on the github repo.**

Expand All @@ -16,7 +16,7 @@ normal Rust programs. IIRC backtraces **don't work** on Mac and on MinGW,
sorry. If you have trouble or the backtraces are full of `unknown`,
you might want to find some way to use Linux or MSVC on Windows.

In the default configuration, you don't have line numbers enabled, so the
In the default configuration, you don't have line numbers enabled, so the
backtrace looks like this:

```text
Expand All @@ -36,8 +36,8 @@ stack backtrace:
37: rustc_driver::run_compiler
```

If you want line numbers for the stack trace, you can enable
`debuginfo-lines=true` or `debuginfo=true` in your config.toml and rebuild the
If you want line numbers for the stack trace, you can enable
`debuginfo-lines=true` or `debuginfo=true` in your config.toml and rebuild the
compiler. Then the backtrace will look like this:

```text
Expand Down Expand Up @@ -110,16 +110,16 @@ note: rustc 1.24.0-dev running on x86_64-unknown-linux-gnu

note: run with `RUST_BACKTRACE=1` for a backtrace

thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
/home/user/rust/src/librustc_errors/lib.rs:411:12
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
stack backtrace:
(~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
'tcx>>::report_selection_error
at /home/user/rust/src/librustc/traits/error_reporting.rs:823
8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
'tcx>>::report_fulfillment_errors
at /home/user/rust/src/librustc/traits/error_reporting.rs:160
at /home/user/rust/src/librustc/traits/error_reporting.rs:112
Expand All @@ -136,7 +136,7 @@ $ # Cool, now I have a backtrace for the error

The compiler has a lot of `debug!` calls, which print out logging information
at many points. These are very useful to at least narrow down the location of
a bug if not to find it entirely, or just to orient yourself as to why the
a bug if not to find it entirely, or just to orient yourself as to why the
compiler is doing a particular thing.

To see the logs, you need to set the `RUST_LOG` environment variable to
Expand Down Expand Up @@ -191,37 +191,37 @@ want to call `x.py clean` to force one.
### Logging etiquette

Because calls to `debug!` are removed by default, in most cases, don't worry
about adding "unnecessary" calls to `debug!` and leaving them in code you
commit - they won't slow down the performance of what we ship, and if they
helped you pinning down a bug, they will probably help someone else with a
about adding "unnecessary" calls to `debug!` and leaving them in code you
commit - they won't slow down the performance of what we ship, and if they
helped you pinning down a bug, they will probably help someone else with a
different one.

However, there are still a few concerns that you might care about:

### Expensive operations in logs

A note of caution: the expressions *within* the `debug!` call are run
whenever RUST_LOG is set, even if the filter would exclude the log. This means
whenever RUST_LOG is set, even if the filter would exclude the log. This means
that if in the module `rustc::foo` you have a statement

```Rust
debug!("{:?}", random_operation(tcx));
```

Then if someone runs a debug `rustc` with `RUST_LOG=rustc::bar`, then
`random_operation()` will still run - even while it's output will never be
Then if someone runs a debug `rustc` with `RUST_LOG=rustc::bar`, then
`random_operation()` will still run - even while it's output will never be
needed!

This means that you should not put anything too expensive or likely
to crash there - that would annoy anyone who wants to use logging for their own
module. Note that if `RUST_LOG` is unset (the default), then the code will not
run - this means that if your logging code panics, then no-one will know it
to crash there - that would annoy anyone who wants to use logging for their own
module. Note that if `RUST_LOG` is unset (the default), then the code will not
run - this means that if your logging code panics, then no-one will know it
until someone tries to use logging to find *another* bug.

If you *need* to do an expensive operation in a log, be aware that while log
expressions are *evaluated* even if logging is not enabled in your module,
they are not *formatted* unless it *is*. This means you can put your
expensive/crashy operations inside an `fmt::Debug` impl, and they will not be
If you *need* to do an expensive operation in a log, be aware that while log
expressions are *evaluated* even if logging is not enabled in your module,
they are not *formatted* unless it *is*. This means you can put your
expensive/crashy operations inside an `fmt::Debug` impl, and they will not be
run unless your log is enabled:

```Rust
Expand All @@ -246,7 +246,7 @@ debug!("{:?}", ExpensiveOperationContainer { tcx });
## Formatting Graphviz output (.dot files)
[formatting-graphviz-output]: #formatting-graphviz-output

Some compiler options for debugging specific features yield graphviz graphs -
Some compiler options for debugging specific features yield graphviz graphs -
e.g. the `#[rustc_mir(borrowck_graphviz_postflow="suffix.dot")]` attribute
dumps various borrow-checker dataflow graphs.

Expand All @@ -261,30 +261,66 @@ $ firefox maybe_init_suffix.pdf # Or your favorite pdf viewer
## Debugging LLVM
[debugging-llvm]: #debugging-llvm

LLVM is a big project on its own that probably needs to have its own debugging
document (not that I could find one). But here are some tips that are important
in a rustc context:
> NOTE: If you are looking for info about code generation, please see [this
> chapter][codegen] instead.

[codegen]: codegen.html

This section is about debugging compiler bugs in code generation (e.g. why the
compiler generated some piece of code or crashed in LLVM). LLVM is a big
project on its own that probably needs to have its own debugging document (not
that I could find one). But here are some tips that are important in a rustc
context:

As a general rule, compilers generate lots of information from analyzing code.
Thus, a useful first step is usually to find a minimal example. One way to do
this is to

1. create a new crate that reproduces the issue (e.g. adding whatever crate is
at fault as a dependency, and using it from there)

2. minimize the crate by removing external dependencies; that is, moving
everything relevant to the new crate

3. further minimize the issue by making the code shorter (there are tools that
help with this like `creduce`)

The official compilers (including nightlies) have LLVM assertions disabled,
which means that LLVM assertion failures can show up as compiler crashes (not
ICEs but "real" crashes) and other sorts of weird behavior. If you are
encountering these, it is a good idea to try using a compiler with LLVM
assertions enabled - either an "alt" nightly or a compiler you build yourself
by setting `[llvm] assertions=true` in your config.toml - and
see whether anything turns up.
by setting `[llvm] assertions=true` in your config.toml - and see whether
anything turns up.

The rustc build process builds the LLVM tools into
The rustc build process builds the LLVM tools into
`./build/<host-triple>/llvm/bin`. They can be called directly.

The default rustc compilation pipeline has multiple codegen units, which is hard
to replicate manually and means that LLVM is called multiple times in parallel.
If you can get away with it (i.e. if it doesn't make your bug disappear),
passing `-C codegen-units=1` to rustc will make debugging easier.

If you want to play with the optimization pipeline, you can use the opt tool
from `./build/<host-triple>/llvm/bin/` with the the LLVM IR emitted by rustc.
Note that rustc emits different IR depending on whether `-O` is enabled, even
without LLVM's optimizations, so if you want to play with the IR rustc emits,
The default rustc compilation pipeline has multiple codegen units, which is
hard to replicate manually and means that LLVM is called multiple times in
parallel. If you can get away with it (i.e. if it doesn't make your bug
disappear), passing `-C codegen-units=1` to rustc will make debugging easier.

To rustc to generate LLVM IR, you need to pass the `--emit=llvm-ir` flag. If
you are building via cargo, use the `RUSTFLAGS` environment variable (e.g.
`RUSTFLAGS='--emit=llvm-ir'`). This causes rustc to spit out LLVM IR into the
target directory.

`cargo llvm-ir [options] path` spits out the LLVM IR for a particular function
at `path`. (`cargo install cargo-asm` installs `cargo asm` and `cargo
llvm-ir`). `--build-type=debug` emits code for debug builds. There are also
other useful options. Also, debug info in LLVM IR can clutter the output a lot:
`RUSTFLAGS="-C debuginfo=0"` is really useful.

`RUSTFLAGS="-C save-temps"` outputs LLVM bitcode (not the same as IR) at
different stages during compilation, which is sometimes useful. One just needs
to convert the bitcode files to `.ll` files using `llvm-dis` which should be in
the target local compilation of rustc.

If you want to play with the optimization pipeline, you can use the `opt` tool
from `./build/<host-triple>/llvm/bin/` with the LLVM IR emitted by rustc. Note
that rustc emits different IR depending on whether `-O` is enabled, even
without LLVM's optimizations, so if you want to play with the IR rustc emits,
you should:

```bash
Expand All @@ -295,21 +331,21 @@ $ $OPT -S -O2 < my-file.ll > my
```

If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which
IR causes an optimization-time assertion to fail, or to see when
LLVM performs a particular optimization, you can pass the rustc flag
`-C llvm-args=-print-after-all`, and possibly add
`-C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME` (e.g.
`-C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\
7replace17hbe10ea2e7c809b0bE'`).

That produces a lot of output into standard error, so you'll want to pipe
that to some file. Also, if you are using neither `-filter-print-funcs` nor
`-C codegen-units=1`, then, because the multiple codegen units run in parallel,
the printouts will mix together and you won't be able to read anything.

If you want just the IR for a specific function (say, you want to see
why it causes an assertion or doesn't optimize correctly), you can use
`llvm-extract`, e.g.
IR causes an optimization-time assertion to fail, or to see when LLVM performs
a particular optimization, you can pass the rustc flag `-C
llvm-args=-print-after-all`, and possibly add `-C
llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME` (e.g. `-C
llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\
7replace17hbe10ea2e7c809b0bE'`).

That produces a lot of output into standard error, so you'll want to pipe that
to some file. Also, if you are using neither `-filter-print-funcs` nor `-C
codegen-units=1`, then, because the multiple codegen units run in parallel, the
printouts will mix together and you won't be able to read anything.

If you want just the IR for a specific function (say, you want to see why it
causes an assertion or doesn't optimize correctly), you can use `llvm-extract`,
e.g.

```bash
$ ./build/$TRIPLE/llvm/bin/llvm-extract \
Expand All @@ -319,4 +355,32 @@ $ ./build/$TRIPLE/llvm/bin/llvm-extract \
> extracted.ll
```

### Filing LLVM bug reports

When filing an LLVM bug report, you will probably want some sort of minimal
working example that demonstrates the problem. The Godbolt compiler explorer is
really helpful for this.

1. Once you have some LLVM IR for the problematic code (see above), you can
create a minimal working example with Godbolt. Go to
[gcc.godbolt.org](https://gcc.godbolt.org).

2. Choose `LLVM-IR` as programming language.

3. Use `llc` to compile the IR to a particular target as is:
- There are some useful flags: `-mattr` enables target features, `-march=`
selects the target, `-mcpu=` selects the CPU, etc.
- Commands like `llc -march=help` output all architectures available, which
is useful because sometimes the Rust arch names and the LLVM names do not
match.
- If you have compiled rustc yourself somewhere, in the target directory
you have binaries for `llc`, `opt`, etc.

4. If you want to optimize the LLVM-IR, you can use `opt` to see how the LLVM
optimizations transform it.

5. Once you have a godbolt link demonstrating the issue, it is pretty easy to
fill in an LLVM bug.


[env-logger]: https://docs.rs/env_logger/0.4.3/env_logger/