-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Add documentation for MemProf. #172238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add documentation for MemProf. #172238
Conversation
Generated with the help of Gemini CLI, commands validated with a local build of LLVM from head and tcmalloc.
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
mingmingl-llvm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for putting this together!
I'll revisit #124991 to add clang options for data partitioning, and then update this doc on the compile options.
| ---------------- | ||
|
|
||
| * **Runtime:** ``compiler-rt/lib/memprof`` | ||
| * Contains the runtime implementation, including shadow memory mapping, interceptors (malloc, free, etc.), and the thread-local storage for recording stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When displaying rich diff, the lines after each header (runtime/instrumentation, etc) doesn't start with a new line or come with a +2 indentation.
I wonder if ninja docs-llvm-html reproduces this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for me in preview mode (https://github.com/llvm/llvm-project/blob/fd6fb04f5dbc07ee2953a6cdfc1d1b0d1d64da20/llvm/docs/MemProf.rst
| ---------------------- | ||
|
|
||
| The runtime uses a **shadow memory** scheme similar to AddressSanitizer (ASan) but optimized for profiling. | ||
| * **Shadow Mapping:** Application memory is mapped to shadow memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, in Github rich diff, each point (shadow mapping, granularity, etc) doesn't come with a new line.
|
|
||
| To support static data partitioning, the profile format includes a payload for symbolized data access profiles. This maps data addresses to canonical symbol names (or module source location for internal data) and access counts. This enables the compiler to identify which global variables are hot. | ||
|
|
||
| Testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want to mention code pointers for static data partitioning somewhere. (non-exhaustive code examples, 'llvm/lib/CodeGen/' for 'AsmPrinter/AsmPrinter.cpp' and 'StaticDataSplitter.cpp'), if yes, whether to use this doc section or use a different section to make the description of each section more focused.
teresajohnson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! Misc comments and suggestions below.
|
|
||
| This information enables optimizations such as: | ||
|
|
||
| * **Heap Layout Optimization:** Grouping objects with similar lifetimes or access density. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to mention here that for now this requires tcmalloc? Or more generally, an allocator that supports the necessary interfaces?
|
|
||
| .. code-block:: bash | ||
|
|
||
| clang++ -fmemory-profile -fdebug-info-for-profiling -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -gmlt -O2 source.cpp -o app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include -fno-optimize-sibling-calls here or at least note that it will yield more accurate contexts?
|
|
||
| This section describes how to use MemProf to profile and optimize your application. | ||
|
|
||
| Building with MemProf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "with MemProf instrumentation"? Since we use "MemProf" to also refer to the feedback compile.
|
|
||
| llvm-profdata show --memory memprof.memprofdata > memprof.yaml | ||
|
|
||
| Merge MemProf profiles with standard PGO instrumentation profiles if you have both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is optional. They are passed via different flags and can be separate.
|
|
||
| .. code-block:: bash | ||
|
|
||
| clang++ -fmemory-profile-use=memprof.memprofdata -O2 source.cpp -o optimized_app -ltcmalloc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the necessary debug info flags for matching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, while -fmemory-profile-use will give the matching, in order to get everything hinted properly, there are a few other internal options that currently need to be explicitly enabled during the LTO link:
- -Wl,-mllvm,-enable-memprof-context-disambiguation
This enables the context disambiguation pass (both the LTO link and in the ThinLTO backend). Otherwise cloning is not enabled.
- -Wl,-mllvm,-optimize-hot-cold-new
This is used during the LTO backends to enable rewriting of the allocations during SimplifyLibCalls.
- -Wl,-mllvm,-supports-hot-cold-new
Used during the LTO link to indicate that we are linking with a library supporting the hot cold new interfaces.
We should probably enable 1 and 2 by default now, and I can look at doing so. They shouldn't have any effect without a profile anyway. 3 should be off by default so that it can optionally be enabled at LTO link time when linking with the appropriate linker. So perhaps I can flip the default for 1 and 2 asap, so they don't need to be called out here, but 3 should be mentioned.
| ---------------- | ||
|
|
||
| * **Runtime:** ``compiler-rt/lib/memprof`` | ||
| * Contains the runtime implementation, including shadow memory mapping, interceptors (malloc, free, etc.), and the thread-local storage for recording stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for me in preview mode (https://github.com/llvm/llvm-project/blob/fd6fb04f5dbc07ee2953a6cdfc1d1b0d1d64da20/llvm/docs/MemProf.rst
| * Reads the profile and annotates the IR with metadata. | ||
| * **Context Disambiguation:** ``llvm/lib/transforms/ipo/MemProfContextDisambiguation.cpp`` | ||
| * Implements the analysis and transformations (e.g., cloning) for resolving ambiguous allocation contexts, particularly during ThinLTO. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also mention SimplifyLibCalls.cpp for transforming the allocation calls?
| * **Use Pass:** ``llvm/lib/Transforms/Instrumentation/MemProfUse.cpp`` | ||
| * Reads the profile and annotates the IR with metadata. | ||
| * **Context Disambiguation:** ``llvm/lib/transforms/ipo/MemProfContextDisambiguation.cpp`` | ||
| * Implements the analysis and transformations (e.g., cloning) for resolving ambiguous allocation contexts, particularly during ThinLTO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change ", particularly during ThinLTO" to something like " using LTO".
| * Location: ``llvm/test/Transforms/PGOProfile`` | ||
| * Purpose: Verify the correctness of the ``MemProfUse`` pass, metadata annotation, and IR transformations. | ||
|
|
||
| 4. **ThinLTO & Context Disambiguation Tests:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also llvm/test/Transforms/MemProfContextDisambiguation for regular LTO tests.
| * Purpose: Verify the correctness of the ``MemProfUse`` pass, metadata annotation, and IR transformations. | ||
|
|
||
| 4. **ThinLTO & Context Disambiguation Tests:** | ||
| * Location: ``llvm/test/ThinLTO/X86`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add "/memprof* since there are a lot of ThinLTO tests in this directory

Generated with the help of Gemini CLI, commands validated with local builds of LLVM and tcmalloc.