Skip to content

Conversation

@snehasish
Copy link

@snehasish snehasish commented Dec 15, 2025

Generated with the help of Gemini CLI, commands validated with local builds of LLVM and tcmalloc.

Generated with the help of Gemini CLI, commands validated with a local
build of LLVM from head and tcmalloc.
Copy link
Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@snehasish snehasish marked this pull request as ready for review December 15, 2025 00:19
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for putting this together!

I'll revisit #124991 to add clang options for data partitioning, and then update this doc on the compile options.

----------------

* **Runtime:** ``compiler-rt/lib/memprof``
* Contains the runtime implementation, including shadow memory mapping, interceptors (malloc, free, etc.), and the thread-local storage for recording stats.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When displaying rich diff, the lines after each header (runtime/instrumentation, etc) doesn't start with a new line or come with a +2 indentation.

I wonder if ninja docs-llvm-html reproduces this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

----------------------

The runtime uses a **shadow memory** scheme similar to AddressSanitizer (ASan) but optimized for profiling.
* **Shadow Mapping:** Application memory is mapped to shadow memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, in Github rich diff, each point (shadow mapping, granularity, etc) doesn't come with a new line.


To support static data partitioning, the profile format includes a payload for symbolized data access profiles. This maps data addresses to canonical symbol names (or module source location for internal data) and access counts. This enables the compiler to identify which global variables are hot.

Testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to mention code pointers for static data partitioning somewhere. (non-exhaustive code examples, 'llvm/lib/CodeGen/' for 'AsmPrinter/AsmPrinter.cpp' and 'StaticDataSplitter.cpp'), if yes, whether to use this doc section or use a different section to make the description of each section more focused.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Misc comments and suggestions below.


This information enables optimizations such as:

* **Heap Layout Optimization:** Grouping objects with similar lifetimes or access density.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention here that for now this requires tcmalloc? Or more generally, an allocator that supports the necessary interfaces?


.. code-block:: bash

clang++ -fmemory-profile -fdebug-info-for-profiling -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -gmlt -O2 source.cpp -o app
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include -fno-optimize-sibling-calls here or at least note that it will yield more accurate contexts?


This section describes how to use MemProf to profile and optimize your application.

Building with MemProf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "with MemProf instrumentation"? Since we use "MemProf" to also refer to the feedback compile.


llvm-profdata show --memory memprof.memprofdata > memprof.yaml

Merge MemProf profiles with standard PGO instrumentation profiles if you have both.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is optional. They are passed via different flags and can be separate.


.. code-block:: bash

clang++ -fmemory-profile-use=memprof.memprofdata -O2 source.cpp -o optimized_app -ltcmalloc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the necessary debug info flags for matching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while -fmemory-profile-use will give the matching, in order to get everything hinted properly, there are a few other internal options that currently need to be explicitly enabled during the LTO link:

  1. -Wl,-mllvm,-enable-memprof-context-disambiguation

This enables the context disambiguation pass (both the LTO link and in the ThinLTO backend). Otherwise cloning is not enabled.

  1. -Wl,-mllvm,-optimize-hot-cold-new

This is used during the LTO backends to enable rewriting of the allocations during SimplifyLibCalls.

  1. -Wl,-mllvm,-supports-hot-cold-new

Used during the LTO link to indicate that we are linking with a library supporting the hot cold new interfaces.

We should probably enable 1 and 2 by default now, and I can look at doing so. They shouldn't have any effect without a profile anyway. 3 should be off by default so that it can optionally be enabled at LTO link time when linking with the appropriate linker. So perhaps I can flip the default for 1 and 2 asap, so they don't need to be called out here, but 3 should be mentioned.

----------------

* **Runtime:** ``compiler-rt/lib/memprof``
* Contains the runtime implementation, including shadow memory mapping, interceptors (malloc, free, etc.), and the thread-local storage for recording stats.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Reads the profile and annotates the IR with metadata.
* **Context Disambiguation:** ``llvm/lib/transforms/ipo/MemProfContextDisambiguation.cpp``
* Implements the analysis and transformations (e.g., cloning) for resolving ambiguous allocation contexts, particularly during ThinLTO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention SimplifyLibCalls.cpp for transforming the allocation calls?

* **Use Pass:** ``llvm/lib/Transforms/Instrumentation/MemProfUse.cpp``
* Reads the profile and annotates the IR with metadata.
* **Context Disambiguation:** ``llvm/lib/transforms/ipo/MemProfContextDisambiguation.cpp``
* Implements the analysis and transformations (e.g., cloning) for resolving ambiguous allocation contexts, particularly during ThinLTO.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change ", particularly during ThinLTO" to something like " using LTO".

* Location: ``llvm/test/Transforms/PGOProfile``
* Purpose: Verify the correctness of the ``MemProfUse`` pass, metadata annotation, and IR transformations.

4. **ThinLTO & Context Disambiguation Tests:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also llvm/test/Transforms/MemProfContextDisambiguation for regular LTO tests.

* Purpose: Verify the correctness of the ``MemProfUse`` pass, metadata annotation, and IR transformations.

4. **ThinLTO & Context Disambiguation Tests:**
* Location: ``llvm/test/ThinLTO/X86``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add "/memprof* since there are a lot of ThinLTO tests in this directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants