[libc][cmake][linux] require new LLVM_LIBC_USE_HOST_KERNEL_HEADERS or LIBC_KERNEL_HEADERS #123820

nickdesaulniers · 2025-01-21T21:16:41Z

When cross compiling, we DO NOT want to use the host's kernel headers. Adding
all of /usr/include via -dirafter also isn't very hermetic since it can pull
in more than just kernel headers. But peeking at /usr/include simplifies
setting up the libc, since then the kernel headers don't have to be built from
source. (Building kernel headers from source is quite trivial though).

We already support setting -DLIBC_KERNEL_HEADERS=/path/to/kernel/headers. Add a
new boolean cmake flag, LLVM_LIBC_USE_HOST_KERNEL_HEADERS, which indicates that
the user is opting into just using the hosts kernel headers. Existing host
builds may break and need to set this.

Setting LIBC_KERNEL_HEADERS currently produces a few unit test failures that I
still need to debug, but ideally users targeting linux will always pass
LIBC_KERNEL_HEADERS in the future.

Add some checks to ensure that at least one of these flags are set, that
they're mutually exclusive, that the headers are required when cross compiling,
and that the headers path is a directory.

llvmbot · 2025-01-21T21:19:21Z

@llvm/pr-subscribers-github-workflow

@llvm/pr-subscribers-libc

Author: Nick Desaulniers (nickdesaulniers)

Changes

When cross compiling, we DO NOT want to use the host's kernel headers. Adding
all of /usr/include via -dirafter also isn't very hermetic since it can pull
in more than just kernel headers. But peeking at /usr/include simplifies
setting up the libc, since then the kernel headers don't have to be built from
source. (Building kernel headers from source is quite trivial though).

We already support setting -DLIBC_KERNEL_HEADERS=/path/to/kernel/headers. Add a
new boolean cmake flag, LLVM_LIBC_USE_HOST_KERNEL_HEADERS, which indicates that
the user is opting into just using the hosts kernel headers. Existing host
builds may break and need to set this.

Setting LIBC_KERNEL_HEADERS currently produces a few unit test failures that I
still need to debug, but ideally users targeting linux will always pass
LIBC_KERNEL_HEADERS in the future.

Add some checks to ensure that at least one of these flags are set, that
they're mutually exclusive, that the headers are required when cross compiling,
and that the headers path is a directory.

Full diff: https://github.com/llvm/llvm-project/pull/123820.diff

3 Files Affected:

(modified) libc/CMakeLists.txt (+5-1)
(modified) libc/cmake/modules/LLVMLibCArchitectures.cmake (+21)
(modified) libc/docs/getting_started.rst (+1)

diff --git a/libc/CMakeLists.txt b/libc/CMakeLists.txt
index 6f1c180a3f192e..11cb7ced4e6073 100644
--- a/libc/CMakeLists.txt
+++ b/libc/CMakeLists.txt
@@ -46,7 +46,11 @@ set(LIBC_BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR})
 
 set(LIBC_ENABLE_USE_BY_CLANG OFF CACHE BOOL "Whether or not to place libc in a build directory findable by a just built clang")
 
-set(LIBC_KERNEL_HEADERS "/usr/include" CACHE STRING "Path to Linux kernel headers")
+# LLVM_LIBC_USE_HOST_KERNEL_HEADERS is brittle, and frowned upon (prefer
+# setting LIBC_KERNEL_HEADERS), but is simpler from a hello world linux host
+# build.
+set(LLVM_LIBC_USE_HOST_KERNEL_HEADERS OFF CACHE BOOL "Add /usr/include to the header seach path")
+set(LIBC_KERNEL_HEADERS "" CACHE STRING "Path to Linux kernel headers")
 
 # Defining a global namespace to enclose all libc functions.
 set(default_namespace "__llvm_libc")
diff --git a/libc/cmake/modules/LLVMLibCArchitectures.cmake b/libc/cmake/modules/LLVMLibCArchitectures.cmake
index fbb1091ddabab4..8cf4dfac70e498 100644
--- a/libc/cmake/modules/LLVMLibCArchitectures.cmake
+++ b/libc/cmake/modules/LLVMLibCArchitectures.cmake
@@ -216,6 +216,27 @@ if (LIBC_TARGET_OS_IS_WINDOWS AND LLVM_LIBC_FULL_BUILD)
   message(FATAL_ERROR "Windows does not support full mode build.")
 endif ()
 
+if (LIBC_TARGET_OS_IS_LINUX)
+  if (NOT LLVM_LIBC_USE_HOST_KERNEL_HEADERS AND NOT LIBC_KERNEL_HEADERS STREQUAL "")
+    message(FATAL_ERROR "MUST specify either LIBC_KERNEL_HEADERS or LLVM_LIBC_USE_HOST_KERNEL_HEADERS")
+  endif()
+  if (LLVM_LIBC_USE_HOST_KERNEL_HEADERS AND NOT LIBC_KERNEL_HEADERS STREQUAL "")
+    message(FATAL_ERROR "LLVM_LIBC_USE_HOST_KERNEL_HEADERS and LIBC_USE_HOST_KERNEL_HEADERS are mutually exclusive")
+  endif()
+  if(LIBC_TARGET_TRIPLE AND LLVM_LIBC_USE_HOST_KERNEL_HEADERS)
+    # This is because the syscall numbers are frequently different between
+    # different target architectures. Code may compile, but you'll get spooky
+    # runtime failures.
+    message(FATAL_ERROR "LLVM_LIBC_USE_HOST_KERNEL_HEADERS should not be set when using LIBC_TARGET_TRIPLE, set LIBC_KERNEL_HEADERS instead")
+  endif()
+  if (NOT LIBC_KERNEL_HEADERS STREQUAL "" AND NOT IS_DIRECTORY LIBC_KERNEL_HEADERS)
+    message(FATAL_ERROR "LIBC_KERNEL_HEADERS should be a path to an existing directory")
+  endif()
+  if (LLVM_LIBC_USE_HOST_KERNEL_HEADERS)
+    set(LIBC_KERNEL_HEADERS "/usr/include" CACHE STRING "Path to Linux kernel headers" FORCE)
+  endif()
+endif()
+
 
 message(STATUS
         "Building libc for ${LIBC_TARGET_ARCHITECTURE} on ${LIBC_TARGET_OS} with
diff --git a/libc/docs/getting_started.rst b/libc/docs/getting_started.rst
index 51c295f384a06f..1f62d94d3c3fd5 100644
--- a/libc/docs/getting_started.rst
+++ b/libc/docs/getting_started.rst
@@ -21,6 +21,7 @@ Install dependencies first:
     -DCMAKE_CXX_COMPILER=clang++ \
     -DCMAKE_C_COMPILER=clang \
     -DLLVM_LIBC_FULL_BUILD=ON \
+    -DLLVM_LIBC_USE_HOST_KERNEL_HEADERS=ON \
     -DLLVM_LIBC_INCLUDE_SCUDO=ON \
     -DCOMPILER_RT_BUILD_SCUDO_STANDALONE_WITH_LLVM_LIBC=ON \
     -DCOMPILER_RT_BUILD_GWP_ASAN=OFF \

nickdesaulniers · 2025-01-21T21:24:44Z

Alternatively, we could just always require folks to provide linux kernel headers. As I said, it's quite trivial to build them from source.

libc/CMakeLists.txt

nickdesaulniers · 2025-01-21T22:20:14Z

libc/CMakeLists.txt

+# setting LIBC_KERNEL_HEADERS), but is simpler from a hello world linux host
+# build.
+set(LLVM_LIBC_USE_HOST_KERNEL_HEADERS OFF CACHE BOOL "Add /usr/include to the header seach path")
+set(LIBC_KERNEL_HEADERS "" CACHE STRING "Path to Linux kernel headers")


oh, I wonder if we should be using option instead? I see that used elsewhere in our cmake. I'm not familiar with option...

There are two ways to define a boolean flag in CMake:

set(FOO true CACHE BOOL "description") option(FOO "description" ON)

The difference between the two is that the option command does not create a CACHE variable if a normal variable with the same name exists, but using set with CACHE variable named FOO when FOO exists as a normal variable will yield two copies of FOO, a normal variable and a CACHE one.

Which is one more appropriate really depends on your intention. There are places in LLVM build system where we intentionally rely on the set semantics (e.g. we would use the CACHE variable as the default value to initialize the normal variable).

Ok, so it sounds like cmake has distinct "namespace-esque" differentiation between "normal variables" and "cache variables."

For cmake variables that we encourage folks to use to configure llvm-libc, I think we may want option. That seems very common in libcxx/CMakeLists.txt.

michaelrj-google · 2025-01-22T00:46:37Z

libc/cmake/modules/LLVMLibCArchitectures.cmake

@@ -216,6 +216,27 @@ if (LIBC_TARGET_OS_IS_WINDOWS AND LLVM_LIBC_FULL_BUILD)
  message(FATAL_ERROR "Windows does not support full mode build.")
 endif ()

+if (LIBC_TARGET_OS_IS_LINUX)


Would it make sense to automatically set LLVM_LIBC_USE_HOST_KERNEL_HEADERS for overlay builds?

It's worthwhile to consider this. Let me jot down my thoughts.

Doing so would effectively change the implicit default value of LIBC_USE_HOST_KERNEL_HEADERS based on whether we were doing full build (off) vs overlay build (on).

Two invariants I'd like to maintain:

In the spirit of [libc][docs] reorder docs to be more beginner friendly #122376, I would like us to consider overlay builds the "more advanced" of the build configurations relative to full builds (due to the ABI compatibility concerns). So if we require folks doing overlay builds to add one more cmake flag, in this case -DLIBC_USE_HOST_KERNEL_HEADERS=ON, then I don't feel too bad.

We encourage folks targeting linux to just build the kernel headers; it's pretty straightforward. If they can't do this, then forcing them to admit to bad behavior via another cmake var is somewhat part of the intent of this PR. Perhaps that's being malevolent. Implying LIBC_USE_HOST_KERNEL_HEADERS for overlay builds removes the point that folks own up to looking at /usr/include. (Besides, it hurts our hermeticity; once we add /usr/include to the directory search path, we MAY end up including other non-kernel headers since kernel and non-kernel headers are all mixed together in /usr/include. This alone gives me heartburn).

Thinking through the case of wanting to do an overlay build, is there ever a case where we'd want to use newer kernel headers (or perhaps older kernel headers) than what was available on the host? A hypothetical use case for newer headers: you want to build on one host (that has old headers) but support newer syscalls. The result can only run on other machines with newer kernels (when newer syscalls are made at runtime). A hypothetical use case for older headers: you want to build on one host (that has newer headers), but run the libc on another target that is running an older kernel. Both of those hypotheticals seem to be a stretch in my mind. Perhaps overlay mode should (or does) imply "literally the result will only run on my host machine+kernel?"

An inspiration for LIBC_USE_HOST_KERNEL_HEADERS is -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang which was a flag in clang that was necessary to use -ftrivial-auto-var-init= for some time. Basically, it was intended as a way to not outright block people from using -ftrivial-auto-var-init=, but make it such that there was one more hoop for them to jump through to use it.

In the same vein, I don't want to outright prevent people from looking at /usr/include for linux kernel headers (indeed, maybe on the buildbots we'll skip building kernel headers for each run), but I would like to gradually move us away from doing so.

By adding LIBC_USE_HOST_KERNEL_HEADERS, I think that strikes the right balance where folks (and build bots) can transition to LIBC_KERNEL_HEADERS over time, rather than forcing them to do so right now (and not providing any option otherwise).

So I guess, yes, it would make sense to automatically set it. But that betrays an intent in this PR to add one more hoop for folks who refuse to just build the kernel headers from source to sign on the X that they are potentially including more than just kernel headers when they add /usr/include to the compiler's header search path. Is that being too malicious?

My main concern is around default configurations. Currently, LLVM_LIBC_FULL_BUILD defaults to off, so overlay is the default. We should probably change that at some point, but that's a separate issue. The important thing is that our cmake shouldn't error if you don't specify any flags.

Currently, if you run cmake ../runtimes -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_RUNTIMES="libc" you get an overlay mode build with no allocator. It's not a particularly useful configuration, but it will build and pass our tests. This is important for two reasons:

First impressions are important, a build failure looks bad, and lots of people don't read the docs.

A surprising number of people download the llvm-project repo, set the enabled projects/runtimes to "all", and run make install. They then file bugs when they run into a build error, even if it's for a project they didn't even plan to use. This is annoying and wastes both their time and ours, so I'd like to avoid it if possible.

The other issue is that for overlay mode we do actually need to look at the other libc's headers. All of our type/macro headers in libc/hdr/ use the system's headers in overlay mode, and a couple places (specifically in printf, assert, and a few other places iirc) even call functions from the system's libc. Overlay mode can't be hermetic, and I don't think we can practically make it independent of the system it's targeting.

The other issue is that for overlay mode we do actually need to look at the other libc's headers.

D'oh, you're right!

My main concern is around default configurations.

This is something I hadn't even considered. I can get behind wanting an optionless configure to build out of the box. Let me think more about this over lunch.

Currently, LLVM_LIBC_FULL_BUILD defaults to off, so overlay is the default. We should probably change that at some point, but that's a separate issue.

I agree. Filed #124973.

… LIBC_KERNEL_HEADERS When cross compiling, we DO NOT want to use the host's kernel headers. Adding all of /usr/include via `-dirafter` also isn't very hermetic since it can pull in more than just kernel headers. But peeking at /usr/include simplifies setting up the libc, since then the kernel headers don't have to be built from source. (Building kernel headers from source is quite trivial though). We already support setting -DLIBC_KERNEL_HEADERS=/path/to/kernel/headers. Add a new boolean cmake flag, LLVM_LIBC_USE_HOST_KERNEL_HEADERS, which indicates that the user is opting into just using the hosts kernel headers. Existing host builds may break and need to set this. Setting LIBC_KERNEL_HEADERS currently produces a few unit test failures that I still need to debug, but ideally users targeting linux will always pass LIBC_KERNEL_HEADERS in the future. Add some checks to ensure that at least one of these flags are set, that they're mutually exclusive, that the headers are required when cross compiling, and that the headers path is a directory.

lntue · 2025-01-29T19:16:17Z

precommit errors:

CMake Error at /home/runner/work/llvm-project/llvm-project/libc/cmake/modules/LLVMLibCArchitectures.cmake:221 (message):
  MUST specify either LIBC_KERNEL_HEADERS or LIBC_USE_HOST_KERNEL_HEADERS

michaelrj-google · 2025-01-29T19:05:53Z

libc/docs/full_host_build.rst

   $> make LLVM=1 INSTALL_HDR_PATH=/path/to/sysroot -C /tmp/linux headers_install

 The headers can be built to target non-host architectures by adding the
-``ARCH={arm|arm64|i386}`` to the above invocation of ``make``.
+``ARCH={arm|arm64|i386}`` to the above invocation of ``make``. Then you should
+set the cmake variable `-DLIBC_KERNEL_HEADERS=/path/to/sysroot` when


for consistency, would it make sense to use /path/to/kernel/headers/ instead of /path/to/sysroot here?

Ah, you're right about the inconsistency. I think I will resolve it the other way, since existing docs talk about setting

$> SYSROOT=/path/to/sysroot # Remember to set this!

above. So I will use /path/to/sysroot/ more consistently and get rid of /path/to/kernel/headers.

Done in db5b2f9.

michaelrj-google · 2025-01-29T19:18:56Z

libc/cmake/modules/LLVMLibCArchitectures.cmake

@@ -216,6 +216,27 @@ if (LIBC_TARGET_OS_IS_WINDOWS AND LLVM_LIBC_FULL_BUILD)
  message(FATAL_ERROR "Windows does not support full mode build.")
 endif ()

+if (LIBC_TARGET_OS_IS_LINUX)


My main concern is around default configurations. Currently, LLVM_LIBC_FULL_BUILD defaults to off, so overlay is the default. We should probably change that at some point, but that's a separate issue. The important thing is that our cmake shouldn't error if you don't specify any flags.

Currently, if you run cmake ../runtimes -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_RUNTIMES="libc" you get an overlay mode build with no allocator. It's not a particularly useful configuration, but it will build and pass our tests. This is important for two reasons:

First impressions are important, a build failure looks bad, and lots of people don't read the docs.

A surprising number of people download the llvm-project repo, set the enabled projects/runtimes to "all", and run make install. They then file bugs when they run into a build error, even if it's for a project they didn't even plan to use. This is annoying and wastes both their time and ours, so I'd like to avoid it if possible.

The other issue is that for overlay mode we do actually need to look at the other libc's headers. All of our type/macro headers in libc/hdr/ use the system's headers in overlay mode, and a couple places (specifically in printf, assert, and a few other places iirc) even call functions from the system's libc. Overlay mode can't be hermetic, and I don't think we can practically make it independent of the system it's targeting.

nickdesaulniers · 2025-01-29T19:22:52Z

precommit errors:

CMake Error at /home/runner/work/llvm-project/llvm-project/libc/cmake/modules/LLVMLibCArchitectures.cmake:221 (message):
  MUST specify either LIBC_KERNEL_HEADERS or LIBC_USE_HOST_KERNEL_HEADERS

Ah! Forgot to drop the LLVM_ prefix in the .github changes!

nickdesaulniers · 2025-01-29T19:37:13Z

presubmit test failures

probably need to update .github/workflows/libc-overlay-tests.yml, too.

nickdesaulniers · 2025-01-29T20:51:09Z

Talked more with @michaelrj-google about this after lunch. Taking a step back, I think what I care more about is requiring the use of LIBC_KERNEL_HEADERS when LIBC_TARGET_TRIPLE is set. And then, perhaps warning when LLVM_LIBC_FULL_BUILD is set that LIBC_KERNEL_HEADERS should be used (but is not required).

nickdesaulniers requested a review from petrhosek January 21, 2025 21:16

llvmbot added the libc label Jan 21, 2025

nickdesaulniers requested review from SchrodingerZhu, jhuber6, lntue and michaelrj-google January 21, 2025 21:17

llvmbot added the github:workflow label Jan 21, 2025

lntue reviewed Jan 21, 2025

View reviewed changes

libc/CMakeLists.txt Outdated Show resolved Hide resolved

nickdesaulniers commented Jan 21, 2025

View reviewed changes

michaelrj-google reviewed Jan 22, 2025

View reviewed changes

nickdesaulniers added 5 commits January 29, 2025 10:42

set LLVM_LIBC_USE_HOST_KERNEL_HEADERS for presubmit tests

d918f85

drop LLVM_ prefix from option name

e2c259d

switch set BOOL to option

76e422c

moar docs

d4c49a6

nickdesaulniers force-pushed the kernel_headers2 branch from bcfb656 to d4c49a6 Compare January 29, 2025 18:58

nickdesaulniers requested review from michaelrj-google and lntue January 29, 2025 18:58

michaelrj-google reviewed Jan 29, 2025

View reviewed changes

fixup cmake var name and path

db5b2f9

nickdesaulniers mentioned this pull request Jan 29, 2025

[libc] switch implicit default mode from overlay to fullbuild #124973

Open

nickdesaulniers closed this Jan 29, 2025

[libc][cmake][linux] require new LLVM_LIBC_USE_HOST_KERNEL_HEADERS or LIBC_KERNEL_HEADERS #123820

[libc][cmake][linux] require new LLVM_LIBC_USE_HOST_KERNEL_HEADERS or LIBC_KERNEL_HEADERS #123820

Uh oh!

Conversation

nickdesaulniers commented Jan 21, 2025

Uh oh!

llvmbot commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickdesaulniers commented Jan 21, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lntue commented Jan 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickdesaulniers commented Jan 29, 2025

Uh oh!

nickdesaulniers commented Jan 29, 2025

Uh oh!

nickdesaulniers commented Jan 29, 2025

Uh oh!

Uh oh!

llvmbot commented Jan 21, 2025 •

edited

Loading