Skip to content

[codex] Avoid untrusted binary lookup paths#2111

Draft
rparolin wants to merge 1 commit into
NVIDIA:mainfrom
rparolin:codex/fix-nvidia-binary-search-path
Draft

[codex] Avoid untrusted binary lookup paths#2111
rparolin wants to merge 1 commit into
NVIDIA:mainfrom
rparolin:codex/fix-nvidia-binary-search-path

Conversation

@rparolin
Copy link
Copy Markdown
Collaborator

Summary

This PR updates cuda.pathfinder.find_nvidia_binary_utility() to search only the trusted binary directories it constructs internally instead of delegating to shutil.which().

It was created in response to the security scan that highlighted a vulnerability where Windows shutil.which(..., path=...) may search the process current working directory before the explicitly supplied trusted search path. That could allow binary planting if a caller executes the returned utility path.

Root Cause

On Windows, CPython shutil.which() can prepend the current working directory to executable resolution. Because find_nvidia_binary_utility() is a public API intended to return paths that callers execute, a malicious nvcc.exe or other supported utility in an attacker-controlled CWD could be returned before the real CUDA, Conda, or NVIDIA wheel binary.

Changes

  • Replace shutil.which() with an explicit loop over trusted directories.
  • Preserve Windows extension normalization and Unix executable-bit checks.
  • Return normalized absolute paths for discovered binaries.
  • Document that CWD and ambient PATH are not searched.
  • Add regression coverage proving a planted Windows CWD binary is ignored.

Validation

  • pixi run pytest tests/test_find_nvidia_binaries.py
  • pixi run pytest tests/
  • git diff --check

@rparolin rparolin added this to the cuda.pathfinder next milestone May 19, 2026
@rparolin rparolin added bug Something isn't working cuda.pathfinder Everything related to the cuda.pathfinder module labels May 19, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

(``.exe``, ``.bat``, ``.cmd``). On Unix-like systems, executables
are identified by the ``X_OK`` (execute) permission bit.

The process current working directory and ``PATH`` are not searched.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could understand the logic of not searching the current working directory for these binaries, but not searching PATH seems like a mistake. If someone did a userspace installation of the toolkit in a non-standard way, inserting something like nvcc on the PATH would be the correct behavior and expected way for us to find it.

Additionally, given we already allow environment variable based override it doesn't open up any other attack surface that isn't already exposed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spirit of the warning was preventing any directory that wasn't explicitly programmed into pathfinder. So if we agree that we want to prevent then it follows that we should prevent solutions that use PATH as well b/c its not a formalized path that was baked into the tool.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 19, 2026

I asked Cursor GPT-5.5 1M Extra High for an analysis and recommendation, based on enumerating pros-and-cons of shutil.which vs a custom function (in the spirit of _find_binary_in_dirs), and considering that we have the CTK-root canary feature in our toolbox already. My prompt ended with:

Could you please analyze carefully, to make a strong argument for the

  • safest implementation
  • that does not inconvenience common use cases
    ?

Below is what it gave me. The recommendation is exactly what I was hoping for :-)


PR 2111 is really about making find_nvidia_binary_utility behave like a deterministic pathfinder API, not like a shell command lookup. The safest direction is to search only known NVIDIA installation layouts, add the existing CTK-root canary fallback, and avoid implicit PATH lookup.

shutil.which vs. a Custom Directory Finder

shutil.which is a good fit when the intended behavior is "find what the shell would run." That is not quite the contract we want for find_nvidia_binary_utility.

Pros of shutil.which:

  • It is stdlib, well-tested, and familiar.
  • It handles platform executable conventions, especially Windows PATHEXT.
  • It already handles existence and executable checks.
  • It de-duplicates searched directories.
  • It is concise if shell-like discovery is the desired behavior.

Cons of shutil.which:

  • It is not a pure "search exactly these directories" primitive.
  • On Windows, Python 3.12 may prepend the current directory even when a custom path= is supplied.
  • It carries ambient OS/Python policy such as PATHEXT, current-directory behavior, PATH fallback when path=None, and version-specific details.
  • Empty or relative path segments can become current-directory-sensitive.
  • It returns the matched spelling/path, not necessarily an absolute normalized path.
  • It makes the security story harder to state: "we searched only trusted roots" is no longer obviously true.

A custom _find_binary_in_dirs has more code and needs tests, but it gives us the policy surface we actually want. We can decide exactly which directories are searched, skip empty or relative directories if desired, normalize results, and define Windows extension handling explicitly.

CTK Root Canary And PATH

The CTK-root canary feature is a better fit than PATH for this API. It answers: "is there a system-visible CUDA Toolkit root, and can we derive its root from a known CTK library layout?" That keeps binary lookup tied to an actual CTK installation layout rather than to an arbitrary executable name.

The safe and common search order should be:

  1. NVIDIA Python wheel bin directories from site-packages
  2. CONDA_PREFIX bin layout
  3. Explicit CUDA_PATH / CUDA_HOME
  4. CTK root derived from the canary, then CTK bin layouts
  5. Stop

I would not add implicit PATH search after that. PATH can find wrappers, stale tools, user-writable directories, or a different CUDA version than the canary-resolved libraries. That creates the exact class of "it worked, but with the wrong thing" behavior that pathfinder should avoid.

There are real nonstandard cases outside CTK, conda, and site-packages: nvcc wrappers, ccache-style wrappers, distro symlinks in /usr/bin, HPC module wrappers, and local site policy scripts. Those are shell/toolchain policy cases, not deterministic NVIDIA binary discovery cases. If we support them, it should be explicit, for example through a future opt-in parameter or dedicated environment variable, not default PATH.

Recommendation

Use a custom bounded finder, add CTK-root canary fallback, and do not implicitly search PATH.

That implementation is safest because it avoids current-directory and ambient PATH surprises. It also does not inconvenience common supported installs: wheels, conda environments, explicit toolkit roots, and system-visible toolkit roots are all covered. Users who only rely on PATH may need to set CUDA_PATH, but that is a reasonable explicitness cost for avoiding untrusted or mismatched binaries.

@kkraus14
Copy link
Copy Markdown
Collaborator

Part of the linux installation guide for CUDA indicates that PATH should be set to the bin directory for finding binary executables: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#environment-setup

If someone follows this installation guide and sets LD_LIBRARY_PATH for libraries and PATH for executables, we'd end up in a situation where they need to manually intervene in order for pathfinder to find the extent of their CTK installation properly.

@leofang
Copy link
Copy Markdown
Member

leofang commented May 20, 2026

It would be helpful that every PR has a corresponding issue created first, and we triage the issue and ensure work needs to happen. I am clueless what we're trying to solve here... By my standard I would just close this PR right away 😛

@leofang
Copy link
Copy Markdown
Member

leofang commented May 20, 2026

I read the security report that Keith shared offline. The way I look at, it is not even about any security issue. This PR actually ensures we honor the pathfinder contract better.

Assuming a pathological case where my CWD is %CONDA_PREFIX%\Library\bin for whatever reason, and I have NVCC there. But I also have NVCC installed from wheel. Following the pathfinder contract I would expect to get the latter NVCC. But because shutil.which prepends CWD (TIL!), I would actually get the former.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.pathfinder Everything related to the cuda.pathfinder module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants