Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: uv lock rule instead of genrule #2657

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

aignas
Copy link
Collaborator

@aignas aignas commented Mar 11, 2025

This change implements the uv pip compile as a rule.

In order to also make things easier to debug we provide
a runnable rule that has the same arguments and updates
the source tree output file automatically.

The main design is to have a regular lock rule and then
it returns a custom provider that has all of the recipe
ingredients to construct an executable rule. The execution
depends on having bash or bat files on Windows when running
the debugging rule target.

There are integration tests that exercise the locker. However,
things that are untested:

  • Windows support - current CI Windows runners do not support
    running the uv binary. Need help from some Windows users.
  • Running the integration tests within RBE seems to not work
    but locking when using RBE still works - there is a native_test
    exercising this.
  • Supporting keyring integration to pull packages from private
    index servers. https://docs.astral.sh/uv/configuration/authentication/
    should be supported.

Work towards #1325
Work towards #1975
Related #2663

Copy link
Collaborator Author

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after doing self review I think I want to have:

  • A script for launching uv from bash that would be compatible with UNIX.
  • A script for launching uv from powershell that would be compatible with Windows (generative AI may help with initial translation here).
  • Rewrite the internals a little to drop the Python usage (or at least most of it).

@aignas aignas force-pushed the uv-lock-rule-instead-of-genrule branch from fcc17c3 to 7ddfd28 Compare March 13, 2025 13:52
@aignas aignas changed the title uv lock rule instead of genrule feat: uv lock rule instead of genrule Mar 13, 2025
@aignas aignas force-pushed the uv-lock-rule-instead-of-genrule branch from 928f1e2 to 2c29ae2 Compare March 13, 2025 15:23
@aignas aignas marked this pull request as ready for review March 13, 2025 15:23
@aignas aignas requested a review from rickeylev as a code owner March 13, 2025 15:23
@bazel-contrib bazel-contrib deleted a comment from aignas Mar 13, 2025
@rickeylev
Copy link
Collaborator

The core of the implementation looks good to me (rule that looks up toolchains to run an build action; lock_run uses info lock provides via a provider). I have a variety of smaller comments about some particulars, but gtg now, so I'll have to wait until, ah, probably the weekend sometime.

The maybe_out behavior is interesting/clever, I just noticed that and will take a closer look.

Copy link
Collaborator

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got about half way through

Copy link
Collaborator

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to clarify, this is the sort of interface I imagined:

  1. bazel build //:requirements generates a requirements file using a build action
  2. bazel run //:requirements.update generates a requirements file using a build action and copies it into the local client to update the requirements file
  3. bazel run //:requirements.run -- <args> more of a direct invocation; runs uv directly; this is for debugging or experimenting with settings without having to modify the BUILD file.

The reason for (2) to use the output of (1) is because that's where the magic of bazel happens. By this i mean: build actions have isolated/deterministic/hermetic capabilities, can run remotely, are more amenable to having transitions applied, and are better about not having different output per user machine.

@aignas
Copy link
Collaborator Author

aignas commented Mar 16, 2025

OK, addressed the comments, PTAL.

I see that Windows is failing, because something is not compatible with the CI Windows version. I wonder if this means we should just tell users that Windows may be unsupported?

@aignas aignas force-pushed the uv-lock-rule-instead-of-genrule branch 2 times, most recently from daab523 to c67cdcd Compare March 18, 2025 09:23
@aignas
Copy link
Collaborator Author

aignas commented Mar 19, 2025

Open questions:

  • Currently providing keyring and other authentication things need to happen via system and this has not been fully tested. This might be relevant for compile_pip_requirements does not use credential helper #2663 were users need to be able to configure credential helpers for pulling packages from internal mirrors.
  • Should we use python_toolchain or a target to the current python interpreter? Passing //python:none could act as a way to indicate that I need python from the system, i.e. python. This would mean that we could use the //python/bin:interpreter which could include dependencies used during locking (e.g. keyring and similar). Right now I am not sure how to get that wired through. Maybe we should have a few more tests that setup a mirror that needs the keyring dep to connect, otherwise it is hard to not break the behaviour, but I would like to merge this as is for now because the PR is getting big and it is hard to keep track of all of the things.

@rickeylev
Copy link
Collaborator

Sorry, some short notice deadlines came upon me. I'll be able to have another look Weds evening or after

@rickeylev
Copy link
Collaborator

Should we use python_toolchain or a target to the current python interpreter?
Passing an interpreter means additional deps (keyring) can be captured

Since uv is coming from a toolchain, and we need python to match uv, I think both should come from a toolchain. A behavior unique to toolchains is that a group of toolchains can get resolved with the same config state. Getting the same behavior using labels, or a mix of a label and toolchain, might be tricky (it might be possible using exec groups?).

return [
DefaultInfo(
executable = executable,
runfiles = ctx.runfiles(transitive_files = info.srcs),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a subtle config mismatch here: info.srcs contains uv in exec config, but here it's going to run in target config.

I'm OK with ignoring this for now, though, to keep progress moving along.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a transition, but I am still new as to how to ensure that this will be in exec configuration.

I can follow this up with a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transition LGTM.

When a transition is applied to the rule itself, it decides what the "target" configuration is for the current target. It doesn't affect the exec config directly.

When toolchain resolution occurs, Bazel finds a toolchain that is compatible with the current target configuration. e.g. if python_version=3.12.1 is in the target configuration, then Bazel looks for a matching exec_tools toolchain with target_compatible_with=3.12. The e.g. exec_interpreter attribute will be in the exec config, but that's fine; the toolchain is claiming all its pieces are intended to produce output valid for 3.12.

HTH

args.run_shell.add("--no-progress")
args.run_shell.add("--quiet")

ctx.actions.run_shell(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding why this step needs the copy step as part of its execution.

Isn't this the same behavior?

srcs = list(ctx.attr.srcs)
if existing_file:
  srcs.append(existing_file)
output = declare_file(name + ".out")
ctx.actions.run([uv, "--output={output}"] + srcs, inputs=srcs, output=output)

uv is going to overwrite whatever --output specifies, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv expects the previous output to be in the location that is defined by --output={output}. If you pass it as a source then you will have extra log lines via requirements-existing.txt, which is not what you want here.

Comment on lines +376 to +438
# FIXME @aignas 2025-03-17: should we have one more target that transitions
# the python_version to ensure that if somebody calls `bazel build
# :requirements` that it is locked with the right `python_version`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate target, no. An attribute with rule-level cfg transition, yes.

This also enables tricks like this:

lock(python_version="3.10", srcs=select(":py310": "requirements_310.txt", ...))

Similarly, because an attr.output is not used, the outputs can be varied to e.g. include the python version (or platform, etc) into the output file name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nudge. I have added the transition to the lock rule and I think the overall design is now simpler.

I have also created an internal expand_template rule that makes the wiring of files easier. I have added the python version to the file name, so it might be both, a good and a bad decision and we will see. :)

@aignas
Copy link
Collaborator Author

aignas commented Mar 20, 2025

RBE is failing with an error:

bazel-out/k8-opt-exec-ST-150d2d5f4ddd/bin/python/private/python3: error while loading shared libraries: /b/f/w/bazel-out/k8-opt-exec-ST-150d2d5f4ddd/bin/python/private/../lib/libpython3.11.so.1.0: cannot open shared object file: No such file or directory

Not sure exactly why this is happening because I am passing interpreter.files_to_run to the action.

I think the documentation mentioning to use .files_to_run may be wrong.

EDIT: this guess was wrong. I am fiddling with the cfg for the uv_toolchain now. I think it should be exec instead of target.

EDIT2: the uv_toolchain has nothing to do with that, because the error is coming from python not finding the .so file, which to me suggests that the files are missing, but that should not be the case.

This change implements the uv pip compile as a rule.

In order to also make things easier to debug we provide
a runnable rule that has the same arguments and updates
the source tree output file automatically.

The main design is to have a regular lock rule and then
it returns a custom provider that has all of the recipe
ingredients to construct an executable rule. The execution
depends on having bash or powershell, however the
powershell script is not yet complete and requires some
help from the community.

Work towards bazel-contrib#1975.

Address all of the comments
@aignas aignas force-pushed the uv-lock-rule-instead-of-genrule branch from f6a052f to 7a37d5d Compare March 22, 2025 12:34
@aignas
Copy link
Collaborator Author

aignas commented Mar 22, 2025

It seems that the current_py_exec_toolchain had a bug that was not letting me use it in RBE. My analysis went as follows:

  1. Setup a minimal RBE on my machine like mentioned in Third party dependencies are incorrect when using RBE because host != exec #2241
  2. Add print statements to inspect what is in the sandbox:
    i. The python toolchain with all of the files was there
    ii. The current_py_exec_toolchain symlink was not a symlink and instead it was a copy.
  3. Given that the symlink was dereferenced, there are multiple ways to solve this:
    i. Use a dangling symlink, like the one where we are pointing outside the sandbox
    when non-hermetic toolchain is used.
    ii. Somehow symlink all of the directories so that the lib and other
    folders can be found.

I chose the 3.i. method because I only need one extra symlink and it does not
require exposing extra data through the py_runtime. E.g. I would need to
expose the contents of bin folder most likely and the lib, etc folders for
everything to work properly.

Maybe the 3.ii solution could be also beneficial for creating #2156 and setting up
the venv inside the py_executable rules, but we can look at that later.

EDIT: it seems that there has to be an extra option that I can chose as the 3.i
breaks the integration tests.

EDIT2: I wonder if I am hitting bazelbuild/bazel#23620

EDIT3: OK, so in the end the solution was to forward the runtime field from
the TARGET_TOOLCHAIN_TYPE to the EXEC_TOOLS_TOOLCHAIN_TYPE which means that
we stop relying on symlinks created by the current_interpreter_executable. To
be honest, I am not sure if it makes sense to keep it for anything else but
just returning the toolchain - the interpreter symlink will not work properly
in RBE and it is quite difficult to debug when that is the case.

@amaranthjinn
Copy link

Would this feature fix the issue #2640?

@aignas
Copy link
Collaborator Author

aignas commented Mar 25, 2025

Would this feature fix the issue #2640?

It would not, because the linked issue is about building wheels rather than locking them.

@rickeylev
Copy link
Collaborator

python can't find its .so files when a "regular" symlink action is used ... bazel #23620

Yeah, I'm pretty sure that's what you're seeing. I haven't looked at the PR code yet, but what I recall was you can't simply ctx.actions.symlink(<output>, <underlying interpreter>) because Bazel RBE is prone to creating a copy instead of a symlink. It works locally because Python has a behavior where it will check if argv[0] is a symlink, and if so, realpath() to find the actual location of the python interpreter (and thus all the runtime's files). I think this is to support stuff like venvs, or creating a convenience symlink to the interpreter in one place while it's actually installed in another (e.g. /usr/lib/python3 is a symlink to /usr/lib/python3.10, or whatever).

Using declare_symlink, or a wrapper script, should work, though. I'll have a look at the PR now.

@rickeylev
Copy link
Collaborator

Beh. Looking at current_interpreter_executable.bzl, I think changing L93 to use declare_symlink() should work?

I tried setting up a local RBE to test it, but couldn't get bazel and the RBE talking. I'll have to try again when I have more time

@aignas
Copy link
Collaborator Author

aignas commented Mar 25, 2025

Beh. Looking at current_interpreter_executable.bzl, I think changing L93 to use declare_symlink() should work?

I tried setting up a local RBE to test it, but couldn't get bazel and the RBE talking. I'll have to try again when I have more time

If you use declare_symlink then you cannot use a target_file and you need to use target_path. So that is only possible to work with target_path in my tests. This is documented in https://bazel.build/rules/lib/builtins/actions#symlink.

],
# It seems that the CI remote executors for the RBE do not have network
# connectivity. Is it only our setup or is it a property of RBE?
tags = ["no-remote-exec"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is where the RBE is disabled for our tests

@rickeylev
Copy link
Collaborator

I got an RBE setup locally. Yeah, current_interpreter_executable (the thing that backs ExecTools.exec_interpreter) does indeed look entirely broken with RBE. Argh >.<

So...

  • An executable rule has to define its own output file. It can't forward on another file. Hence e.g. current_interpreter_executable has to call e.g. declare_file/symlink
  • If it uses declare_file(), it can't use symlink(), because RBE will create a copy, and Python can no longer traverse back to its actual install location.
  • If it uses declare_symlink, then symlink() has to write either a bin-relative path (the file that is a sibling of the runfiles directory), or a runfiles-relative path (the copy of the executable within the runfiles tree). The former allows e.g. ctx.actions.run(executable=...) to work; the latter allows e.g. ctx.actions.run_shell("<runfiles path to to executable>") to work.

The only way I can think of to make both work is, essentially, to make the output executable a wrapper. e.g. shell code that figures out how to locate the file it wants to run and execs it.

Or, maybe if py_runtime() is directly executable? i.e. sets executable=True, and returns DefaultInfo(executable=...), and the whole target gets forwarded on.

The reason I'm so keen on having an exectuable=True attribute (i.e. a thing with FilesToRun provider) that is fed to ctx.actions.run() is that is supposed to be the proper abstraction -- "run the interpreter, let its executable rule figure out the details". This is in comparison to e.g. having to directly use PyRuntime, then manually pass PyRuntime.files etc to ctx.actions.run.

Well, its late now, so gotta log off.

@aignas
Copy link
Collaborator Author

aignas commented Mar 25, 2025

Hmmm... since I have a question on RB slack about the requires-network, I can wait on this.

I kind of am getting where you are coming from - just running python should not be that hard and should not require py_runtime. I'll think about it as well, gotta go.

@rickeylev
Copy link
Collaborator

I poked this some more and have some ideas, but want to explore them a bit more.

I don't want to block this PR on them, though. How about for now, revert the changes to the exec toolchain stuff. The uv rule can still get at the PyRuntime object via exec_interpreter: exec_interpreter[ToolchainInfo].py3_runtime.

Copy link
Collaborator

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the exec tools toolchain changes as I mentioned in the other comment, otherwise LGTM

return [SentinelInfo()]
return [
SentinelInfo(),
# Also output ToolchainInfo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Also output ToolchainInfo
# Also output ToolchainInfo to allow it to be used for no-op toolchains

Comment on lines +148 to +151
progress_message = "Creating a requirements.txt with uv: //{}:{}".format(
ctx.label.package,
ctx.label.name,
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use %{label} instead

Suggested change
progress_message = "Creating a requirements.txt with uv: //{}:{}".format(
ctx.label.package,
ctx.label.name,
),
progress_message = "Creating a requirements.txt with uv: %{label}",

Comment on lines +191 to +192
doc = """\
""",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add doc or just omit the doc attribute

return [
DefaultInfo(
executable = executable,
runfiles = ctx.runfiles(transitive_files = info.srcs),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transition LGTM.

When a transition is applied to the rule itself, it decides what the "target" configuration is for the current target. It doesn't affect the exec config directly.

When toolchain resolution occurs, Bazel finds a toolchain that is compatible with the current target configuration. e.g. if python_version=3.12.1 is in the target configuration, then Bazel looks for a matching exec_tools toolchain with target_compatible_with=3.12. The e.g. exec_interpreter attribute will be in the exec config, but that's fine; the toolchain is claiming all its pieces are intended to produce output valid for 3.12.

HTH

Comment on lines +38 to +39
# It seems that the CI remote executors for the RBE do not have network
# connectivity. Is it only our setup or is it a property of RBE?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not intrinsic to RBE, so must be something wit our RBE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants