Skip to content

Conversation

dougbtv
Copy link
Contributor

@dougbtv dougbtv commented Jul 17, 2025

Main goal is in the context of CI, in order to not build wheels when unnecessary, and speed up CI builds overall.

  • added VLLM_DOCKER_BUILD_CONTEXT to keep precompiled wheel logic in setup.py but add parameterization for use during a docker build.
  • normalized VLLM_USE_PRECOMPILED, treat only "1" or "true" as true (makes it more complex to force unset in CI context)
  • setup.py now copies contextually-named precompiled wheel into dist/ during docker builds.
  • smoother precompiled wheel flow, overall, in docker

See also: vllm-project/ci-infra#125
In follow up of: #20943

Notably: setup.py would automatically fetch the upstream main and rebase your work on top of it -- in the context of docker, it always takes the remote main commitish and uses that. This does require that your work be rebased if it's dependent on upstream changes currently in main.

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Allows pre-built wheels to be used in a docker build context, especially for CI build improvements where building wheels isn't necessary (currently: we're building wheels one very CI run)

Test Plan

I'd build like this:

time docker build --no-cache=true --progress plain --file docker/Dockerfile --build-arg max_jobs=16 --build-arg USE_SCCACHE=0 --build-arg USE_FLASHINFER_PREBUILT_WHEEL=true --build-arg VLLM_USE_PRECOMPILED=1 --tag dougbtv/vllm:precomp-nocache . > /tmp/doug.docker.precomp.log 2>&1

And would get resulting times around 3m5.478s on my test system -- the bottleneck is now the downloads from external repositories, such as apt installs and pip installs.

Back of the napkin math (just me looking at a few of my own runs) it's usually about 40 minutes for a full out build in buildkite CI the way it stands now.

Test Result

I'd then validate that it would run using:

docker run -it --rm --gpus device=4 -e VLLM_LOGGING_LEVEL=DEBUG -v /home/dougtest/network-share/vllm/tests:/workdir dougbtv/vllm:precomp-nocache-rebase

Currently runs.

As it stands, without implementation in the ci-infra repo, builds should happen the same way that they do today (e.g. where VLLM_USE_PRECOMPILED is false-y)

(Optional) Documentation Update

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Jul 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for using precompiled wheels within Docker builds to accelerate CI. It adds a VLLM_DOCKER_BUILD_CONTEXT flag to alter setup.py behavior, normalizes the VLLM_USE_PRECOMPILED environment variable, and adds logic to copy the correct wheel into the dist/ directory.

The changes are logical and well-implemented to achieve the goal. However, I've identified a significant issue with hardcoded architecture tags (x86_64) in both setup.py and docker/Dockerfile. This will cause build failures on other architectures like arm64, which the Dockerfile appears to support. I've left comments with suggestions on how to make this logic platform-aware.

RUN if [ "$VLLM_USE_PRECOMPILED" = "1" ]; then \
echo "Cleaning up extra wheels in dist/..." && \
# Identify the most recent manylinux1_x86_64 wheel
KEEP_WHEEL=$(ls -t dist/*manylinux1_x86_64.whl 2>/dev/null | head -n1) && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The wheel filename pattern *manylinux1_x86_64.whl is hardcoded. This will not work for other architectures like arm64, for which there is build logic in this Dockerfile (using TARGETPLATFORM).

When using precompiled wheels on arm64, this step will fail to find the correct wheel to keep. If there are multiple wheels in dist/, it might not clean up correctly, potentially leading to the wrong wheel being installed in the final image.

This should be parameterized. You could use a shell variable set based on TARGETPLATFORM to specify the wheel pattern.

For example:

if [ "$TARGETPLATFORM" = "linux/arm64" ]; then
    WHEEL_PLATFORM_TAG="*manylinux2014_aarch64.whl"
else
    WHEEL_PLATFORM_TAG="*manylinux1_x86_64.whl"
fi
KEEP_WHEEL=$(ls -t dist/${WHEEL_PLATFORM_TAG} 2>/dev/null | head -n1)

This change is necessary to support multi-architecture builds with precompiled wheels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of other hardcoding here, we're currently specifying:

https://wheels.vllm.ai/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl

elsewhere as a statically defined string. So, I think we can directly assume the arch.

version = version_line.split(": ")[1].strip()

# Build correct filename using internal version
arch_tag = "cp38-abi3-manylinux1_x86_64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The arch_tag is hardcoded to cp38-abi3-manylinux1_x86_64. This will cause issues when building for other architectures, such as arm64, which is handled in the Dockerfile via the TARGETPLATFORM build argument. This will lead to incorrect wheel names and build failures on non-x86_64 platforms when using precompiled wheels.

To make this more robust, you should determine the architecture tag dynamically. A good approach would be to set an environment variable in the Dockerfile based on TARGETPLATFORM and read it here.

For example, in your Dockerfile:

ARG TARGETPLATFORM
RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
        export VLLM_ARCH_TAG="cp38-abi3-manylinux2014_aarch64" ; \
    else \
        export VLLM_ARCH_TAG="cp38-abi3-manylinux1_x86_64" ; \
    fi && \
    ...
    python3 setup.py bdist_wheel ...

Then in setup.py, you could read this environment variable. This would make the build process platform-aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 above.

@dougbtv dougbtv force-pushed the use-precompiled-truthiness branch from b34f6f6 to 2c095c4 Compare July 17, 2025 16:32
@simon-mo
Copy link
Collaborator

Confirming that the release wheel building workflow still works? (so the per commit is still published post merge)

@dougbtv
Copy link
Contributor Author

dougbtv commented Jul 17, 2025

Confirming that the release wheel building workflow still works? (so the per commit is still published post merge)

Thanks for asking @simon-mo -- that behavior shouldn't change yet with this PR. Gist is that this PR just enables using VLLM_USE_PRECOMPILED during docker builds but doesn't actually call the docker build yet. That change will (I believe) go into the ci-infra (however, I haven't found the publish workflow for wheels.vllm.ai -- -- happy to take a pointer if you have it handy)

That being said, I still need to take another pass at the related ci infra PR - #125 and ensure that we implement the pre-merge/post-merge logic we discussed (optionally run wheel build pre-merge, ensure build and publish post-merge)

@dougbtv dougbtv force-pushed the use-precompiled-truthiness branch from 2c095c4 to 6488c11 Compare July 17, 2025 17:43
Main goal is in the context of CI, in order to not build wheels when unnecessary, and speed up CI builds overall.

- added VLLM_DOCKER_BUILD_CONTEXT to envs to skip git + unzip logic in setup.py
- normalized VLLM_USE_PRECOMPILED, treat only "1" or "true" as true
- setup.py now copies contextually-named precompiled wheel into dist/ during docker builds.
- smoother precompiled wheel flow, overall, in docker

Signed-off-by: dougbtv <[email protected]>
@dougbtv dougbtv force-pushed the use-precompiled-truthiness branch from 6488c11 to 3dcc491 Compare July 17, 2025 17:52
@dougbtv
Copy link
Contributor Author

dougbtv commented Jul 18, 2025

Confirmed, this will not impact the behavior of uploading wheels for availability on wheels.vllm.ai -- the wheel upload is triggered here:

https://github.com/vllm-project/vllm/blob/main/.buildkite/release-pipeline.yaml#L2

The dockerfile build doesn't ask for VLLM_USE_PRECOMPILED, so the wheel build will still happen via setup.py when executed at docker build time.

That action in turn uploads the wheel at: https://github.com/vllm-project/vllm/blob/main/.buildkite/scripts/upload-wheels.sh#L77-L78

(Thanks Kevin Luu for the pointer, too)

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug in finding the commit to use. But other places LGTM

Comment on lines +301 to +304
# In Docker build context, .git may be immutable or missing.
if envs.VLLM_DOCKER_BUILD_CONTEXT:
return upstream_main_commit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be problematic as the main commit might not have the wheels ready (e.g. when it is just merged).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we still have nightly in that case.

            # Fallback to nightly wheel if latest commit wheel is unavailable,
            # in this rare case, the nightly release CI hasn't finished on main.
            if not is_url_available(wheel_location):
                wheel_location = "https://wheels.vllm.ai/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the same problem will be if the PR merge base is not compatible with latest main/nightly. we can address this as a follow up.

@simon-mo simon-mo merged commit a1873db into vllm-project:main Jul 29, 2025
14 checks passed
liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025
simon-mo added a commit to simon-mo/vllm that referenced this pull request Aug 1, 2025
wenscarl pushed a commit to wenscarl/vllm that referenced this pull request Aug 4, 2025
wenscarl pushed a commit to wenscarl/vllm that referenced this pull request Aug 4, 2025
vadiklyutiy pushed a commit to CentML/vllm that referenced this pull request Aug 5, 2025
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
taneem-ibrahim pushed a commit to taneem-ibrahim/vllm that referenced this pull request Aug 14, 2025
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 14, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants