[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

ryanaoleary · 2024-09-25T00:48:39Z

Why are these changes needed?

This PR adds a new guide to the Ray docs that details how to serve an LLM with vLLM and single-host TPUs on GKE. This PR has been tested by running through the steps in the proposed guide and verifying correct output. This PR uses sample code from https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/tree/main/ai-ml/gke-ray/rayserve/llm/tpu.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary · 2024-09-25T00:52:38Z

cc: @andrewsykim

Signed-off-by: Ryan O'Leary <[email protected]>

doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md

Signed-off-by: Ryan O'Leary <[email protected]>

stale · 2025-02-01T01:11:57Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary · 2025-02-04T02:29:36Z

I went through and updated this guide for the newer TPU Trillium so that it'll be more useful, it no longer requires multi-host since v6e can fit Llama 70B on a single node. I'll create a PR with a separate guide showcasing serving with Llama 405B and multi-host TPUs. cc: @andrewsykim @kevin85421

Signed-off-by: Ryan O'Leary <[email protected]>

Initial commit for multi-host Ray vLLM example

c8d0c8b

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary requested review from architkulkarni, maxpumperla, pcmoritz, kevin85421 and a team as code owners September 25, 2024 00:48

ryanaoleary added 2 commits September 25, 2024 00:50

Fix step numbering

b1aadea

Signed-off-by: Ryan O'Leary <[email protected]>

fix working_dir link

4233e38

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary added 2 commits September 25, 2024 02:45

Update RayService manifest

d4d0d51

Signed-off-by: Ryan O'Leary <[email protected]>

Add example dashboard image

ed43bc0

Signed-off-by: Ryan O'Leary <[email protected]>

kevin85421 self-assigned this Sep 25, 2024

andrewsykim reviewed Sep 25, 2024

View reviewed changes

ryanaoleary added 2 commits September 27, 2024 07:28

Fix instructions/comments

70fbab0

Signed-off-by: Ryan O'Leary <[email protected]>

Set default docker vals

15b4848

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary changed the title ~~RayServe Multi-Host TPU Example with vLLM~~ [Doc] RayServe Multi-Host TPU Example with vLLM Oct 3, 2024

Uploaded wrong dashboard image file

a5668a9

Signed-off-by: Ryan O'Leary <[email protected]>

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 1, 2025

Refactor to be a single host guide with v5e and v6e

65d096c

Signed-off-by: Ryan O'Leary <[email protected]>

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 4, 2025

ryanaoleary changed the title ~~[Doc] RayServe Multi-Host TPU Example with vLLM~~ [Doc] RayServe Single-Host TPU v5e and v6e Example with vLLM Feb 4, 2025

Remove v5e instructions

8c76ba5

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary changed the title ~~[Doc] RayServe Single-Host TPU v5e and v6e Example with vLLM~~ [Doc] RayServe Single-Host TPU v6e Example with vLLM Feb 4, 2025

ryanaoleary added 3 commits February 4, 2025 02:21

Fix doc ref

d3622bf

Signed-off-by: Ryan O'Leary <[email protected]>

Undo changes to GKE cluster create doc

21dcad5

Signed-off-by: Ryan O'Leary <[email protected]>

Update guide overview

66f2090

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary requested a review from andrewsykim February 4, 2025 02:27

Remove unnecessary brackets

5d88878

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary added 2 commits February 4, 2025 02:38

Fix flipped command

e7cec10

Signed-off-by: Ryan O'Leary <[email protected]>

Small edits to commands

40f0972

Signed-off-by: Ryan O'Leary <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

ryanaoleary commented Sep 25, 2024 •

edited

Loading

ryanaoleary commented Sep 25, 2024

stale bot commented Feb 1, 2025

ryanaoleary commented Feb 4, 2025

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

Are you sure you want to change the base?

[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814

Conversation

ryanaoleary commented Sep 25, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

ryanaoleary commented Sep 25, 2024

stale bot commented Feb 1, 2025

ryanaoleary commented Feb 4, 2025

ryanaoleary commented Sep 25, 2024 •

edited

Loading