-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] RayServe Single-Host TPU v6e Example with vLLM #47814
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
cc: @andrewsykim |
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md
Outdated
Show resolved
Hide resolved
doc/source/cluster/kubernetes/examples/tpu-multi-host-rayservice.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
I went through and updated this guide for the newer TPU Trillium so that it'll be more useful, it no longer requires multi-host since v6e can fit Llama 70B on a single node. I'll create a PR with a separate guide showcasing serving with Llama 405B and multi-host TPUs. cc: @andrewsykim @kevin85421 |
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Why are these changes needed?
This PR adds a new guide to the Ray docs that details how to serve an LLM with vLLM and single-host TPUs on GKE. This PR has been tested by running through the steps in the proposed guide and verifying correct output. This PR uses sample code from https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/tree/main/ai-ml/gke-ray/rayserve/llm/tpu.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.