@@ -361,18 +361,12 @@ triton profile -m llama-3.1-8b-instruct --service-kind openai --endpoint-type ch
361361
362362## Serving a HuggingFace LLM Model with LLM API
363363
364- > [ !NOTE]
365- > LLM API has not yet been integrated into the official triton server tensorrt_llm backend image yet.
366- > To start the LLM API functionality, the user will only
367-
368364The LLM API is a high-level Python API and designed for Tensorrt LLM workflows. It could
369365convert a LLM model in Hugging Face format into a Tensorrt LLM engine and serve the engine with a unified Python API without invoking different
370366engine build and converting scripts.
371367To use the LLM API with Triton CLI, import the model with ` --backend llmapi `
372368``` bash
373- export MODEL_NAME=" llama-3.1-8b-instruct"
374- export HF_ID=" meta-llama/Llama-3.1-8B-Instruct"
375- triton import -m $MODEL_NAME --source " hf:$HF_ID " --backend llmapi
369+ triton import -m " llama-3.1-8b-instruct" --backend llmapi
376370```
377371
378372Huggingface models will be downloaded at runtime when starting the LLM API engine if not found
@@ -383,6 +377,15 @@ startup time. tensorrt_llm>=0.18.0 is required.
383377#### Example
384378
385379``` bash
380+ docker run -ti \
381+ --gpus all \
382+ --network=host \
383+ --shm-size=1g --ulimit memlock=-1 \
384+ -v /tmp:/tmp \
385+ -v ${HOME} /models:/root/models \
386+ -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
387+ nvcr.io/nvidia/tritonserver:25.03-trtllm-python-py3
388+
386389# Install the Triton CLI
387390pip install git+https://github.com/triton-inference-server/triton_cli.git@main
388391
@@ -394,7 +397,7 @@ triton remove -m all
394397triton import -m llama-3.1-8b-instruct --backend llmapi
395398
396399# Start Triton pointing at the default model repository
397- triton start --frontend openai --mode docker
400+ triton start --frontend openai
398401
399402# Interact with model at http://localhost:9000
400403curl -s http://localhost:9000/v1/chat/completions -H ' Content-Type: application/json' -d ' {
0 commit comments