Skip to content

[Frontend] Adding the "User Defined Custom Tool Calling" parser for the Llama models #12752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1,243 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1243 commits
Select commit Hold shift + click to select a range
c116565
[V1][Sampler] Faster top-k only implementation (#15478)
njhill Mar 26, 2025
0e2d515
Support SHA256 as hash function in prefix caching (#15297)
dr75 Mar 26, 2025
63eb14c
Applying some fixes for K8s agents in CI (#15493)
Alexei-V-Ivanov-AMD Mar 26, 2025
63cd4fe
[V1] TPU - Revert to exponential padding by default (#15565)
alexm-redhat Mar 26, 2025
930df2b
[V1] TPU CI - Fix test_compilation.py (#15570)
alexm-redhat Mar 26, 2025
0a9f2e4
Use Cache Hinting for fused_moe kernel (#15511)
wrmedford Mar 26, 2025
32bbe1d
[TPU] support disabling xla compilation cache (#15567)
yaochengji Mar 27, 2025
cc52a9d
Support FIPS enabled machines with MD5 hashing (#15299)
MattTheCuber Mar 27, 2025
d503e7d
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
ElizaWszola Mar 27, 2025
85249b4
Add automatic tpu label to mergify.yml (#15560)
mgoin Mar 27, 2025
e620406
add platform check back (#15578)
Chenyaaang Mar 27, 2025
ca15761
[misc] LoRA: Remove unused long context test data (#15558)
varun-sundar-rabindranath Mar 27, 2025
ac4d911
[Doc] Update V1 user guide for fp8 kv cache support (#15585)
wayzeng Mar 27, 2025
bf7b9ac
[moe][quant] add weight name case for offset (#15515)
MengqingCao Mar 27, 2025
8d977af
[V1] Refactor num_computed_tokens logic (#15307)
comaniac Mar 27, 2025
ea76699
Allow torchao quantization in SiglipMLP (#15575)
jerryzh168 Mar 27, 2025
1675513
[ROCm] Env variable to trigger custom PA (#15557)
gshtras Mar 27, 2025
adadb10
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SE…
yaochengji Mar 27, 2025
5d17ed8
[Misc] Restrict ray version dependency and update PP feature warning …
ruisearch42 Mar 27, 2025
5bf0806
[TPU] Avoid Triton Import (#15589)
robertgshaw2-redhat Mar 27, 2025
5d648f7
[Misc] Consolidate LRUCache implementations (#15481)
Avabowler Mar 27, 2025
52426d5
[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587)
robertgshaw2-redhat Mar 27, 2025
8b8e4cc
[Misc] Clean up `scatter_patch_features` (#15559)
DarkLight1337 Mar 27, 2025
b24abe0
[Misc] Use model_redirect to redirect the model name to a local folde…
noooop Mar 27, 2025
dc53ae9
Fix incorrect filenames in vllm_compile_cache.py (#15494)
zou3519 Mar 27, 2025
93f27b2
[Doc] update --system for transformers installation in docker doc (#1…
reidliu41 Mar 27, 2025
6859fad
[Model] MiniCPM-V/O supports V1 (#15487)
DarkLight1337 Mar 27, 2025
5fb8471
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on…
h-sugi Mar 27, 2025
4dfda0a
[Doc] Link to onboarding tasks (#15629)
DarkLight1337 Mar 27, 2025
7a2867e
[Misc] Replace `is_encoder_decoder_inputs` with `split_enc_dec_inputs…
DarkLight1337 Mar 27, 2025
44e203f
[Feature] Add middleware to log API Server responses (#15593)
terrytangyuan Mar 27, 2025
ab95221
[Misc] Avoid direct access of global `mm_registry` in `compute_encode…
DarkLight1337 Mar 27, 2025
abf7481
Use absolute placement for Ask AI button (#15628)
hmellor Mar 27, 2025
873f81c
[Bugfix][TPU][V1] Fix recompilation (#15553)
NickLucche Mar 27, 2025
0403684
Correct PowerPC to modern IBM Power (#15635)
clnperez Mar 27, 2025
2ede7da
[CI] Update rules for applying `tpu` label. (#15634)
russellb Mar 27, 2025
2f4bbe3
[V1] AsyncLLM data parallel (#13923)
njhill Mar 27, 2025
2a48d90
[TPU] Lazy Import (#15656)
robertgshaw2-redhat Mar 28, 2025
58aac98
[Quantization][V1] BitsAndBytes support V1 (#15611)
jeejeelee Mar 28, 2025
3c3b7fd
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (…
kebe7jun Mar 28, 2025
bc51b9f
[Doc] Fix dead links in Job Board (#15637)
wwl2755 Mar 28, 2025
2e438c8
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
robertgshaw2-redhat Mar 28, 2025
4771a27
Revert "Use Cache Hinting for fused_moe kernel (#15511)" (#15645)
wrmedford Mar 28, 2025
f52842f
[Misc]add coding benchmark for speculative decoding (#15303)
CXIAAAAA Mar 28, 2025
aff825a
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#…
gshtras Mar 28, 2025
6596ab9
Refactor error handling for multiple exceptions in preprocessing (#15…
JasonZhu1313 Mar 28, 2025
164af0f
[Bugfix] Fix `mm_hashes` forgetting to be passed (#15668)
DarkLight1337 Mar 28, 2025
4cd614b
[V1] Remove legacy input registry (#15673)
DarkLight1337 Mar 28, 2025
9ebaf77
[TPU][CI] Fix TPUModelRunner Test (#15667)
robertgshaw2-redhat Mar 28, 2025
7ff23c2
[Refactor][Frontend] Keep all logic about reasoning into one class (#…
gaocegege Mar 28, 2025
84b8a2c
[CPU][CI] Improve CPU Dockerfile (#15690)
bigPYJ1151 Mar 28, 2025
f6bee17
[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' (#1…
jeejeelee Mar 28, 2025
6dce0da
[Misc] Fix test_sleep to use query parameters (#14373)
lizzzcai Mar 28, 2025
4960fe3
[Bugfix][Frontend] Eliminate regex based check in reasoning full gene…
gaocegege Mar 28, 2025
c33b13f
[Frontend] update priority for --api-key and VLLM_API_KEY (#15588)
reidliu41 Mar 28, 2025
0bead01
[Docs] Add "Generation quality changed" section to troubleshooting (#…
hmellor Mar 28, 2025
163c0e0
[Model] Adding torch compile annotations to chatglm (#15624)
jeejeelee Mar 28, 2025
a429138
[Bugfix][v1] xgrammar structured output supports Enum. (#15594)
chaunceyjiang Mar 28, 2025
389e8cc
[Bugfix] `embed_is_patch` for Idefics3 (#15696)
DarkLight1337 Mar 28, 2025
ec75d81
[V1] Support disable_any_whtespace for guidance backend (#15584)
russellb Mar 28, 2025
87ab13f
[doc] add missing imports (#15699)
reidliu41 Mar 28, 2025
e417f82
[Bugfix] Fix regex compile display format (#15368)
kebe7jun Mar 28, 2025
35d942b
Fix cpu offload testing for gptq/awq/ct (#15648)
mgoin Mar 28, 2025
3f0c76c
[Minor] Remove TGI launching script (#15646)
WoosukKwon Mar 28, 2025
404e7b7
[Misc] Remove unused utils and clean up imports (#15708)
DarkLight1337 Mar 28, 2025
6fc07da
[Misc] Remove stale func in KVTransferConfig (#14746)
ShangmingCai Mar 28, 2025
f43ac32
[TPU] [Perf] Improve Memory Usage Estimation (#15671)
robertgshaw2-redhat Mar 28, 2025
b924c5e
[Bugfix] [torch.compile] Add Dynamo metrics context during compilatio…
ProExpertProg Mar 28, 2025
355aec5
[V1] TPU - Fix the chunked prompt bug (#15713)
alexm-redhat Mar 28, 2025
44a2d4d
[Misc] cli auto show default value (#15582)
reidliu41 Mar 28, 2025
a31a106
implement prometheus fast-api-instrumentor for http service metrics (…
daniel-salib Mar 29, 2025
2857da3
[Docs][V1] Optimize diagrams in prefix caching design (#15716)
simpx Mar 29, 2025
26b79cb
[ROCm][AMD][Build] Update AMD supported arch list (#15632)
gshtras Mar 29, 2025
072be34
[Model] Support Skywork-R1V (#15397)
pengyuange Mar 29, 2025
3423bc2
[Docs] Document v0 engine support in reasoning outputs (#15739)
gaocegege Mar 29, 2025
4ca67b9
[Misc][V1] Misc code streamlining (#15723)
njhill Mar 29, 2025
660c709
[Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
varun-sundar-rabindranath Mar 29, 2025
eb95f62
[CI] Speed up V1 structured output tests (#15718)
russellb Mar 29, 2025
c648c16
Use numba 0.61 for python 3.10+ to support numpy>=2 (#15692)
cyyever Mar 29, 2025
aa4af0e
[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts…
jinzhen-lin Mar 29, 2025
d65ddc2
[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K (#15714)
NickLucche Mar 29, 2025
c98e52c
[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 (#15659)
yarongmu-google Mar 29, 2025
67e4b4c
[doc] update doc (#15740)
reidliu41 Mar 29, 2025
5860555
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433)
tjtanaa Mar 29, 2025
039e288
[V1] [Feature] Collective RPC (#15444)
wwl2755 Mar 29, 2025
cd16bf3
[Feature][Disaggregated] Support XpYd disaggregated prefill with Moon…
ShangmingCai Mar 29, 2025
4051628
[V1] Support interleaved modality items (#15605)
ywang96 Mar 29, 2025
0e2175f
[V1][Minor] Simplify rejection sampler's parse_output (#15741)
WoosukKwon Mar 29, 2025
8e99bff
[Bugfix] Fix Mllama interleaved images input support (#15564)
Isotr0py Mar 29, 2025
2d849fc
[CI] xgrammar structured output supports Enum. (#15757)
chaunceyjiang Mar 30, 2025
9366565
[Bugfix] Fix Mistral guided generation using xgrammar (#15704)
juliendenize Mar 30, 2025
8bf940f
[doc] update conda to usage link in installation (#15761)
reidliu41 Mar 30, 2025
5694675
fix test_phi3v (#15321)
pansicheng Mar 30, 2025
184ddd6
[V1] Override `mm_counts` for dummy data creation (#15703)
DarkLight1337 Mar 30, 2025
fbb5f78
fix: lint fix a ruff checkout syntax error (#15767)
yihong0618 Mar 30, 2025
6dc0ec6
[Bugfix] Added `embed_is_patch` mask for fuyu model (#15731)
kylehh Mar 30, 2025
9de3323
fix: Comments to English for better dev experience (#15768)
yihong0618 Mar 30, 2025
62967a4
[V1][Scheduler] Avoid calling `_try_schedule_encoder_inputs` for ever…
WoosukKwon Mar 30, 2025
1f7129c
[Misc] update the comments (#15780)
lcy4869 Mar 31, 2025
33ae51b
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup …
JenZhao Mar 31, 2025
d4dd879
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
charlifu Mar 31, 2025
736a97c
Recommend developing with Python 3.12 in developer guide (#15811)
hmellor Mar 31, 2025
2f8a219
fix: better install requirement for install in setup.py (#15796)
yihong0618 Mar 31, 2025
e522bff
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
youkaichao Mar 31, 2025
d9130af
[Model] Update support for NemotronNAS models (#15008)
Naveassaf Mar 31, 2025
65977ee
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813)
alex-jw-brooks Mar 31, 2025
c124f0d
[Bugfix] Fix missing return value in load_weights method of adapters.…
noc-turne Mar 31, 2025
85cd82c
Upgrade `transformers` to `v4.50.3` (#13905)
hmellor Mar 31, 2025
f6c1a50
[Bugfix] Check dimensions of multimodal embeddings in V1 (#15816)
DarkLight1337 Mar 31, 2025
f0265bb
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
ShangmingCai Mar 31, 2025
cabe554
fix: change GB to GiB in logging close #14979 (#15807)
yihong0618 Mar 31, 2025
f275ca3
[V1] TPU CI - Add basic perf regression test (#15414)
alexm-redhat Mar 31, 2025
14c4498
Fix Transformers backend compatibility check (#15290)
hmellor Mar 31, 2025
b08d157
[V1][Core] Remove unused speculative config from scheduler (#15818)
markmc Mar 31, 2025
84aed20
Move dockerfiles into their own directory (#14549)
hmellor Mar 31, 2025
37e67c7
[Distributed] Add custom allreduce support for ROCM (#14125)
ilmarkov Apr 1, 2025
7647ce0
Rename fallback model and refactor supported models section (#15829)
hmellor Apr 1, 2025
8a14237
[Frontend] Add Phi-4-mini function calling support (#14886)
kinfey Apr 1, 2025
0f62fd2
[Bugfix][Model] fix mllama multi-image (#14883)
yma11 Apr 1, 2025
ed015a9
[Bugfix] Fix extra comma (#15851)
haochengxia Apr 1, 2025
a132afd
[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding …
alexwl Apr 1, 2025
1a8832d
[V1] TPU - Fix fused MOE (#15834)
alexm-redhat Apr 1, 2025
e1c55e1
[sleep mode] clear pytorch cache after sleep (#15248)
lionelvillard Apr 1, 2025
0af00e1
[ROCm] Use device name in the warning (#15838)
gshtras Apr 1, 2025
2ea26ba
[V1] Implement sliding window attention in kv_cache_manager (#14097)
heheda12345 Apr 1, 2025
9d2e02b
fix: can not use uv run collect_env close #13888 (#15792)
yihong0618 Apr 1, 2025
8dbdca5
[Feature] specify model in config.yaml (#15798)
wayzeng Apr 1, 2025
c1bda21
[Misc] Enable V1 LoRA by default (#15320)
varun-sundar-rabindranath Apr 1, 2025
6f64b2a
[Misc] Fix speculative config repr string (#15860)
ShangmingCai Apr 1, 2025
0351c0b
[Docs] Fix small error in link text (#15868)
hmellor Apr 1, 2025
e003840
[Bugfix] Fix no video/image profiling edge case for `MultiModalDataPa…
Isotr0py Apr 1, 2025
ebc2914
[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE (#15831)
ruisearch42 Apr 1, 2025
7232ad4
setup correct nvcc version with CUDA_HOME (#15725)
chenyang78 Apr 1, 2025
5bcb069
[Model] Support Mistral3 in the HF Transformers format (#15505)
mgoin Apr 1, 2025
b72a115
[Misc] remove unused script (#15746)
reidliu41 Apr 1, 2025
1ebfd45
Remove `format.sh` as it's been unsupported >70 days (#15884)
hmellor Apr 1, 2025
fb6ea0d
[New Model]: jinaai/jina-reranker-v2-base-multilingual (#15876)
noooop Apr 1, 2025
e80f7bc
[Doc] Quark quantization documentation (#15861)
cha557 Apr 1, 2025
59a0b04
Reinstate `format.sh` and make `pre-commit` installation simpler (#15…
hmellor Apr 1, 2025
28dff57
[Misc] Allow using OpenCV as video IO fallback (#15055)
Isotr0py Apr 1, 2025
0f12cde
[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm…
gshtras Apr 1, 2025
f875c98
Add option to use DeepGemm contiguous grouped gemm kernel for fused M…
bnellnm Apr 1, 2025
2388497
[CI/Build] Clean up LoRA tests (#15867)
jeejeelee Apr 1, 2025
56d0c43
[Model] Aya Vision (#15441)
JenZhao Apr 1, 2025
a8a1c0d
[Model] Add module name prefixes to gemma3 (#15889)
cloud11665 Apr 1, 2025
632e7fc
[CI] Disable flaky structure decoding test temporarily. (#15892)
ywang96 Apr 1, 2025
bb13560
[V1][Metrics] Initial speculative decoding metrics (#15151)
markmc Apr 1, 2025
7d0c77b
[V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729)
WoosukKwon Apr 1, 2025
8aaa97c
[Docs] update usage stats language (#15898)
simon-mo Apr 1, 2025
3915484
[BugFix] make sure socket close (#15875)
yihong0618 Apr 1, 2025
dc8f2a3
[Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454)
ZZBoom Apr 1, 2025
1328c07
[Docs] Add Ollama meetup slides (#15905)
simon-mo Apr 1, 2025
28ea8c6
[Docs] Add Intel as Sponsor (#15913)
simon-mo Apr 2, 2025
e029f46
[Spec Decode] Fix input triton kernel for eagle (#15909)
ekagra-ranjan Apr 2, 2025
213cecc
[V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` (#15907)
b8zhong Apr 2, 2025
593412f
[Bugfix] Fix imports for MoE on CPU (#15841)
gau-nernst Apr 2, 2025
ce541f5
[V1][Minor] Enhance SpecDecoding Metrics Log in V1 (#15902)
WoosukKwon Apr 2, 2025
6d02986
[Doc] Update rocm.inc.md (#15917)
chun37 Apr 2, 2025
3c6260c
[V1][Bugfix] Fix typo in MoE TPU checking (#15927)
ywang96 Apr 2, 2025
3400a42
[Benchmark]Fix error message (#15866)
Potabk Apr 2, 2025
260bf45
[Misc] Replace print with logger (#15923)
chaunceyjiang Apr 2, 2025
16300be
[CI/Build] Further clean up LoRA tests (#15920)
jeejeelee Apr 2, 2025
29c90a6
[Bugfix] Fix cache block size calculation for CPU MLA (#15848)
gau-nernst Apr 2, 2025
41f00db
[Build/CI] Update lm-eval to 0.4.8 (#15912)
cthi Apr 2, 2025
fdd0e40
[Kernel] Add more dtype support for GGUF dequantization (#15879)
LukasBluebaum Apr 2, 2025
8276af7
[core] Add tags parameter to wake_up() (#15500)
erictang000 Apr 2, 2025
ccee0c3
[V1] Fix json_object support with xgrammar (#15488)
russellb Apr 2, 2025
742a24b
Add minimum version for `huggingface_hub` to enable Xet downloads (#1…
hmellor Apr 2, 2025
7584fba
[Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the Op…
b8zhong Apr 2, 2025
508e239
[CI] Remove duplicate entrypoints-test (#15940)
yankay Apr 2, 2025
76fb3b8
[Bugfix] Fix the issue where the model name is empty string, causing …
chaunceyjiang Apr 2, 2025
0c1a98c
[Metrics] Hide deprecated metrics (#15458)
markmc Apr 2, 2025
4cbcc51
[Frontend] Implement Tool Calling with `tool_choice='required'` (#13483)
meffmadd Apr 2, 2025
482e9e9
[CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
bigPYJ1151 Apr 2, 2025
e6a451b
[Model] use AutoWeightsLoader in model load_weights (#15770)
lengrongfu Apr 2, 2025
2ce1ad9
[Misc] V1 LoRA support CPU offload (#15843)
jeejeelee Apr 2, 2025
448fd9e
Restricted cmake to be less than version 4 as 4.x breaks the build of…
npanpaliya Apr 2, 2025
ac71778
[misc] instruct pytorch to use nvml-based cuda check (#15951)
youkaichao Apr 2, 2025
479223c
[V1] Support Mistral3 in V1 (#15950)
mgoin Apr 2, 2025
2a3f6fb
Fix `huggingface-cli[hf-xet]` -> `huggingface-cli[hf_xet]` (#15969)
hmellor Apr 2, 2025
cf6b741
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#1…
hyeygit Apr 3, 2025
5280c49
[TPU] optimize the all-reduce performance (#15903)
yaochengji Apr 3, 2025
ee357b7
[V1][TPU] Do not compile sampling more than needed (#15883)
NickLucche Apr 3, 2025
cd3e150
[ROCM][KERNEL] Paged attention for V1 (#15720)
maleksan85 Apr 3, 2025
fc59723
fix: better error message for get_config close #13889 (#15943)
yihong0618 Apr 3, 2025
ec96101
[bugfix] add seed in torchrun_example.py (#15980)
youkaichao Apr 3, 2025
99d507c
[ROCM][V0] PA kennel selection when no sliding window provided (#15982)
maleksan85 Apr 3, 2025
bcaf1c1
[Benchmark] Add AIMO Dataset to Benchmark (#15955)
StevenShi-23 Apr 3, 2025
e0cb32d
[misc] improve error message for "Failed to infer device type" (#15994)
youkaichao Apr 3, 2025
40fb323
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a back…
wwl2755 Apr 3, 2025
28cf2ea
[doc] update contribution link (#15922)
reidliu41 Apr 3, 2025
3e6ddbb
fix: tiny fix make format.sh excutable (#16015)
yihong0618 Apr 3, 2025
b662373
[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573)
kylesayrs Apr 3, 2025
2c2c0e1
[SupportsQuant] Chameleon, Chatglm, Commandr (#15952)
kylesayrs Apr 3, 2025
bfb1edf
[Neuron][kernel] Fuse kv cache into a single tensor (#15911)
liangfu Apr 3, 2025
453ee0d
[Minor] Fused experts refactor (#15914)
bnellnm Apr 3, 2025
7433003
[Misc][Performance] Advance tpu.txt to the most recent nightly torch …
yarongmu-google Apr 3, 2025
f2c55b2
Re-enable the AMD Testing for the passing tests. (#15586)
Alexei-V-Ivanov-AMD Apr 3, 2025
2a97e63
[TPU] Support sliding window and logit soft capping in the paged atte…
vanbasten23 Apr 3, 2025
8fe8826
[TPU] Switch Test to Non-Sliding Window (#15981)
robertgshaw2-redhat Apr 3, 2025
cfcd364
[Bugfix] Fix function names in test_block_fp8.py (#16033)
bnellnm Apr 3, 2025
7a9a9f4
[ROCm] Tweak the benchmark script to run on ROCm (#14252)
huydhn Apr 4, 2025
138b7a7
[Misc] improve gguf check (#15974)
reidliu41 Apr 4, 2025
80b338a
[TPU][V1] Remove ragged attention kernel parameter hard coding (#16041)
yaochengji Apr 4, 2025
757b2ed
doc: add info for macos clang errors (#16049)
yihong0618 Apr 4, 2025
8fac27c
[V1][Spec Decode] Avoid logging useless nan metrics (#16023)
markmc Apr 4, 2025
b74d450
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939)
jonghyunchoe Apr 4, 2025
37c557f
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945)
zhenwei-intel Apr 4, 2025
6ab7513
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995)
Isotr0py Apr 4, 2025
9a1deb0
[Benchmark][Doc] Update throughput benchmark and README (#15998)
StevenShi-23 Apr 4, 2025
a67be08
[CPU] Change default block_size for CPU backend (#16002)
bigPYJ1151 Apr 4, 2025
6146b93
[Distributed] [ROCM] Fix custom allreduce enable checks (#16010)
ilmarkov Apr 4, 2025
cd1add6
[ROCm][Bugfix] Use platform specific FP8 dtype (#15717)
gshtras Apr 4, 2025
e314193
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, b…
gshtras Apr 4, 2025
7e3a129
[Bugfix] Fix default behavior/fallback for pp in v1 (#16057)
mgoin Apr 4, 2025
00b0576
[CI] Reorganize .buildkite directory (#16001)
khluu Apr 4, 2025
d96927b
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queu…
njhill Apr 4, 2025
e26aaad
[V1] Scatter and gather placeholders in the model runner (#15712)
DarkLight1337 Apr 4, 2025
b01b6c4
Revert "[V1] Scatter and gather placeholders in the model runner" (#1…
ywang96 Apr 4, 2025
a1fbed6
[Kernel][Minor] Re-fuse triton moe weight application (#16071)
bnellnm Apr 4, 2025
e55a619
[Bugfix][TPU] Fix V1 TPU worker for sliding window (#16059)
mgoin Apr 4, 2025
87d1b97
[V1][Spec Decode] Update N-gram Proposer Interface (#15750)
WoosukKwon Apr 4, 2025
2f64ba2
[Misc] Auto detect bitsandbytes pre-quantized models (#16027)
tristanleclercq Apr 5, 2025
5d11672
[CI] Fix benchmark script level (#16089)
khluu Apr 5, 2025
e919b62
fix: support clang17 for macos and fix the real libomp (#16086)
yihong0618 Apr 5, 2025
c3ee3e8
[doc] fix 404 (#16082)
reidliu41 Apr 5, 2025
32d8fab
Revert "doc: add info for macos clang errors (#16049)" (#16091)
yihong0618 Apr 5, 2025
4650dec
Fix some capitalisations in generated examples doc titles (#16094)
hmellor Apr 5, 2025
544db76
[Misc] format output for encoder_decoder.py (#16095)
reidliu41 Apr 6, 2025
0d7ac17
[Misc] Remove redundant code (#16098)
chaunceyjiang Apr 6, 2025
94e02f8
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 en…
jinzhen-lin Apr 6, 2025
21bc099
[Model] use AutoWeightsLoader for phi, gemma, deepseek (#16088)
jonghyunchoe Apr 6, 2025
1836b82
[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (#16…
luccafong Apr 6, 2025
6f88773
[Benchmark] Add sampling parameters to benchmark_serving. (#16022)
hyeygit Apr 6, 2025
792c9a3
[Frontend] Fix typo in tool chat templates for llama3.2 and toolace (…
bjj Apr 6, 2025
ed870a6
[CI][V1] Fix passing `tokenizer` as kwarg to `validate_guidance_gramm…
ywang96 Apr 6, 2025
d17c534
[Misc] refactor example eagle (#16100)
reidliu41 Apr 6, 2025
aa05264
[Doc][Bugfix] Add missing EOF in k8s deploy doc (#16025)
psschwei Apr 6, 2025
e305582
[Misc] Improve model redirect to accept json dictionary (#16119)
Isotr0py Apr 6, 2025
78fe1d2
[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (#16103)
lengrongfu Apr 6, 2025
bb25eba
[Bugfix] LoRA : Fix the order in which the kernels process LoRAs (#1…
varun-sundar-rabindranath Apr 6, 2025
f01e5c5
fix: fixing typing deprecations issues
lulmer Apr 6, 2025
c4eb330
fix: fixing typing deprecations issues
lulmer Apr 6, 2025
28a5ba8
fix: fixing typing deprecations issues (3)
lulmer Apr 6, 2025
164128b
fix: fixing typing deprecations issues (4)
lulmer Apr 6, 2025
8994964
fix: running pre commits
lulmer Apr 6, 2025
3dc2641
fix: solving merge conflict due to rebase to comply with DCO
lulmer Apr 7, 2025
c36ac71
fix: adding pre commit for examples/offline_inference/mistral-small.py
lulmer Apr 7, 2025
3182aa2
Merge branch 'main' into llama_usr_defined_toolcall_template
lulmer Apr 22, 2025
3e102ce
fix: removing the accidental modification of the kv_cache_manager
lulmer Apr 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions examples/tool_chat_template_llama3.1_usr_def_tool_call.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
{{- bos_token }}
{%- if custom_tools is defined %}
{%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
{%- set tools_in_user_message = false %}
{%- endif %}
{%- if not date_string is defined %}
{%- set date_string = "26 Jul 2024" %}
{%- endif %}

{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
{%- set system_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{%- set system_message = "" %}
{%- endif %}

{#- System message + builtin tools #}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if builtin_tools is defined or tools is not none %}
{{- "Environment: ipython\n" }}
{%- endif %}
{%- if builtin_tools is defined %}
{{- "Tools: " + builtin_tools | reject('equalto', 'code_interpreter') | join(", ") + "\n\n"}}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023\n" }}
{{- "Today Date: " + date_string + "\n\n" }}

{%- if builtin_tools is defined %}
{{- "# Tool Instructions\n"}}
{{- "- Always execute python code in messages that you share.\n"}}
{{- "- When looking for real time information use relevant functions if available else fallback to brave_search\n\n\n"}}
{%- endif %}

{%- if tools is not none and not tools_in_user_message %}
{{- "You have access to the following functions:\n\n"}}

{%- for t in tools %}
{%- if t.function is defined %}
{%- set t = t.function %}
{%- endif -%}
{{- "Use the function '"+t.name+"' to: "+t.description+"\n"}}
{{- t | tojson(indent=4) }}
{{- "\n\n" }}
{%- endfor %}
{{- "If a you choose to call a function ONLY reply in the following format:\n"}}
{{- "<{start_tag}={function_name}>{parameters}{end_tag}\n" }}
{{- "where\n\n"}}
{{- "start_tag => `<function`\n" }}
{{- "parameters => a JSON dict with the function argument name as key and function argument value as value.\n"}}
{{- "end_tag => `</function>`" }}
{{- "\n\n" }}
{{- "Here is an example,\n"}}
{{- "<function=example_function_name>{\"example_name\": \"example_value\"}</function>"}}
{{- "\n\n" }}
{{- "Reminder:\n"}}
{{- "- Function calls MUST follow the specified format\n"}}
{{- "- Required parameters MUST be specified\n"}}
{{- "- Only call one function at a time\n"}}
{{- "- Put the entire function call reply on one line\n"}}
{{- "- Always use the information returned by the function to answer to the user\n"}}
{{- "- If there is no relevant function available, do NOT call any function: respond directly to the user\n\n"}}

{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
{#- Extract the first user message so we can plug it in here #}
{%- if messages | length != 0 %}
{%- set first_user_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
{%- endif %}
{{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
{{- "Given the following functions, please respond with a JSON for a function call " }}
{{- "with its proper arguments that best answers the given prompt.\n\n" }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.\n\n" }}
{%- for t in tools %}
{{- t | tojson }}
{{- "\n\n" }}
{%- endfor %}
{{- first_user_message + "<|eot_id|>"}}
{%- endif %}

{%- for message in messages %}
{%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
{%- elif 'tool_calls' in message %}
{%- if not message.tool_calls|length == 1 %}
{{- raise_exception("This model only supports single tool-calls at once!") }}
{%- endif %}
{%- set tool_call = message.tool_calls[0].function %}
{%- if builtin_tools is defined and tool_call.name in builtin_tools %}
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
{{- "<|python_tag|>" + tool_call.name + ".call(" }}
{%- for arg_name, arg_val in tool_call.arguments | items %}
{{- arg_name + '="' + arg_val + '"' }}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- ")" }}
{%- else %}
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
{{- '<function=' + tool_call.name + '>' + tool_call.arguments + '</function>'}}
{%- endif %}
{%- if builtin_tools is defined or tools is not none%}
{#- This means we're in ipython mode #}
{{- "<|eom_id|>" }}
{%- else %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- elif message.role == "tool" or message.role == "ipython" %}
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
{%- if message.content is mapping or message.content is iterable %}
{{- message.content | tojson }}
{%- else %}
{{- message.content }}
{%- endif %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}
4 changes: 3 additions & 1 deletion vllm/entrypoints/openai/tool_parsers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from .internlm2_tool_parser import Internlm2ToolParser
from .jamba_tool_parser import JambaToolParser
from .llama_tool_parser import Llama3JsonToolParser
from .llama_usr_defined_tool_parser import Llama3UserDefinedCustomToolParser
from .mistral_tool_parser import MistralToolParser
from .phi4mini_tool_parser import Phi4MiniJsonToolParser
from .pythonic_tool_parser import PythonicToolParser
Expand All @@ -15,5 +16,6 @@
"ToolParser", "ToolParserManager", "Granite20bFCToolParser",
"GraniteToolParser", "Hermes2ProToolParser", "MistralToolParser",
"Internlm2ToolParser", "Llama3JsonToolParser", "JambaToolParser",
"PythonicToolParser", "Phi4MiniJsonToolParser"
"PythonicToolParser", "Llama3UserDefinedCustomToolParser",
"Phi4MiniJsonToolParser"
]
Loading