-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] fused_experts_impl wrong compute type for float32
#11921
opened Jan 10, 2025 by
shaochangxu
Loading…
[Misc] Update benchmark_prefix_caching.py fixed example usage
#11920
opened Jan 10, 2025 by
remimin
Loading…
[Doc] links Tensorizer example
documentation
Improvements or additions to documentation
#11918
opened Jan 10, 2025 by
guspan-tanadi
Loading…
[Doc] Correct the spelling of GitHub
documentation
Improvements or additions to documentation
#11915
opened Jan 10, 2025 by
Yaminyam
Loading…
[V1] APC + prompt logprobs unsupported (PR 2/N for v1 sample and prompt logprobs support)
#11910
opened Jan 10, 2025 by
afeldman-nm
•
Draft
[FP8][Kernel] Dynamic kv cache scaling factors computation
documentation
Improvements or additions to documentation
#11906
opened Jan 9, 2025 by
gshtras
Loading…
[Bugfix] support to run partially 2:4 model with CompressedTensors24 scheme
#11889
opened Jan 9, 2025 by
jiangjiadi
Loading…
Add
device
as parameter to TP and rotary_embedding functions
#11888
opened Jan 9, 2025 by
chunyuan-w
•
Draft
[optimization] remove python function call for custom activation op
ready
ONLY add when PR is ready to merge/full CI is needed
#11885
opened Jan 9, 2025 by
cennn
Loading…
[CI] Add auto update workflow for Dockerfile graph
ci/build
#11879
opened Jan 9, 2025 by
WineChord
Loading…
Updating the high performance vllm docker for AMD Rocm.
documentation
Improvements or additions to documentation
#11877
opened Jan 9, 2025 by
haic0
Loading…
[Hardware][Gaudi] Support loading checkpoints quantized using Autofp8
#11869
opened Jan 9, 2025 by
zhenwei-intel
•
Draft
[WIP][Kernel] Update
cutlass_scaled_mm
to support 2d group (blockwise) scaling
ci/build
#11868
opened Jan 8, 2025 by
LucasWilkinson
•
Draft
3 tasks
[Spec Decode] Add Script for converting HF Eagle checkpoint to vLLM compatible checkpoint
documentation
Improvements or additions to documentation
#11866
opened Jan 8, 2025 by
sroy745
Loading…
[CI/Build] Add markdown linter
ci/build
documentation
Improvements or additions to documentation
#11857
opened Jan 8, 2025 by
rafvasq
Loading…
[Bugfix] Fix start_idx for computing slot mapping to avoid uninitiali…
#11851
opened Jan 8, 2025 by
ShawnD200
Loading…
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support
ci/build
#11844
opened Jan 8, 2025 by
sighingnow
Loading…
[Hardware][CPU] Support MOE models on x86 CPU
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
x86 CPU
#11831
opened Jan 8, 2025 by
bigPYJ1151
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.