vllm-project / vllm-gaudi Public

Notifications You must be signed in to change notification settings
Fork 59
Star 15

Code
Issues 1
Pull requests 61
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm-gaudi

Labels 12 Milestones 0

New pull request New

61 Open 443 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[SW-243111] Add correctors for decode buckets

#509 opened Oct 31, 2025 by jbyczkow

Loading…

[New Feature] Add cpu core pinning to vllm-server to improve performance.

#502 opened Oct 29, 2025 by louie-tsai

Loading…

Port: Fix bucketing of query + num_blocks neighbor expansion #350, #355

#500 opened Oct 29, 2025 by iboiko-habana

Loading…

DP: allreduce on the host

#498 opened Oct 29, 2025 by xinyu-intel

Loading…

Udpate TESTOWNERS

#495 opened Oct 28, 2025 by jbyczkow

Loading…

Documentation updates - part 1 documentation

Improvements or additions to documentation

skip-gaudi-tests

#493 opened Oct 28, 2025 by mhelf-intel

Loading…

Initial Commit GPT-OSS

#485 opened Oct 28, 2025 by hlahkar

Loading…

[SW-242794] Fix not warmed up decode buckets

#484 opened Oct 28, 2025 by jbyczkow • Draft

[SW-242523] Support per-tensor FP8 scaling

#483 opened Oct 27, 2025 by skavulya

Loading…

Port "[Bugfix] Fix bucketing of query + num_blocks neighbor expansion" #350

#482 opened Oct 27, 2025 by iboiko-habana

Loading…

Fix docker cmdlines for v0.10.2_next workarounds

#477 opened Oct 25, 2025 by nngokhale

Loading…

[Attention Metadata Overhaul 1/N] Add per-layer attention metadata

#475 opened Oct 24, 2025 by kzawora-intel • Draft

Simplify requirements

#458 opened Oct 23, 2025 by pawel-olejniczak

Loading…

Add tests for custom operator implementation correctness

#457 opened Oct 23, 2025 by Kacper-Pietkun

Loading…

Fix requirements filtering in HPU Dockerfiles

#455 opened Oct 23, 2025 by jakub-sochacki

Loading…

Enable triangular mask with valid_seq_lengths

#454 opened Oct 23, 2025 by kamil-kaczor

Loading…

Remove VLLM_DELAYED_SAMPLING

#433 opened Oct 21, 2025 by xwu-intel

Loading…

Automatically adjust VLLM_DECODE_BLOCK_BUCKET_MIN if it exceeds max_blocks

#432 opened Oct 20, 2025 by dsocek

Loading…

enable gdr on 10.2 baseline

#431 opened Oct 20, 2025 by hsubramony • Draft

Fix for Llama4 static quantization

#430 opened Oct 20, 2025 by vidyasiv

Loading…

Implementing softmax kernels in partial_attn_unique

#429 opened Oct 20, 2025 by ksmusz • Draft

multimodal support for unified attn

#423 opened Oct 19, 2025 by attafosu

Loading…

initial port for heterogenous support

#420 opened Oct 17, 2025 by hsubramony • Draft

Fix docker cmdlines for v.0.11.0 work arounds

#417 opened Oct 17, 2025 by nngokhale

Loading…

Add async scheduling for unified attention

#414 opened Oct 16, 2025 by tianmu-li

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!