Skip to content

Navigation Menu

Appearance settings

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 9.3k
Star 54.9k

Code
Issues 1.8k
Pull requests 933
Discussions
Actions
Projects 12
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

[Kernel] Enable FP16 and BF16 CUTLASS MoE kernels #15932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

ElizaWszola wants to merge 26 commits into vllm-project:main

base: main

Choose a base branch

Loading

Loading

from neuralmagic:cutlass-moe-bf16-weights

+1,647 −803

Conversation 35 Commits 26 Checks 4 Files changed 23

Format, cleanup

d5995b2

Select commit

Loading

Failed to load commit list.

Uh oh!

There was an error while loading. Please reload this page.

Open

[Kernel] Enable FP16 and BF16 CUTLASS MoE kernels #15932

Format, cleanup

d5995b2

Select commit

Loading

Failed to load commit list.

Uh oh!

There was an error while loading. Please reload this page.

Mergify

Summary

DCO

DCO

pre-commit on: pull_request

pre-commit

Lint and Deploy Charts on: pull_request

lint-and-deploy

Mergify / Summary succeeded Aug 11, 2025 in 1s

4 rules match and 16 potential rules

Rule: label-documentation (label)

any of:
- files~=^[^/]+\.md$
- files~=^docs/
- files~=^examples/

✅ Rule: label-ci-build (label)

any of:
- files=CMakeLists.txt
- files=setup.py
- files~=\.buildkite/
- files~=^\.github/
- files~=^cmake/
- files~=^docker/Dockerfile
- files~=^requirements.*\.txt

Rule: label-deepseek (label)

any of:
- files~=^examples/.*deepseek.*\.py
- files~=^tests/.*deepseek.*\.py
- files~=^vllm/entrypoints/openai/tool_parsers/.*deepseek.*\.py
- files~=^vllm/model_executor/models/.*deepseek.*\.py
- files~=^vllm/reasoning/.*deepseek.*\.py
- files~=^vllm/transformers_utils/.*deepseek.*\.py
- title~=(?i)DeepSeek

Rule: label-frontend (label)

files~=^vllm/entrypoints/

Rule: label-llama (label)

any of:
- files~=^examples/.*llama.*\.py
- files~=^tests/.*llama.*\.py
- files~=^vllm/entrypoints/openai/tool_parsers/llama.*\.py
- files~=^vllm/model_executor/models/.*llama.*\.py
- files~=^vllm/transformers_utils/configs/.*llama.*\.py
- title~=(?i)llama

Rule: label-multi-modality (label)

any of:
- files=tests/models/test_vision.py
- files~=^tests/models/multimodal/
- files~=^tests/multimodal/
- files~=^vllm/multimodal/

Rule: label-new-model (label)

all of:
- files=vllm/model_executor/models/registry.py
- files~=^vllm/model_executor/models/

✅ Rule: label-performance (label)

any of:
- files~=^benchmarks/
- files~=^\.buildkite/nightly-benchmarks/
- files~=^tests/benchmarks/
- files~=^vllm/benchmarks/

Rule: label-qwen (label)

any of:
- files~=^examples/.*qwen.*\.py
- files~=^tests/.*qwen.*\.py
- files~=^vllm/model_executor/models/.*qwen.*\.py
- files~=^vllm/reasoning/.*qwen.*\.py
- title~=(?i)Qwen

Rule: label-gpt-oss (label)

any of:
- files~=^examples/.*gpt[-_]?oss.*\.py
- files~=^tests/.*gpt[-_]?oss.*\.py
- files~=^vllm/model_executor/layers/.*gpt[-_]?oss.*\.py
- files~=^vllm/model_executor/models/.*gpt[-_]?oss.*\.py
- title~=(?i)gpt[-_]?oss

Rule: label-rocm (label)

any of:
- files=vllm/platforms/rocm.py
- files~=^csrc/rocm/
- files~=^docker/Dockerfile.rocm
- files~=^requirements/rocm.*\.txt
- files~=^tests/kernels/.*_rocm.*\.py
- files~=^vllm/attention/backends/rocm.*\.py
- files~=^vllm/attention/ops/rocm.*\.py
- files~=^vllm/model_executor/layers/fused_moe/rocm.*\.py
- files~=^vllm/v1/attention/backends/mla/rocm.*\.py
- title~=(?i)AMD
- title~=(?i)ROCm

Rule: label-structured-output (label)

any of:
- files=benchmarks/benchmark_serving_structured_output.py
- files=benchmarks/run_structured_output_benchmark.sh
- files=docs/features/structured_outputs.md
- files=examples/offline_inference/structured_outputs.py
- files=examples/online_serving/openai_chat_completion_structured_outputs.py
- files=examples/online_serving/openai_chat_completion_structured_outputs_with_reasoning.py
- files=tests/v1/entrypoints/llm/test_guided_generate.py
- files~=^benchmarks/structured_schemas/
- files~=^tests/v1/structured_output/
- files~=^vllm/v1/structured_output/

Rule: label-speculative-decoding (label)

any of:
- files=vllm/model_executor/models/mlp_speculator.py
- files~=^examples/.*(spec_decode|mlpspeculator|eagle|speculation).*\.py
- files~=^tests/v1/spec_decode/
- files~=^vllm/model_executor/models/.*eagle.*\.py
- files~=^vllm/transformers_utils/configs/(eagle|medusa|mlp_speculator)\.py
- files~=^vllm/v1/spec_decode/

Rule: label-v1 (label)

any of:
- files~=^tests/v1/
- files~=^vllm/v1/

Rule: label-tpu (label)

any of:
- files~=/tpu/
- files~=_tpu
- files~=pallas
- files~=tpu.py
- files~=tpu_

✅ Rule: label-tpu-remove (label)

all of:
- -files~=/tpu/
- -files~=_tpu
- -files~=pallas
- -files~=tpu.py
- -files~=tpu_

Rule: label-tool-calling (label)

any of:
- files=docs/features/tool_calling.md
- files=examples/offline_inference/chat_with_tools.py
- files=examples/online_serving/openai_chat_completion_client_with_tools.py
- files=examples/online_serving/openai_chat_completion_client_with_tools_required.py
- files=examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py
- files=tests/entrypoints/openai/test_chat_with_tool_reasoning.py
- files~=^examples/tool_chat_*
- files~=^tests/entrypoints/openai/tool_parsers/
- files~=^tests/mistral_tool_use/
- files~=^tests/tool_use/
- files~=^vllm/entrypoints/openai/tool_parsers/

✅ Rule: ping author on conflicts and add 'needs-rebase' label (comment, label)

-closed
conflict

Rule: assign reviewer for tensorizer changes (assign)

files~=^tests/entrypoints/openai/test_tensorizer_entrypoint.py
files~=^tests/tensorizer_loader/
files~=^vllm/model_executor/model_loader/tensorizer.py
files~=^vllm/model_executor/model_loader/tensorizer_loader.py

Rule: remove 'needs-rebase' label when conflict is resolved (label)

-conflict
-closed

Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

@Mergifyio refresh will re-evaluate the rules
@Mergifyio rebase will rebase this PR on its base branch
@Mergifyio update will merge the base branch into this PR
@Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

look at your merge queues
generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

View more details on Mergify

Loading

Re-running checks...

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.