Skip to content

[feat] Introduce platform-specific sparse trigger thresholds for GPU and NPU#762

Merged
ygwpz merged 3 commits intoModelEngine-Group:developfrom
wangwenxin0312:dev_gsa_opt
Mar 2, 2026
Merged

[feat] Introduce platform-specific sparse trigger thresholds for GPU and NPU#762
ygwpz merged 3 commits intoModelEngine-Group:developfrom
wangwenxin0312:dev_gsa_opt

Conversation

@wangwenxin0312
Copy link
Contributor

@wangwenxin0312 wangwenxin0312 commented Feb 27, 2026

Purpose

This PR introduces platform-specific sparse triggering thresholds for GSAOnDevice.

Modifications

  1. Add platform-specific configuration fields. In GSAOnDeviceConfig:
  • gpu_seq_len_threshold
  • gpu_concurrency_threshold
  • npu_seq_len_threshold
  • npu_concurrency_threshold
  1. Update sparse gating logic in build_sparse_meta

Test

python examples/offline_inference_gsaondevice.py
image

@wangwenxin0312 wangwenxin0312 force-pushed the dev_gsa_opt branch 6 times, most recently from e27e098 to 149a879 Compare March 2, 2026 01:56
Infinite666
Infinite666 previously approved these changes Mar 2, 2026
@ygwpz ygwpz merged commit de27651 into ModelEngine-Group:develop Mar 2, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants