Skip to content

Conversation

noemotiovon
Copy link
Collaborator

What does this PR do?

  • Added a check to skip aclrtSetDevice if the current device is already set.
  • Prevents unnecessary context switches while keeping thread/device consistency.

@noemotiovon noemotiovon added Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 11, 2025
@noemotiovon
Copy link
Collaborator Author

noemotiovon commented Sep 11, 2025

Qwen2.5-0.5B Model Inference Test in 2 NPUs

......
llama_perf_sampler_print:    sampling time =      40.70 ms /   175 runs   (    0.23 ms per token,  4299.54 tokens per second)
llama_perf_context_print:        load time =    7582.66 ms
llama_perf_context_print: prompt eval time =      26.60 ms /    20 tokens (    1.33 ms per token,   751.94 tokens per second)
llama_perf_context_print:        eval time =     709.67 ms /   154 runs   (    4.61 ms per token,   217.00 tokens per second)
llama_perf_context_print:       total time =    1885.73 ms /   174 tokens
llama_perf_context_print:    graphs reused =        153

Copy link
Collaborator

@hipudding hipudding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@noemotiovon noemotiovon force-pushed the set_device_opti branch 2 times, most recently from b711387 to 4a99538 Compare September 13, 2025 09:25
- Added a check to skip aclrtSetDevice if the current device is already set.
- Prevents unnecessary context switches while keeping thread/device consistency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants