Skip to content

Retrieve the correct cached model batch size in Neuron config checker for Neuron Backend #3300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jimburtoft
Copy link

@jimburtoft jimburtoft commented Jul 19, 2025

The Neuron config checker will look at each of the cached models to find any that meet the specified criteria. Specifically, they will DROP for anything with:
Any tp_degree greater than specified:

if neuron_config_dict["tp_degree"] > available_cores:

(leaving anything < or =)

A batch size LESS THAN what was specified:

if batch_size is not None and neuron_config_dict["batch_size"] < int(batch_size):

(leaving anything > or =)

It then sorts anything that that isn't dropped LARGEST to smallest (on both batch size and tp):

return -dictionary["tp_degree"], -dictionary["batch_size"]

Then, it selects the top one on the list to use:

This works great for tp_degree, where you (probably) don't want to run a tp=2 model on a system with 8 cores. But if you specify a batch size of 4, you probably want a batch size of 4, not one of 8.

What does this PR do?

It changes the sort order for batch size so that the smallest valid batch will be first. Since we are looking for something not <, the SMALLEST thing on that list is >= .

No new tests because it should be covered by existing tests.

Fixes # (issue)
Fixing #3299

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x ] Did you read the contributor guideline,
    Pull Request section?
  • [ x] Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@jimburtoft jimburtoft changed the title Retrieve the correct cached model in Neuron config checker for Neuron Backend Retrieve the correct cached model batch size in Neuron config checker for Neuron Backend Jul 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant