Skip to content

Conversation

@dacorvo
Copy link
Collaborator

@dacorvo dacorvo commented Nov 28, 2025

What does this PR do?

This pull-request adds utilities to read and log neuron device memory usage.
This pull request also refactors and simplifies the sharding logic for attention heads and key/value heads across the codebase, improves logging for cache allocation and head adjustment.
Finally it cleans up unused or redundant code in the Makefile, adding an install target and using uv whenever possible.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds utilities for reading and logging Neuron device memory usage, refactors the attention head sharding logic to be more straightforward, and cleans up the Makefile to better utilize uv for package management.

Key changes:

  • Introduces neuron_device_memory.py module to read device memory statistics from Neuron sysfs
  • Simplifies GQA sharding strategy determination by eliminating the desired_sharding_strategy parameter and consolidating logic into a single get_shardable_head_counts() function
  • Removes unused parameters (scatter_index, kvcache_buffer, inputs_embeds) from the decoder and KV cache manager interfaces
  • Adds logging for KV cache allocation and head count adjustments
  • Updates Makefile with new install target and better uv integration

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
optimum/neuron/version.py Bumps development version from 0.4.3.dev0 to 0.4.3.dev1
optimum/neuron/utils/neuron_device_memory.py Adds new utility module to read and format Neuron device memory usage from sysfs
optimum/neuron/models/inference/backend/pretrained_model.py Adds device memory logging after weight initialization
optimum/neuron/models/inference/backend/modules/kvcache/kv_cache_manager.py Refactors to accept actual KV head count directly, removes unused parameters, adds cache size logging
optimum/neuron/models/inference/backend/modules/decoder/modeling_decoder.py Integrates refactored head count logic, removes unused parameters and methods
optimum/neuron/models/inference/backend/modules/attention/gqa.py Consolidates sharding strategy logic into get_shardable_head_counts(), removes desired_sharding_strategy parameter, adds logging for head adjustments
Makefile Removes unused targets, adds install target with uv support, simplifies build command

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Installation
install: $(PACKAGE_DIST)
# Force CPU version of torch to speed up installation time
$(UV) pip install --upgrade $(PACKAGE_DIST)[neuronx] \
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The install target unconditionally uses $(UV) without checking if it exists. If uv is not installed, this will fail. Consider adding a check similar to line 41, or use $(if $(UV), $(UV) pip, python -m pip) to fall back to standard pip when uv is not available.

Suggested change
$(UV) pip install --upgrade $(PACKAGE_DIST)[neuronx] \
$(if $(UV),$(UV) pip,python -m pip) install --upgrade $(PACKAGE_DIST)[neuronx] \

Copilot uses AI. Check for mistakes.
Comment on lines 53 to 57
try:
with open(file_path, "r") as f:
return f.read().strip()
except (FileNotFoundError, PermissionError) as e:
raise e
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling in lines 56-57 catches exceptions only to re-raise them immediately. This serves no purpose and should be simplified to just let the exceptions propagate naturally by removing the try-except block entirely.

Suggested change
try:
with open(file_path, "r") as f:
return f.read().strip()
except (FileNotFoundError, PermissionError) as e:
raise e
with open(file_path, "r") as f:
return f.read().strip()

Copilot uses AI. Check for mistakes.
for category in DEVICE_MEM_CATEGORIES:
try:
categories[category] = read_memory_counter(device_mem_path, category)
except Exception:
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a bare except Exception: is too broad and will silently skip categories that fail for any reason, making debugging difficult. Consider either logging the exception or being more specific about which exceptions to catch (e.g., FileNotFoundError, PermissionError, ValueError).

Suggested change
except Exception:
except (FileNotFoundError, PermissionError, ValueError):

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the comments, plus I would add a constant EXPECTED_EXCEPTIONS = (FileNotFoundError, PermissionError, ValueError) to avoid repeating over and over

weights = sharder.shard_checkpoint()
start_rank_tensor = torch.tensor([start_rank_id], dtype=torch.int32, device="cpu")
self._traced_model.nxd_model.initialize(weights, start_rank_tensor)
logger.info(str(get_neuron_device_memory()))
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to get_neuron_device_memory() will raise a RuntimeError if the Neuron sysfs is not found. This could cause weight loading to fail even when weights are loaded successfully. Consider wrapping this in a try-except block to log the memory usage only when available, or handle the case when Neuron driver is not present gracefully.

Suggested change
logger.info(str(get_neuron_device_memory()))
try:
logger.info(str(get_neuron_device_memory()))
except RuntimeError as e:
logger.info(f"Neuron device memory information unavailable: {e}")

Copilot uses AI. Check for mistakes.
Comment on lines 120 to 123
:param position_ids: tensor of size (batch_sz, seq_len)
:param new_key_values: list of tuple, the latest kv obtained at the end of the network from forward pass
:param seq_len: sequence length
:param scatter_index: tensor representing index to update
:param active_mask: tensor representing index to update
:param kvcache_buffer: if passed key states are updates to this buffer.
kvcache_buffer is 2D list where, 1st dim for layer and the second denotes K and V.
For example,
kvcache_buffer[1][0] is the K cache of the 1st layer
kvcache_buffer[4][1] is the V cache of the 4th layer
:return: list of tuple of (K, V)
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring references a removed parameter scatter_index at line 117. This outdated documentation should be removed as scatter_index is no longer a parameter of the update_cache method.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@tengomucho tengomucho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Few comments, otherwise seems good!

SHELL := /bin/bash
CURRENT_DIR = $(shell pwd)
UV = $(shell command -v uv)
UV_ARGS = $(if $(UV),--index-strategy unsafe-best-match)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already assume a given system is available (the AMI for example). If that is the case, I think we could install uv there and just use it, without adding additional checks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv is not installed by default on the AWS DLAMI, which I personally often use when bumping versions.

for category in DEVICE_MEM_CATEGORIES:
try:
categories[category] = read_memory_counter(device_mem_path, category)
except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the comments, plus I would add a constant EXPECTED_EXCEPTIONS = (FileNotFoundError, PermissionError, ValueError) to avoid repeating over and over

@dacorvo dacorvo force-pushed the device_memory_logs branch 2 times, most recently from a90fe5d to 203b5b4 Compare December 1, 2025 09:25
@dacorvo dacorvo force-pushed the device_memory_logs branch from 203b5b4 to 0b97e83 Compare December 1, 2025 10:01
@dacorvo dacorvo requested review from Copilot and tengomucho December 1, 2025 10:40
Copilot finished reviewing on behalf of dacorvo December 1, 2025 10:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +133 to +134
except Exception:
# Skip categories that can't be read
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except Exception: on line 133 silently catches all exceptions. Consider logging a warning when a category cannot be read, or at least being more specific about which exceptions to catch (e.g., OSError, ValueError) to avoid masking unexpected errors.

Suggested change
except Exception:
# Skip categories that can't be read
except (OSError, ValueError) as e:
logger.warning(f"Could not read memory category '{category}' in {device_mem_path}: {e}")

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@tengomucho tengomucho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits, otherwise LGTM!

@@ -0,0 +1,12 @@
import pytest

from optimum.neuron.utils.neuron_device_memory import get_neuron_device_memory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really a decoder test, but I understand it had to be placed somewhere

assert neuron_device_memory is not None
assert len(neuron_device_memory.devices) == get_neuron_devices_count()
assert neuron_device_memory.get_total_memory() > 0
assert neuron_device_memory.get_total_memory() >= 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is it going to be 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the CI, it looks like it is zero when no other tests have been run

@dacorvo dacorvo merged commit 433407f into main Dec 1, 2025
10 of 11 checks passed
@dacorvo dacorvo deleted the device_memory_logs branch December 1, 2025 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants