Skip to content

Conversation

shanjiaz
Copy link
Collaborator

@shanjiaz shanjiaz commented Oct 6, 2025

We were loading the full verifier model only to get the embeddings, specific layers. This util helps saving memory by loading only the shards we needed and returns a dict of layer name -> layers.

Tested locally:

model_path = "shanjiaz/Meta-Llama-3-8B-Instruct-FP8-BLOCK"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")
model_path = "/home/hzhao/.cache/huggingface/hub/models--shanjiaz--Meta-Llama-3-8B-Instruct-FP8-BLOCK/snapshots/ea6d7c1a6a0874d9db6511ce93da2b777f24376f"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")
2025-10-06 14:02:21.042 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.043 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.044 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)
2025-10-06 14:02:52.385 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.468 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.508 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)

Copy link

github-actions bot commented Oct 6, 2025

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/18291068632/artifacts/4195955811.
They will be retained for up to 30 days.
Commit: e251fc9

Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
@shanjiaz shanjiaz marked this pull request as ready for review October 6, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant