added loading util for specific layers #144

shanjiaz · 2025-10-06T18:17:24Z

We were loading the full verifier model only to get the embeddings, specific layers. This util helps saving memory by loading only the shards we needed and returns a dict of layer name -> layers.

Tested locally:

model_path = "shanjiaz/Meta-Llama-3-8B-Instruct-FP8-BLOCK"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")

model_path = "/home/hzhao/.cache/huggingface/hub/models--shanjiaz--Meta-Llama-3-8B-Instruct-FP8-BLOCK/snapshots/ea6d7c1a6a0874d9db6511ce93da2b777f24376f"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")

2025-10-06 14:02:21.042 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.043 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.044 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)
2025-10-06 14:02:52.385 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.468 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.508 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)

Signed-off-by: shanjiaz <[email protected]>

github-actions · 2025-10-06T18:19:56Z

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/18291068632/artifacts/4195955811.
They will be retained for up to 30 days.
Commit: e251fc9

Signed-off-by: shanjiaz <[email protected]>

added loading util for specific layers

4ef4362

Signed-off-by: shanjiaz <[email protected]>

shanjiaz added 4 commits October 6, 2025 14:34

quality

47b921b

Signed-off-by: shanjiaz <[email protected]>

end of file

405b9b1

Signed-off-by: shanjiaz <[email protected]>

type

9affbbc

Signed-off-by: shanjiaz <[email protected]>

type

e251fc9

Signed-off-by: shanjiaz <[email protected]>

shanjiaz marked this pull request as ready for review October 6, 2025 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added loading util for specific layers #144

added loading util for specific layers #144

Uh oh!

shanjiaz commented Oct 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

added loading util for specific layers #144

Are you sure you want to change the base?

added loading util for specific layers #144

Uh oh!

Conversation

shanjiaz commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shanjiaz commented Oct 6, 2025 •

edited

Loading

github-actions bot commented Oct 6, 2025 •

edited

Loading