fix: security hardening — unsafe deserialization, RCE, input validation by devatsecure · Pull Request #263 · microsoft/VibeVoice

devatsecure · 2026-03-28T05:13:33Z

Summary

Security hardening addressing 6 vulnerabilities found during automated security review using multiple AI models and static analysis.

Changes (6 fixes across 4 files)

1. `torch.load` → `weights_only=True` (CWE-502 — Medium)

File	Line
`demo/web/app.py`	161
`demo/realtime_model_inference_from_file.py`	225
`vibevoice/scripts/convert_nnscaler_checkpoint_to_transformers.py`	32

weights_only=False allows arbitrary code execution via pickle deserialization in malicious .pt files. Changed to weights_only=True to load only tensor data.

Note: If voice preset .pt files contain non-tensor objects, this may need adjustment. Please verify with your voice preset files.

2. `--trust-remote-code` made opt-in (CWE-94 — Medium)

Removed from default vLLM command in _build_vllm_cmd()
Added --trust-remote-code CLI flag (default: False)

The default model (microsoft/VibeVoice-ASR) is from a trusted source, but hardcoding --trust-remote-code is a risk if users point --model at untrusted repos.

3. `shell=True` removed (CWE-78 — Low)

run_command() now always uses shell=False
All current callers pass command lists, so shell=True was unnecessary

4. WebSocket text size limit (CWE-770 — Low)

Added 10K character limit on /stream endpoint text parameter
Returns WebSocket close code 1008 for oversized inputs
Prevents excessive GPU inference from large text inputs

5. vLLM server API key authentication (CWE-306 — Medium)

Added --api-key CLI flag and VLLM_API_KEY env var support
Propagated through _build_vllm_cmd, start_vllm_server, start_dp_server
Without auth, anyone on the network can access the inference API

6. Config path traversal guard (CWE-22 — Low)

Sanitized init_config_name in checkpoint conversion script using Path.name + validation
Prevents ../ sequences from escaping the configs/ directory
Config name originates from untrusted checkpoint data

Test Plan

Verify voice preset loading works with weights_only=True
Verify server starts correctly without --trust-remote-code (may need the flag for custom model architectures)
Verify run_command() callers work without shell=True
Verify WebSocket connections with text > 10K chars are rejected
Verify --api-key / VLLM_API_KEY enables token-based auth on vLLM endpoints
Verify checkpoint conversion works with path sanitization
Verify normal TTS generation still works end-to-end

References

Found by Argus Security — AI-powered 6-phase security pipeline.

…on, input validation 1. torch.load(weights_only=False) → weights_only=True (CWE-502) - demo/web/app.py: voice preset loading - demo/realtime_model_inference_from_file.py: voice sample loading - vibevoice/scripts/convert_nnscaler_checkpoint_to_transformers.py: checkpoint loading Prevents arbitrary code execution via malicious .pt/.pth files. 2. --trust-remote-code made opt-in instead of default (CWE-94) - vllm_plugin/scripts/start_server.py: removed from default vLLM command - Added --trust-remote-code CLI flag (default: False) Prevents automatic execution of remote Python code from model repositories. 3. subprocess shell=True removed (CWE-78) - vllm_plugin/scripts/start_server.py: run_command() now always uses shell=False Eliminates command injection vector. 4. WebSocket text size limit added (CWE-770) - demo/web/app.py: 10K char limit on /stream endpoint text parameter Prevents denial of service via excessive GPU inference from oversized inputs. Found by: Argus Security (https://github.com/devatsecure/Argus-Security)

devatsecure · 2026-03-28T05:15:51Z

@microsoft-github-policy-service agree

5. Missing vLLM server authentication (CWE-306) - Added --api-key CLI flag and VLLM_API_KEY env var support - Propagated through _build_vllm_cmd, start_vllm_server, start_dp_server Without auth, anyone on the network can access the inference API. 6. Config path traversal in checkpoint conversion (CWE-22) - Sanitized init_config_name using Path.name + validation - Prevents "../" sequences from escaping the configs directory Config name comes from untrusted checkpoint data. Found by: Argus Security (https://github.com/devatsecure/Argus-Security)

alondai mentioned this pull request Mar 30, 2026

docs: add security note for public ASR demos #272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: security hardening — unsafe deserialization, RCE, input validation#263

fix: security hardening — unsafe deserialization, RCE, input validation#263
devatsecure wants to merge 2 commits intomicrosoft:mainfrom
devatsecure:fix/security-hardening

devatsecure commented Mar 28, 2026 •

edited

Loading

Uh oh!

devatsecure commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devatsecure commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes (6 fixes across 4 files)

1. torch.load → weights_only=True (CWE-502 — Medium)

2. --trust-remote-code made opt-in (CWE-94 — Medium)

3. shell=True removed (CWE-78 — Low)

4. WebSocket text size limit (CWE-770 — Low)

5. vLLM server API key authentication (CWE-306 — Medium)

6. Config path traversal guard (CWE-22 — Low)

Test Plan

References

Uh oh!

devatsecure commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devatsecure commented Mar 28, 2026 •

edited

Loading

1. `torch.load` → `weights_only=True` (CWE-502 — Medium)

2. `--trust-remote-code` made opt-in (CWE-94 — Medium)

3. `shell=True` removed (CWE-78 — Low)