fix: load voice presets with a restricted unpickler (CWE-502) by officialasishkumar · Pull Request #397 · microsoft/VibeVoice

officialasishkumar · 2026-05-25T00:23:51Z

Summary

The voice-preset loaders in demo/web/app.py and demo/realtime_model_inference_from_file.py crash on startup, making the streaming web demo and the realtime file demo unusable on PyTorch >= 2.6 (App/Demo exception since fix: use weights_only=True (CWE-502) #392).
Load the presets through a restricted pickle.Unpickler that keeps the CWE-502 protection while actually being able to reconstruct the objects.

Problem

Commit 303b283 changed both loaders to torch.load(..., weights_only=True) wrapped in torch.serialization.safe_globals([BaseModelOutputWithPast, DynamicCache]) to close a CWE-502 risk (arbitrary code execution from a tampered .pt).

The presets are dicts of transformers.modeling_outputs.BaseModelOutputWithPast objects (each holding a last_hidden_state tensor and a DynamicCache). PyTorch's weights-only unpickler refuses the SETITEMS opcode on dict subclasses — it accepts only the exact dict, OrderedDict and Counter types, even when the subclass is allowlisted via safe_globals. So the load can never succeed and aborts on startup with:

_pickle.UnpicklingError: Weights only load failed. ...
Can only SETITEMS for dict, collections.OrderedDict, collections.Counter,
but got <class 'transformers.modeling_outputs.BaseModelOutputWithPast'>

BaseModelOutputWithPast has to stay an object — the model accesses outputs.past_key_values and outputs.last_hidden_state — so the presets cannot simply be flattened to plain tensors.

Fix

Add a shared load_voice_preset helper in vibevoice/processor/vibevoice_streaming_processor.py (imported by both demos) that loads via a restricted pickle.Unpickler:

find_class resolves only the container classes the presets are built from (OrderedDict, BaseModelOutputWithPast, DynamicCache) and torch's tensor-rebuilding primitives (torch._utils._rebuild_*, torch.*Storage). Any other global, e.g. os.system, raises UnpicklingError, so a tampered preset still cannot execute arbitrary code — the CWE-502 protection is preserved.
The same restriction is applied to the module-level load / loads that torch.load uses for legacy-format metadata, so both the zip and legacy serialization formats stay safe.

Because the restricted unpickler uses standard pickle semantics, SETITEMS on BaseModelOutputWithPast works and the objects load correctly. The now-unused BaseModelOutputWithPast / DynamicCache imports are removed from both demos.

Test plan

Validated on PyTorch 2.12 + transformers 4.51.3:

Reproduced App/Demo exception since fix: use weights_only=True (CWE-502) #392: the bundled presets fail to load under weights_only=True + safe_globals with the exact SETITEMS error.
All 25 bundled demo/voices/streaming_model/*.pt files load via load_voice_preset, byte-identical (torch.equal on every last_hidden_state and every cache key/value tensor) to a trusted full unpickle.
Security: a .pt whose payload calls os.system is rejected with UnpicklingError in both the zip and legacy formats, and the side effect never runs.
Downstream attribute access (outputs.past_key_values, outputs.last_hidden_state) and item access (outputs["last_hidden_state"]) keep working.

Closes #392

Commit 303b283 switched the voice-preset loaders in demo/web/app.py and demo/realtime_model_inference_from_file.py to `torch.load(..., weights_only=True)` guarded by `torch.serialization.safe_globals([BaseModelOutputWithPast, DynamicCache])` to close a CWE-502 arbitrary-code-execution risk. That load can never succeed. The presets are dicts of `transformers.modeling_outputs.BaseModelOutputWithPast` objects, and PyTorch's weights-only unpickler refuses the `SETITEMS` opcode on `dict` subclasses (it accepts only the exact `dict`, `OrderedDict` and `Counter` types) even when the class is allowlisted via `safe_globals`. Loading therefore aborts on startup with: _pickle.UnpicklingError: Weights only load failed. ... Can only SETITEMS for dict, collections.OrderedDict, collections.Counter, but got <class 'transformers.modeling_outputs.BaseModelOutputWithPast'> making the streaming web demo and the realtime file demo unusable on PyTorch >= 2.6. `BaseModelOutputWithPast` has to stay an object (the model accesses `outputs.past_key_values` and `outputs.last_hidden_state`), so the presets cannot simply be flattened to plain tensors. Instead, load them through a restricted `pickle.Unpickler` whose `find_class` resolves only the container classes the presets are built from (`OrderedDict`, `BaseModelOutputWithPast`, `DynamicCache`) plus torch's tensor-rebuilding primitives. Any other global, e.g. `os.system`, is refused, so a tampered preset still cannot execute arbitrary code and the CWE-502 protection is preserved. The same restriction is applied to the module-level `load` / `loads` that `torch.load` uses for legacy-format metadata, so both serialization formats stay safe. The shared `load_voice_preset` helper lives in the streaming processor module imported by both demos; the now-unused `BaseModelOutputWithPast` and `DynamicCache` imports are removed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: load voice presets with a restricted unpickler (CWE-502)#397

fix: load voice presets with a restricted unpickler (CWE-502)#397
officialasishkumar wants to merge 1 commit into
microsoft:mainfrom
officialasishkumar:fix/voice-preset-weights-only-load

officialasishkumar commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

officialasishkumar commented May 25, 2026

Summary

Problem

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant