Skip to content

D3D11 VERIFY_OK aborts on GPU device-lost (DXGI_ERROR_DEVICE_REMOVED) — Flutter Windows #86

@unspokenlanguage

Description

@unspokenlanguage

Summary
The Rive Flutter runtime on Windows crashes with an unrecoverable abort() when the GPU device is lost during fence synchronization. This happens during common user scenarios like sleep/wake, GPU TDR (Timeout Detection & Recovery), and driver resets. The crash originates from VERIFY_OK macros wrapping ID3D11Fence::SetEventOnCompletion() and ID3D11DeviceContext4::Signal() in

rive_native_windows.cpp
, which call abort() on any non-S_OK HRESULT — including DXGI_ERROR_DEVICE_REMOVED, a legitimate runtime condition.

Environment
Rive Flutter Runtime: rive_native (custom fork, D3D11 backend)
Platform: Windows 10/11, D3D11 with ID3D11Device5 fence path
Flutter: 3.38+
GPU: Reproduced on both NVIDIA and Intel adapters

Reproduction Steps
Run a Flutter app using the Rive renderer on Windows
Start rendering animations (one or more RiveWidget instances)
Put the machine to sleep (Win+X → Sleep) or trigger a GPU TDR (e.g., heavy GPU load causing driver reset)

Wake the machine
Result: App crashes immediately
Crash Log
........\platform\windows\rive_native_windows.cpp:220:
D3D error unknown error: m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr)
Lost connection to device.

Root Cause
The crash occurs in WindowsContextPLS::fenceWaitThread():

cpp
// rive_native_windows.cpp — fenceWaitThread()
VERIFY_OK(m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr));
And in WindowsContextPLS::end():
cpp
VERIFY_OK(m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex));
VERIFY_OK calls abort() on failure. When the GPU device is removed, these D3D calls return DXGI_ERROR_DEVICE_REMOVED (0x887A0005) — a non-fatal, expected runtime condition per the DXGI documentation. The Win32/D3D contract is that applications must handle this gracefully by detecting the error and either re-creating the device or degrading gracefully.

Why This Happens

Laptop sleep/wake Very common App crashes on wake, user loses session
GPU driver update/reset Occasional App crashes during driver install
GPU TDR (long shader/compute) Rare App crashes under heavy GPU load

Remote Desktop attach/detach Occasional App crashes when GPU context changes

Impact
Severity: Critical — unrecoverable crash, no user workaround
User experience: Users lose all unsaved work when their laptop sleeps. On production broadcast systems (our use case), this can cause live broadcast interruptions.
Affected users: All Windows users of the Rive Flutter runtime who use the D3D11 fence path (Windows 10 Creators Update 1703+, which is essentially all supported Windows machines)

Our Local Patch (Workaround)
We applied the following changes to our fork to prevent the crash:

  1. Replace VERIFY_OK with HRESULT checks
diff
// fenceWaitThread()
- VERIFY_OK(m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr));
+ HRESULT hr = m_lastFrameFence->SetEventOnCompletion(m_activeFenceWaitIndex, nullptr);
+ if (FAILED(hr)) {
+     m_deviceLost = true;
+     // Notify app, unblock main thread, exit fence thread
+     break;
+ }
diff
// end()
- VERIFY_OK(m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex));
+ HRESULT hrSignal = m_gpuContext4->Signal(m_lastFrameFence.Get(), ++m_lastFrameIndex);
+ if (FAILED(hrSignal)) {
+     m_deviceLost = true;
+     return;
+ }


2. Guard all render entry points

cpp
void begin(bool clear, uint32_t color) {
    if (m_deviceLost) return;
    // ...
}
void end(float devicePixelRatio) {
    if (m_deviceLost) return;
    // ...
}

3. Event-based notification to Dart
Used PostMessage(WM_APP + 0x52) from the fence thread → Win32 WndProc subclass → MethodChannel.InvokeMethod("onGpuDeviceLost") to notify the Dart layer, which shows a user-facing toast.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions