[pthreads] Broadcast fresh SAB to non-growing workers on memory.grow#1
[pthreads] Broadcast fresh SAB to non-growing workers on memory.grow#1ricardscandit wants to merge 2 commits into
Conversation
4964992 to
375eccb
Compare
Fix a pre-existing missing cross-thread notification in SHARED_MEMORY + ALLOW_MEMORY_GROWTH + PTHREADS, exposed deterministically on Chromium 149+ (reported in emscripten-core#27084) and latent on earlier engines. The original growMemViews check is the only synchronization between "thread A called wasmMemory.grow()" and "thread B is about to issue Atomics.* against HEAP*": function growMemViews() { if (wasmMemory.buffer != HEAP8.buffer) { updateMemoryViews(); } } That is an identity compare of a JS-engine getter accessed across threads, with no fence, message, Atomics.wait, or shared flag. The WebAssembly threads proposal does not require `Memory.prototype.buffer` to return a SAB that reflects a grow done on another thread — it is implementation-defined whether non-growing workers ever observe a fresh buffer. The code has therefore always assumed an implementation detail (re-query on every access) that several engines happen to provide; any engine that caches or memoizes the getter leaves non-growing workers holding a stale view, and the next Atomics.* at a post-grow address throws RangeError: Invalid atomic access index at which point the pthread worker dies uncaught and the calling code hangs. Chromium 149+ added that caching (almost certainly as a getter-side optimization), turning the latent race into a deterministic reproducer for any -pthread -sPROXY_TO_PTHREAD -sALLOW_MEMORY_GROWTH -sDISABLE_EXCEPTION_CATCHING=0 build that triggers a heap grow. Diagnosed by polling the worker's wasmMemory.buffer at the throw site across microtask + setTimeout(0,10,50,200,1000) + after an explicit updateMemoryViews(); on affected workers the buffer is pinned at the pre-grow byteLength and maxByteLength with growable=false. The growing thread, in contrast, sees a fresh SAB returned from wasmMemory.buffer immediately after grow(). Fix: replace the implicit `buffer != HEAP8.buffer` synchronization with an explicit broadcast of the fresh SAB to every other pthread worker. * src/lib/libcore.js: after a successful wasmMemory.grow() in $growMemory, publish the fresh SAB to `freshSharedBuffer` (declared in runtime_common.js) and broadcast it via postMessage as `{cmd: 'memBufferRefresh', buffer: newBuf}` — from a pthread worker to the main thread, and from the main thread directly to every worker in PThread.pthreads. Broadcast block is wrapped in `#if PTHREADS` so it isn't generated as an empty try/catch in WASM_WORKERS-only builds. * src/runtime_pthread.js: worker handleMessage stashes the SAB and calls updateMemoryViews() on 'memBufferRefresh'. Handled before the CMD_LOAD arm so it works for pre-runtime-init workers too. * src/lib/libpthread.js: main-thread worker.onmessage refreshes its own views on 'memBufferRefresh' from any worker, then fans the message out to every other worker. * src/runtime_common.js: declares `freshSharedBuffer` next to growMemViews. growMemViews() and updateMemoryViews() consult it before falling back to wasmMemory.buffer, so views are built on the fresh SAB even when wasmMemory.buffer remains stale. All five edits are gated by the existing `#if SHARED_MEMORY && ALLOW_MEMORY_GROWTH && !GROWABLE_ARRAYBUFFERS` condition, so the fix is inert for any build that does not need it. The new message uses a string `cmd` so it sits alongside the existing numeric CMD_* (CMD_LOAD = 1 … CMD_CALL_HANDLER = 9) protocol without extending the numeric namespace. The three try/catch blocks (broadcast, worker receive, main-thread fan-out) log via `err()` on failure so a broken broadcast is observable instead of silent. Empirical validation on Chromium 149.0.7827.102 (headless), running three independent wasm32-pthread C++/GTest executables with stock INITIAL_MEMORY=64MB and MAXIMUM_MEMORY=300MB. Each executable spawns many short-lived pthread workers that allocate enough memory to force multiple cross-thread grows past INITIAL_MEMORY, and tens to thousands of inter-thread synchronization points exercising Atomics.waitAsync via __emscripten_thread_mailbox_await: * Backported onto emsdk 5.0.3: - small binary (75 tests, 8 suites) — 75/75 PASSED, 434 ms - medium binary (1287 tests, 185 suites) — 1284/1287 PASSED, 33.1 s (3 unrelated SKIPs) - large binary (1221 tests, 150 suites) — 1221/1221 PASSED, 32.6 s * Applied onto a fresh emsdk 6.0.0 install of the same release used in the linked issue: - small binary — 75/75 PASSED, 406 ms - medium binary — 1284/1287 PASSED, 32.6 s (3 SKIPs) - large binary — 1221/1221 PASSED, 32.6 s Without this patch all three executables hang at the first Atomics.waitAsync that targets a post-grow address, on either toolchain — the worker dies uncaught and the main thread's heartbeat to emrun stalls until --timeout. Closes emscripten-core#27084
375eccb to
df1ddaa
Compare
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (1) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads_memgrowth.json: 26421 => 27162 [+741 bytes / +2.80%] Average change: +2.80% (+2.80% - +2.80%) ```
|
I don't think that fix will work in the general case since most pthreads never return to the even loop so can never receive anything via |
|
Hi @sbc100, thanks for the feedback. You're right - for pthreads that never return to the event loop (compute loops, blocking The approach I had in mind covers specifically the I could see landing this as a partial mitigation for that specific case, but it wouldn't be a general fix. What would you suggest - going for something heavier (e.g. a synchronous freshness check on the JS-side Atomics.* paths), or just waiting for the Chromium side to be addressed? Thanks. |
|
I think that the vast majority of pthread in the wild are going to be ones that have a single entry point and never return the event loop (since that is how pthreads works in C). So I'm not sure there is much value in landing a mitigation that only works for event-loop-using pthreads. I'm not sure what you mean by |
|
Another possible solution here is to feature-test the Could it be the recent version of chromium support |
|
Closing this PR. The issue was filed to the chromium project already. Here https://issues.chromium.org/issues/522454846 And duplicated by mistake also here https://issues.chromium.org/issues/522454849 |
Fix a pre-existing missing cross-thread notification in
SHARED_MEMORY + ALLOW_MEMORY_GROWTH + PTHREADS, exposed deterministically on Chromium 149+ (reported in emscripten-core#27084) and latent on earlier engines.The original
growMemViewscheck is the only synchronization between "thread A calledwasmMemory.grow()" and "thread B is about to issueAtomics.*againstHEAP*":That is an identity compare of a JS-engine getter accessed across threads, with no fence, message,
Atomics.wait, or shared flag. The WebAssembly threads proposal does not requireMemory.prototype.bufferto return a SAB that reflects a grow done on another thread — it is implementation-defined whether non-growing workers ever observe a fresh buffer. The code has therefore always assumed an implementation detail (re-query on every access) that several engines happen to provide; any engine that caches or memoizes the getter leaves non-growing workers holding a stale view, and the nextAtomics.*at a post-grow address throwsat which point the pthread worker dies uncaught and the calling code hangs.
Chromium 149+ added that caching (almost certainly as a getter-side optimization), turning the latent race into a deterministic reproducer for any
-pthread -sPROXY_TO_PTHREAD -sALLOW_MEMORY_GROWTH -sDISABLE_EXCEPTION_CATCHING=0build that triggers a heap grow. Diagnosed by polling the worker'swasmMemory.bufferat the throw site across microtask +setTimeout(0,10,50,200,1000)+ after an explicitupdateMemoryViews(); on affected workers the buffer is pinned at the pre-growbyteLengthandmaxByteLengthwithgrowable=false. The growing thread, in contrast, sees a fresh SAB returned fromwasmMemory.bufferimmediately aftergrow().Fix
Replace the implicit
buffer != HEAP8.buffersynchronization with an explicit broadcast of the fresh SAB to every other pthread worker.src/lib/libcore.js: after a successfulwasmMemory.grow()in$growMemory, publish the fresh SAB tofreshSharedBuffer(declared inruntime_common.js) and broadcast it viapostMessageas{cmd: 'memBufferRefresh', buffer: newBuf}— from a pthread worker to the main thread, and from the main thread directly to every worker inPThread.pthreads. Broadcast block is wrapped in#if PTHREADSso it isn't generated as an empty try/catch inWASM_WORKERS-only builds.src/runtime_pthread.js: workerhandleMessagestashes the SAB and callsupdateMemoryViews()on'memBufferRefresh'. Handled before theCMD_LOADarm so it works for pre-runtime-init workers too.src/lib/libpthread.js: main-threadworker.onmessagerefreshes its own views on'memBufferRefresh'from any worker, then fans the message out to every other worker.src/runtime_common.js: declaresfreshSharedBuffernext togrowMemViews.growMemViews()andupdateMemoryViews()consult it before falling back towasmMemory.buffer, so views are built on the fresh SAB even whenwasmMemory.bufferremains stale.All five edits are gated by the existing
#if SHARED_MEMORY && ALLOW_MEMORY_GROWTH && !GROWABLE_ARRAYBUFFERScondition, so the fix is inert for any build that does not need it. The new message uses a stringcmdso it sits alongside the existing numericCMD_*(CMD_LOAD = 1…CMD_CALL_HANDLER = 9) protocol without extending the numeric namespace. The three try/catch blocks (broadcast, worker receive, main-thread fan-out) log viaerr()on failure so a broken broadcast is observable instead of silent.Empirical validation
Chromium 149.0.7827.102 (headless), running three independent wasm32-pthread C++/GTest executables with stock
INITIAL_MEMORY=64MBandMAXIMUM_MEMORY=300MB. Each executable spawns many short-lived pthread workers that allocate enough memory to force multiple cross-thread grows pastINITIAL_MEMORY, and tens to thousands of inter-thread synchronization points exercisingAtomics.waitAsyncvia__emscripten_thread_mailbox_await:Backported onto emsdk 5.0.3:
Applied onto a fresh emsdk 6.0.0 install of the same release used in the linked issue:
Without this patch all three executables hang at the first
Atomics.waitAsyncthat targets a post-grow address, on either toolchain — the worker dies uncaught and the main thread's heartbeat toemrunstalls until--timeout.Closes emscripten-core#27084