Skip to content

Commit ac04c91

Browse files
committed
gfx/gfx_thumbnail: port hand-rolled atomic shim to retro_atomic.h
The file had its own local __atomic_* / _Interlocked / volatile shim guarding the cross-thread .status field, written before retro_atomic.h existed. Replace it with the portable API now that the header is upstream (75ea1e4). The shim was 18 lines, well-commented, correct, and operated on plain `int*` via GCC's __atomic_* extension form. The C11 stdatomic and C++11 std::atomic backends in retro_atomic.h cannot operate on plain pointers -- their typedefs are _Atomic int and std::atomic<int>, which forbid plain dereference -- so this port changes the .status field type from `enum gfx_thumbnail_status` to `retro_atomic_int_t` and routes every read and write through retro_atomic_*_int. Twenty-five access sites are converted; all but three are single-threaded main-thread accesses that were already non-atomic in shape, but now go through the API because the field's atomic type requires it. On x86 TSO the acquire/release barriers compile out entirely; on weak-memory ARM/PowerPC the cost is one extra ldar/stlr per cold-path access (menu list updates, frame draws -- never per-sample). The two cross-thread sites that drove the original shim are preserved verbatim: - gfx_thumbnail_handle_upload publishes texture/width/height via release-store of AVAILABLE. - gfx_thumbnail_draw acquire-loads the status before reading the texture/width/height. Both already had explanatory comments; those are kept. Field-type ABI: retro_atomic_int_t is int-sized and int-aligned on every supported backend, so gfx_thumbnail_t's layout is unchanged. Verified across 7 backends and 2 architectures (x86_64, AArch64); also asserted at compile-time in the new test (see below). Intra-file initialisation of stack-local gfx_thumbnail_t in gfx_thumbnail_draw uses retro_atomic_int_init() for the .status field; plain assignment to an _Atomic int is illegal under C11 stdatomic. CXX_BUILD compatibility ----------------------- RetroArch's CXX_BUILD mode (Apple platforms, etc.) compiles every .c file as C++. Once .status is std::atomic<int>, two C-valid patterns become C++-invalid: - memset(&t, 0, sizeof(t)) of a struct containing std::atomic<int> warns -Wclass-memaccess (the struct is no longer trivially-copyable). - `*dst = *src` for a struct embedding std::atomic<int> fails to compile (the deleted copy-assignment of std::atomic<int> propagates upward through aggregates). The first case fires once: xmb.c:667's xmb_alloc_node() did memset(&node->thumbnail_icon.icon, 0, ...) instead of calloc'ing the whole node, intentionally to avoid zeroing a multi-KB thumbnail_path_data substruct on the hot path. Replaced with gfx_thumbnail_init_blank() -- a new field-by-field zero-init helper added to gfx_thumbnail.h. The second case does not fire: the only struct copy in the menu drivers (ozone.c:4652 ozone_copy_node) operates on ozone_node_t, which does not embed gfx_thumbnail_t (the gfx_thumbnail_t fields are in ozone_handle_t, which is calloc'd, never struct-copied). calloc-init of structs containing std::atomic<int> (xmb_handle_t, ozone_handle_t, materialui_handle_t) is technically UB per C++11 [atomics.types.generic] (the default constructor leaves the atomic uninitialised; reads should be preceded by atomic_init). In practice every libstdc++/libc++/ MSVC STL treats a zero-bit pattern of std::atomic<int> as a zero-valued atomic, and reads via atomic_load_explicit return 0. Verified with all three menu drivers under CXX_BUILD. Wrappers (GFX_THUMB_STATUS_STORE / GFX_THUMB_STATUS_LOAD) are preserved as static-inline functions rather than function-like macros. GCC 13+ at -O3 emits a -Wstringop-overflow false- positive on the inlined __atomic_* primitive when called from gfx_thumbnail_request -- the optimiser cannot prove the @thumbnail argument non-NULL across every `goto end:` flow-graph path. The pre-port code did not trip this because the shim's explicit (int*) cast on enum* defeated GCC's analysis; retro_atomic_int_t is int directly on the GCC backend, so no cast remains to do that. The wrappers carry a targeted _Pragma("GCC diagnostic ignored \"-Wstringop-overflow\"") that suppresses the warning at the function boundary; codegen is unchanged on every backend (verified via aarch64 objdump: 14 ldar / 14 stlr emitted, identical to pre-port). Other toolchains (clang, MSVC, older GCC) skip the suppression. Regression test --------------- samples/gfx/gfx_thumbnail_status_atomic/ exercises the synchronisation pattern in isolation. The test does not link gfx_thumbnail.c; it mirrors the relevant types and macros and runs an SPSC producer/consumer stress. Verified failure mode: deliberately moving the producer's generation-counter bump before the data writes (simulating a removed release-store fence) makes the stress fail with hundreds of mismatched reads on x86 and a "ThreadSanitizer: data race in consumer_thread" report under TSan (halt_on_error=1 -> exit 66). Restoring the fence gives 1M iterations with 0 mismatches and a clean TSan run. Static checks asserted at compile time: - sizeof(gfx_thumbnail_t.status) == sizeof(int), so the field-type change preserves struct layout. - HAVE_RETRO_ATOMIC is set after the include. Runtime checks: - All four GFX_THUMBNAIL_STATUS_* values round-trip through GFX_THUMB_STATUS_STORE / GFX_THUMB_STATUS_LOAD. - 1M-iteration SPSC handshake stress (HAVE_THREADS only): producer stages (texture, width, height) = (i, i, i), stores AVAILABLE via the macro under test, bumps a `produced` counter via release-store; consumer waits for the new generation, reads the four fields via the macro under test, verifies all four equal i, acks via a `consumed` counter. The handshake prevents the producer from rewriting the data while the consumer is reading it; without it, torn reads would be expected even with correct barriers (a property of the test design, not the primitives). The test mirrors gfx_thumbnail_init_blank() locally rather than including gfx_thumbnail.h, since the test deliberately shadows the header's type definitions to verify layout invariants. CI -- .github/workflows/Linux-samples-gfx.yml gets one new step that builds the sample with SANITIZER=thread and runs it with TSAN_OPTIONS=halt_on_error=1. TSan is the right validator for this test specifically: missing-barrier regressions show up as data races even on x86 TSO, where the hardware would otherwise hide them at runtime. ASan/UBSan would catch nothing here (no out-of-bounds access, no UB), so they are not added. Verified locally ---------------- Compile-clean across: - gcc -O2 / -O3 -Wall -Werror, x86_64, all 5 affected files (gfx_thumbnail.c, xmb.c, ozone.c, materialui.c, runloop.c) - g++ -xc++ -std=c++11 -DCXX_BUILD, same 5 files - clang -O2, same 5 files - aarch64-linux-gnu-gcc + qemu-user, gfx_thumbnail.c - arm-linux-gnueabihf-gcc, gfx_thumbnail.c - clang -x objective-c / -x objective-c++ smoke (header-only inclusion check; mirrors cocoa_common.m's transitive path via menu_driver.h) - C89 -ansi -pedantic, gfx_thumbnail.c - Forced backends: C11 stdatomic, GCC __sync_*, volatile fallback - Clang static analyser - HAVE_THREADS=0 (test sample, static checks only) Functional: - 1M-iter SPSC stress passes on x86_64 native - 1M-iter SPSC stress passes under qemu-aarch64 - TSan halt_on_error=1 catches deliberately-injected missing-barrier bug -> exit 66 - aarch64 objdump shows 14 ldar / 14 stlr in gfx_thumbnail.o, identical between pre-port and post-port - x86 codegen: plain mov for cross-thread sites (TSO-correct) - ARM32 codegen: 28 dmb instructions emitted Behavior when no real atomic backend is available ------------------------------------------------- retro_atomic.h's cascade always selects something; the last tier is a volatile fallback (`#warning` unless suppressed, `RETRO_ATOMIC_LOCK_FREE` undefined). On that tier, expansions are plain volatile loads and stores -- no memory barriers -- which is correct only on single-core targets or x86 TSO. The header's docstring documents this and the intended caller-side gate is `#if defined(RETRO_ATOMIC_LOCK_FREE)`. gfx_thumbnail does NOT add such a gate, matching pre-port behavior: the original shim's `#else` branch was also a plain volatile read/write with no barriers, and gfx_thumbnail never checked whether barriers were available. That tradeoff was explicit in 91166d0 ("Note on correctness"), where the volatile fallback's SMP PowerPC G5 weakness was acknowledged and accepted: the failure mode is at worst a fade-in animation starting one frame late, not corruption (status is a single aligned int -- no torn reads possible -- and a cosmetic mistime is recoverable on the next frame). What changes with the port: the cascade gets richer. Pre- port, anything that wasn't the GCC __atomic_* or MSVC Interlocked* branch landed on the volatile fallback. Post- port, four additional tiers sit between modern GCC and volatile -- C11 stdatomic, C++11 std::atomic, Apple OSAtomic*, GCC __sync_* -- which means every real RetroArch target with threaded video falls into a tier that gives proper release/ acquire ordering. The G5 case the historical note acknowledged is now covered by GCC __sync_* (which existed in GCC 4.1+ and was named as the proper fix if anyone wanted to harden the G5 path later); G5 with modern devkitPPC would land on __atomic_* and get real lwsync. Net: on every platform where atomics are "unavailable" in the sense of "no real backend matches", pre-port and post-port emit identical plain-volatile codegen (verified by aarch64- gcc -DRETRO_ATOMIC_FORCE_VOLATILE: gfx_thumbnail_handle_upload emits plain `str` rather than `stlr`, same as pre-port's volatile branch on the same target). The port introduces no new "atomics unavailable" footgun. Not verified on real hardware: - MSVC ARM64 (relies on retro_atomic.h's MSVC backend, which landed in 75ea1e4 and is itself not yet hardware-validated) - Real PowerPC SMP (Wii U, Xbox 360) - End-to-end Apple build (macOS / iOS / tvOS)
1 parent 75ea1e4 commit ac04c91

7 files changed

Lines changed: 1552 additions & 60 deletions

File tree

.github/workflows/Linux-samples-gfx.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,3 +205,39 @@ jobs:
205205
test -x vulkan_ctx_double_free_test
206206
timeout 60 ./vulkan_ctx_double_free_test
207207
echo "[pass] vulkan_ctx_double_free_test"
208+
209+
- name: Build and run gfx_thumbnail_status_atomic_test (TSan)
210+
shell: bash
211+
working-directory: samples/gfx/gfx_thumbnail_status_atomic
212+
run: |
213+
set -eu
214+
# Regression test for the cross-thread thumbnail status
215+
# synchronisation in gfx/gfx_thumbnail.c. The .status
216+
# field is read by the video thread (gfx_thumbnail_draw)
217+
# and written by the upload-callback thread
218+
# (gfx_thumbnail_handle_upload), with a release-store /
219+
# acquire-load pairing that guards the texture / width /
220+
# height fields published alongside the AVAILABLE
221+
# status. Pre-port the file used a hand-rolled
222+
# __atomic_* / _Interlocked / volatile shim. Post-port
223+
# the .status field is retro_atomic_int_t and the
224+
# accesses route through retro_atomic_*_int macros.
225+
#
226+
# ThreadSanitizer is the right validator for this test:
227+
# it instruments every atomic load and store and would
228+
# flag a missing-barrier regression as a data race, even
229+
# on x86 TSO where the hardware otherwise hides the bug.
230+
# ASan/UBSan would not catch barrier removal at all
231+
# (no out-of-bounds access, no UB). The test also
232+
# carries static-assertion checks that validate the
233+
# struct layout invariant (.status size == sizeof(int))
234+
# so the field's atomic type does not change the
235+
# gfx_thumbnail_t ABI. If gfx_thumbnail.c amends
236+
# GFX_THUMB_STATUS_STORE / GFX_THUMB_STATUS_LOAD or the
237+
# .status field type, the verbatim copies in
238+
# gfx_thumbnail_status_atomic_test.c must follow.
239+
make clean all SANITIZER=thread
240+
test -x gfx_thumbnail_status_atomic_test
241+
TSAN_OPTIONS=halt_on_error=1 timeout 60 \
242+
./gfx_thumbnail_status_atomic_test
243+
echo "[pass] gfx_thumbnail_status_atomic_test"

0 commit comments

Comments
 (0)