Commit ac04c91
committed
gfx/gfx_thumbnail: port hand-rolled atomic shim to retro_atomic.h
The file had its own local __atomic_* / _Interlocked / volatile
shim guarding the cross-thread .status field, written before
retro_atomic.h existed. Replace it with the portable API now
that the header is upstream (75ea1e4).
The shim was 18 lines, well-commented, correct, and operated on
plain `int*` via GCC's __atomic_* extension form. The C11
stdatomic and C++11 std::atomic backends in retro_atomic.h cannot
operate on plain pointers -- their typedefs are _Atomic int and
std::atomic<int>, which forbid plain dereference -- so this port
changes the .status field type from `enum gfx_thumbnail_status`
to `retro_atomic_int_t` and routes every read and write through
retro_atomic_*_int. Twenty-five access sites are converted; all
but three are single-threaded main-thread accesses that were
already non-atomic in shape, but now go through the API because
the field's atomic type requires it. On x86 TSO the
acquire/release barriers compile out entirely; on weak-memory
ARM/PowerPC the cost is one extra ldar/stlr per cold-path
access (menu list updates, frame draws -- never per-sample).
The two cross-thread sites that drove the original shim are
preserved verbatim:
- gfx_thumbnail_handle_upload publishes texture/width/height
via release-store of AVAILABLE.
- gfx_thumbnail_draw acquire-loads the status before reading
the texture/width/height.
Both already had explanatory comments; those are kept.
Field-type ABI: retro_atomic_int_t is int-sized and int-aligned
on every supported backend, so gfx_thumbnail_t's layout is
unchanged. Verified across 7 backends and 2 architectures
(x86_64, AArch64); also asserted at compile-time in the new
test (see below).
Intra-file initialisation of stack-local gfx_thumbnail_t in
gfx_thumbnail_draw uses retro_atomic_int_init() for the .status
field; plain assignment to an _Atomic int is illegal under C11
stdatomic.
CXX_BUILD compatibility
-----------------------
RetroArch's CXX_BUILD mode (Apple platforms, etc.) compiles
every .c file as C++. Once .status is std::atomic<int>, two
C-valid patterns become C++-invalid:
- memset(&t, 0, sizeof(t)) of a struct containing
std::atomic<int> warns -Wclass-memaccess (the struct is no
longer trivially-copyable).
- `*dst = *src` for a struct embedding std::atomic<int>
fails to compile (the deleted copy-assignment of
std::atomic<int> propagates upward through aggregates).
The first case fires once: xmb.c:667's xmb_alloc_node() did
memset(&node->thumbnail_icon.icon, 0, ...) instead of calloc'ing
the whole node, intentionally to avoid zeroing a multi-KB
thumbnail_path_data substruct on the hot path. Replaced with
gfx_thumbnail_init_blank() -- a new field-by-field zero-init
helper added to gfx_thumbnail.h. The second case does not
fire: the only struct copy in the menu drivers
(ozone.c:4652 ozone_copy_node) operates on ozone_node_t, which
does not embed gfx_thumbnail_t (the gfx_thumbnail_t fields are
in ozone_handle_t, which is calloc'd, never struct-copied).
calloc-init of structs containing std::atomic<int>
(xmb_handle_t, ozone_handle_t, materialui_handle_t) is
technically UB per C++11 [atomics.types.generic] (the default
constructor leaves the atomic uninitialised; reads should be
preceded by atomic_init). In practice every libstdc++/libc++/
MSVC STL treats a zero-bit pattern of std::atomic<int> as a
zero-valued atomic, and reads via atomic_load_explicit return 0.
Verified with all three menu drivers under CXX_BUILD.
Wrappers (GFX_THUMB_STATUS_STORE / GFX_THUMB_STATUS_LOAD) are
preserved as static-inline functions rather than function-like
macros. GCC 13+ at -O3 emits a -Wstringop-overflow false-
positive on the inlined __atomic_* primitive when called from
gfx_thumbnail_request -- the optimiser cannot prove the
@thumbnail argument non-NULL across every `goto end:` flow-graph
path. The pre-port code did not trip this because the shim's
explicit (int*) cast on enum* defeated GCC's analysis;
retro_atomic_int_t is int directly on the GCC backend, so no
cast remains to do that. The wrappers carry a targeted
_Pragma("GCC diagnostic ignored \"-Wstringop-overflow\"") that
suppresses the warning at the function boundary; codegen is
unchanged on every backend (verified via aarch64 objdump:
14 ldar / 14 stlr emitted, identical to pre-port). Other
toolchains (clang, MSVC, older GCC) skip the suppression.
Regression test
---------------
samples/gfx/gfx_thumbnail_status_atomic/ exercises the
synchronisation pattern in isolation. The test does not link
gfx_thumbnail.c; it mirrors the relevant types and macros and
runs an SPSC producer/consumer stress.
Verified failure mode: deliberately moving the producer's
generation-counter bump before the data writes (simulating a
removed release-store fence) makes the stress fail with
hundreds of mismatched reads on x86 and a "ThreadSanitizer:
data race in consumer_thread" report under TSan
(halt_on_error=1 -> exit 66). Restoring the fence gives 1M
iterations with 0 mismatches and a clean TSan run.
Static checks asserted at compile time:
- sizeof(gfx_thumbnail_t.status) == sizeof(int), so the
field-type change preserves struct layout.
- HAVE_RETRO_ATOMIC is set after the include.
Runtime checks:
- All four GFX_THUMBNAIL_STATUS_* values round-trip through
GFX_THUMB_STATUS_STORE / GFX_THUMB_STATUS_LOAD.
- 1M-iteration SPSC handshake stress (HAVE_THREADS only):
producer stages (texture, width, height) = (i, i, i),
stores AVAILABLE via the macro under test, bumps a
`produced` counter via release-store; consumer waits for
the new generation, reads the four fields via the macro
under test, verifies all four equal i, acks via a
`consumed` counter. The handshake prevents the producer
from rewriting the data while the consumer is reading it;
without it, torn reads would be expected even with correct
barriers (a property of the test design, not the
primitives).
The test mirrors gfx_thumbnail_init_blank() locally rather than
including gfx_thumbnail.h, since the test deliberately shadows
the header's type definitions to verify layout invariants.
CI
--
.github/workflows/Linux-samples-gfx.yml gets one new step that
builds the sample with SANITIZER=thread and runs it with
TSAN_OPTIONS=halt_on_error=1. TSan is the right validator for
this test specifically: missing-barrier regressions show up as
data races even on x86 TSO, where the hardware would otherwise
hide them at runtime. ASan/UBSan would catch nothing here
(no out-of-bounds access, no UB), so they are not added.
Verified locally
----------------
Compile-clean across:
- gcc -O2 / -O3 -Wall -Werror, x86_64, all 5 affected files
(gfx_thumbnail.c, xmb.c, ozone.c, materialui.c, runloop.c)
- g++ -xc++ -std=c++11 -DCXX_BUILD, same 5 files
- clang -O2, same 5 files
- aarch64-linux-gnu-gcc + qemu-user, gfx_thumbnail.c
- arm-linux-gnueabihf-gcc, gfx_thumbnail.c
- clang -x objective-c / -x objective-c++ smoke (header-only
inclusion check; mirrors cocoa_common.m's transitive path
via menu_driver.h)
- C89 -ansi -pedantic, gfx_thumbnail.c
- Forced backends: C11 stdatomic, GCC __sync_*, volatile
fallback
- Clang static analyser
- HAVE_THREADS=0 (test sample, static checks only)
Functional:
- 1M-iter SPSC stress passes on x86_64 native
- 1M-iter SPSC stress passes under qemu-aarch64
- TSan halt_on_error=1 catches deliberately-injected
missing-barrier bug -> exit 66
- aarch64 objdump shows 14 ldar / 14 stlr in
gfx_thumbnail.o, identical between pre-port and post-port
- x86 codegen: plain mov for cross-thread sites (TSO-correct)
- ARM32 codegen: 28 dmb instructions emitted
Behavior when no real atomic backend is available
-------------------------------------------------
retro_atomic.h's cascade always selects something; the last
tier is a volatile fallback (`#warning` unless suppressed,
`RETRO_ATOMIC_LOCK_FREE` undefined). On that tier, expansions
are plain volatile loads and stores -- no memory barriers --
which is correct only on single-core targets or x86 TSO. The
header's docstring documents this and the intended caller-side
gate is `#if defined(RETRO_ATOMIC_LOCK_FREE)`.
gfx_thumbnail does NOT add such a gate, matching pre-port
behavior: the original shim's `#else` branch was also a plain
volatile read/write with no barriers, and gfx_thumbnail never
checked whether barriers were available. That tradeoff was
explicit in 91166d0 ("Note on correctness"), where the
volatile fallback's SMP PowerPC G5 weakness was acknowledged
and accepted: the failure mode is at worst a fade-in
animation starting one frame late, not corruption (status is
a single aligned int -- no torn reads possible -- and a
cosmetic mistime is recoverable on the next frame).
What changes with the port: the cascade gets richer. Pre-
port, anything that wasn't the GCC __atomic_* or MSVC
Interlocked* branch landed on the volatile fallback. Post-
port, four additional tiers sit between modern GCC and
volatile -- C11 stdatomic, C++11 std::atomic, Apple OSAtomic*,
GCC __sync_* -- which means every real RetroArch target with
threaded video falls into a tier that gives proper release/
acquire ordering. The G5 case the historical note
acknowledged is now covered by GCC __sync_* (which existed in
GCC 4.1+ and was named as the proper fix if anyone wanted to
harden the G5 path later); G5 with modern devkitPPC would land
on __atomic_* and get real lwsync.
Net: on every platform where atomics are "unavailable" in the
sense of "no real backend matches", pre-port and post-port
emit identical plain-volatile codegen (verified by aarch64-
gcc -DRETRO_ATOMIC_FORCE_VOLATILE: gfx_thumbnail_handle_upload
emits plain `str` rather than `stlr`, same as pre-port's
volatile branch on the same target). The port introduces no
new "atomics unavailable" footgun.
Not verified on real hardware:
- MSVC ARM64 (relies on retro_atomic.h's MSVC backend, which
landed in 75ea1e4 and is itself not yet hardware-validated)
- Real PowerPC SMP (Wii U, Xbox 360)
- End-to-end Apple build (macOS / iOS / tvOS)1 parent 75ea1e4 commit ac04c91
7 files changed
Lines changed: 1552 additions & 60 deletions
File tree
- .github/workflows
- gfx
- samples/gfx/gfx_thumbnail_status_atomic
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
0 commit comments