Skip to content

Commit b894479

Browse files
committed
libretro-common: add retro_spsc.h portable lock-free SPSC byte queue
A lock-free single-producer / single-consumer byte queue built on retro_atomic.h. Wired into the standard libretro-common build (Makefile.common, griffin) so it ships with every RetroArch configuration, but no production callers yet -- this commit lands the primitive only. Subsequent commits can convert hand-rolled SPSC patterns (audio/drivers/coreaudio.c's atomic_size_t ring, audio/drivers/opensl.c's __sync_*-guarded buffered_blocks counter) to use it. Design ------ Two-cursor (head + tail) Lamport / Disruptor style, NOT the single shared count. Each cursor is written by exactly one thread and acquire-loaded by the other. The producer publishes new data with a release-store on head; the consumer publishes consumed bytes with a release-store on tail. Neither thread issues atomic RMW operations on the hot path -- only retro_atomic_load_acquire_size and retro_atomic_store_release_size, which retro_atomic.h provides across all 7 backends. Capacity is power-of-2 (rounded up at init), masked with capacity - 1. Wraparound uses the standard two-memcpy pattern. size_t modular arithmetic on (head - tail) is well-defined as long as the difference stays within capacity, which the producer enforces by checking write_avail before each push. head and tail are placed on separate cache lines via explicit char-array padding (RETRO_SPSC_CACHE_LINE, default 64). Without this, the producer's release-store on head would invalidate the consumer's tail cache line and vice versa, halving throughput on contended SMP. Padding is a performance hint; correctness does not depend on it. The padding macros are guarded against underflow if RETRO_SPSC_CACHE_LINE is misconfigured to be smaller than the prefix fields. Comparison with the existing fifo_queue_t: - fifo_queue takes an slock_t internally; it's MPMC-safe but every push/pop costs a mutex. - retro_spsc is lock-free but limited to one producer / one consumer. Use it in code paths where (a) producer/consumer counts are fixed at one each (audio render thread <-> audio submission, video thread <-> task queue, etc.) and (b) the fifo_queue lock contention measurably matters. Comparison with audio/drivers/coreaudio.c's hand-rolled ring: - coreaudio uses a single shared atomic_size_t `filled` count rather than two cursors; producer fetch_add's it, consumer fetch_sub's it. Correct, but optimises for "give me write_avail" being a single atomic load (audio drivers do that query every fill) at the cost of RMW on every push and pop. - retro_spsc inverts that trade-off: write_avail / read_avail each cost two atomic loads, but push and pop only do load_acquire + store_release. Better for the general case where push/pop count vastly exceeds availability checks. - retro_spsc also pads head and tail; coreaudio doesn't (its single shared atomic doesn't suffer false sharing). Build wiring ------------ Build requirements: - retro_atomic.h (header-only, always available) - <stddef.h>, <stdint.h>, <stdbool.h>, <stdlib.h>, <string.h> - That's it. No HAVE_THREADS gate is needed for compilation: retro_spsc.c builds on every target retro_atomic.h supports, including pre-thread console targets (PSP, original Wii). On a single-threaded build it just sits there as dead code. Wired in through: - Makefile.common adds retro_spsc.o next to fifo_queue.o under the unconditional libretro-common object list. - griffin.c #includes retro_spsc.c next to fifo_queue.c (Apple builds and console unity builds pick it up via griffin). Test ---- libretro-common/samples/queues/retro_spsc_test/ exercises the queue under producer/consumer concurrency: - Property checks (single-threaded): power-of-2 round-up at init, fresh-queue avails, push/pop content round-trip, peek does not advance, wraparound across the buffer end. - SPSC stress (HAVE_THREADS): producer pushes 10M sequential 32-bit tokens through a 4 KB buffer; consumer reads and verifies each token matches the expected sequence. The small buffer relative to the message volume forces heavy interleaving between producer and consumer, exercising the wraparound path on every iteration. The stress harness has no per-iteration handshake -- both threads spin on avail() in tight loops -- so the test is sensitive to real synchronisation bugs. Verified: deliberately moving the producer's release-store on head BEFORE the buffer memcpys (so a consumer observing the new head can read uninitialised bytes) makes the stress fail with hundreds of mismatched tokens out of 10M, even on x86 TSO without TSan. The same bug under TSan with halt_on_error=1 exits 66 with "data race in retro_spsc_read_avail". CI -- .github/workflows/Linux-libretro-common-samples.yml: - retro_spsc_test added to RUN_TARGETS so the auto-discovery job builds and runs it under SANITIZER=address,undefined (the workflow default). - A new step builds and runs the test under Clang + SANITIZER=thread with TSAN_OPTIONS=halt_on_error=1, mirroring the retro_atomic_test TSan lane. TSan is the strict validator for this primitive specifically: missing-barrier regressions show up as data races even on x86 TSO, where the hardware would otherwise hide them at runtime. Verified locally: - gcc -O0/-O2/-O3 -Wall -Werror, x86_64 - clang -O2, x86_64 - g++ -xc++ -std=c++11 (CXX_BUILD-style) - aarch64-linux-gnu-gcc + qemu-user, 10M tokens clean, objdump shows real ldar/stlr at the cursor accesses - arm-linux-gnueabihf-gcc compile-clean - Forced backends: C11 stdatomic, GCC __sync_*, volatile fallback (volatile fallback correct only on x86 TSO, same contract as every other retro_atomic.h caller) - TSan halt_on_error=1: clean run, exit 0 - TSan halt_on_error=1 on bug-injected build: exit 66 - ASan/UBSan: clean - Bug-injected without sanitizer: 815 mismatches / 10M tokens Not verified on real hardware: - MSVC ARM64 (inherits retro_atomic.h's untested-on-hardware state for that backend) - Real PowerPC SMP (Wii U, Xbox 360)
1 parent b84cdac commit b894479

7 files changed

Lines changed: 758 additions & 0 deletions

File tree

.github/workflows/Linux-libretro-common-samples.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ jobs:
8080
task_queue_title_error_test
8181
tpool_wait_test
8282
retro_atomic_test
83+
retro_spsc_test
8384
)
8485
8586
# Per-binary run command (overrides ./<binary> if present).
@@ -289,6 +290,31 @@ jobs:
289290
290291
TSAN_OPTIONS=halt_on_error=1 ./retro_atomic_test
291292
293+
- name: Run retro_spsc_test under Clang + ThreadSanitizer
294+
shell: bash
295+
working-directory: libretro-common/samples/queues/retro_spsc_test
296+
run: |
297+
# retro_spsc.c is a lock-free SPSC byte queue built on
298+
# retro_atomic.h. Its correctness contract is acquire-load
299+
# / release-store on the head and tail cursors, with the
300+
# buffer reads/writes between them ordered by those barriers.
301+
# Missing or weakened barriers produce torn data on the
302+
# consumer side, observable as content mismatches in the
303+
# stress harness AND as TSan-reported races. The default
304+
# ASan/UBSan pass above catches the content mismatches but
305+
# not the races; this lane catches both.
306+
#
307+
# halt_on_error=1 makes TSan exit non-zero on the first race
308+
# rather than continuing -- which is what we want for CI:
309+
# any race here means the SPSC contract is broken.
310+
set -u
311+
set -o pipefail
312+
313+
make clean
314+
CC=clang make all SANITIZER=thread
315+
316+
TSAN_OPTIONS=halt_on_error=1 ./retro_spsc_test
317+
292318
# Cross-architecture validation lane for retro_atomic_test.
293319
#
294320
# The samples job above runs on x86_64, which is a strongly-ordered

Makefile.common

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,7 @@ OBJ += \
364364
input/input_autodetect_builtin.o \
365365
input/input_keymaps.o \
366366
$(LIBRETRO_COMM_DIR)/queues/fifo_queue.o \
367+
$(LIBRETRO_COMM_DIR)/queues/retro_spsc.o \
367368
$(LIBRETRO_COMM_DIR)/compat/compat_fnmatch.o \
368369
$(LIBRETRO_COMM_DIR)/compat/compat_posix_string.o
369370

griffin/griffin.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -815,6 +815,7 @@ INPUT (HID)
815815
FIFO BUFFER
816816
============================================================ */
817817
#include "../libretro-common/queues/fifo_queue.c"
818+
#include "../libretro-common/queues/retro_spsc.c"
818819

819820
/*============================================================
820821
AUDIO RESAMPLER
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
/* Copyright (C) 2010-2026 The RetroArch team
2+
*
3+
* ---------------------------------------------------------------------------------------
4+
* The following license statement only applies to this file (retro_spsc.h).
5+
* ---------------------------------------------------------------------------------------
6+
*
7+
* Permission is hereby granted, free of charge,
8+
* to any person obtaining a copy of this software and associated documentation files (the "Software"),
9+
* to deal in the Software without restriction, including without limitation the rights to
10+
* use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
11+
* and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
12+
*
13+
* The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
14+
*
15+
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
16+
* INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
18+
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
19+
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21+
*/
22+
23+
#ifndef __LIBRETRO_SDK_SPSC_H
24+
#define __LIBRETRO_SDK_SPSC_H
25+
26+
/*
27+
* retro_spsc.h - portable single-producer / single-consumer byte queue
28+
*
29+
* Lock-free byte-stream queue with one writer thread and one reader
30+
* thread. Uses release-store / acquire-load on two cursors (the
31+
* Lamport / Disruptor design) rather than a single shared count, so
32+
* neither thread issues atomic RMW operations on the hot path.
33+
*
34+
* Constraints:
35+
* - Exactly ONE producer and ONE consumer. No locking is performed;
36+
* two producers or two consumers will corrupt the queue.
37+
* - Capacity must be power-of-2. Init rounds up.
38+
* - Maximum capacity is SIZE_MAX/2 (so head - tail never overflows
39+
* the well-defined unsigned modular range).
40+
* - The buffer is byte-addressed. Callers pushing structured records
41+
* are responsible for framing.
42+
*
43+
* Memory model:
44+
* - Producer writes data into the buffer (non-atomic stores), then
45+
* publishes the new head via release-store.
46+
* - Consumer acquire-loads head (sees data writes that preceded the
47+
* release), reads data (non-atomic loads), then publishes the new
48+
* tail via release-store.
49+
* - Producer acquire-loads tail to know how much space is free.
50+
* - This pairing gives the SPSC invariant on every backend
51+
* retro_atomic.h supports lock-freely. On the volatile fallback
52+
* (no real backend), correctness reduces to single-core or x86 TSO,
53+
* same as every other retro_atomic.h caller.
54+
*
55+
* Cache behaviour:
56+
* - head and tail are placed on separate cache lines via explicit
57+
* padding to RETRO_SPSC_CACHE_LINE. Without this, the producer's
58+
* publish would invalidate the consumer's tail line and vice versa,
59+
* halving throughput on contended SMP. The padding is a
60+
* performance hint; correctness does not depend on it.
61+
*
62+
* Lifetime:
63+
* - retro_spsc_init allocates an internal buffer; retro_spsc_free
64+
* releases it. The caller owns the retro_spsc_t struct itself.
65+
* - The struct is NOT itself thread-safe to construct or destroy
66+
* while in use. Build it on one thread, hand the producer pointer
67+
* to one thread and the consumer pointer to another, then tear
68+
* down on one thread after both producer and consumer have stopped.
69+
*
70+
* Example:
71+
*
72+
* retro_spsc_t q;
73+
* retro_spsc_init(&q, 4096);
74+
*
75+
* // Producer thread:
76+
* uint8_t msg[64] = ...;
77+
* if (retro_spsc_write_avail(&q) >= sizeof(msg))
78+
* retro_spsc_write(&q, msg, sizeof(msg));
79+
*
80+
* // Consumer thread:
81+
* uint8_t msg[64];
82+
* if (retro_spsc_read_avail(&q) >= sizeof(msg))
83+
* retro_spsc_read(&q, msg, sizeof(msg));
84+
*
85+
* retro_spsc_free(&q);
86+
*
87+
* Comparison with libretro-common's existing fifo_queue_t:
88+
* - fifo_queue uses a slock_t internally; safe with multiple producers
89+
* and multiple consumers, but every push/pop takes a mutex.
90+
* - retro_spsc is lock-free but limited to one producer and one
91+
* consumer. Use this when (a) the producer/consumer count is fixed
92+
* at one each (audio driver -> backend, video -> task system, etc.)
93+
* and (b) the lock contention in fifo_queue measurably matters.
94+
* - For most code paths, fifo_queue is the better default. This
95+
* primitive is for hot paths where lock-free is a measured win.
96+
*/
97+
98+
#include <stddef.h>
99+
#include <stdint.h>
100+
#include <stdbool.h>
101+
102+
#include <retro_common_api.h>
103+
#include <retro_atomic.h>
104+
105+
RETRO_BEGIN_DECLS
106+
107+
/* Cache-line padding size. 64 covers x86-64, AArch64, most modern
108+
* ARMs, and modern PowerPC. Apple Silicon's effective coherency line
109+
* is 128 due to its M-series cluster topology; over-padding to 64
110+
* still avoids false sharing, just slightly less efficiently. Older
111+
* 32-bit ARMs (ARMv6, Cortex-A8) used 32-byte lines but tolerate the
112+
* larger pad without correctness impact. */
113+
#ifndef RETRO_SPSC_CACHE_LINE
114+
#define RETRO_SPSC_CACHE_LINE 64
115+
#endif
116+
117+
/* Padding helper. C89 has no _Static_assert; the array-size trick is
118+
* portable. We pad to the next multiple of cache-line size after the
119+
* pointer-and-size_t prefix and after each cursor. */
120+
121+
/* Effective cache line for padding. We pad each cursor to a full
122+
* cache line. If the pre-cursor fields exceed RETRO_SPSC_CACHE_LINE
123+
* on some hypothetical small-cache target, the underflow on the
124+
* subtraction would produce a giant array; guard with a max() so
125+
* the pad is always at least 1 byte and never underflows. */
126+
#define RETRO_SPSC_PAD0_BYTES \
127+
((RETRO_SPSC_CACHE_LINE > (sizeof(uint8_t*) + sizeof(size_t))) \
128+
? (RETRO_SPSC_CACHE_LINE - (sizeof(uint8_t*) + sizeof(size_t))) \
129+
: 1)
130+
#define RETRO_SPSC_PAD1_BYTES \
131+
((RETRO_SPSC_CACHE_LINE > sizeof(retro_atomic_size_t)) \
132+
? (RETRO_SPSC_CACHE_LINE - sizeof(retro_atomic_size_t)) \
133+
: 1)
134+
135+
typedef struct retro_spsc
136+
{
137+
uint8_t *buffer;
138+
size_t capacity; /* power of 2; mask = capacity - 1 */
139+
/* Pad so head sits on a fresh cache line, isolating it from
140+
* the buffer/capacity fields that init may touch. */
141+
uint8_t _pad0[RETRO_SPSC_PAD0_BYTES];
142+
retro_atomic_size_t head; /* producer publishes; consumer reads */
143+
/* Pad so tail sits on its own cache line, isolating it from head. */
144+
uint8_t _pad1[RETRO_SPSC_PAD1_BYTES];
145+
retro_atomic_size_t tail; /* consumer publishes; producer reads */
146+
} retro_spsc_t;
147+
148+
/**
149+
* retro_spsc_init:
150+
* @q : The queue.
151+
* @min_capacity : Requested capacity in bytes. Rounded up to the
152+
* next power of 2. Must be > 0 and <= SIZE_MAX/2.
153+
*
154+
* Allocates the internal buffer and zero-initialises both cursors.
155+
* Both cursors begin at 0; the queue is empty.
156+
*
157+
* Returns: true on success, false on allocation failure or invalid
158+
* @min_capacity.
159+
*
160+
* After this returns true, the producer thread can call
161+
* retro_spsc_write / retro_spsc_write_avail and the consumer thread can
162+
* call retro_spsc_read / retro_spsc_read_avail.
163+
*/
164+
bool retro_spsc_init(retro_spsc_t *q, size_t min_capacity);
165+
166+
/**
167+
* retro_spsc_free:
168+
* @q : The queue.
169+
*
170+
* Releases the internal buffer. Caller must ensure both producer
171+
* and consumer have stopped using @q before calling. Safe to call
172+
* on a queue that retro_spsc_init failed on (no-op).
173+
*/
174+
void retro_spsc_free(retro_spsc_t *q);
175+
176+
/**
177+
* retro_spsc_write_avail:
178+
* @q : The queue.
179+
*
180+
* Producer-side query: how many bytes can be written before the
181+
* queue is full.
182+
*
183+
* Returns: byte count, in [0, capacity].
184+
*
185+
* SAFETY: callable only from the producer thread.
186+
*/
187+
size_t retro_spsc_write_avail(const retro_spsc_t *q);
188+
189+
/**
190+
* retro_spsc_read_avail:
191+
* @q : The queue.
192+
*
193+
* Consumer-side query: how many bytes are available to read.
194+
*
195+
* Returns: byte count, in [0, capacity].
196+
*
197+
* SAFETY: callable only from the consumer thread.
198+
*/
199+
size_t retro_spsc_read_avail(const retro_spsc_t *q);
200+
201+
/**
202+
* retro_spsc_write:
203+
* @q : The queue.
204+
* @data : Source bytes.
205+
* @bytes : Number of bytes to attempt to write.
206+
*
207+
* Writes up to @bytes from @data into the queue. If the queue has
208+
* less than @bytes free, writes only what fits.
209+
*
210+
* Returns: number of bytes actually written.
211+
*
212+
* SAFETY: callable only from the producer thread. Concurrent calls
213+
* with another producer corrupt the queue.
214+
*/
215+
size_t retro_spsc_write(retro_spsc_t *q, const void *data, size_t bytes);
216+
217+
/**
218+
* retro_spsc_read:
219+
* @q : The queue.
220+
* @data : Destination bytes.
221+
* @bytes : Number of bytes to attempt to read.
222+
*
223+
* Reads up to @bytes from the queue into @data. If the queue has
224+
* less than @bytes available, reads only what is present.
225+
*
226+
* Returns: number of bytes actually read.
227+
*
228+
* SAFETY: callable only from the consumer thread. Concurrent calls
229+
* with another consumer corrupt the queue.
230+
*/
231+
size_t retro_spsc_read(retro_spsc_t *q, void *data, size_t bytes);
232+
233+
/**
234+
* retro_spsc_peek:
235+
* @q : The queue.
236+
* @data : Destination bytes.
237+
* @bytes : Number of bytes to peek.
238+
*
239+
* Like retro_spsc_read but does not advance the read cursor. The
240+
* peeked bytes remain available for the next read.
241+
*
242+
* Returns: number of bytes peeked.
243+
*
244+
* SAFETY: callable only from the consumer thread.
245+
*/
246+
size_t retro_spsc_peek(const retro_spsc_t *q, void *data, size_t bytes);
247+
248+
RETRO_END_DECLS
249+
250+
#endif /* __LIBRETRO_SDK_SPSC_H */

0 commit comments

Comments
 (0)