Commit 932e1e5
committed
audio/asio: port unsynchronized fifo_buffer_t to retro_spsc_t
The ASIO callback thread (which is the audio driver's real-time
thread) called fifo_read(ad->ring, ...) at five sites in
asio_deinterleave_to_buffers without holding any lock. The main
thread called fifo_write(ad->ring, src, to_write) in asio_write
also without holding any lock. The only slock_* calls in the
file were on cond_lock for the scond_wait/scond_signal pair
around backpressure, which protects the condvar's internal state
but does not serialise fifo_buffer_t access -- fifo_buffer_t is
itself a plain ring buffer with no internal synchronisation, and
the audit's earlier characterisation of fifo_buffer_t as "MPMC-
safe with internal slock" was wrong.
Both threads concurrently mutated ad->ring->first (callback via
fifo_read) and ad->ring->end (main via fifo_write). fifo_*
internally does an "if-then-set" on first/end inside a wraparound
branch, so a racing read can observe a torn value -- not just
loose ordering, an actually-corrupt half-updated cursor. On x86
TSO this is masked at the hardware level most of the time, which
is presumably why the bug has been latent without major reports;
on weak-memory targets (Win-on-ARM ASIO, becoming a real thing
with Snapdragon X) the symptoms would be audio glitches at
minimum and corrupted samples on wraparound.
retro_spsc_t is the right tool for this exact pattern:
single-producer (main thread, asio_write) / single-consumer
(callback thread, asio_deinterleave_to_buffers) byte queue with
acquire/release ordering on the head and tail cursors. Lock-
free, so no priority-inversion concern from the callback path
(taking a mutex from a real-time audio callback is a textbook
anti-pattern in pro-audio code, which rules out the alternative
"add a slock around the FIFO" fix). This also makes asio.c the
first production caller of retro_spsc_t since b894479 introduced
the primitive, validating the API against a real workload.
Mechanical changes
------------------
- #include <queues/fifo_queue.h> -> <retro_spsc.h>.
- fifo_buffer_t *ring -> retro_spsc_t ring (embedded by value)
plus a sibling bool ring_initialized so cleanup paths can tell
"init succeeded" from "never initialised". retro_spsc_t
doesn't carry that bit internally because most callers know
their own lifecycle; asio.c's free path runs from multiple
error sites so the flag is the simplest way.
- fifo_new/fifo_free -> retro_spsc_init/retro_spsc_free, with
!ring NULL-checks replaced by !ring_initialized.
- FIFO_READ_AVAIL / FIFO_WRITE_AVAIL -> retro_spsc_read_avail /
retro_spsc_write_avail.
- fifo_read / fifo_write -> retro_spsc_read / retro_spsc_write.
The writer at line 1306 now uses retro_spsc_write's actual
return value rather than assuming it equals to_write; the cap
via retro_spsc_write_avail makes them equal in practice but
using the return value is defensive against contract changes.
- fifo_clear -> retro_spsc_clear (new API, see below) at the
persistent-instance reuse path.
ra_asio_t is calloc'd, so the embedded retro_spsc_t starts as a
zero-bit pattern; retro_spsc_init then does the proper init via
retro_atomic_size_init before any cross-thread access begins.
This avoids the C++11 [atomics.types.generic] technical UB
concern about uninitialised _Atomic / std::atomic members
(carried over from the gfx_thumbnail port's analysis).
retro_spsc_clear: new API
-------------------------
retro_spsc_clear(retro_spsc_t *q) resets head and tail to zero
without reallocating the buffer. It's documented as safe only
when both producer and consumer are quiesced -- the same lifetime
constraint as retro_spsc_init / retro_spsc_free. Implementation
is two retro_atomic_size_init calls; the init form is necessary
because plain assignment to a retro_atomic_size_t is illegal
under the C11 stdatomic backend.
Without this addition, the alternatives for "drop stale data on
session restart" were:
(a) repeated retro_spsc_read into a scratch buffer until empty
(wasteful memcpys, ugly);
(b) retro_spsc_free + retro_spsc_init (reallocates the heap
buffer, churn).
(c) clear is one line at the call site and atomic with respect
to the calling thread. The quiescence requirement is documented
in the header alongside the function.
asio.c had two pre-port fifo_clear sites:
1. ra_asio_init_via_persistent (~line 1024): runs before
g_asio = ad, so the ASIO callback isn't yet seeing this
instance. Single-threaded; retro_spsc_clear is safe here.
This site is converted directly.
2. ra_asio_free / parking path (~line 1380 pre-port): ran
AFTER ASIO_CALL_STOP and AFTER g_asio = NULL. The intent
was "drop stale audio before parking", but ASIO_CALL_STOP is
not a synchronous join -- ASIO4ALL specifically does not
terminate its audio thread before returning -- so the old
fifo_clear at this site could race with a stray callback's
fifo_read. This was a pre-existing latent bug.
Resolution: drop the clear at the parking site entirely.
The next ra_asio_init_via_persistent will re-clear after the
callback is provably idle (g_asio == NULL means the callback
short-circuits at line 852's null check, so it won't touch
the ring). Documented with a comment at the parking site.
Test coverage
-------------
retro_spsc_clear gets unit-test coverage in the existing sample at
libretro-common/samples/queues/retro_spsc_test/: write 50 bytes,
verify read_avail == 50, retro_spsc_clear, verify read_avail == 0
and write_avail == capacity, then verify the queue is reusable
after clear (write 16, read 16, content matches). Runs clean
under SANITIZER=thread with halt_on_error=1.
The SPSC stress test (1M-token producer/consumer race through a
4 KB buffer) continues to pass under TSan with no regression
from the new function.
Verified locally
----------------
- retro_spsc.c compiles clean as C (gcc, clang) and C++11
(-xc++), under Wall/Wextra/Wpedantic/Werror.
- Forced backends: C11 stdatomic, GCC __sync_*, volatile
fallback all build clean.
- The ring-using portions of asio.c, extracted into a smoke
harness mirroring the real calloc allocation pattern,
compile and run clean under gcc -O2, g++ -std=c++11 -xc++,
and clang.
- Sample stress test passes 10M tokens, 0 mismatches.
- Sample TSan run (halt_on_error=1): exit 0.
Verified on Windows
-------------------
- End-to-end Windows ASIO build with this patch applied.
Audio plays correctly through the ASIO driver, no
regressions observed in normal operation.
Performance
-----------
The motivating reason for the port is correctness (the cross-
thread race) and bounded worst-case latency (the audio-code
property), not throughput. But for the record, a head-to-head
microbenchmark of the same access pattern under both designs
("fifo_buffer_t guarded by slock_t" vs "retro_spsc_t lock-free")
shows:
Single-threaded steady-state, 8 KB chunks (asio.c realistic
payload), x86_64:
fifo+slock: 338 ns/iter (write+read pair)
retro_spsc: 283 ns/iter (write+read pair)
Both dominated by memcpy cost (~94 ns/copy for 8 KB).
Per-op overhead net of memcpy: ~150 ns vs ~96 ns.
Single-threaded steady-state, 16 B chunks (atomic-cost-
dominated), x86_64:
fifo+slock: 44 ns/iter
retro_spsc: 16 ns/iter
Ratio: 2.7x faster.
AArch64 (qemu-user, ratios only — absolute numbers distorted
by emulation overhead):
fifo+slock: 1151 ns/iter (8 KB), 637 ns/iter (16 B)
retro_spsc: 891 ns/iter (8 KB), 410 ns/iter (16 B)
At realistic ASIO operation rates (a few hundred queue ops/sec
across both producer and consumer), the per-op savings amount
to ~20 microseconds per second of CPU on x86 -- ~0.002% of one
core. The throughput win is real but practically negligible at
audio rates.
What is meaningful is the bounded-latency property: under lock
contention, the locked design's worst case is unbounded (the
non-holding thread blocks in the kernel mutex; if the holder is
preempted, the wait is unbounded). Under the same contention,
the lock-free SPSC's worst case is one acquire-load. This is
the textbook reason pro-audio guidance (Apple CoreAudio docs,
Steinberg ASIO docs) discourages mutexes in real-time audio
threads, and it's the qualitatively important difference here.1 parent 9e04def commit 932e1e5
4 files changed
Lines changed: 147 additions & 43 deletions
File tree
- audio/drivers
- libretro-common
- include
- queues
- samples/queues/retro_spsc_test
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| |||
644 | 644 | | |
645 | 645 | | |
646 | 646 | | |
647 | | - | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
648 | 667 | | |
649 | 668 | | |
650 | 669 | | |
| |||
712 | 731 | | |
713 | 732 | | |
714 | 733 | | |
715 | | - | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
716 | 739 | | |
717 | 740 | | |
718 | 741 | | |
| |||
727 | 750 | | |
728 | 751 | | |
729 | 752 | | |
730 | | - | |
| 753 | + | |
731 | 754 | | |
732 | 755 | | |
733 | 756 | | |
| |||
742 | 765 | | |
743 | 766 | | |
744 | 767 | | |
745 | | - | |
| 768 | + | |
746 | 769 | | |
747 | 770 | | |
748 | 771 | | |
| |||
757 | 780 | | |
758 | 781 | | |
759 | 782 | | |
760 | | - | |
| 783 | + | |
761 | 784 | | |
762 | 785 | | |
763 | 786 | | |
| |||
773 | 796 | | |
774 | 797 | | |
775 | 798 | | |
776 | | - | |
| 799 | + | |
777 | 800 | | |
778 | 801 | | |
779 | 802 | | |
| |||
797 | 820 | | |
798 | 821 | | |
799 | 822 | | |
800 | | - | |
| 823 | + | |
801 | 824 | | |
802 | 825 | | |
803 | 826 | | |
| |||
826 | 849 | | |
827 | 850 | | |
828 | 851 | | |
829 | | - | |
| 852 | + | |
830 | 853 | | |
831 | 854 | | |
832 | 855 | | |
| |||
924 | 947 | | |
925 | 948 | | |
926 | 949 | | |
927 | | - | |
928 | | - | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
929 | 955 | | |
930 | 956 | | |
931 | 957 | | |
| |||
991 | 1017 | | |
992 | 1018 | | |
993 | 1019 | | |
994 | | - | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
995 | 1025 | | |
996 | 1026 | | |
997 | 1027 | | |
| |||
1158 | 1188 | | |
1159 | 1189 | | |
1160 | 1190 | | |
1161 | | - | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
1162 | 1196 | | |
1163 | | - | |
1164 | | - | |
| 1197 | + | |
1165 | 1198 | | |
1166 | 1199 | | |
1167 | 1200 | | |
1168 | 1201 | | |
| 1202 | + | |
1169 | 1203 | | |
1170 | 1204 | | |
1171 | 1205 | | |
| |||
1229 | 1263 | | |
1230 | 1264 | | |
1231 | 1265 | | |
1232 | | - | |
1233 | | - | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
1234 | 1271 | | |
1235 | 1272 | | |
1236 | 1273 | | |
| |||
1259 | 1296 | | |
1260 | 1297 | | |
1261 | 1298 | | |
1262 | | - | |
| 1299 | + | |
1263 | 1300 | | |
1264 | 1301 | | |
1265 | 1302 | | |
1266 | 1303 | | |
1267 | 1304 | | |
1268 | 1305 | | |
1269 | | - | |
1270 | | - | |
1271 | | - | |
1272 | | - | |
| 1306 | + | |
| 1307 | + | |
| 1308 | + | |
| 1309 | + | |
| 1310 | + | |
| 1311 | + | |
| 1312 | + | |
| 1313 | + | |
| 1314 | + | |
1273 | 1315 | | |
1274 | 1316 | | |
1275 | 1317 | | |
| |||
1346 | 1388 | | |
1347 | 1389 | | |
1348 | 1390 | | |
1349 | | - | |
1350 | | - | |
1351 | | - | |
| 1391 | + | |
| 1392 | + | |
| 1393 | + | |
| 1394 | + | |
| 1395 | + | |
| 1396 | + | |
| 1397 | + | |
| 1398 | + | |
1352 | 1399 | | |
1353 | 1400 | | |
1354 | 1401 | | |
| |||
1361 | 1408 | | |
1362 | 1409 | | |
1363 | 1410 | | |
1364 | | - | |
| 1411 | + | |
1365 | 1412 | | |
1366 | | - | |
| 1413 | + | |
1367 | 1414 | | |
1368 | 1415 | | |
1369 | 1416 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
176 | 196 | | |
177 | 197 | | |
178 | 198 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
78 | 91 | | |
79 | 92 | | |
80 | 93 | | |
| |||
110 | 123 | | |
111 | 124 | | |
112 | 125 | | |
| 126 | + | |
113 | 127 | | |
114 | | - | |
115 | | - | |
116 | 128 | | |
117 | | - | |
118 | | - | |
119 | | - | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
120 | 132 | | |
121 | 133 | | |
122 | 134 | | |
| |||
141 | 153 | | |
142 | 154 | | |
143 | 155 | | |
| 156 | + | |
144 | 157 | | |
145 | | - | |
146 | | - | |
147 | 158 | | |
148 | 159 | | |
149 | | - | |
150 | | - | |
151 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
152 | 163 | | |
153 | 164 | | |
154 | 165 | | |
| |||
171 | 182 | | |
172 | 183 | | |
173 | 184 | | |
| 185 | + | |
174 | 186 | | |
175 | | - | |
176 | | - | |
177 | | - | |
| 187 | + | |
178 | 188 | | |
179 | | - | |
| 189 | + | |
180 | 190 | | |
181 | | - | |
| 191 | + | |
182 | 192 | | |
183 | 193 | | |
184 | 194 | | |
| |||
187 | 197 | | |
188 | 198 | | |
189 | 199 | | |
190 | | - | |
| 200 | + | |
191 | 201 | | |
192 | 202 | | |
193 | 203 | | |
| |||
Lines changed: 27 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
225 | 252 | | |
226 | 253 | | |
227 | 254 | | |
| |||
0 commit comments