audio: skip redundant float clamp on s16 driver paths

LibretroAdmin · LibretroAdmin · commit 8a2d293cca1c · 2026-04-21T22:20:33.000+02:00
audio_driver_flush() unconditionally clamped the float output buffer to [-1.0, 1.0] before handing it to the active driver. This is necessary on float-output drivers (WASAPI, PulseAudio, PipeWire, CoreAudio, JACK) which pass the values straight to hardware; finding the existing clamp landed in master to fix that case. But on s16-output drivers (ALSA s16, OSS, DirectSound, and most embedded backends -- PSP, PS2, Vita, Switch, GX, etc.) the very next step is convert_float_to_s16(), which saturates internally: scalar: explicit if/else clamp to [-0x8000, 0x7FFF] (libretro-common/audio/conversion/float_to_s16.c:170-176) SSE2: _mm_packs_epi32 -- saturating narrow NEON: vqmovn_s32 -- saturating narrow AltiVec: vec_packs -- saturating narrow Allegrex: vi2s.q -- saturating narrow So on s16 paths the clamp loop visited every sample to do work that convert_float_to_s16 was about to do anyway. The same total_samples floats get touched twice, then read again during conversion -- three buffer passes where one would do. Add USE_FLOAT to the guard so the clamp only runs when its output is actually consumed verbatim: if ((USE_FLOAT) && !MIXER_ACTIVE) clamp(buf); Behavior change is bit-identical for finite inputs, verified against a test harness covering: in-range, slightly over [-1.0, 1.0], +12 dB gain (3.98x), and 100x amplification. The harness shows that for every finite input the s16 output matches with or without the pre-clamp -- because saturation of (val * 0x8000) at the int16 boundary produces the same result as clamp-to-[-1,1] then saturation. One non-finite edge case worth noting: +Inf in the buffer. Under the old code this was clamped to 1.0f then converted to +32767. Under the new code +Inf * 0x8000 = +Inf, cast to int32 is undefined behavior; on x86-64 it wraps to INT32_MIN which then saturates to -32768. This would manifest as a single-sample full-scale negative spike where the old code produced full-scale positive. -Inf and NaN behave identically before and after on tested platforms. In practice +Inf in the audio buffer indicates an upstream bug -- volume gain (at most 3.98x for +12 dB), sinc resampler overshoot (~28% for Gibbs), and mixer summation are all finite operations on finite inputs, so a non-finite value would have to come from a misbehaving core or DSP plugin. Per-driver behavior on Inf was already inconsistent (float drivers always passed Inf through, s16 drivers caught it via the clamp); a consistent upstream check belongs above audio_driver_flush, not here. Hot path impact: on s16 drivers (most embedded targets), one fewer buffer pass per audio frame -- 2 to 4 KiB less L1 traffic per call at typical audio chunk sizes, plus skipping the SIMD/scalar clamp loop entirely. Fixes: audit finding #30
diff --git a/audio/audio_driver.c b/audio/audio_driver.c
@@ -723,10 +723,21 @@ static void audio_driver_flush(audio_driver_state_t *audio_st,
        * clamps the entire buffer (core audio + voices) as its final step,
        * so an additional pass here would be redundant.  We rely on this
        * because audio_mixer_mix() is called on the same buffer immediately
-       * above, with no intervening modifications. */
+       * above, with no intervening modifications.
+       *
+       * Likewise, when the active driver consumes int16 (the path below
+       * via convert_float_to_s16), saturation is performed by the
+       * conversion itself: the scalar fallback clamps to the s16 range,
+       * and every SIMD variant (SSE2 _mm_packs_epi32, NEON vqmovn_s32,
+       * AltiVec vec_packs, PSP/Allegrex vi2s.q) uses a saturating narrow.
+       * Skipping the float-clamp pass on the s16 path produces a
+       * bit-identical result with one fewer touch of the buffer. */
+      if (
+            (audio_st->flags & AUDIO_FLAG_USE_FLOAT)
 #ifdef HAVE_AUDIOMIXER
-      if (!(audio_st->flags & AUDIO_FLAG_MIXER_ACTIVE))
+         && !(audio_st->flags & AUDIO_FLAG_MIXER_ACTIVE)
 #endif
+         )
       {
          unsigned i              = 0;
          unsigned total_samples  = output_frames * 2; /* stereo */