Commit 930bcca
perf(blitter): inline ADDARRAY / ADD16SAT / DATA / COMP_CTRL hot path
Profile data on AvP gameplay (state6, accurate blitter) showed
ADDARRAY as the single largest leaf in the entire emulator at
1910 sample-of-stack hits, with DATA (759) and COMP_CTRL (318)
not far behind. All four are called from the BlitterMidsummer2
inner loop only, and most call sites pass compile-time-constant
flags for daddasel/daddbsel/daddmode/sat/eightbit/hicinh/etc --
ideal candidates for per-call-site specialisation through the
compiler if the bodies become visible at the call site.
This commit moves the four definitions above BlitterMidsummer2
(in the order ADD16SAT -> ADDARRAY -> COMP_CTRL -> DATA so each
sees its dependencies) and marks them
`static INLINE __attribute__((always_inline))`. No body changes;
this is purely a re-arrangement so the compiler can do dead-arm
elimination and constant propagation across the call boundary.
Removed the matching extern forward declarations now that the
definitions provide the prototype.
Measured (Apple M-series, headless `make benchmark` against the
private AvP ROM with state6 loaded, accurate blitter, 600 frames
after 60 warmup, 3-run median):
BlitterMidsummer2 + callees, sample-of-stack
before: ~5268 (BM2 2281, ADDARRAY 1910, DATA 759, COMP_CTRL 318)
after: ~4592 (BM2 absorbs the four inlinees)
AvP accurate FPS
baseline: 173-176
+ADDARRAY: 192-195
+DATA+COMP: 198-201 (~+15% net)
Fast-blitter perf unchanged (within ~3% run-to-run noise).
test_blitter_compare and the rest of `make test` pass.
Bit-exactness preserved: the function bodies are byte-for-byte
identical to the originals, only their linkage and source-file
position changed.
Addresses real-world AvP-on-RetroArch slowdown / audio-dropout
report on Apple Silicon, where the extra ~25 FPS recovers enough
budget for presentation + audio mixing to fit in 16.6 ms.
Co-Authored-By: Claude Opus 4.7 <[email protected]>1 parent d124ed9 commit 930bcca
1 file changed
Lines changed: 1738 additions & 1734 deletions
0 commit comments