@@ -23,11 +23,21 @@ Reports `Frames/sec`, `Time/frame`, total wall time. Boots the core via `dlopen
2323
2424** Instruments (Time Profiler)** is the easiest way to get a flame graph on macOS.
2525
26+ The wrapper at ` scripts/profile-mac.sh ` builds the core, runs the benchmark
27+ under ` xctrace ` , and writes a ` .trace ` bundle you can open in Instruments:
28+
29+ ``` bash
30+ scripts/profile-mac.sh # default: Time Profiler, accurate blitter
31+ scripts/profile-mac.sh --template " CPU Counters" # PMU: cycles, instructions, branch misses
32+ scripts/profile-mac.sh --rom test/roms/yarc.j64 --open # auto-open the trace
33+ ```
34+
35+ Manual invocation if you'd rather attach to a running process:
36+
2637``` bash
2738make benchmark BENCH_FRAMES=6000 BENCH_WARMUP=120 &
2839BENCH_PID=$!
2940
30- # Sample for 30 seconds, output to .trace bundle
3141xcrun xctrace record --template " Time Profiler" --attach $BENCH_PID --output bench.trace --time-limit 30s
3242open bench.trace
3343```
@@ -41,6 +51,57 @@ sample $BENCH_PID 5 -file /tmp/sample.txt
4151# 5-second sample. Read /tmp/sample.txt for collapsed call stacks.
4252```
4353
54+ ## Bespoke counters — ` BENCH_PROFILE=1 `
55+
56+ Sampling profilers tell you * where* time goes; counters tell you * how often*
57+ something happens. When you want exact iteration counts (e.g., "did my
58+ fast-path actually skip the inner loop?"), use the ` perf_counters ` system in
59+ ` src/core/perf_counters.h ` .
60+
61+ ``` bash
62+ make benchmark BENCH_PROFILE=1 BENCH_BLITTER=accurate BENCH_FRAMES=300
63+ # ...
64+ # [perf] counter dump:
65+ # [perf] blitter_phrase_writes 3034993
66+ # [perf] blitter_phrase_reads 931821
67+ # [perf] blitter_inner_io 3966814
68+ # [perf] blitter_inner 4131151
69+ # [perf] blitter_outer 337722
70+ # [perf] blitter_calls 131628
71+ ```
72+
73+ The macros are zero-overhead when ` BENCH_PROFILE ` is undefined (default
74+ build) — every ` PERF_INC ` becomes ` ((void)0) ` , every ` PERF_COUNTER `
75+ becomes a typedef. Use them freely in hot paths to instrument
76+ hypotheses.
77+
78+ Adding a counter:
79+
80+ ``` c
81+ #include " perf_counters.h"
82+
83+ PERF_COUNTER (my_event); /* file scope * /
84+
85+ void hot(void) {
86+ PERF_INC(my_event); /* in-loop * /
87+ PERF_ADD(my_event, n); /* batch * /
88+ }
89+ ```
90+
91+ The harness (`test/tools/test_benchmark.c`) calls
92+ `perf_counters_dump(stderr)` at exit; counter values appear right
93+ before the `BENCHMARK RESULTS` block.
94+
95+ When to reach for this vs. Time Profiler:
96+
97+ | Question | Tool |
98+ |---|---|
99+ | "Where are we spending cycles?" | `xctrace` Time Profiler |
100+ | "How many times does the inner loop run per frame?" | `BENCH_PROFILE=1` |
101+ | "What fraction of inner iterations are no-ops?" | `BENCH_PROFILE=1` |
102+ | "Are we hitting L1 / branch-mispredicting?" | `xctrace` CPU Counters |
103+ | "Did this optimization change behavior, not just timing?" | `BENCH_PROFILE=1` (deltas in counts) |
104+
44105## Linux — `perf` + flamegraph
45106
46107```bash
0 commit comments