You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gdi: replace per-pixel /255 divides with shift+add equivalent
The hot pixel paths in gfx_display_gdi_draw and gdi_font_render_line
do many `(uint32_t)x / 255u` operations per pixel — that's a 20-30
cycle integer divide on x86 vs a few cycles for shift+add. For a
typical Ozone-with-widgets frame:
- General 4-corner gradient: 14 divides per pixel.
- 1D gradients (vertical/horizontal): 4 divides per row/column,
plus 3 per non-opaque pixel. Less hot since the previous
commit collapsed those to 1D loops, but still worth a free
win.
- Tinted-glyph font composite: 4 divides per glyph pixel.
Add a GDI_DIV255 macro:
#define GDI_DIV255(x) ((((x) + 1) + ((x) >> 8)) >> 8)
Verified bit-exact equivalent of `(uint32_t)x / 255u` for every
input in [0, 255*255 = 65025] — a brute-force comparison against
integer division across all 65026 values produces zero diffs.
That's exactly the input range that products of two 8-bit values
land in, which is what every divide-by-255 site here computes.
Applied at every hot per-pixel /255 site:
- Gradient bilinear (general 4-corner path): 14 sites per
pixel.
- 1D gradient paths (vertical-only, horizontal-only): 4 sites
per row/column plus 3 sites per non-opaque pixel.
- Tinted-glyph font scratch composite: 4 sites per pixel.
- 1x1 translucent-solid premultiply: 3 sites per draw.
- Texture-modulated tint (out_a only): 1 site per pixel.
- Font line outer premultiply: 3 sites per line.
- gdi_load_texture / gdi_overlay_load: 3 sites per non-opaque
pixel. Load-time only, but free to apply for consistency.
Deliberately NOT changed:
- The `/ (255u * 255u)` divides for out_r/g/b in
gdi_blit_texture_modulated. Collapsing those to two
sequential GDI_DIV255 calls would introduce up to 1 LSB of
rounding error compared to the single divide, since
(a/255)*(b/255) has a different rounding boundary than
(a*b)/(255*255). The cost saving isn't worth a visible
drift in tinted-icon pixels.
- The `(x + 127) / 255` rounded form in gdi_blit_rgui_alpha.
That's deliberately round-to-nearest rather than truncate,
which GDI_DIV255 doesn't reproduce. RGUI's per-frame cost
is dominated by syscall / blit overhead, not the divides.
- The `(iy * 255u) / (dst_h - 1)` interp-factor divides.
Divisor varies per draw; not a constant-255 case.
No visual change intended. Output is byte-identical to the
divide-based code at every converted site.
0 commit comments