erasure_code: branch-free gf_mul via sentinel log + zeroed antilog by scopedog · Pull Request #421 · intel/isa-l

scopedog · 2026-06-03T22:45:00Z

Summary

The default (non-GF_LARGE_TABLES) gf_mul() does a log + antilog lookup
guarded by two branches: a (a == 0 || b == 0) zero test and a
> 254 ? i - 255 reduction wrap. This replaces it with a single,
fully branch-free table lookup:

return gff_base[gflog_base[a] + gflog_base[b]];

The polynomial is unchanged (0x11D), so results are bit-identical to
the previous gf_mul() — there is no change to encoded data.

How it stays correct without branches

Two table changes:

gflog_base is widened to uint16_t, and gflog_base[0] is set to a
sentinel 511, outside the normal log range [0, 254].
gff_base (antilog) holds two full periods, so
gflog_base[a] + gflog_base[b] (≤ 508 for nonzero operands) never needs
a reduction wrap, and is followed by a zeroed tail.

When either operand is 0, the sentinel pushes the index into the zeroed
tail and the result is 0 — no zero-guard needed. The doubled antilog
removes the reduction wrap. Max index is 511 + 511 = 1022.

Scope / compatibility

Only the default arm changes; the GF_LARGE_TABLES path is untouched.
gf_inv() is unchanged.
Public ABI unchanged.
Tables grow from 256 B + 256 B to ~1 KB (gff_base) + 512 B
(gflog_base) — still far smaller than the 64 KB GF_LARGE_TABLES
table, which this makes largely redundant (see numbers below). Only
erasure_code/ec_base.{c,h} are touched.

This speeds up the scalar multiply that feeds the base dot-product/encode
fallback paths and matrix construction (gf_gen_rs_matrix,
gf_invert_matrix); it does not touch the SIMD paths.

Performance

Microbenchmark, ns/mul (lower is better), -O2 -march=native, minimum of
6 passes, with the gf_mul(data, coeff) argument order used by the base
functions, over a table build and a region multiply (fixed coefficient,
streaming data) on dense and 50%-zero data:

                          build   dense   sparse(50% zero)
  Zen 3 (Ryzen 7 5800X)
    stock (default)       0.75    0.72    0.73
    this change           0.36    0.36    0.42
    GF_LARGE_TABLES       0.50    0.42    0.22
  Raptor Lake (i5-1340P)
    stock (default)       0.81    1.04    1.30
    this change           0.45    0.54    0.54
    GF_LARGE_TABLES       0.73    0.71    0.71

~2× the previous default everywhere (up to ~2.9× on Raptor Lake's sparse
case), and consistent across data (no data-dependent branch) where the
default zero-guard degrades on zero-heavy input. It delivers most of the
64 KB GF_LARGE_TABLES benefit at ~1/40th the memory — faster on Raptor
Lake; on Zen 3 the 64 KB table can still edge it on zero-heavy data.

Testing

gf_mul checked bit-identical to an independent carryless-multiply
reference (mod 0x11D) over all 65 536 input pairs.
gf_inv round-trips for every nonzero input.
erasure_code_test, erasure_code_update_test, gf_inverse_test,
gf_vect_mul_test, gf_vect_dot_prod_test all pass.

Provenance

The branch-free sentinel + zeroed-tail technique is from the gf-nishida-16
library: https://github.com/scopedog/gf-nishida-16

Commit carries Signed-off-by: per the DCO.

Replace the default (non-GF_LARGE_TABLES) gf_mul with a single branch-free table lookup: return gff_base[gflog_base[a] + gflog_base[b]]; Two table changes make this correct without any branch: - gflog_base is widened to uint16_t and gflog_base[0] is set to a sentinel (511) outside the normal log range [0,254]. - gff_base (antilog) holds two full periods so the index never needs a reduction wrap, followed by a zeroed tail. A zero operand sends the index into the zeroed tail, so the product is 0 without an (a == 0 || b == 0) test. This removes both the zero-guard and the reduction branch from the hottest scalar multiply, which feeds the scalar dot-product/encode fallback paths and matrix construction (gf_gen_rs_matrix, gf_invert_matrix). The polynomial is unchanged (0x11D), so results are bit-identical to the previous gf_mul. Tables grow from 256 B + 256 B to ~1 KB (gff_base) + 512 B (gflog_base), still far smaller than the 64 KB GF_LARGE_TABLES path, which this makes largely redundant. gf_inv is unchanged. Microbenchmark, ns/mul (lower is better), -O2 -march=native, min of 6 passes, gf_mul(data, coeff) order as called by the base functions. Comparing this change to the previous default (stock) and the 64 KB GF_LARGE_TABLES, over a table build and a region multiply (fixed coefficient, streaming data) on dense and 50%-zero data: build dense sparse(50% zero) Zen 3 (Ryzen 7 5800X) stock (default) 0.75 0.72 0.73 this change 0.36 0.36 0.42 GF_LARGE_TABLES 0.50 0.42 0.22 Raptor Lake (i5-1340P) stock (default) 0.81 1.04 1.30 this change 0.45 0.54 0.54 GF_LARGE_TABLES 0.73 0.71 0.71 This change is ~2x the previous default everywhere and stays consistent across data (no data-dependent branch). It delivers most of the 64 KB GF_LARGE_TABLES benefit at ~1/40th the memory: faster on Raptor Lake, while on Zen 3 the 64 KB table can still edge it on zero-heavy data. Technique from the gf-nishida-16 library (https://github.com/scopedog/gf-nishida-16). Verified: - gf_mul bit-identical to a carryless-multiply reference over all 65536 input pairs; gf_inv round-trips for every nonzero input - erasure_code_test, erasure_code_update_test, gf_inverse_test, gf_vect_mul_test, gf_vect_dot_prod_test all pass Co-Authored-By: Claude Opus 4.8 <[email protected]> Signed-off-by: Hiroshi Nishida <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

erasure_code: branch-free gf_mul via sentinel log + zeroed antilog#421

erasure_code: branch-free gf_mul via sentinel log + zeroed antilog#421
scopedog wants to merge 1 commit into
intel:masterfrom
scopedog:upstream-gfmul-nishida

scopedog commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scopedog commented Jun 3, 2026

Summary

How it stays correct without branches

Scope / compatibility

Performance

Testing

Provenance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant