Commit ebca788
feat(decompress): optimize offset 13 loop unrolling (#379)
Unrolls the `decompress_offset_13` loop to 8 stores (104 bytes per iteration)
instead of 4 stores (52 bytes). This reduces loop overhead for long matches
and improves throughput by ~4.1%.
Benchmark (Decompress offset13/libdeflate-rs offset13):
Before: ~8.03 GiB/s
After: ~8.36 GiB/s
Change: +4.1%
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>1 parent 15c4d16 commit ebca788
1 file changed
Lines changed: 17 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
963 | 963 | | |
964 | 964 | | |
965 | 965 | | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
966 | 983 | | |
967 | 984 | | |
968 | 985 | | |
| |||
0 commit comments