Commit 183fc18
Optimize
Optimized the decompression hot path for Offset 25 by specializing `decompress_offset_alignr_cycle` for `SHIFT=7`.
The loop was unrolled to a stride of 96 bytes (6 vectors).
The serial dependency chain of `alignr` instructions was optimized by computing vectors `v_next2`, `v_next4`, and `v_next5` using accumulated shift constants (e.g., using shift 14 on `v_prev` and `v_align` directly instead of relying on `v_next1`). This reduces the dependency depth and increases instruction-level parallelism.
Performance Impact:
- Throughput for `Decompress offset25` improved by ~1.4% (from ~10.07 GiB/s to ~10.23 GiB/s).
- Verified correctness with `cargo test`.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>decompress_offset_alignr_cycle for Offset 25 (Shift 7) (#381)1 parent 3bd7ff7 commit 183fc18
1 file changed
Lines changed: 22 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
442 | 442 | | |
443 | 443 | | |
444 | 444 | | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
445 | 467 | | |
446 | 468 | | |
447 | 469 | | |
| |||
0 commit comments