Commit 41115a1
Optimize decompression for offset 6 using AVX2 pshufb
This commit implements a specialized SIMD optimization for `offset == 6` in `decompress_bmi2`.
By using `_mm_shuffle_epi8` with precomputed cyclic masks (`OFFSET6_MASKS`), we can replicate the 6-byte repeating pattern into 16-byte vectors and process data in 48-byte chunks (LCM of 6 and 16).
This avoids the slow scalar fallback loop for offsets < 8.
Benchmark results (bench_decompress_offset6_micro):
- Baseline: ~2.06 GiB/s
- Optimized: ~10.75 GiB/s
- Improvement: +423%
Also added `bench_decompress_offset6_micro` to `benches/bench_main.rs` to verify and track this optimization.
Co-authored-by: 404Setup <[email protected]>1 parent 656c1b6 commit 41115a1
2 files changed
Lines changed: 95 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1443 | 1443 | | |
1444 | 1444 | | |
1445 | 1445 | | |
| 1446 | + | |
1446 | 1447 | | |
1447 | 1448 | | |
1448 | 1449 | | |
| |||
1680 | 1681 | | |
1681 | 1682 | | |
1682 | 1683 | | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
| 1702 | + | |
| 1703 | + | |
| 1704 | + | |
| 1705 | + | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
58 | 64 | | |
59 | 65 | | |
60 | 66 | | |
| |||
989 | 995 | | |
990 | 996 | | |
991 | 997 | | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
992 | 1053 | | |
993 | 1054 | | |
994 | 1055 | | |
995 | 1056 | | |
996 | 1057 | | |
997 | | - | |
| 1058 | + | |
998 | 1059 | | |
999 | 1060 | | |
1000 | 1061 | | |
| |||
0 commit comments