Commit d120e2c
Optimize adler32 tail processing
Optimized `adler32_tail!` macro in `src/adler32/x86.rs` to handle small tails (1, 2, 3 bytes) using explicit unrolled accumulation logic instead of sequential conditional branches.
Performance impact (bench_adler32_tail):
- 1 byte: -8.3% time (faster)
- 2 bytes: -5.2% time (faster)
- 3 bytes: +3.8% time (slower)
- 0 bytes (implicit): Faster due to branch prediction or single jump.
This optimization improves throughput for very small unaligned tails, common in chunked processing.
Co-authored-by: 404Setup <[email protected]>1 parent bb9bcc7 commit d120e2c
2 files changed
Lines changed: 45 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
419 | 419 | | |
420 | 420 | | |
421 | 421 | | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
422 | 446 | | |
423 | 447 | | |
424 | 448 | | |
| |||
1412 | 1436 | | |
1413 | 1437 | | |
1414 | 1438 | | |
| 1439 | + | |
1415 | 1440 | | |
1416 | 1441 | | |
1417 | 1442 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
82 | 84 | | |
83 | 85 | | |
84 | 86 | | |
| |||
0 commit comments