Hello!
I would like to propose an optimization for XZ decompression. Currently, XZ decompression is performed sequentially in a single thread. We have implemented a concurrent parallel block decompressor in a fork of the underlying dependency at github.com/unxed/xz (which functions as a drop-in replacement).
By leveraging the XZ format's native index block boundaries, the parallel decompressor parses the index backwards in O(1) time and decompresses independent blocks concurrently using a worker pool. To keep memory utilization bounded and prevent Garbage Collector overhead, we also introduced sync.Pool pooling for both decompressed block buffers and the large LZMA decoder dictionary slices.
Benchmarks (on a 2-core / 4-thread CPU with a 20MB payload):
- Vanilla decompressor: ~14 MB/s
- Optimized sequential decompressor: ~30 MB/s
- Optimized parallel decompressor: ~70 MB/s
On systems with 4, 8, or more physical cores, the throughput scales near-linearly, easily exceeding 120+ MB/s.
We can utilize this optimization when the input source implements io.ReaderAt and the total compressed stream size is known.
Integration Instruction
To utilize the parallel reader within your Go code when dealing with random-access inputs, you can check if the underlying stream supports seeking and pass it to the parallel decompressor.
package main
import (
"io"
"os"
"github.com/unxed/xz"
)
// DecompressXZ reads an XZ stream. If the input is seekable, it leverages
// multi-core parallel block decompression. Otherwise, it falls back to sequential.
func DecompressXZ(r io.Reader) (io.Reader, error) {
// 1. Check if the input implements io.ReaderAt and is a statable file
if rAt, ok := r.(io.ReaderAt); ok {
if file, ok := r.(*os.File); ok {
fi, err := file.Stat()
if err == nil {
// Use the highly-concurrent parallel reader
return xz.ReaderConfig{}.NewParallelReader(rAt, fi.Size())
}
}
}
// 2. Fall back to standard sequential decompression if streaming from a pipe
return xz.NewReader(r)
}
CC mholt/archives#75
Hello!
I would like to propose an optimization for XZ decompression. Currently, XZ decompression is performed sequentially in a single thread. We have implemented a concurrent parallel block decompressor in a fork of the underlying dependency at
github.com/unxed/xz(which functions as a drop-in replacement).By leveraging the XZ format's native index block boundaries, the parallel decompressor parses the index backwards in O(1) time and decompresses independent blocks concurrently using a worker pool. To keep memory utilization bounded and prevent Garbage Collector overhead, we also introduced
sync.Poolpooling for both decompressed block buffers and the large LZMA decoder dictionary slices.Benchmarks (on a 2-core / 4-thread CPU with a 20MB payload):
On systems with 4, 8, or more physical cores, the throughput scales near-linearly, easily exceeding 120+ MB/s.
We can utilize this optimization when the input source implements
io.ReaderAtand the total compressed stream size is known.Integration Instruction
To utilize the parallel reader within your Go code when dealing with random-access inputs, you can check if the underlying stream supports seeking and pass it to the parallel decompressor.
CC mholt/archives#75