Skip to content

Support parallel multi-core XZ decompression when input is seekable (io.ReaderAt) #470

Description

@unxed

Hello!

I would like to propose an optimization for XZ decompression. Currently, XZ decompression is performed sequentially in a single thread. We have implemented a concurrent parallel block decompressor in a fork of the underlying dependency at github.com/unxed/xz (which functions as a drop-in replacement).

By leveraging the XZ format's native index block boundaries, the parallel decompressor parses the index backwards in O(1) time and decompresses independent blocks concurrently using a worker pool. To keep memory utilization bounded and prevent Garbage Collector overhead, we also introduced sync.Pool pooling for both decompressed block buffers and the large LZMA decoder dictionary slices.

Benchmarks (on a 2-core / 4-thread CPU with a 20MB payload):

  • Vanilla decompressor: ~14 MB/s
  • Optimized sequential decompressor: ~30 MB/s
  • Optimized parallel decompressor: ~70 MB/s

On systems with 4, 8, or more physical cores, the throughput scales near-linearly, easily exceeding 120+ MB/s.

We can utilize this optimization when the input source implements io.ReaderAt and the total compressed stream size is known.


Integration Instruction

To utilize the parallel reader within your Go code when dealing with random-access inputs, you can check if the underlying stream supports seeking and pass it to the parallel decompressor.

package main

import (
	"io"
	"os"

	"github.com/unxed/xz"
)

// DecompressXZ reads an XZ stream. If the input is seekable, it leverages
// multi-core parallel block decompression. Otherwise, it falls back to sequential.
func DecompressXZ(r io.Reader) (io.Reader, error) {
	// 1. Check if the input implements io.ReaderAt and is a statable file
	if rAt, ok := r.(io.ReaderAt); ok {
		if file, ok := r.(*os.File); ok {
			fi, err := file.Stat()
			if err == nil {
				// Use the highly-concurrent parallel reader
				return xz.ReaderConfig{}.NewParallelReader(rAt, fi.Size())
			}
		}
	}

	// 2. Fall back to standard sequential decompression if streaming from a pipe
	return xz.NewReader(r)
}

CC mholt/archives#75

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions