pccl

A point cloud compression library (positions only) with a tiny public API. The caller passes a frame of float xyz positions, picks one of four presets, and gets back a self-describing byte buffer. The receiver calls decode() on those bytes and gets the cloud back.

Ultra-low latency compared to MPEG G-PCC compression (100-300x faster encoding / decoding) at the cost of slightly worse compression ratio (between 1.1x and 2.5x the MPEG G-PCC compressed size depending on the cloud). Most LIDAR point clouds will encode in a few milliseconds at most ; a 3.2 million point PCL encodes to 9 megabytes in ~32 milliseconds and decodes in ~10 milliseconds on a 5950X CPU. On smaller point clouds (for instance an Orbbec scan) compression within a 60FPS frame budget is possible.

Maximum input precision is 21 bits per axis, way above the actual precision of an actual LIDAR.

Building

cmake -S . -B build -GNinja
cmake --build build
ctest --test-dir build

Build options:

Option	Default	Effect
`PCCL_BUILD_TESTS`	ON	Build the Catch2 test suite.
`PCCL_BUILD_BENCHMARKS`	ON	Build the benchmark and explorer tools (needs zlib, zstd, lz4, brotli, snappy, lzo system packages; miniply is vendored).
`PCCL_ENABLE_LTO`	ON	Enable interprocedural optimisation.
`PCCL_NATIVE_ARCH`	ON	Tune for the host CPU (`-march=native` on x86_64, `-mcpu=native` on aarch64). Turn OFF for portable CI builds.

Compile flags applied to library and test code are -O3 -ffp-contract=fast plus the arch-tuning flag selected by PCCL_NATIVE_ARCH. The project does not use -Ofast by default because, in our measurements, -ffast-math regressed std::unique codegen under gcc and forced a workaround for an infinity-sentinel pattern in the AABB reduction. The sentinel was rewritten anyway (it would have been a latent bug under any -ffinite-math-only build), so a build with -Ofast now works correctly; it is not faster in practice for this workload.

Quick start

Dependencies: C++23, CMake, boost. Multiple compression libraries available for benchmarks (zstd etc.).

The API is two stateful classes that own their scratch buffers. Reuse one Encoder across the whole stream so the steady state stops allocating after the first frame.

#include <pccl/pccl.hpp>
#include <span>

pccl::Encoder enc;                       // reuse across frames
pccl::AlignedVec<std::uint8_t> bytes;    // reuse across frames

pccl::EncodeOptions opts;
opts.preset = pccl::Preset::Balanced;    // Fast, Balanced, Small, Smallest
opts.bits   = 16;                        // 1..21 bits per axis

for (const auto& frame : frames)
{
  // xyz is a flat span of floats whose size is a multiple of 3.
  std::span<const float> xyz{frame.xyz, frame.num_points * 3};
  pccl::EncodeStats stats;
  enc.encode(xyz, opts, bytes, &stats);
  send(bytes.data(), bytes.size());
}

// On the receiver:
pccl::Decoder dec;                       // reuse across frames
pccl::AlignedVec<float>  decoded;        // reuse across frames
pccl::FrameMetadata meta;

while (auto packet = recv())
{
  std::span<const std::uint8_t> data{packet.data, packet.size};
  dec.decode(data, decoded, &meta);
  // decoded has 3 * meta.unique_voxels floats
}

For streaming, set a shared AABB explicitly so consecutive frames quantise to the same grid:

opts.aabb_min   = {-1.f, -1.f, -1.f};
opts.aabb_max   = { 1.f,  1.f,  1.f};
opts.aabb_valid = true;

pccl::peek_header() reads the bitstream metadata without touching the payload, useful for a relay that wants to route or transcode without decoding.

Presets

Each preset is a fixed pipeline. All four voxelise the input (collapse duplicate quantised points). The receiver gets unique_voxels floats, not the original num_points.

Preset	Pipeline	When to pick
`Fast`	TurboPFor `vbxenc64` (delta + variable byte on u64)	Lowest decode latency. Best ratio per decode millisecond at low/mid bit depth.
`Balanced`	TurboPFor `p4ndenc64` on sorted Morton	Densest output at mid/high bit depth.
`Small`	delta + LEB128 + FSE entropy coding	Best size/speed compromise when decode budget is tight.
`Smallest`	octree occupancy bytes + AVX2 8-way interleaved rANS	Absolute smallest output. Encode/decode are 10× the cost of the others.

At low bit depths (10-12) Fast becomes attractive: smaller decode-time-per-byte than Balanced. At high bit depths (>=20) Balanced wins on size. See the per-bit numbers below for the actual tradeoff.

Wire format

Little-endian throughout.

Offset	Bytes	Field
0	4	magic `'P','C','C','1'`
4	1	preset (0..3)
5	1	bits per axis (1..21)
6	2	flags (reserved, must be 0)
8	4	input_points (caller's count, for stats only)
12	4	unique_voxels (count the receiver reconstructs)
16	24	aabb: 6 little-endian float32, min[3] then max[3]
40	4	payload_size
44	payload_size	preset-specific payload

decode() validates magic, preset range, bits range, and payload bounds; it throws std::runtime_error on any malformed input.

Benchmarks

Hardware: AMD 5950X (AVX2). Build: gcc 16.1 release, default flags listed above.

Input: 14 Ladybrook Road 10.ply, 3 202 232 points, 36.65 MiB of raw float xyz (12 B per point). The benchmark loads with miniply, computes the AABB once, then runs each preset and reports per-stage timings. Numbers below are medians of three runs.

Compression ratio and encode time

bits/axis	preset	bytes	B/point	ratio vs raw	enc (ms)	dec (ms)
10	Fast	3 468 925	1.08	0.090	26	4
10	Balanced	2 626 992	0.82	0.068	38	18
10	Small	2 239 540	0.70	0.058	31	11
10	Smallest	2 214 719	0.69	0.058	143	120
14	Fast	9 078 152	2.83	0.236	32	10
14	Balanced	7 569 295	2.36	0.197	45	25
14	Small	7 897 539	2.47	0.206	53	31
14	Smallest	7 533 729	2.35	0.196	434	412
16	Fast	12 457 251	3.89	0.324	34	12
16	Balanced	9 970 344	3.11	0.259	34	14
16	Small	10 513 751	3.28	0.274	48	28
16	Smallest	10 068 355	3.14	0.262	563	544
20	Fast	18 570 532	5.79	0.483	29	8
20	Balanced	14 774 637	4.61	0.384	35	14
20	Small	15 618 329	4.87	0.406	54	36
20	Smallest	15 100 113	4.71	0.393	844	824

Pick-by-use-case

bits=10 (dense voxel grid, lots of duplicates collapsed. Good enough for tracking):

Preset	Output	Encode	Decode	Notes
Fast	3.47 MB	26 ms	4 ms	Best decode-time-per-byte.
Small	2.24 MB	31 ms	11 ms	Best size/speed compromise.
Balanced	2.63 MB	38 ms	18 ms	Dominated by Small here. Don't pick.
Smallest	2.21 MB	143 ms	120 ms	1% smaller than Small at 11× the cost.

bits=14 (live-streaming sweet spot for lidar/photogrammetry):

Preset	Output	Encode	Decode	Notes
Fast	9.08 MB	32 ms	10 ms	Vetter than Small at this depth.
Balanced	7.57 MB	45 ms	25 ms	Best size when 25 ms decode fits.
Small	7.90 MB	53 ms	31 ms	Slower and bigger than Balanced.
Smallest	7.53 MB	434 ms	412 ms	0.5% smaller than Balanced; not worth it.

bits=16 (default quality):

Preset	Output	Encode	Decode	Notes
Fast	12.5 MB	34 ms	12 ms	Fastest decode.
Balanced	9.97 MB	34 ms	14 ms	Same encode time as Fast, 20% smaller.
Small	10.5 MB	48 ms	28 ms	Dominated by Balanced.
Smallest	10.1 MB	563 ms	544 ms	1% smaller than Balanced at 16× the cost.

bits=20 (lossy archival, voxel grid very sparse):

Preset	Output	Encode	Decode
Fast	18.6 MB	29 ms	8 ms
Balanced	14.8 MB	35 ms	14 ms
Small	15.6 MB	54 ms	36 ms
Smallest	15.1 MB	844 ms	824 ms

Decision summary:

bits ≤ 12: pick Fast for the streaming path, Small if you need the absolute smallest at sub-30-ms encode.
bits 14-16: Fast if decode time matters more than wire bytes; Balanced otherwise.
bits ≥ 18: Balanced wins on size with the smallest size/speed penalty; Fast still has the cheapest decoder.
Smallest: as useful as the placebo x264 preset.

Comparison vs MPEG G-PCC

Reference implementation: mpeg-pcc-tmc13 release-v23.0-rc2 (the official ISO/IEC MPEG G-PCC encoder). Both codecs built with -O3 -march=native -ffp-contract=fast. tmc13 was given an inputScale that matches each pccl bit-depth. Geometry-only on both sides; medians of three runs.

Example 1 (3.2M points, architectural):

bits	codec	enc (ms)	dec (ms)	output	size vs tmc13
10	tmc13	2106	803	1.46 MB	1.00×
10	pccl Fast	25	4	3.47 MB	2.38×
10	pccl Balanced	39	19	2.63 MB	1.80×
10	pccl Smallest	125	105	2.21 MB	1.52×
14	tmc13	4616	2758	6.50 MB	1.00×
14	pccl Fast	33	11	9.08 MB	1.39×
14	pccl Balanced	45	25	7.57 MB	1.16×
14	pccl Smallest	383	362	7.53 MB	1.15×
16	tmc13	5719	3724	8.94 MB	1.00×
16	pccl Fast	32	12	12.5 MB	1.39×
16	pccl Balanced	32	13	9.97 MB	1.11×
16	pccl Smallest	426	404	10.07 MB	1.12×

Example 2 (150K points with majority duplicates, LIDAR scan of a group of persons):

bits	codec	enc (ms)	dec (ms)	output	size vs tmc13
14	tmc13	65	27	52.8 KB	1.00×
14	pccl Balanced	6	0.1	60.7 KB	1.14×
14	pccl Smallest	10	3	61.3 KB	1.16×
16	tmc13	71	33	67.1 KB	1.00×
16	pccl Balanced	8	0.1	74.8 KB	1.11×

Determinism

encode() is deterministic at the decoded-result level: two encode calls with the same input produce bitstreams that decode to the same cloud. The encoded bytes themselves can differ in a handful of positions when TurboPFor (used by the Balanced preset) picks among several legal block encodings, but the decoded result does not.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
3rdparty		3rdparty
benchmarks		benchmarks
include/pccl		include/pccl
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pccl

Building

Quick start

Presets

Wire format

Benchmarks

Compression ratio and encode time

Pick-by-use-case

Comparison vs MPEG G-PCC

Determinism

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pccl

Building

Quick start

Presets

Wire format

Benchmarks

Compression ratio and encode time

Pick-by-use-case

Comparison vs MPEG G-PCC

Determinism

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages