bintools

bintools is a small, stdlib-only collection of command-line utilities for examining, comparing, and experimenting with binary files and byte streams.

The tools help answer practical questions such as:

Which offsets are stable across a set of related files?
Which bytes changed between two versions of a file?
Where did known input bytes appear in an output artifact?
Which offsets look like candidate counters, lengths, timestamps, offsets, or other scalar fields?
Does a proposed JSON layout description fit a collection of samples?
Can a validated layout be exported as an overlay for a hex viewer?

bintools produces evidence and candidate results. It does not attempt automatic file type detection, complete format discovery, parsing, decompression, checksum identification, or full layout inference.

Current Status

Current status: public draft / 0.1.

The core commands are implemented and tested on synthetic corpora. The project has unit tests, shell integration checks, and runnable example workflows. The runtime package has no third-party dependencies and requires Python 3.10 or newer.

This is still an early public release:

CLI text output may change.
Heuristic tools report candidates, not conclusions.
The example corpora are deliberately small and synthetic.
GitHub Actions CI runs on pull requests and pushes to main, installing from a clean checkout across the supported Python versions, verifying installed console scripts, and running lint, coverage-backed pytest, and shell integration checks.
Coverage is enforced in CI and can also be measured locally through the validation suite (see Development below). An opt-in stress lane exercises the tools at scale to catch algorithmic blowups, memory pathologies, and hangs, and an opt-in performance lane guards against complexity-class regressions (an O(n) path silently becoming O(n^2)) with scale-relative checks; a mutation-test lane is not yet part of the regular validation suite.

The intended contract is: small tools, explicit inputs, reproducible outputs, and conservative claims.

What It Is For

Use bintools when you have related binary artifacts and want to interrogate how they differ.

Common uses include:

comparing outputs produced from controlled inputs
separating stable bytes from variable bytes across a corpus
finding copied, truncated, repeated, hex-encoded, or base64-encoded payloads
checking whether candidate length or count fields track known manifest values
masking known noisy byte ranges and confirming what differences remain
validating a layout hypothesis against one or more sample files
exporting layout overlays for the sibling multihex viewer

The tools are intentionally composable. A typical workflow uses several of them together rather than expecting one command to explain a file by itself.

What It Is Not

bintools is not:

a file type detector
an automatic format discovery engine
a parser generator
a checksum or compression recognizer
a disassembler
a GUI
a format database
a replacement for human validation

Some tools are useful when investigating unknown formats, but the project is better described as a toolkit for examining binary data than as a reverse engineering or file format detection system.

Most commands read their input files fully into memory. bintools is intended for small-to-moderate artifacts, controlled corpora, fixtures, and focused investigations, not huge disk images or streaming analysis.

Installed Tools

The package installs nine command-line tools:

Tool	Purpose	Result type
`genbytes`	Generate deterministic binary inputs or an input corpus with a manifest.	exact generation
`bindiffmap`	Build byte-level and bit-level stability maps across many sample files.	exact map
`mapranges`	Collapse map bytes into contiguous labeled spans.	exact summary
`bindelta`	Compare one base binary against one or more changed binaries.	exact diff, optional alignment heuristic
`fieldscan`	Find candidate scalar fields at fixed offsets.	candidate evidence
`payloadscan`	Find where input bytes appear in related output artifacts.	exact matches plus candidate observations
`maskdiff`	Compare files after masking known noisy ranges.	exact comparison
`offsetstats`	Compute per-offset and per-range corpus statistics.	statistics and heuristic labels
`layoutcheck`	Validate a user-written JSON layout against sample files and optionally export an overlay.	validation result

See TOOLS.md for detailed usage notes and caveats for each command.

Installation

Install from a checkout:

python3 -m pip install -e .

Install development dependencies too:

python3 -m pip install -e .[dev]

The project requires Python 3.10 or newer and has no runtime third-party dependencies.

During local development, commands can also be run as modules without installing:

PYTHONPATH=src python3 -m bintools.bindiffmap --help
PYTHONPATH=src python3 -m bintools.mapranges --help
PYTHONPATH=src python3 -m bintools.genbytes --help
PYTHONPATH=src python3 -m bintools.bindelta --help
PYTHONPATH=src python3 -m bintools.fieldscan --help
PYTHONPATH=src python3 -m bintools.payloadscan --help
PYTHONPATH=src python3 -m bintools.maskdiff --help
PYTHONPATH=src python3 -m bintools.offsetstats --help
PYTHONPATH=src python3 -m bintools.layoutcheck --help

Quick Start

Generate one deterministic input file:

genbytes --length 16 --pattern fill --fill 0x41 --out probe.bin

Generate a small corpus and manifest:

genbytes --lengths 0-256:16 --pattern zeros,increment,random \
         --seeds 1,2 --replicates 2 --out-dir corpus/

genbytes writes inputs under corpus/inputs/ and records the experiment plan in corpus/manifest.csv. It does not run any target program. A separate runner or script should execute the target once per manifest row, write each output to the recorded output_path, and fill in the status fields.

After you have related output files, build common maps:

bindiffmap --mode eq     corpus/outputs/*.out -o stable.eq
bindiffmap --mode values corpus/outputs/*.out -o stable.values
bindiffmap --mode zero   corpus/outputs/*.out -o always-zero.map
bindiffmap --mode fixed  corpus/outputs/*.out -o stable-bits.map

Summarize stable and variable regions:

mapranges stable.eq --ref stable.values

Compare two concrete outputs:

bindelta base.bin changed.bin
bindelta --mode offset --bytes --xor base.bin changed.bin

Look for candidate scalar fields:

fieldscan --manifest corpus/manifest.csv --path-column output_path \
          --expect input_len=length corpus/outputs/*.out

Find copied payload bytes:

payloadscan --manifest corpus/manifest.csv \
            --input-column input_path --output-column output_path

Validate a proposed layout:

layoutcheck --layout layout.json corpus/outputs/*.out

Export an overlay for one sample:

layoutcheck --layout layout.json sample.bin --overlay-out sample.overlay.json
multihex sample.bin --layout sample.overlay.json

A Runnable First Workflow

The repository ships a small synthetic corpus generator so you can try the tools without supplying your own files. It writes four related sample files using a toy format with a 3-byte ODD identifier, a big-endian count, and intentionally unaligned fields:

python3 tests/integration/generators/make_odd_corpus.py /tmp/odd
cd /tmp/odd

Build an equality map and a values map:

bindiffmap --mode eq     odd_*.bin -o stable.eq
bindiffmap --mode values odd_*.bin -o stable.values

Collapse the equality map into labeled spans:

mapranges stable.eq --ref stable.values

The full walkthrough - from samples to evidence, layout hypothesis, validation, and overlay export - is in docs/workflows/unknown-format-walkthrough.md.

Worked Documentation

The main workflow guides are:

Unknown-format walkthrough - raw samples to evidence, layout hypothesis, layoutcheck, and overlay JSON.
Comparing related files - stable vs. variable regions, masking known noise, and candidate scalar fields.
Validating layouts - writing a layout spec, reading failure diagnostics, strict EOF, and allowed trailing data.
Generating overlays - exporting layout-overlay-v1 JSON for the sibling multihex viewer.

The repository also includes runnable example workflows under examples/workflows/. These generate small deterministic artifacts, run relevant bintools commands, explain the evidence, and assert a few stable facts.

Run all example workflows:

examples/workflows/run_all.sh

Run one workflow:

examples/workflows/checksum/run.sh

Keep generated files for inspection:

KEEP_WORK=1 examples/workflows/checksum/run.sh

Example Generators

examples/generators/ contains small deterministic target programs used by the example workflows. They are not installed as bintools commands.

Included generators cover text-like and binary-like outputs:

CSV, JSON, and SVG wrappers around input text
PPM, BMP, and WAV artifact generation
length-prefixed records
offset tables
checksum trailers
simple run-length encoding

These examples are fixtures. They are useful for learning and regression checks, but they are not a claim that real-world formats all have magic values, fixed headers, u32 fields, aligned payloads, checksums, or counts.

How To Interpret Results

bintools deliberately separates exact observations from candidates.

Exact tools report facts about the files you provided:

bindiffmap reports which bytes or bits agree across the current corpus.
mapranges reports contiguous spans from a map.
bindelta reports concrete differences between files.
maskdiff reports whether differences remain after selected masks.
layoutcheck reports whether a supplied layout fits each sample.

Heuristic tools narrow the search space:

fieldscan reports scalar-field candidates.
payloadscan reports exact byte matches plus cross-pair observations.
offsetstats reports statistics and heuristic labels.

A stable byte is not proof that a field is always constant. A candidate length is not proof of meaning. A passing layout check proves only that the supplied layout fits the supplied samples under the expressed rules.

Good practice is to vary one input property at a time, keep the manifest, compare repeated identical inputs for determinism, and validate any candidate meaning with new samples.

Command Notes

`genbytes`

Creates deterministic binary inputs. In single mode it writes one sample to stdout or a file. In corpus mode it creates inputs/ plus manifest.csv.

Supported patterns include zeros, fill, increment, random, and ascii. Seeded random output uses a deterministic SHA-256 counter-mode stream, so it is reproducible across machines and Python versions.

`bindiffmap`

Compares many files and writes a byte map or bit map.

Common modes:

eq - 0xff where every input has the same byte, otherwise 0x00
values - the stable byte value where inputs agree, otherwise 0x00
zero - 0xff where every input byte is 0x00, otherwise 0x00
and - bitwise AND across all inputs
or - bitwise OR across all inputs
fixed - each bit is 1 if that bit is identical across all inputs

By default, differing-length files are compared up to the shortest length. --pad extends shorter files to the longest length using --pad-byte, but be careful with zero and values maps because synthetic padding can look like real observed bytes.

`mapranges`

Converts a map into contiguous spans. For an eq map, the defaults are:

0xff -> STABLE
0x00 -> VARIABLE
any other byte -> PARTIAL

When --ref is provided, invariant spans include hex and ASCII content. For non-invariant spans, --show-variable-bytes can show the reference sample value while marking it as varying sample data.

`bindelta`

Compares a base file against one or more comparison files.

Modes:

auto - equal-length files use fixed-offset comparison; unequal lengths use alignment-aware comparison
offset - compare bytes at fixed positions
align - use an alignment hunk model for insertions, deletions, and replacements

Exit status follows cmp-style behavior: 0 for identical, 1 for different, and 2 for command or file errors.

`fieldscan`

Scans fixed offsets across a corpus, decodes bytes there as integers, and reports offsets whose values look meaningful. Widths 1, 2, 4, and 8 are supported with little-endian or big-endian decoding and signed or unsigned interpretation.

Candidate types include manifest matches, file-size matches, Unix-time-like values, absolute-offset-like values, size or length candidates, constants, zeros, monotonic values, small integers, and low-variance fields.

Every row is a scored candidate. Confirm semantics with other tools and new samples.

`payloadscan`

Takes input/output artifact pairs and finds where input bytes appear in output artifacts. It can report copied, truncated, split, repeated, hex-encoded, or base64-encoded payloads.

Pairs may be supplied directly with --pair INPUT:OUTPUT or through a manifest. Short coincidental matches are suppressed by --min-run, which defaults to 4. Use a lower value when deliberately investigating very small copied payloads.

`maskdiff`

Compares binary artifacts while ignoring byte ranges you mark as noisy, such as timestamps, counters, checksums, or reserved fields.

Mask ranges are half-open [start, end) and can be supplied on the command line or in a mask file. Masked bytes may be treated as wildcards or normalized to a chosen byte before comparison.

maskdiff proves only that the selected ranges explain the selected differences. It does not prove the meaning of those ranges.

`offsetstats`

Scans a corpus offset-by-offset and reports how each byte position behaves: presence, unique-value count, min/max, most-common value, zero ratio, printable ratio, control ratio, mean, variance, entropy, and heuristic labels.

With --ranges, adjacent like-behaving offsets collapse into spans. This is a useful first look at a corpus, but labels are statistical hints, not semantic proof.

`layoutcheck`

Validates a user-written JSON layout against one or more sample files. It checks fixed bytes, strings, scalar integers, length-prefixed payloads, count-driven repeats, padding, EOF behavior, and trailing-data policy.

A layout can be strict about EOF or allow trailing bytes. --dump-fields reports parsed values. --csv and --json write structured validation results. --overlay-out PATH writes a bintools.layout-overlay version 1 document for a single binary.

layoutcheck checks whether samples fit a layout you wrote. It does not discover the layout.

Layout Overlays

layoutcheck --overlay-out exports bintools.layout-overlay version 1 JSON. The schema is documented in docs/layout-overlay-v1.md.

The sibling multihex viewer can load these overlays to color and label byte ranges while displaying the file. The overlay records resolved offsets, lengths, field paths, decoded values where available, validation status, and diagnostics.

Development

Install development dependencies:

python3 -m pip install -e .[dev]

Run unit tests:

python3 -m pytest

Run linting:

ruff check .

Measure test coverage (opt-in; requires the dev extra above):

python3 -m coverage run -m pytest
python3 -m coverage report

Coverage is configured in pyproject.toml (branch coverage on, with a fail_under floor) and is deliberately kept out of the default python3 -m pytest invocation, so the fast lane needs no coverage plugin. The validation wrapper runs the same measurement as its coverage lane (see below) and writes coverage.xml and an htmlcov/ report.

Run shell integration checks:

scripts/integration/run_all.sh

Run the opt-in stress lane (scale/resource survival tests):

scripts/stress/run_all.sh                       # CI-default tier
BINTOOLS_STRESS_TIER=local scripts/stress/run_all.sh   # heavier local soak

The stress lane exercises the tools at scale (wide corpora, large files, near-cap align inputs) to catch algorithmic blowups, memory pathologies, and hangs. It is deselected from the default pytest and coverage runs via the stress marker, and its budgets are scale-relative (loose time ratios plus a memory-peak bound), not absolute wall-clock numbers. Scale is env-driven: BINTOOLS_STRESS_TIER=ci|local plus per-dimension overrides BINTOOLS_STRESS_L, _N, _P, and _ALIGN. This lane is about survival under scale, not performance-regression benchmarking or malformed-input fuzzing.

Run the opt-in performance lane (scale-relative complexity guards):

scripts/performance/run_all.sh                       # small tier
BINTOOLS_PERF_SCALE=large scripts/performance/run_all.sh   # heavier local soak

The performance lane catches complexity-class regressions (an O(n) path silently becoming O(n^2)) and catastrophic slowdowns for the high-cost-dimension tools (bindelta, bindiffmap, offsetstats, fieldscan, payloadscan). It is deselected from the default pytest and coverage runs via the performance marker. It makes no absolute wall-clock assertions: each test measures a tool at a base and a ~10x-larger input and asserts the runtime ratio did not jump a complexity class, backed by a uniform hang ceiling. Scale is env-driven (BINTOOLS_PERF_SCALE=small|large). This is the complement of the stress lane: timing/complexity, not survival. A printed per-tool timing summary closes each run; the script warns if coverage instrumentation (COVERAGE_PROCESS_START) would distort the numbers.

Run example workflows:

examples/workflows/run_all.sh

Run the local validation wrapper:

scripts/run-full-test-suite.sh

The validation wrapper runs the implemented local lanes, including the coverage lane when coverage.py is installed, and reports skipped lanes for validation layers that are opt-in or not yet present. Pass --include-stress to run the stress lane (at the CI-default tier) or --include-performance to run the performance lane (at the small tier) as part of the wrapper; --all runs every opt-in lane.

GitHub Actions CI

The primary GitHub Actions workflow runs on pull requests and pushes to main. It tests Python 3.10 through 3.14 on Ubuntu, installs .[dev] from a clean checkout, verifies that the installed console scripts match pyproject.toml and respond to --help, then runs:

ruff check .
scripts/coverage/run_coverage.sh
scripts/integration/run_all.sh

scripts/coverage/run_coverage.sh runs the default pytest suite, so packaging and SPDX/license-header tests are included in the automatic path.

The stress workflow is available through manual dispatch and also runs weekly at the ci tier. The performance workflow is manual only and should be treated as a scale-relative guardrail, not a precise benchmark gate.

To mirror the automatic CI checks locally from a fresh environment:

python3 -m pip install -e .[dev]
for tool in bindiffmap bindelta genbytes mapranges fieldscan payloadscan maskdiff offsetstats layoutcheck; do
  "$tool" --help >/dev/null
done
ruff check .
scripts/coverage/run_coverage.sh
scripts/integration/run_all.sh

Repository Layout

src/bintools/
  bindiffmap.py        byte/bit stability map builder
  bindelta.py          concrete binary diff reporter
  genbytes.py          deterministic input and corpus generator
  mapranges.py         map span summarizer
  fieldscan.py         candidate scalar-field finder
  payloadscan.py       input-bytes-in-output locator
  maskdiff.py          masked binary comparison reporter
  offsetstats.py       per-offset and per-range corpus statistics
  layoutcheck.py       declarative layout validator
  overlay_export.py    layoutcheck to layout-overlay-v1 exporter
  layout_overlay_v1.py layout-overlay-v1 schema validator

tests/
  pytest coverage for the tools and packaging

scripts/integration/
  shell integration checks over synthetic corpora

examples/generators/
  deterministic example target programs

examples/workflows/
  runnable, self-narrating examples

docs/workflows/
  worked user-facing guides

TOOLS.md
  detailed command notes and caveats

TODO.md
  current follow-up work

Release Notes For This Draft

Before treating this as more than a public draft, useful next steps include:

expanding CI only if future release workflows need additional package or platform coverage
raising coverage toward the 90%+ line stretch target (an opt-in coverage lane with a fail_under floor is already in place; see Development)
opt-in stress and performance lanes are now in place (the stress lane covers scale/resource survival, the performance lane covers complexity-class regressions via scale-relative guards, both with env-driven tiers; see Development); the separate fuzzing/mutation lane is still open
adding the remaining scalar-width, endian, alignment, and tiny-payload tests listed in TODO.md
adding package metadata needed for a future PyPI release, if desired

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
prompts		prompts
scripts		scripts
src/bintools		src/bintools
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
TOOLS.md		TOOLS.md
binary_format_toolkit_roadmap.md		binary_format_toolkit_roadmap.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bintools

Current Status

What It Is For

What It Is Not

Installed Tools

Installation

Quick Start

A Runnable First Workflow

Worked Documentation

Example Generators

How To Interpret Results

Command Notes

`genbytes`

`bindiffmap`

`mapranges`

`bindelta`

`fieldscan`

`payloadscan`

`maskdiff`

`offsetstats`

`layoutcheck`

Layout Overlays

Development

GitHub Actions CI

Repository Layout

Release Notes For This Draft

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bintools

Current Status

What It Is For

What It Is Not

Installed Tools

Installation

Quick Start

A Runnable First Workflow

Worked Documentation

Example Generators

How To Interpret Results

Command Notes

genbytes

bindiffmap

mapranges

bindelta

fieldscan

payloadscan

maskdiff

offsetstats

layoutcheck

Layout Overlays

Development

GitHub Actions CI

Repository Layout

Release Notes For This Draft

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`genbytes`

`bindiffmap`

`mapranges`

`bindelta`

`fieldscan`

`payloadscan`

`maskdiff`

`offsetstats`

`layoutcheck`

Packages