bintools is a small, stdlib-only collection of command-line utilities for
examining, comparing, and experimenting with binary files and byte streams.
The tools help answer practical questions such as:
- Which offsets are stable across a set of related files?
- Which bytes changed between two versions of a file?
- Where did known input bytes appear in an output artifact?
- Which offsets look like candidate counters, lengths, timestamps, offsets, or other scalar fields?
- Does a proposed JSON layout description fit a collection of samples?
- Can a validated layout be exported as an overlay for a hex viewer?
bintools produces evidence and candidate results. It does not attempt automatic
file type detection, complete format discovery, parsing, decompression, checksum
identification, or full layout inference.
Current status: public draft / 0.1.
The core commands are implemented and tested on synthetic corpora. The project has unit tests, shell integration checks, and runnable example workflows. The runtime package has no third-party dependencies and requires Python 3.10 or newer.
This is still an early public release:
- CLI text output may change.
- Heuristic tools report candidates, not conclusions.
- The example corpora are deliberately small and synthetic.
- GitHub Actions CI runs on pull requests and pushes to
main, installing from a clean checkout across the supported Python versions, verifying installed console scripts, and running lint, coverage-backed pytest, and shell integration checks. - Coverage is enforced in CI and can also be measured locally through the validation suite (see Development below). An opt-in stress lane exercises the tools at scale to catch algorithmic blowups, memory pathologies, and hangs, and an opt-in performance lane guards against complexity-class regressions (an O(n) path silently becoming O(n^2)) with scale-relative checks; a mutation-test lane is not yet part of the regular validation suite.
The intended contract is: small tools, explicit inputs, reproducible outputs, and conservative claims.
Use bintools when you have related binary artifacts and want to interrogate
how they differ.
Common uses include:
- comparing outputs produced from controlled inputs
- separating stable bytes from variable bytes across a corpus
- finding copied, truncated, repeated, hex-encoded, or base64-encoded payloads
- checking whether candidate length or count fields track known manifest values
- masking known noisy byte ranges and confirming what differences remain
- validating a layout hypothesis against one or more sample files
- exporting layout overlays for the sibling
multihexviewer
The tools are intentionally composable. A typical workflow uses several of them together rather than expecting one command to explain a file by itself.
bintools is not:
- a file type detector
- an automatic format discovery engine
- a parser generator
- a checksum or compression recognizer
- a disassembler
- a GUI
- a format database
- a replacement for human validation
Some tools are useful when investigating unknown formats, but the project is better described as a toolkit for examining binary data than as a reverse engineering or file format detection system.
Most commands read their input files fully into memory. bintools is
intended for small-to-moderate artifacts, controlled corpora, fixtures, and
focused investigations, not huge disk images or streaming analysis.
The package installs nine command-line tools:
| Tool | Purpose | Result type |
|---|---|---|
genbytes |
Generate deterministic binary inputs or an input corpus with a manifest. | exact generation |
bindiffmap |
Build byte-level and bit-level stability maps across many sample files. | exact map |
mapranges |
Collapse map bytes into contiguous labeled spans. | exact summary |
bindelta |
Compare one base binary against one or more changed binaries. | exact diff, optional alignment heuristic |
fieldscan |
Find candidate scalar fields at fixed offsets. | candidate evidence |
payloadscan |
Find where input bytes appear in related output artifacts. | exact matches plus candidate observations |
maskdiff |
Compare files after masking known noisy ranges. | exact comparison |
offsetstats |
Compute per-offset and per-range corpus statistics. | statistics and heuristic labels |
layoutcheck |
Validate a user-written JSON layout against sample files and optionally export an overlay. | validation result |
See TOOLS.md for detailed usage notes and caveats for each command.
Install from a checkout:
python3 -m pip install -e .Install development dependencies too:
python3 -m pip install -e .[dev]The project requires Python 3.10 or newer and has no runtime third-party dependencies.
During local development, commands can also be run as modules without installing:
PYTHONPATH=src python3 -m bintools.bindiffmap --help
PYTHONPATH=src python3 -m bintools.mapranges --help
PYTHONPATH=src python3 -m bintools.genbytes --help
PYTHONPATH=src python3 -m bintools.bindelta --help
PYTHONPATH=src python3 -m bintools.fieldscan --help
PYTHONPATH=src python3 -m bintools.payloadscan --help
PYTHONPATH=src python3 -m bintools.maskdiff --help
PYTHONPATH=src python3 -m bintools.offsetstats --help
PYTHONPATH=src python3 -m bintools.layoutcheck --helpGenerate one deterministic input file:
genbytes --length 16 --pattern fill --fill 0x41 --out probe.binGenerate a small corpus and manifest:
genbytes --lengths 0-256:16 --pattern zeros,increment,random \
--seeds 1,2 --replicates 2 --out-dir corpus/genbytes writes inputs under corpus/inputs/ and records the experiment plan
in corpus/manifest.csv. It does not run any target program. A separate runner
or script should execute the target once per manifest row, write each output to
the recorded output_path, and fill in the status fields.
After you have related output files, build common maps:
bindiffmap --mode eq corpus/outputs/*.out -o stable.eq
bindiffmap --mode values corpus/outputs/*.out -o stable.values
bindiffmap --mode zero corpus/outputs/*.out -o always-zero.map
bindiffmap --mode fixed corpus/outputs/*.out -o stable-bits.mapSummarize stable and variable regions:
mapranges stable.eq --ref stable.valuesCompare two concrete outputs:
bindelta base.bin changed.bin
bindelta --mode offset --bytes --xor base.bin changed.binLook for candidate scalar fields:
fieldscan --manifest corpus/manifest.csv --path-column output_path \
--expect input_len=length corpus/outputs/*.outFind copied payload bytes:
payloadscan --manifest corpus/manifest.csv \
--input-column input_path --output-column output_pathValidate a proposed layout:
layoutcheck --layout layout.json corpus/outputs/*.outExport an overlay for one sample:
layoutcheck --layout layout.json sample.bin --overlay-out sample.overlay.json
multihex sample.bin --layout sample.overlay.jsonThe repository ships a small synthetic corpus generator so you can try the tools
without supplying your own files. It writes four related sample files using a toy
format with a 3-byte ODD identifier, a big-endian count, and intentionally
unaligned fields:
python3 tests/integration/generators/make_odd_corpus.py /tmp/odd
cd /tmp/oddBuild an equality map and a values map:
bindiffmap --mode eq odd_*.bin -o stable.eq
bindiffmap --mode values odd_*.bin -o stable.valuesCollapse the equality map into labeled spans:
mapranges stable.eq --ref stable.valuesThe full walkthrough - from samples to evidence, layout hypothesis, validation, and overlay export - is in docs/workflows/unknown-format-walkthrough.md.
The main workflow guides are:
- Unknown-format walkthrough -
raw samples to evidence, layout hypothesis,
layoutcheck, and overlay JSON. - Comparing related files - stable vs. variable regions, masking known noise, and candidate scalar fields.
- Validating layouts - writing a layout spec, reading failure diagnostics, strict EOF, and allowed trailing data.
- Generating overlays - exporting
layout-overlay-v1JSON for the siblingmultihexviewer.
The repository also includes runnable example workflows under
examples/workflows/. These generate small deterministic
artifacts, run relevant bintools commands, explain the evidence, and assert a
few stable facts.
Run all example workflows:
examples/workflows/run_all.shRun one workflow:
examples/workflows/checksum/run.shKeep generated files for inspection:
KEEP_WORK=1 examples/workflows/checksum/run.shexamples/generators/ contains small deterministic target
programs used by the example workflows. They are not installed as bintools
commands.
Included generators cover text-like and binary-like outputs:
- CSV, JSON, and SVG wrappers around input text
- PPM, BMP, and WAV artifact generation
- length-prefixed records
- offset tables
- checksum trailers
- simple run-length encoding
These examples are fixtures. They are useful for learning and regression checks,
but they are not a claim that real-world formats all have magic values, fixed
headers, u32 fields, aligned payloads, checksums, or counts.
bintools deliberately separates exact observations from candidates.
Exact tools report facts about the files you provided:
bindiffmapreports which bytes or bits agree across the current corpus.maprangesreports contiguous spans from a map.bindeltareports concrete differences between files.maskdiffreports whether differences remain after selected masks.layoutcheckreports whether a supplied layout fits each sample.
Heuristic tools narrow the search space:
fieldscanreports scalar-field candidates.payloadscanreports exact byte matches plus cross-pair observations.offsetstatsreports statistics and heuristic labels.
A stable byte is not proof that a field is always constant. A candidate length is not proof of meaning. A passing layout check proves only that the supplied layout fits the supplied samples under the expressed rules.
Good practice is to vary one input property at a time, keep the manifest, compare repeated identical inputs for determinism, and validate any candidate meaning with new samples.
Creates deterministic binary inputs. In single mode it writes one sample to
stdout or a file. In corpus mode it creates inputs/ plus manifest.csv.
Supported patterns include zeros, fill, increment, random, and ascii.
Seeded random output uses a deterministic SHA-256 counter-mode stream, so it is
reproducible across machines and Python versions.
Compares many files and writes a byte map or bit map.
Common modes:
eq-0xffwhere every input has the same byte, otherwise0x00values- the stable byte value where inputs agree, otherwise0x00zero-0xffwhere every input byte is0x00, otherwise0x00and- bitwise AND across all inputsor- bitwise OR across all inputsfixed- each bit is1if that bit is identical across all inputs
By default, differing-length files are compared up to the shortest length.
--pad extends shorter files to the longest length using --pad-byte, but be
careful with zero and values maps because synthetic padding can look like
real observed bytes.
Converts a map into contiguous spans. For an eq map, the defaults are:
0xff->STABLE0x00->VARIABLE- any other byte ->
PARTIAL
When --ref is provided, invariant spans include hex and ASCII content. For
non-invariant spans, --show-variable-bytes can show the reference sample value
while marking it as varying sample data.
Compares a base file against one or more comparison files.
Modes:
auto- equal-length files use fixed-offset comparison; unequal lengths use alignment-aware comparisonoffset- compare bytes at fixed positionsalign- use an alignment hunk model for insertions, deletions, and replacements
Exit status follows cmp-style behavior: 0 for identical, 1 for different,
and 2 for command or file errors.
Scans fixed offsets across a corpus, decodes bytes there as integers, and
reports offsets whose values look meaningful. Widths 1, 2, 4, and 8 are
supported with little-endian or big-endian decoding and signed or unsigned
interpretation.
Candidate types include manifest matches, file-size matches, Unix-time-like values, absolute-offset-like values, size or length candidates, constants, zeros, monotonic values, small integers, and low-variance fields.
Every row is a scored candidate. Confirm semantics with other tools and new samples.
Takes input/output artifact pairs and finds where input bytes appear in output artifacts. It can report copied, truncated, split, repeated, hex-encoded, or base64-encoded payloads.
Pairs may be supplied directly with --pair INPUT:OUTPUT or through a manifest.
Short coincidental matches are suppressed by --min-run, which defaults to 4.
Use a lower value when deliberately investigating very small copied payloads.
Compares binary artifacts while ignoring byte ranges you mark as noisy, such as timestamps, counters, checksums, or reserved fields.
Mask ranges are half-open [start, end) and can be supplied on the command line
or in a mask file. Masked bytes may be treated as wildcards or normalized to a
chosen byte before comparison.
maskdiff proves only that the selected ranges explain the selected
differences. It does not prove the meaning of those ranges.
Scans a corpus offset-by-offset and reports how each byte position behaves: presence, unique-value count, min/max, most-common value, zero ratio, printable ratio, control ratio, mean, variance, entropy, and heuristic labels.
With --ranges, adjacent like-behaving offsets collapse into spans. This is a
useful first look at a corpus, but labels are statistical hints, not semantic
proof.
Validates a user-written JSON layout against one or more sample files. It checks fixed bytes, strings, scalar integers, length-prefixed payloads, count-driven repeats, padding, EOF behavior, and trailing-data policy.
A layout can be strict about EOF or allow trailing bytes. --dump-fields reports
parsed values. --csv and --json write structured validation results.
--overlay-out PATH writes a bintools.layout-overlay version 1 document for a
single binary.
layoutcheck checks whether samples fit a layout you wrote. It does not discover
the layout.
layoutcheck --overlay-out exports bintools.layout-overlay version 1 JSON.
The schema is documented in docs/layout-overlay-v1.md.
The sibling multihex viewer can load these overlays to color and label byte
ranges while displaying the file. The overlay records resolved offsets, lengths,
field paths, decoded values where available, validation status, and diagnostics.
Install development dependencies:
python3 -m pip install -e .[dev]Run unit tests:
python3 -m pytestRun linting:
ruff check .Measure test coverage (opt-in; requires the dev extra above):
python3 -m coverage run -m pytest
python3 -m coverage reportCoverage is configured in pyproject.toml (branch coverage on, with a
fail_under floor) and is deliberately kept out of the default python3 -m pytest invocation, so the fast lane needs no coverage plugin. The validation
wrapper runs the same measurement as its coverage lane (see below) and writes
coverage.xml and an htmlcov/ report.
Run shell integration checks:
scripts/integration/run_all.shRun the opt-in stress lane (scale/resource survival tests):
scripts/stress/run_all.sh # CI-default tier
BINTOOLS_STRESS_TIER=local scripts/stress/run_all.sh # heavier local soakThe stress lane exercises the tools at scale (wide corpora, large files,
near-cap align inputs) to catch algorithmic blowups, memory pathologies, and
hangs. It is deselected from the default pytest and coverage runs via the
stress marker, and its budgets are scale-relative (loose time ratios plus a
memory-peak bound), not absolute wall-clock numbers. Scale is env-driven:
BINTOOLS_STRESS_TIER=ci|local plus per-dimension overrides
BINTOOLS_STRESS_L, _N, _P, and _ALIGN. This lane is about survival under
scale, not performance-regression benchmarking or malformed-input fuzzing.
Run the opt-in performance lane (scale-relative complexity guards):
scripts/performance/run_all.sh # small tier
BINTOOLS_PERF_SCALE=large scripts/performance/run_all.sh # heavier local soakThe performance lane catches complexity-class regressions (an O(n) path
silently becoming O(n^2)) and catastrophic slowdowns for the high-cost-dimension
tools (bindelta, bindiffmap, offsetstats, fieldscan, payloadscan). It is
deselected from the default pytest and coverage runs via the performance
marker. It makes no absolute wall-clock assertions: each test measures a tool at
a base and a ~10x-larger input and asserts the runtime ratio did not jump a
complexity class, backed by a uniform hang ceiling. Scale is env-driven
(BINTOOLS_PERF_SCALE=small|large). This is the complement of the stress lane:
timing/complexity, not survival. A printed per-tool timing summary closes each
run; the script warns if coverage instrumentation (COVERAGE_PROCESS_START)
would distort the numbers.
Run example workflows:
examples/workflows/run_all.shRun the local validation wrapper:
scripts/run-full-test-suite.shThe validation wrapper runs the implemented local lanes, including the coverage
lane when coverage.py is installed, and reports skipped lanes for validation
layers that are opt-in or not yet present. Pass --include-stress to run the
stress lane (at the CI-default tier) or --include-performance to run the
performance lane (at the small tier) as part of the wrapper; --all runs every
opt-in lane.
The primary GitHub Actions workflow runs on pull requests and pushes to main.
It tests Python 3.10 through 3.14 on Ubuntu, installs .[dev] from a clean
checkout, verifies that the installed console scripts match pyproject.toml and
respond to --help, then runs:
ruff check .
scripts/coverage/run_coverage.sh
scripts/integration/run_all.shscripts/coverage/run_coverage.sh runs the default pytest suite, so packaging
and SPDX/license-header tests are included in the automatic path.
The stress workflow is available through manual dispatch and also runs weekly at
the ci tier. The performance workflow is manual only and should be treated as
a scale-relative guardrail, not a precise benchmark gate.
To mirror the automatic CI checks locally from a fresh environment:
python3 -m pip install -e .[dev]
for tool in bindiffmap bindelta genbytes mapranges fieldscan payloadscan maskdiff offsetstats layoutcheck; do
"$tool" --help >/dev/null
done
ruff check .
scripts/coverage/run_coverage.sh
scripts/integration/run_all.shsrc/bintools/
bindiffmap.py byte/bit stability map builder
bindelta.py concrete binary diff reporter
genbytes.py deterministic input and corpus generator
mapranges.py map span summarizer
fieldscan.py candidate scalar-field finder
payloadscan.py input-bytes-in-output locator
maskdiff.py masked binary comparison reporter
offsetstats.py per-offset and per-range corpus statistics
layoutcheck.py declarative layout validator
overlay_export.py layoutcheck to layout-overlay-v1 exporter
layout_overlay_v1.py layout-overlay-v1 schema validator
tests/
pytest coverage for the tools and packaging
scripts/integration/
shell integration checks over synthetic corpora
examples/generators/
deterministic example target programs
examples/workflows/
runnable, self-narrating examples
docs/workflows/
worked user-facing guides
TOOLS.md
detailed command notes and caveats
TODO.md
current follow-up work
Before treating this as more than a public draft, useful next steps include:
- expanding CI only if future release workflows need additional package or platform coverage
- raising coverage toward the 90%+ line stretch target (an opt-in coverage lane with a fail_under floor is already in place; see Development)
- opt-in stress and performance lanes are now in place (the stress lane covers scale/resource survival, the performance lane covers complexity-class regressions via scale-relative guards, both with env-driven tiers; see Development); the separate fuzzing/mutation lane is still open
- adding the remaining scalar-width, endian, alignment, and tiny-payload tests
listed in
TODO.md - adding package metadata needed for a future PyPI release, if desired
Apache-2.0. See LICENSE.