From de3db7e27b01c4751fc0fa071313b5f4a29c6b95 Mon Sep 17 00:00:00 2001 From: Brad Geesaman <3769609+bgeesaman@users.noreply.github.com> Date: Tue, 30 Jun 2026 12:42:21 -0400 Subject: [PATCH 1/3] docs: add agent guide and document local build, test, and run workflow Poltergeist shipped without any agent-facing instructions, and its README sent setup details to external docs, so a newcomer or coding agent cloning the repo had no local reference for how to build, test, or run it. The repository .gitignore also excluded these files outright, which is why none of them could be tracked. Add a CLAUDE.md agent guide covering the architecture, the make-based commands, the Hyperscan build requirement, and the rule format, an AGENTS.md symlink so non-Claude tools read the same guidance, a scoped .claude permission allowlist for the build and test commands, README sections for usage and the build and test workflow, and the .gitignore allowlist entries that let all of these be tracked. --- .claude/settings.json | 25 ++++++++ .gitignore | 3 + AGENTS.md | 1 + CLAUDE.md | 130 ++++++++++++++++++++++++++++++++++++++++++ README.md | 56 ++++++++++++++++++ 5 files changed, 215 insertions(+) create mode 100644 .claude/settings.json create mode 120000 AGENTS.md create mode 100644 CLAUDE.md diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 0000000..6081582 --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,25 @@ +{ + "permissions": { + "allow": [ + "Bash(make build)", + "Bash(make test)", + "Bash(make test-rules)", + "Bash(make lint)", + "Bash(make docs)", + "Bash(make benchmarks)", + "Bash(make benchmarks-go)", + "Bash(make benchmarks-hyperscan)", + "Bash(make help)", + "Bash(make deps)", + "Bash(make clean)", + "Bash(go build:*)", + "Bash(go test:*)", + "Bash(go vet:*)", + "Bash(go run:*)", + "Bash(go mod tidy)", + "Bash(go mod download)", + "Bash(golangci-lint run:*)", + "Bash(./poltergeist:*)" + ] + } +} diff --git a/.gitignore b/.gitignore index 160b5cf..57cbd59 100644 --- a/.gitignore +++ b/.gitignore @@ -6,6 +6,7 @@ !.gitattributes !.github/** !.vscode/* +!.claude/** !.goreleaser.yaml !cmd/** !docs/* @@ -25,6 +26,8 @@ !LICENSE !NOTICE !Makefile +!CLAUDE.md +!AGENTS.md # ...even if they are in subdirectories !*/ diff --git a/AGENTS.md b/AGENTS.md new file mode 120000 index 0000000..681311e --- /dev/null +++ b/AGENTS.md @@ -0,0 +1 @@ +CLAUDE.md \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..b03a56a --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,130 @@ +# Poltergeist - Agent Guide + +Guidance for AI agents and humans working in this repository. For rule-writing +specifics, see [docs/rule-authoring.md](docs/rule-authoring.md). For the +published user documentation, see [ghostsecurity.ai](https://ghostsecurity.ai). + +## What Poltergeist Is + +Poltergeist is a high-performance secret scanner for source code, shipped both +as a standalone CLI and as an importable Go library under the module path +`github.com/ghostsecurity/poltergeist/v2`. It matches many regex rules at once +using Vectorscan/Hyperscan through CGO, falls back to a pure-Go regex engine +when Hyperscan is unavailable, and applies Shannon-entropy filtering so that +low-entropy matches are treated as likely false positives rather than findings. + +## Layout + +| Path | Purpose | +|------|---------| +| `cmd/poltergeist/` | CLI entry point. Parses flags, loads rules, scans a path, and prints text, JSON, or Markdown output. | +| `cmd/docs/` | Generates `docs/rules.md` from the embedded rule set. Invoked by `make docs`. | +| `cmd/benchmark/` | Benchmark harness that compares the Go and Hyperscan engines. | +| `pkg/poltergeist.go` | Public library API. `Scanner`, `ScanResult`, `LoadRules`, `LoadRulesFromFile`, `LoadRulesFromDirectory`, `SelectEngine`, `IsHyperscanAvailable`, `NewScanner`. | +| `pkg/engine.go` | The `PatternEngine` interface plus its two implementations, `HyperscanEngine` and `GoRegexEngine`. | +| `pkg/rule.go` | Rule data model, embedded-rule loading, extended-regex normalization, and Shannon entropy. | +| `pkg/rules/` | The 100-plus built-in detection rules, one YAML file per provider, embedded into the binary via `//go:embed`. | +| `rules` | A symlink to `pkg/rules` so the embedded rules are also reachable from the repo root. | +| `pkg/testdata/` | Fixtures used by the package tests. | +| `examples/` | A minimal library-usage example. | + +## Key Commands + +All routine tasks run through the `Makefile`, so prefer the targets below over +ad-hoc `go` invocations. Run `make help` to list every target. + +| Command | What it does | +|---------|--------------| +| `make build` | Build the CLI binary `poltergeist` from `cmd/poltergeist` with `CGO_ENABLED=1`. | +| `make test` | Run the full test suite with `go test -v ./...`. | +| `make test-rules` | Run only `TestRulesValidation`, which checks every embedded rule against its own `assert` and `assert_not` cases. | +| `make lint` | Run `golangci-lint run`, whose default linter set includes go vet checks, installing golangci-lint first if it is missing. | +| `make docs` | Regenerate `docs/rules.md` from the embedded rules. | +| `make benchmarks` | Run the benchmark harness across both engines. | + +## Build Prerequisites + +Building and testing require the Vectorscan/Hyperscan development headers +because the default engine binds to them through CGO. On Debian and Ubuntu the +package is `libhyperscan-dev`, which is exactly what CI installs before running +`make test` and `make lint`. On macOS install `vectorscan` or `hyperscan` with +Homebrew. When you cannot install the native library, the scanner still works +by selecting the pure-Go regex engine, either automatically through `-engine +auto` or explicitly with `-engine go`, so a Hyperscan-less environment can run +the tool even though it cannot compile the Hyperscan engine. + +## Running the CLI + +```bash +# Scan a directory with the built-in rules and redacted output. +./poltergeist /path/to/code + +# Force the pure-Go engine and emit JSON to a file. +./poltergeist -engine go -format json -output findings.json /path/to/code + +# Use a custom rule file or directory instead of the embedded rules. +./poltergeist -rules ./my-rules.yaml /path/to/code +``` + +The main flags are `-engine` for engine selection across `auto`, `go`, and +`hyperscan`, `-rules` for a custom rule file or directory, `-format` for `text`, +`json`, or `md`, `-output` for writing to a file, `-dnr` to show unredacted +matches, `-low-entropy` to include matches below their entropy threshold, and +`-no-color` to disable terminal coloring. + +## Rules + +Detection rules live in `pkg/rules/` as one YAML file per provider and are +compiled into the binary at build time, so adding or editing a file there +changes the built-in rule set without any external configuration. Each rule +carries a human name, a machine `id`, a regex `pattern`, an `entropy` +threshold, a `redact` pair giving the number of leading and trailing bytes to +keep when the match is masked, and inline `tests` that the validation suite +enforces. A representative rule looks like this: + +```yaml +rules: + - name: Anthropic API Key + id: ghost.anthropic.1 + description: Anthropic API key. + tags: [api, anthropic] + pattern: | + (?x) + \b + (sk-ant-api\d{2}-(?i)[A-Z0-9_-]{86}-(?i)[A-Z0-9_]{6}AA) + \b + entropy: 5.1 + redact: [16, 4] + tests: + assert: + - sk-ant-api03-...AA + assert_not: + - sk-ant-admin01-...AA + history: + - 2025-08-02 initial version +``` + +Patterns use the PCRE extended `(?x)` form for readability, and `rule.go` +normalizes that syntax for the Go engine before compilation. After any change +under `pkg/rules/`, run `make docs` to regenerate `docs/rules.md`, because the +Docs workflow fails the pull request when that file is stale. Run `make +test-rules` to confirm the new or edited rule still passes its own assertions. + +## Conventions + +Tests rely only on the standard library `testing` package with table-driven +cases and `t.Fatalf` or `t.Errorf`, so do not introduce testify or any other +assertion framework. Wrap errors with `fmt.Errorf` and `%w` to preserve the +chain, following the pattern already used throughout `rule.go`. Keep new +detection logic inside `pkg/` so that both the CLI and library consumers benefit +from it, and reserve `cmd/` for thin entry points. Match the existing file +organization rather than introducing new top-level packages. + +## Continuous Integration + +Pull requests run four required checks. The Test workflow installs +`libhyperscan-dev` and runs `make test`, the Lint workflow installs the same +library and runs `make build` followed by `make lint`, the Docs workflow +confirms `docs/rules.md` is regenerated, and a secret-scanning workflow runs +TruffleHog. Run `make lint`, `make test`, and `make docs` locally before +opening a pull request so these checks pass on the first try. diff --git a/README.md b/README.md index cc9b445..2acd039 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,62 @@ As a Go library: go get github.com/ghostsecurity/poltergeist ``` +## Usage + +Point Poltergeist at a file or directory and it scans with the built-in rules, +printing redacted matches by default: + +```bash +poltergeist /path/to/code +``` + +Common flags let you change the engine, the output format, and the destination: + +```bash +# Emit JSON to a file using the pure-Go engine. +poltergeist -engine go -format json -output findings.json /path/to/code + +# Scan with a custom rule file instead of the embedded rules. +poltergeist -rules ./my-rules.yaml /path/to/code +``` + +Use `-engine` to choose between `auto`, `go`, and `hyperscan`, `-format` to +choose `text`, `json`, or `md`, `-dnr` to show unredacted matches, and +`-low-entropy` to include matches below their entropy threshold. Run +`poltergeist -help` for the full list. + +## Building from Source + +Building requires Go and the Vectorscan/Hyperscan development library, since the +default engine binds to it through CGO. On Debian and Ubuntu install +`libhyperscan-dev`, and on macOS install `vectorscan` or `hyperscan` with +Homebrew. When the native library is unavailable you can still run the tool with +the pure-Go engine by passing `-engine go`. + +```bash +git clone https://github.com/ghostsecurity/poltergeist.git +cd poltergeist +make build +./poltergeist --version +``` + +## Development + +The `Makefile` drives the common workflows, and `make help` lists every target: + +```bash +make test # run the full test suite +make test-rules # validate the built-in rules against their own test cases +make lint # run golangci-lint, whose default checks include go vet +make docs # regenerate docs/rules.md after editing pkg/rules +``` + +Run `make test` and `make lint` before opening a pull request, and run `make +docs` whenever you change a rule so the generated documentation stays current. +See [CONTRIBUTING](.github/CONTRIBUTING.md) for the full contribution workflow +and [CLAUDE.md](CLAUDE.md) for an architecture-level guide aimed at coding +agents. + ## Comprehensive Documentation Full documentation, tutorials, and video guides at [ghostsecurity.ai](https://ghostsecurity.ai). From 3e290a2afdfcdcf4798fec55cfec0fcf0ce37f1f Mon Sep 17 00:00:00 2001 From: Brad Geesaman <3769609+bgeesaman@users.noreply.github.com> Date: Tue, 30 Jun 2026 13:18:52 -0400 Subject: [PATCH 2/3] Update README.md Co-authored-by: Josh Larsen <2565382+joshlarsen@users.noreply.github.com> Signed-off-by: Brad Geesaman <3769609+bgeesaman@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2acd039..b4412bc 100644 --- a/README.md +++ b/README.md @@ -80,7 +80,7 @@ agents. ## Comprehensive Documentation -Full documentation, tutorials, and video guides at [ghostsecurity.ai](https://ghostsecurity.ai). +Full documentation, tutorials, and video guides at [oss.ghostsecurity.ai](https://oss.ghostsecurity.ai). ## Contributions, Feedback, Feature Requests, and Issues From 46517ac37c158c07c9e47905764fadd2b9bc5920 Mon Sep 17 00:00:00 2001 From: Brad Geesaman <3769609+bgeesaman@users.noreply.github.com> Date: Tue, 30 Jun 2026 13:18:59 -0400 Subject: [PATCH 3/3] Update CLAUDE.md Co-authored-by: Josh Larsen <2565382+joshlarsen@users.noreply.github.com> Signed-off-by: Brad Geesaman <3769609+bgeesaman@users.noreply.github.com> --- CLAUDE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CLAUDE.md b/CLAUDE.md index b03a56a..28ce600 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ Guidance for AI agents and humans working in this repository. For rule-writing specifics, see [docs/rule-authoring.md](docs/rule-authoring.md). For the -published user documentation, see [ghostsecurity.ai](https://ghostsecurity.ai). +published user documentation, see the Poltergeist section of [oss.ghostsecurity.ai](https://oss.ghostsecurity.ai/tools/poltergeist). ## What Poltergeist Is