Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"permissions": {
"allow": [
"Bash(make build)",
"Bash(make test)",
"Bash(make test-rules)",
"Bash(make lint)",
"Bash(make docs)",
"Bash(make benchmarks)",
"Bash(make benchmarks-go)",
"Bash(make benchmarks-hyperscan)",
"Bash(make help)",
"Bash(make deps)",
"Bash(make clean)",
"Bash(go build:*)",
"Bash(go test:*)",
"Bash(go vet:*)",
"Bash(go run:*)",
"Bash(go mod tidy)",
"Bash(go mod download)",
"Bash(golangci-lint run:*)",
"Bash(./poltergeist:*)"
]
}
}
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
!.gitattributes
!.github/**
!.vscode/*
!.claude/**
!.goreleaser.yaml
!cmd/**
!docs/*
Expand All @@ -25,6 +26,8 @@
!LICENSE
!NOTICE
!Makefile
!CLAUDE.md
!AGENTS.md

# ...even if they are in subdirectories
!*/
1 change: 1 addition & 0 deletions AGENTS.md
130 changes: 130 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Poltergeist - Agent Guide

Guidance for AI agents and humans working in this repository. For rule-writing
specifics, see [docs/rule-authoring.md](docs/rule-authoring.md). For the
published user documentation, see the Poltergeist section of [oss.ghostsecurity.ai](https://oss.ghostsecurity.ai/tools/poltergeist).

## What Poltergeist Is

Poltergeist is a high-performance secret scanner for source code, shipped both
as a standalone CLI and as an importable Go library under the module path
`github.com/ghostsecurity/poltergeist/v2`. It matches many regex rules at once
using Vectorscan/Hyperscan through CGO, falls back to a pure-Go regex engine
when Hyperscan is unavailable, and applies Shannon-entropy filtering so that
low-entropy matches are treated as likely false positives rather than findings.

## Layout

| Path | Purpose |
|------|---------|
| `cmd/poltergeist/` | CLI entry point. Parses flags, loads rules, scans a path, and prints text, JSON, or Markdown output. |
| `cmd/docs/` | Generates `docs/rules.md` from the embedded rule set. Invoked by `make docs`. |
| `cmd/benchmark/` | Benchmark harness that compares the Go and Hyperscan engines. |
| `pkg/poltergeist.go` | Public library API. `Scanner`, `ScanResult`, `LoadRules`, `LoadRulesFromFile`, `LoadRulesFromDirectory`, `SelectEngine`, `IsHyperscanAvailable`, `NewScanner`. |
| `pkg/engine.go` | The `PatternEngine` interface plus its two implementations, `HyperscanEngine` and `GoRegexEngine`. |
| `pkg/rule.go` | Rule data model, embedded-rule loading, extended-regex normalization, and Shannon entropy. |
| `pkg/rules/` | The 100-plus built-in detection rules, one YAML file per provider, embedded into the binary via `//go:embed`. |
| `rules` | A symlink to `pkg/rules` so the embedded rules are also reachable from the repo root. |
| `pkg/testdata/` | Fixtures used by the package tests. |
| `examples/` | A minimal library-usage example. |

## Key Commands

All routine tasks run through the `Makefile`, so prefer the targets below over
ad-hoc `go` invocations. Run `make help` to list every target.

| Command | What it does |
|---------|--------------|
| `make build` | Build the CLI binary `poltergeist` from `cmd/poltergeist` with `CGO_ENABLED=1`. |
| `make test` | Run the full test suite with `go test -v ./...`. |
| `make test-rules` | Run only `TestRulesValidation`, which checks every embedded rule against its own `assert` and `assert_not` cases. |
| `make lint` | Run `golangci-lint run`, whose default linter set includes go vet checks, installing golangci-lint first if it is missing. |
| `make docs` | Regenerate `docs/rules.md` from the embedded rules. |
| `make benchmarks` | Run the benchmark harness across both engines. |

## Build Prerequisites

Building and testing require the Vectorscan/Hyperscan development headers
because the default engine binds to them through CGO. On Debian and Ubuntu the
package is `libhyperscan-dev`, which is exactly what CI installs before running
`make test` and `make lint`. On macOS install `vectorscan` or `hyperscan` with
Homebrew. When you cannot install the native library, the scanner still works
by selecting the pure-Go regex engine, either automatically through `-engine
auto` or explicitly with `-engine go`, so a Hyperscan-less environment can run
the tool even though it cannot compile the Hyperscan engine.

## Running the CLI

```bash
# Scan a directory with the built-in rules and redacted output.
./poltergeist /path/to/code

# Force the pure-Go engine and emit JSON to a file.
./poltergeist -engine go -format json -output findings.json /path/to/code

# Use a custom rule file or directory instead of the embedded rules.
./poltergeist -rules ./my-rules.yaml /path/to/code
```

The main flags are `-engine` for engine selection across `auto`, `go`, and
`hyperscan`, `-rules` for a custom rule file or directory, `-format` for `text`,
`json`, or `md`, `-output` for writing to a file, `-dnr` to show unredacted
matches, `-low-entropy` to include matches below their entropy threshold, and
`-no-color` to disable terminal coloring.

## Rules

Detection rules live in `pkg/rules/` as one YAML file per provider and are
compiled into the binary at build time, so adding or editing a file there
changes the built-in rule set without any external configuration. Each rule
carries a human name, a machine `id`, a regex `pattern`, an `entropy`
threshold, a `redact` pair giving the number of leading and trailing bytes to
keep when the match is masked, and inline `tests` that the validation suite
enforces. A representative rule looks like this:

```yaml
rules:
- name: Anthropic API Key
id: ghost.anthropic.1
description: Anthropic API key.
tags: [api, anthropic]
pattern: |
(?x)
\b
(sk-ant-api\d{2}-(?i)[A-Z0-9_-]{86}-(?i)[A-Z0-9_]{6}AA)
\b
entropy: 5.1
redact: [16, 4]
tests:
assert:
- sk-ant-api03-...AA
assert_not:
- sk-ant-admin01-...AA
history:
- 2025-08-02 initial version
```

Patterns use the PCRE extended `(?x)` form for readability, and `rule.go`
normalizes that syntax for the Go engine before compilation. After any change
under `pkg/rules/`, run `make docs` to regenerate `docs/rules.md`, because the
Docs workflow fails the pull request when that file is stale. Run `make
test-rules` to confirm the new or edited rule still passes its own assertions.

## Conventions

Tests rely only on the standard library `testing` package with table-driven
cases and `t.Fatalf` or `t.Errorf`, so do not introduce testify or any other
assertion framework. Wrap errors with `fmt.Errorf` and `%w` to preserve the
chain, following the pattern already used throughout `rule.go`. Keep new
detection logic inside `pkg/` so that both the CLI and library consumers benefit
from it, and reserve `cmd/` for thin entry points. Match the existing file
organization rather than introducing new top-level packages.

## Continuous Integration

Pull requests run four required checks. The Test workflow installs
`libhyperscan-dev` and runs `make test`, the Lint workflow installs the same
library and runs `make build` followed by `make lint`, the Docs workflow
confirms `docs/rules.md` is regenerated, and a secret-scanning workflow runs
TruffleHog. Run `make lint`, `make test`, and `make docs` locally before
opening a pull request so these checks pass on the first try.
58 changes: 57 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,65 @@ As a Go library:
go get github.com/ghostsecurity/poltergeist
```

## Usage

Point Poltergeist at a file or directory and it scans with the built-in rules,
printing redacted matches by default:

```bash
poltergeist /path/to/code
```

Common flags let you change the engine, the output format, and the destination:

```bash
# Emit JSON to a file using the pure-Go engine.
poltergeist -engine go -format json -output findings.json /path/to/code

# Scan with a custom rule file instead of the embedded rules.
poltergeist -rules ./my-rules.yaml /path/to/code
```

Use `-engine` to choose between `auto`, `go`, and `hyperscan`, `-format` to
choose `text`, `json`, or `md`, `-dnr` to show unredacted matches, and
`-low-entropy` to include matches below their entropy threshold. Run
`poltergeist -help` for the full list.

## Building from Source

Building requires Go and the Vectorscan/Hyperscan development library, since the
default engine binds to it through CGO. On Debian and Ubuntu install
`libhyperscan-dev`, and on macOS install `vectorscan` or `hyperscan` with
Homebrew. When the native library is unavailable you can still run the tool with
the pure-Go engine by passing `-engine go`.

```bash
git clone https://github.com/ghostsecurity/poltergeist.git
cd poltergeist
make build
./poltergeist --version
```

## Development

The `Makefile` drives the common workflows, and `make help` lists every target:

```bash
make test # run the full test suite
make test-rules # validate the built-in rules against their own test cases
make lint # run golangci-lint, whose default checks include go vet
make docs # regenerate docs/rules.md after editing pkg/rules
```

Run `make test` and `make lint` before opening a pull request, and run `make
docs` whenever you change a rule so the generated documentation stays current.
See [CONTRIBUTING](.github/CONTRIBUTING.md) for the full contribution workflow
and [CLAUDE.md](CLAUDE.md) for an architecture-level guide aimed at coding
agents.

## Comprehensive Documentation

Full documentation, tutorials, and video guides at [ghostsecurity.ai](https://ghostsecurity.ai).
Full documentation, tutorials, and video guides at [oss.ghostsecurity.ai](https://oss.ghostsecurity.ai).

## Contributions, Feedback, Feature Requests, and Issues

Expand Down
Loading