Fix line numbers for duplicate secrets within a chunk by amanfcp · Pull Request #4910 · trufflesecurity/trufflehog

amanfcp · 2026-04-22T16:54:04Z

Summary

When the same secret appears multiple times in a chunk, every finding was reported with the line number of the first occurrence. This fixes it so each finding reports the line where it actually sits.

Closes #2502

Root cause

FragmentLineOffset in pkg/engine/engine.go uses bytes.Cut(chunk.Data, secret), which always splits at the first match. Because each detectors.Result only carries the secret bytes (not its byte position in the chunk), every duplicate result went through this first-match lookup and inherited the same line number.

Duplicates survive to this function under normal scans. CleanResults deduplication only runs under --only-verified, or when a detector's ShouldCleanResultsIrrespectiveOfConfiguration() returns true. By default, any detector that emits one result per regex match hits the bug.

Fix

Localized to the engine. No detector changes required.

pkg/detectors/detectors.go: add a private chunkOffset / chunkOffsetSet pair to Result with SetChunkOffset / ChunkOffset / HasChunkOffset accessors.
pkg/engine/engine.go: add AssignDuplicateLineOffsets(chunk, results), which groups results by secret value, walks the chunk with bytes.Index to find each successive occurrence, and stamps the offset onto the corresponding result. Called once per result batch inside detectChunk, before results are dispatched. Unique secrets are skipped (zero overhead).
FragmentLineOffset: gains a fast path. When result.HasChunkOffset() is true, compute the line directly from the pre-assigned offset and run the ignore-tag check against that occurrence's line. The original bytes.Cut logic is preserved as a fallback for any caller that didn't go through AssignDuplicateLineOffsets, so the change is backward compatible.
Test fixtures: pkg/detectors/datadogapikey/datadogapikey_test.go and pkg/custom_detectors/custom_detectors_test.go compared detectors.Result values via cmp.Diff with IgnoreFields enumerating the then-existing unexported fields. Switched them to cmpopts.IgnoreUnexported(detectors.Result{}) so they tolerate the new private fields and are future-proof against similar additions.

A side effect worth calling out: the trufflehog:ignore check now runs per-occurrence instead of always inspecting the first occurrence's line. That's a real behavioral improvement and is covered explicitly by a test.

Why not fix it in detectors?

Threading byte offsets through every detector's FromData would touch 700+ implementations and require switching most of them from regex.FindAllStringSubmatch to FindAllStringIndex. The engine is the single place in the pipeline that has both the full chunk data and the full result batch, so a localized engine-level fix is the minimal correct change.

Performance

Result struct: 160 → 176 bytes (+16 bytes for the int64 + bool + alignment padding). Results are short-lived per-chunk, so there is no meaningful memory pressure.
AssignDuplicateLineOffsets: O(N) with one map allocation per call. The inner bytes.Index loop only runs for groups with duplicates.

End-to-end verification

Reproduced the reporter's scenario against a fresh filesystem scan using a CustomRegex detector and a file containing FAKE_SECRET_ABC123XYZ on lines 2, 5, and 8:

Binary	Reported lines
baseline	`line=2, line=2, line=2`
with this fix	`line=2, line=5, line=8`

Test plan

TestFragmentLineOffset_DuplicateSecrets: regression test for the bug (fails without the fix, passes with it)
TestAssignDuplicateLineOffsets: unit test covering unique secrets, duplicates, and result ordering
TestFragmentLineOffset_DuplicateSecretsWithIgnoreTag: verifies per-occurrence trufflehog:ignore handling
Existing TestFragmentLineOffset and TestFragmentLineOffsetWithPrimarySecret* still pass (fallback path unchanged)
Full go test ./pkg/engine/... and go test ./pkg/detectors/... pass
Manual CLI verification against a CustomRegex detector matches the expected line numbers

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

Note

Low Risk
Low risk bugfix localized to line-number assignment; main behavioral change is per-occurrence trufflehog:ignore handling when the same secret appears multiple times.

Overview
Fixes a bug where multiple findings for the same secret value within a chunk were all reported with the first occurrence’s line number.

The engine now precomputes per-result byte offsets for duplicate secret values (AssignDuplicateLineOffsets) and FragmentLineOffset uses this offset to compute the correct line number and run the trufflehog:ignore check against the matching occurrence; detectors.Result gains internal chunkOffset storage to carry this through. Adds regression/unit tests covering duplicate occurrences and ignore-tag behavior.

^{Reviewed by Cursor Bugbot for commit b086057. Bugbot is set up for automated code reviews on this repo. Configure here.}

fix line numbers for duplicate secrets within a chunk

b086057

amanfcp requested a review from a team April 22, 2026 16:54

amanfcp requested review from a team as code owners April 22, 2026 16:54

amanfcp requested a review from a team April 22, 2026 16:54

fix-test: use cmpopts.IgnoreUnexported instead of cmpopts.IgnoreFields

02ec427

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix line numbers for duplicate secrets within a chunk#4910

Fix line numbers for duplicate secrets within a chunk#4910
amanfcp wants to merge 2 commits intomainfrom
INS-30

amanfcp commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amanfcp commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Why not fix it in detectors?

Performance

End-to-end verification

Test plan

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amanfcp commented Apr 22, 2026 •

edited

Loading