Skip to content

Fix line numbers for duplicate secrets within a chunk#4910

Open
amanfcp wants to merge 2 commits intomainfrom
INS-30
Open

Fix line numbers for duplicate secrets within a chunk#4910
amanfcp wants to merge 2 commits intomainfrom
INS-30

Conversation

@amanfcp
Copy link
Copy Markdown
Contributor

@amanfcp amanfcp commented Apr 22, 2026

Summary

When the same secret appears multiple times in a chunk, every finding was reported with the line number of the first occurrence. This fixes it so each finding reports the line where it actually sits.

Closes #2502

Root cause

FragmentLineOffset in pkg/engine/engine.go uses bytes.Cut(chunk.Data, secret), which always splits at the first match. Because each detectors.Result only carries the secret bytes (not its byte position in the chunk), every duplicate result went through this first-match lookup and inherited the same line number.

Duplicates survive to this function under normal scans. CleanResults deduplication only runs under --only-verified, or when a detector's ShouldCleanResultsIrrespectiveOfConfiguration() returns true. By default, any detector that emits one result per regex match hits the bug.

Fix

Localized to the engine. No detector changes required.

  1. pkg/detectors/detectors.go: add a private chunkOffset / chunkOffsetSet pair to Result with SetChunkOffset / ChunkOffset / HasChunkOffset accessors.
  2. pkg/engine/engine.go: add AssignDuplicateLineOffsets(chunk, results), which groups results by secret value, walks the chunk with bytes.Index to find each successive occurrence, and stamps the offset onto the corresponding result. Called once per result batch inside detectChunk, before results are dispatched. Unique secrets are skipped (zero overhead).
  3. FragmentLineOffset: gains a fast path. When result.HasChunkOffset() is true, compute the line directly from the pre-assigned offset and run the ignore-tag check against that occurrence's line. The original bytes.Cut logic is preserved as a fallback for any caller that didn't go through AssignDuplicateLineOffsets, so the change is backward compatible.
  4. Test fixtures: pkg/detectors/datadogapikey/datadogapikey_test.go and pkg/custom_detectors/custom_detectors_test.go compared detectors.Result values via cmp.Diff with IgnoreFields enumerating the then-existing unexported fields. Switched them to cmpopts.IgnoreUnexported(detectors.Result{}) so they tolerate the new private fields and are future-proof against similar additions.

A side effect worth calling out: the trufflehog:ignore check now runs per-occurrence instead of always inspecting the first occurrence's line. That's a real behavioral improvement and is covered explicitly by a test.

Why not fix it in detectors?

Threading byte offsets through every detector's FromData would touch 700+ implementations and require switching most of them from regex.FindAllStringSubmatch to FindAllStringIndex. The engine is the single place in the pipeline that has both the full chunk data and the full result batch, so a localized engine-level fix is the minimal correct change.

Performance

  • Result struct: 160 → 176 bytes (+16 bytes for the int64 + bool + alignment padding). Results are short-lived per-chunk, so there is no meaningful memory pressure.
  • AssignDuplicateLineOffsets: O(N) with one map allocation per call. The inner bytes.Index loop only runs for groups with duplicates.

End-to-end verification

Reproduced the reporter's scenario against a fresh filesystem scan using a CustomRegex detector and a file containing FAKE_SECRET_ABC123XYZ on lines 2, 5, and 8:

Binary Reported lines
baseline line=2, line=2, line=2
with this fix line=2, line=5, line=8

Test plan

  • TestFragmentLineOffset_DuplicateSecrets: regression test for the bug (fails without the fix, passes with it)
  • TestAssignDuplicateLineOffsets: unit test covering unique secrets, duplicates, and result ordering
  • TestFragmentLineOffset_DuplicateSecretsWithIgnoreTag: verifies per-occurrence trufflehog:ignore handling
  • Existing TestFragmentLineOffset and TestFragmentLineOffsetWithPrimarySecret* still pass (fallback path unchanged)
  • Full go test ./pkg/engine/... and go test ./pkg/detectors/... pass
  • Manual CLI verification against a CustomRegex detector matches the expected line numbers

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Note

Low Risk
Low risk bugfix localized to line-number assignment; main behavioral change is per-occurrence trufflehog:ignore handling when the same secret appears multiple times.

Overview
Fixes a bug where multiple findings for the same secret value within a chunk were all reported with the first occurrence’s line number.

The engine now precomputes per-result byte offsets for duplicate secret values (AssignDuplicateLineOffsets) and FragmentLineOffset uses this offset to compute the correct line number and run the trufflehog:ignore check against the matching occurrence; detectors.Result gains internal chunkOffset storage to carry this through. Adds regression/unit tests covering duplicate occurrences and ignore-tag behavior.

Reviewed by Cursor Bugbot for commit b086057. Bugbot is set up for automated code reviews on this repo. Configure here.

@amanfcp amanfcp requested a review from a team April 22, 2026 16:54
@amanfcp amanfcp requested review from a team as code owners April 22, 2026 16:54
@amanfcp amanfcp requested a review from a team April 22, 2026 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Line of code calculation is wrong for sequential identical secrets

1 participant