Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the same secret appears multiple times in a chunk, every finding was reported with the line number of the first occurrence. This fixes it so each finding reports the line where it actually sits.
Closes #2502
Root cause
FragmentLineOffsetinpkg/engine/engine.gousesbytes.Cut(chunk.Data, secret), which always splits at the first match. Because eachdetectors.Resultonly carries the secret bytes (not its byte position in the chunk), every duplicate result went through this first-match lookup and inherited the same line number.Duplicates survive to this function under normal scans.
CleanResultsdeduplication only runs under--only-verified, or when a detector'sShouldCleanResultsIrrespectiveOfConfiguration()returns true. By default, any detector that emits one result per regex match hits the bug.Fix
Localized to the engine. No detector changes required.
pkg/detectors/detectors.go: add a privatechunkOffset/chunkOffsetSetpair toResultwithSetChunkOffset/ChunkOffset/HasChunkOffsetaccessors.pkg/engine/engine.go: addAssignDuplicateLineOffsets(chunk, results), which groups results by secret value, walks the chunk withbytes.Indexto find each successive occurrence, and stamps the offset onto the corresponding result. Called once per result batch insidedetectChunk, before results are dispatched. Unique secrets are skipped (zero overhead).FragmentLineOffset: gains a fast path. Whenresult.HasChunkOffset()is true, compute the line directly from the pre-assigned offset and run the ignore-tag check against that occurrence's line. The originalbytes.Cutlogic is preserved as a fallback for any caller that didn't go throughAssignDuplicateLineOffsets, so the change is backward compatible.pkg/detectors/datadogapikey/datadogapikey_test.goandpkg/custom_detectors/custom_detectors_test.gocompareddetectors.Resultvalues viacmp.DiffwithIgnoreFieldsenumerating the then-existing unexported fields. Switched them tocmpopts.IgnoreUnexported(detectors.Result{})so they tolerate the new private fields and are future-proof against similar additions.A side effect worth calling out: the
trufflehog:ignorecheck now runs per-occurrence instead of always inspecting the first occurrence's line. That's a real behavioral improvement and is covered explicitly by a test.Why not fix it in detectors?
Threading byte offsets through every detector's
FromDatawould touch 700+ implementations and require switching most of them fromregex.FindAllStringSubmatchtoFindAllStringIndex. The engine is the single place in the pipeline that has both the full chunk data and the full result batch, so a localized engine-level fix is the minimal correct change.Performance
Resultstruct: 160 → 176 bytes (+16 bytes for the int64 + bool + alignment padding). Results are short-lived per-chunk, so there is no meaningful memory pressure.AssignDuplicateLineOffsets: O(N) with one map allocation per call. The innerbytes.Indexloop only runs for groups with duplicates.End-to-end verification
Reproduced the reporter's scenario against a fresh filesystem scan using a
CustomRegexdetector and a file containingFAKE_SECRET_ABC123XYZon lines 2, 5, and 8:line=2, line=2, line=2line=2, line=5, line=8Test plan
TestFragmentLineOffset_DuplicateSecrets: regression test for the bug (fails without the fix, passes with it)TestAssignDuplicateLineOffsets: unit test covering unique secrets, duplicates, and result orderingTestFragmentLineOffset_DuplicateSecretsWithIgnoreTag: verifies per-occurrencetrufflehog:ignorehandlingTestFragmentLineOffsetandTestFragmentLineOffsetWithPrimarySecret*still pass (fallback path unchanged)go test ./pkg/engine/...andgo test ./pkg/detectors/...passCustomRegexdetector matches the expected line numbersChecklist:
make test-community)?make lintthis requires golangci-lint)?Note
Low Risk
Low risk bugfix localized to line-number assignment; main behavioral change is per-occurrence
trufflehog:ignorehandling when the same secret appears multiple times.Overview
Fixes a bug where multiple findings for the same secret value within a chunk were all reported with the first occurrence’s line number.
The engine now precomputes per-result byte offsets for duplicate secret values (
AssignDuplicateLineOffsets) andFragmentLineOffsetuses this offset to compute the correct line number and run thetrufflehog:ignorecheck against the matching occurrence;detectors.Resultgains internalchunkOffsetstorage to carry this through. Adds regression/unit tests covering duplicate occurrences and ignore-tag behavior.Reviewed by Cursor Bugbot for commit b086057. Bugbot is set up for automated code reviews on this repo. Configure here.