Skip to content

Improve Algolia detector context matching#4905

Open
heyfunwhoa wants to merge 2 commits intotrufflesecurity:mainfrom
heyfunwhoa:algolia-improve-context
Open

Improve Algolia detector context matching#4905
heyfunwhoa wants to merge 2 commits intotrufflesecurity:mainfrom
heyfunwhoa:algolia-improve-context

Conversation

@heyfunwhoa
Copy link
Copy Markdown

@heyfunwhoa heyfunwhoa commented Apr 22, 2026

Summary

Adds common Algolia environment variable and header names to improve real-world detection coverage:

  • ALGOLIA_API_KEY
  • ALGOLIA_APPLICATION_ID
  • x-algolia-api-key
  • x-algolia-application-id

Motivation

These are standard patterns used in Algolia integrations and are commonly found in real-world codebases. This improves detection reliability without changing core logic.

Risk

Low: only expands keyword matching, no changes to verification or detection logic.


Note

Low Risk
Low risk: only broadens keyword-based prefiltering for these detectors and does not change matching regexes or verification behavior.

Overview
Improves real-world secret discovery by expanding Keywords() prefilter terms for the Algolia Admin Key and Metabase detectors.

Algolia now also prefilters on common env var and header names (e.g., ALGOLIA_API_KEY, ALGOLIA_APPLICATION_ID, x-algolia-api-key, x-algolia-application-id), and Metabase adds typical session/API key header/env identifiers (e.g., X-Metabase-Session, X-API-Key, METABASE_API_KEY).

Reviewed by Cursor Bugbot for commit 33bc856. Bugbot is set up for automated code reviews on this repo. Configure here.

@heyfunwhoa heyfunwhoa requested a review from a team April 22, 2026 05:07
@heyfunwhoa heyfunwhoa requested a review from a team as a code owner April 22, 2026 05:07
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

"ALGOLIA_API_KEY",
"ALGOLIA_APPLICATION_ID",
"x-algolia-api-key",
"x-algolia-application-id",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New keywords are redundant substrings of existing keyword

Low Severity

All four new keywords (ALGOLIA_API_KEY, ALGOLIA_APPLICATION_ID, x-algolia-api-key, x-algolia-application-id) contain algolia as a substring. The Aho-Corasick pre-filter in ahocorasickcore.go already lowercases both keywords and chunk data, and performs substring matching. Since "algolia" is already a keyword, it will match any text containing the new keywords, making them entirely redundant. They don't expand detection coverage but do add unnecessary entries to the trie and keyword-to-detector map.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7fd27bb. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 33bc856. Configure here.

return []string{
"metabase",
"X-Metabase-Session",
"X-API-Key",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly generic "X-API-Key" keyword triggers false pre-filter matches

Medium Severity

The keyword X-API-Key is an extremely common HTTP header used by many APIs (findl, interseller, langsmith, cloudsmith, etc.). Adding it to the Metabase detector's Keywords() causes the Metabase detector's FromData to be invoked on any chunk containing this generic header, even though the detector's regex (keyPat and baseURL) both require the prefix metabase to match. This results in wasted CPU running regex scans on chunks that can never produce a Metabase result. The other new Metabase keywords (X-Metabase-Session, METABASE_API_KEY) already contain metabase as a substring, making them redundant but harmless — X-API-Key is uniquely problematic because it doesn't contain metabase at all.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 33bc856. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants