Skip to content

[INS-455] Unify common logic in Atlassian Data Center detectors#4907

Open
mustansir14 wants to merge 3 commits intomainfrom
INS-455-Unify-common-code-in-Atlassian-Data-Center-detectors
Open

[INS-455] Unify common logic in Atlassian Data Center detectors#4907
mustansir14 wants to merge 3 commits intomainfrom
INS-455-Unify-common-code-in-Atlassian-Data-Center-detectors

Conversation

@mustansir14
Copy link
Copy Markdown
Contributor

@mustansir14 mustansir14 commented Apr 22, 2026

Background

Three detectors target Atlassian Data Center (self-hosted) products: JiraDataCenterPAT, ConfluenceDataCenter, and BitbucketDataCenter. Reviewing them side-by-side revealed significant duplication: identical structural-validation logic, the same bearer-auth HTTP plumbing, and the same URL-extraction pipeline repeated verbatim in each file.

What changed

New shared package: pkg/detectors/atlassiandatacenter/

Following the pattern established by pkg/detectors/aws/, the three detectors are now co-located under a common parent that also houses shared utilities in common.go.

GetDCTokenPat(prefixes []string) *regexp.Regexp
Returns the compiled PAT regex for Jira/Confluence DC tokens (44-char base64 strings whose decoded form is <numeric-id>:<random-bytes>). Previously each detector hand-rolled the same pattern with inline literal strings.

IsStructuralPAT(candidate string) bool
Validates that a base64 candidate decodes to the <digits>:<bytes> structure. This function was copy-pasted verbatim between Jira and Confluence.

GetURLPat(prefixes []string) *regexp.Regexp
Returns the compiled URL regex for self-hosted Atlassian instance URLs. Each detector calls this once at package init time and stores the result in a package-level var, preserving the same compile-once behaviour as before. go-re2 compiles via CGo/FFI so doing this per-chunk would be a meaningful regression.

FindEndpoints(data string, urlPat *regexp.Regexp, resolve func(...string) []string) []string
Extracts keyword-scoped URLs from a chunk using the pre-compiled urlPat, passes them through s.Endpoints (which merges configured endpoints), deduplicates, and strips trailing slashes. All three detectors had their own multi-step version of this pipeline.

MakeVerifyRequest(ctx, client, fullURL, token string) (bool, map[string]any, error)
Sends a Bearer-authenticated GET and interprets the response: 200(true, decoded-JSON-body, nil), 401(false, nil, nil), other → (false, nil, error). Previously each detector built the request inline and contained its own copy of the 200 / 401 / default status-code switch. Jira reads body["displayName"] and body["emailAddress"] from the returned map; Confluence and Bitbucket discard it.

Detector changes

Each detector now imports the shared package and delegates to it:

Before After
Private isStructuralPAT (Jira + Confluence) atlassiandatacenter.IsStructuralPAT
Inline PAT regex literal (Jira + Confluence) atlassiandatacenter.GetDCTokenPat(keywords)
10–15 line URL-find + dedup + resolve block (all three) atlassiandatacenter.FindEndpoints(...)
Inline request build + status-code switch (all three) atlassiandatacenter.MakeVerifyRequest(...)

The URL regex is also standardised across all three detectors. Confluence and Bitbucket previously used an unbounded \d+ port pattern and allowed hostnames starting with . or -; all three now use [a-zA-Z0-9][a-zA-Z0-9.\-]* with \d{1,5} for the port (matching the stricter Jira original).

A keywords package-level variable is introduced in Jira (Confluence and Bitbucket already had one) so the keyword list is defined once and reused by the regex, FindEndpoints, and Keywords().

TestIsStructuralPAT migrated

The structural-PAT test previously lived in jiradatacenterpat_test.go and called the private function. It is now in common_test.go and covers tokens from all three products. common_test.go also adds tests for GetDCTokenPat, FindEndpoints, and MakeVerifyRequest.

What did not change

  • Each detector retains its own package, Scanner type, Keywords(), Type(), Description(), and FromData() logic.
  • Jira's JSON response parsing (displayName, emailAddress) is unchanged.
  • Confluence's invalidHosts cache and errNoHost sentinel are unchanged.
  • Bitbucket's ?limit=1 query parameter is unchanged.
  • All existing tests pass without modification.

Corpora Tests

image image

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Note

Medium Risk
Mostly a refactor, but it changes shared URL matching/endpoint extraction and verification plumbing across three detectors, which could impact detection/verification behavior and false positive/negative rates.

Overview
Consolidates duplicated logic across Atlassian Data Center detectors by introducing a new shared atlassiandatacenter helper package for PAT/URL regexes, endpoint extraction, structural PAT validation, and a common Bearer-auth verification request helper.

Updates the Jira/Confluence/Bitbucket DC detectors to delegate to these shared helpers (including a standardized, slightly stricter URL pattern), removes their inlined structural/HTTP code, and relocates detector imports in defaults.go/tests (plus registering Bitbucket DC as a no-cloud-endpoint detector). Adds a comprehensive common_test.go covering the new helpers and moves structural-PAT tests out of Jira-specific tests.

Reviewed by Cursor Bugbot for commit 7c19046. Bugbot is set up for automated code reviews on this repo. Configure here.

@mustansir14 mustansir14 requested a review from a team April 22, 2026 08:48
@mustansir14 mustansir14 requested a review from a team as a code owner April 22, 2026 08:48
Comment thread pkg/detectors/atlassiandatacenter/common.go Outdated
@mustansir14 mustansir14 requested a review from a team as a code owner April 22, 2026 09:20
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7c19046. Configure here.

&billomat.Scanner{},
&bingsubscriptionkey.Scanner{},
&bitbar.Scanner{},
&bitbucketdatacenter.Scanner{},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitbucket detector silently added to default scanner list

Medium Severity

The bitbucketdatacenter.Scanner{} is being newly registered in the default detector list (both import and scanner instantiation are pure additions with no corresponding removal). The detector previously existed as code but was not included in the default scan engine. This PR's stated goal is to "unify common logic," but silently enabling a previously-unregistered detector in production scans could produce unexpected new findings and has performance implications. The addition to noCloudEndpointDetectors in the test file confirms this is a net-new registration.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7c19046. Configure here.

@MuneebUllahKhan222
Copy link
Copy Markdown
Contributor

This is completely irrelevant to the goal of this PR. But can you please add similar logic in bitbucketdatacenter detector

if !emitted {
	// No reachable URL in context — emit an unverified token-only
	// result and annotate why we couldn't verify.
	results = append(results, detectors.Result{
		DetectorType: detector_typepb.DetectorType_ConfluenceDataCenter,
		Raw:          []byte(token),
		RawV2:        []byte(token),
		ExtraData: map[string]string{
			"verification_note": "no reachable Confluence Data Center URL found in context; token reported unverified",
		},
	})
}

Copy link
Copy Markdown
Contributor

@MuneebUllahKhan222 MuneebUllahKhan222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of changes that need to be addressed, Other than that the PR is good to go.

@@ -170,35 +136,13 @@ func (s Scanner) FromData(ctx context.Context, verify bool, data []byte) (result

func verifyPAT(ctx context.Context, client *http.Client, baseURL, token string) (bool, error) {
endpoint := strings.TrimRight(baseURL, "/") + "/rest/api/user/current"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do TrimRight here because we are already doing that in FindEndpoints function.

Comment on lines 85 to 92
u, err := detectors.ParseURLAndStripPathAndParams(baseURL)
if err != nil {
return false, err
}
u.Path = "rest/api/1.0/projects"
q := u.Query()
q.Set("limit", "1")
u.RawQuery = q.Encode()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of all this and do

baseURL + "/rest/api/1.0/projects?limit=1"

Comment on lines 104 to 108
u, err := detectors.ParseURLAndStripPathAndParams(baseURL)
if err != nil {
return false, nil, err
}
u.Path = "/rest/api/2/myself"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of all this and do

baseURL + "/rest/api/2/myself"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants