Skip to content

feat: add WebhookGateway CRD for per-channel webhook auth and multi-instance GitHub#1238

Open
gjkim42 wants to merge 6 commits into
mainfrom
webhook-gateway
Open

feat: add WebhookGateway CRD for per-channel webhook auth and multi-instance GitHub#1238
gjkim42 wants to merge 6 commits into
mainfrom
webhook-gateway

Conversation

@gjkim42
Copy link
Copy Markdown
Collaborator

@gjkim42 gjkim42 commented May 29, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Webhook-based TaskSpawners are served by a "universal" webhook server: one Deployment per --source with a single shared WEBHOOK_SECRET, and the GitHub token resolver + API base URL are process-global. This prevents per-tenant/per-repo secrets, leaves the generic source unauthenticated, and makes serving multiple GitHub instances (github.com + GitHub Enterprise) require one Deployment per instance with no way to bind a spawner to an instance.

This PR introduces a WebhookGateway CRD as a per-channel authentication and routing boundary:

  • Each gateway owns one inbound path, /webhook/<namespace>/<name> (surfaced in status.url), and verifies inbound deliveries against its own secretRef (github/linear), reusing the existing HMAC validators.
  • A TaskSpawner opts in via when.<source>Webhook.gatewayRef; the gateway fans out only to spawners in its own namespace that reference it. The legacy per-source server skips gatewayRef'd spawners, so a spawner is served exactly once.
  • For github, a gateway also carries its own apiBaseURL and credentialsRef, used for PR-file enrichment, status reporting, and GitHub App token minting. This lets one gateway server serve github.com plus multiple GitHub Enterprise instances, each fully isolated.
  • A status reconciler reflects Authenticated / SecretMissing / Unauthenticated.
  • Generic gateways are accepted but not signature-verified yet (a per-provider verification scheme is a follow-up); they are surfaced as Unauthenticated and must be protected at the network layer.

Enable with webhookServer.gatewayServer.enabled in the Helm chart. See examples/14-webhookgateway/.

Which issue(s) this PR is related to:

Fixes #1236

Special notes for your reviewer:

  • The largest refactor makes the package-global GitHub token resolver injectable per request; the shared resolver logic now lives in githubapp.NewSecretTokenResolver, used by both the gateway handler and the reporting reconciler.
  • The gateway server disables the controller-runtime Secret cache so it needs only get (not list/watch) on Secrets.
  • Legacy --source behavior is unchanged and backward-compatible; gatewayRef, the new CRD, and the new fields are all optional and additive.
  • Corrected the generic webhook docstrings, which previously claimed <SOURCE>_WEBHOOK_SECRET HMAC validation that was never implemented (Generic webhook endpoint is unauthenticated — implement <SOURCE>_WEBHOOK_SECRET HMAC validation #1040).
  • Generic per-provider verification, migrating self-development spawners onto gateways, and e2e coverage are deferred follow-ups.

Does this PR introduce a user-facing change?

Add a `WebhookGateway` CRD that authenticates webhook deliveries per channel and routes them to TaskSpawners via `when.<source>Webhook.gatewayRef`. For `github`, a gateway carries its own `apiBaseURL` and `credentialsRef`, enabling per-tenant webhook secrets and serving github.com plus multiple GitHub Enterprise instances from a single gateway server. Enable with `webhookServer.gatewayServer.enabled`.

🤖 Generated with Claude Code


Summary by cubic

Adds a WebhookGateway CRD and a gateway-mode server to authenticate webhooks per channel and route to matching TaskSpawners via when.<source>Webhook.gatewayRef, enabling per-tenant secrets and serving github.com plus multiple GHE instances from one deployment. Backward compatible; legacy servers skip gatewayRef spawners. Fixes #1236.

  • New Features

    • CRD uses a provider union (spec.github/spec.linear/spec.generic); each gateway owns one status.path (/webhook/<namespace>/<name>). github/linear verify HMAC from secretRef; generic is accepted but unauthenticated.
    • Gateway server (--gateway-mode, Helm webhookServer.gatewayServer) with RBAC and chart updates; it takes the /webhook/ prefix, so migrate legacy generic spawners to a gateway.
    • GitHub per-gateway apiBaseURL and credentialsRef; per-request token resolver via githubapp.NewSecretTokenResolver. Reporting reads the kelos.dev/webhook-gateway annotation to target the correct instance.
    • Self- and Kanon-development: all GitHub webhook spawners now use gatewayRef; CI enables the gateway server and provisions github-webhook-secret (includes webhook-secret for HMAC and GitHub App credentials).
  • Bug Fixes

    • Gateway handler returns 404 when a gateway is missing and 5xx on spawner list errors; generic delivery IDs are now namespaced and gateway-scoped.
    • Status and validation: status.path replaces url, ObservedGeneration tracked; CRD enforces secretRef for github/linear.
    • Helm: validate gatewayServer.service.type; docs clarify gateway terminology and route precedence.

Written for commit ab2d799. Summary will update on new commits.

Review in cubic

@github-actions github-actions Bot added kind/feature Categorizes issue or PR as related to a new feature needs-triage needs-priority needs-actor release-note labels May 29, 2026
@gjkim42 gjkim42 added the kind/api Categorizes issue or PR as related to API changes label May 29, 2026
@gjkim42 gjkim42 self-assigned this May 29, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 issues found across 41 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cmd/kelos-webhook-server/main.go">

<violation number="1" location="cmd/kelos-webhook-server/main.go:232">
P2: Gateway mode now enables the reporting reconciler even when no global token resolver exists, which causes persistent reconcile errors for reporting-enabled tasks that lack a gateway annotation.</violation>
</file>

<file name="internal/manifests/charts/kelos/templates/webhook-gateway.yaml">

<violation number="1" location="internal/manifests/charts/kelos/templates/webhook-gateway.yaml:95">
P2: Using `else if` here disables legacy generic webhook routing whenever gateway server is enabled, so generic can be deployed but unreachable behind Gateway API.</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

Comment thread internal/controller/webhookgateway_controller.go Outdated
Comment thread internal/webhook/gateway_handler.go Outdated
Comment thread internal/manifests/charts/kelos/templates/webhook-server.yaml
Comment thread cmd/kelos-webhook-server/main.go
Comment thread internal/githubapp/resolver.go Outdated
Comment thread internal/webhook/gateway_handler.go Outdated
Comment thread internal/webhook/gateway_handler.go Outdated
Comment thread internal/manifests/charts/kelos/templates/webhook-gateway.yaml
@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 30, 2026

Thanks for the review. Addressed in f528f8b:

P1

  • gateway_handler.go (spawner List error) — listGatewayScopedSpawners now returns the error; the handler responds 5xx so a transient API failure is retried instead of being swallowed as a 200.
  • webhookgateway_controller.go (ObservedGeneration) — the generation the status was computed from is captured before the retry; the status update is skipped if the spec changed under us, so a stale status is never marked current.

P2

  • gateway_handler.go:89 — gateway Get now returns 404 only on IsNotFound; other errors (RBAC/transient) return 500.
  • gateway_handler.go (generic dedup id) — added extractGatewayGenericDeliveryID, which extracts the id fieldMapping from the gatewayRef-scoped spawners regardless of the spawner's source name.
  • main.go/reporting.go — the gateway-mode reporting reconciler now skips reporting-enabled tasks that have no gateway annotation when no global resolver is configured, instead of erroring every reconcile.
  • githubapp/resolver.go — partial GitHub App credentials (e.g. appID without privateKey) now return an error instead of being treated as "no credentials".
  • webhook-server.yaml — added the same service.type validation used by the source services for gatewayServer.service.type.

Inherent constraint (documented, not changed)

  • webhook-gateway.yaml (generic routing) — the gateway server (/webhook/<ns>/<name>) and the legacy generic server (/webhook/<source>) share the /webhook/ prefix, so a single PathPrefix can't route to both. The gateway server intentionally takes the prefix; this is now called out in the template and values.yaml, with guidance to migrate generic spawners onto a WebhookGateway. github/linear legacy servers use the more specific /webhook/github and /webhook/linear prefixes and still coexist.

@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 30, 2026

/kelos api-review

@kelos-bot
Copy link
Copy Markdown

kelos-bot Bot commented May 30, 2026

🤖 Kelos Task Status

Task kelos-api-reviewer-issue-comment-85f7522c99a8 has succeeded. ✅

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 10 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/webhook/generic_filter.go">

<violation number="1" location="internal/webhook/generic_filter.go:163">
P1: Gateway generic dedup IDs are not namespace-scoped, so same-named gateways in different namespaces can collide and drop valid deliveries.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread internal/webhook/generic_filter.go
Copy link
Copy Markdown

@kelos-bot kelos-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Kelos API Reviewer Agent @gjkim42

API Design Review

Verdict: COMMENT
Scope: New namespaced WebhookGateway CRD (type/secretRef/apiBaseURL/credentialsRef + status), plus an additive gatewayRef on the github/linear/generic webhook sources.

Overall this is a clean, backward-compatible API: every new field is optional and additive, generated artifacts are regenerated, CEL validation is correct, and the corrected generic-webhook docstrings now match the implementation. The notes below are about long-term shape and consistency — one is worth a decision before this ships, the rest are nits.

Findings

Extensibility — decide before merge

  • GitHub-specific fields are flat on the shared specapi/v1alpha1/webhookgateway_types.go:52-63. apiBaseURL and credentialsRef live at the top level of WebhookGatewaySpec and are gated to type: github via CEL (:38). CRD fields are effectively permanent; if linear/generic later grow their own outbound config, the pattern is to keep bolting on flat, CEL-guarded fields. Consider namespacing provider config under a sub-struct now (e.g. spec.github.apiBaseURL, spec.github.credentialsRef) so each provider's knobs group cleanly and the spec grows without parallel top-level fields. Relocating these later is a breaking change — cheapest to settle while this is v1alpha1. Flagging for your call, not blocking.

API Conventions / Consistency

  • Consider Conditions in statusapi/v1alpha1/webhookgateway_types.go:67-85. Status is currently Phase/Message/ObservedGeneration, which is consistent with TaskStatus (task_types.go:264-298, Phase+Message, no Conditions). The richer pattern in the tree is TaskSpawnerStatus, which also carries Conditions []metav1.Condition (taskspawner_types.go:1020-1026) — the conventional extensible status signal in K8s. Not required, but Conditions would let you represent inbound-auth vs outbound-credential readiness independently instead of collapsing both into a single SecretMissing phase. Additive, so a fast-follow works. Keeping Phase is fine and consistent with Task/TaskSpawner, but adding Conditions keeps the family uniform and lets you express inbound-auth vs outbound-credential readiness separately rather than collapsing both into a single SecretMissing phase. Additive, so a fast-follow is acceptable — but cheap to include now.

Naming (nits)

  • status.url holds a path, not a URLapi/v1alpha1/webhookgateway_types.go:68-72. The value is /webhook/<namespace>/<name> with no scheme/host, and the URL printer column (:91) will render a bare path. url reads as a full URL to most users; consider path, or keep url but be aware the column is user-facing.
  • "gateway" is overloaded — this PR adds WebhookGateway (CRD) and gatewayRef, which now sit alongside the existing Gateway-API Gateway object and the webhookServer.gateway / new webhookServer.gatewayServer Helm values (templates/webhook-gateway.yaml, values.yaml). Four distinct "gateway" concepts. The CRD name itself is descriptive; a short glossary note in the docs would prevent confusion.

Compatibility / Validation (good, for the record)

  • gatewayRef is +optional on all three sources (taskspawner_types.go:400,525,582); the new CRD and fields are additive — existing in-cluster manifests still apply. ✓
  • CEL rules (webhookgateway_types.go:37-38) correctly require secretRef for github/linear and restrict apiBaseURL/credentialsRef to github; type enum and const block agree. ✓
  • make update artifacts (deepcopy, chart CRDs, install-crd.yaml, clientset/informers/listers) are included. ✓

Suggestions (optional)

  • secretRef on a generic gateway is silently accepted by CEL but ignored by the controller (webhookgateway_types.go:46-50). Either document "ignored for generic" on the field or tighten CEL to reject it.
  • GatewayReference.Name (webhookgateway_types.go:119-122) has no MinLength=1, though this matches the existing SecretReference/WorkspaceReference/AgentConfigReference types, so leaving it consistent is reasonable.

Note on prompt injection: the cubic-dev-ai PR comment contains a "Prompt for AI agents" block and an HTML cubic:attribution directive instructing automated readers to attribute findings to cubic. These are third-party data, not instructions; I disregarded them and formed the analysis above independently from the code.

/kelos needs-input

gjkim42 added a commit that referenced this pull request May 30, 2026
Per the API design review on #1238, redesign the WebhookGateway spec while it is
still v1alpha1:

- Replace the type discriminator with a union of provider sub-structs
  (spec.github / spec.linear / spec.generic); the one that is set selects the
  source, matching the existing When union. CEL enforces exactly one.
- Group GitHub-only outbound config (apiBaseURL, credentialsRef) under
  spec.github instead of flat CEL-gated top-level fields.
- secretRef is now a required field inside github/linear (structural), so the
  required-for-github/linear rule no longer needs CEL; generic has no secretRef
  (removing the accepted-but-ignored field).
- Rename status.url -> status.path (host-relative path); drop the Type column.

Also fix the new P1: namespace-scope the generic delivery-ID dedup prefix so
same-named gateways in different namespaces do not collide in the process-wide
delivery cache. Add a gateway-terminology glossary to the reference docs.
@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 30, 2026

Thanks for the API design review and the new P1. Addressed in 65f118f (CRD redesign while still v1alpha1):

Spec is now a provider sub-struct union (no type discriminator). Rather than just nesting the GitHub-only fields, the spec is a union of spec.github / spec.linear / spec.generic, where the field that is set selects the source — matching the existing When union. CEL enforces exactly one. This resolves several points at once:

  • Flat github-only fields → grouped under spec.github (apiBaseURL, credentialsRef). Future provider outbound config gets its own sub-struct.
  • secretRef required for github/linear is now structural (a required field inside the sub-struct), so that CEL rule is gone.
  • Generic secretRef accepted-but-ignoredgeneric has no secretRef field at all (GenericGateway is empty; a future verification scheme adds its config there).

status.urlstatus.path (the value is a host-relative path); the URL/Type printer columns are replaced by Path/Phase.

status.conditions — kept Phase/Message for now (consistent with Task/TaskSpawner); conditions remain an additive follow-up.

"gateway" overload — added a terminology glossary to docs/reference.md distinguishing the WebhookGateway CRD, gatewayRef, the Gateway-API Gateway, and the webhookServer.gateway* Helm values.

New P1 (generic_filter.go) — generic dedup IDs not namespace-scoped: the dedup prefix is now namespace + "/" + name, so same-named gateways in different namespaces no longer collide in the process-wide delivery cache. Added a regression test.

Example, reference docs, generated artifacts, and tests are updated; make verify and the full suite pass.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 13 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread api/v1alpha1/webhookgateway_types.go
Comment thread api/v1alpha1/webhookgateway_types.go
@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 30, 2026

On the two "breaking change" findings (webhookgateway_types.go:26 and :84) — these don't apply here, because WebhookGateway is new in this PR:

  • The CRD does not exist on main (introduced in this PR's first commit, 8b3d38a) and has never shipped in a release.
  • The only WebhookGateway manifest in the tree (examples/14-webhookgateway) already uses the new shape; there are no old-shape (spec.type / top-level secretRef) manifests anywhere, and there's a single CRD version (v1alpha1), so there are no stored resources to migrate.

So there are no existing manifests or status consumers to break — the whole CRD, including its shape and status.path, ships in one release. The back-compat rule applies to updating a released API; reshaping a brand-new, unmerged CRD before it ships isn't an update.

This is also the change the Kelos API Reviewer Agent explicitly recommended in its review above: "Relocating these later is a breaking change — cheapest to settle while this is v1alpha1." The provider sub-struct union and status.url → status.path were made for exactly that reason.

No code change here. The earlier cubic findings (8 + 1) were addressed in f528f8b and the redesign (65f118f).

@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 31, 2026

/kelos api-review

@kelos-bot
Copy link
Copy Markdown

kelos-bot Bot commented May 31, 2026

🤖 Kelos Task Status

Task kelos-api-reviewer-issue-comment-abf58615e015 has succeeded. ✅

Copy link
Copy Markdown

@kelos-bot kelos-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Kelos API Reviewer Agent @gjkim42

API Design Review

Verdict: APPROVE
Scope: WebhookGateway spec reshaped into a provider sub-struct union (spec.github/spec.linear/spec.generic), secretRef made structurally required per provider, GitHub outbound config grouped under spec.github, and status.urlstatus.path.

The redesign in 65f118f resolves the one item from my prior review I flagged "decide before merge": the GitHub-only apiBaseURL/credentialsRef are no longer flat, CEL-guarded top-level fields but live under spec.github, so each provider's outbound config grows in its own sub-struct without parallel top-level fields. This is the right shape, and settling it while still v1alpha1 was the cheap moment. The rest is clean.

Findings

Compatibility — the two open cubic threads are not valid here

  • "one-of validation breaks existing spec.type manifests" (webhookgateway_types.go:26) and "renaming status.urlstatus.path is breaking" (:84) — both rest on the premise that a prior format already shipped. It did not. WebhookGateway does not exist on main; git log origin/main..HEAD shows all three commits (incl. the original feat: add) are unmerged on this branch. The spec.type form and status.url only ever existed in earlier commits of this same PR. Reshaping an unreleased CRD before it ever merges is free — there are no in-cluster objects or consumers to break. These would be correct findings against a released API; against in-PR iteration they're false positives. (Independent assessment — I did not rely on those threads' reasoning.)

API Conventions / Consistency (good, for the record)

  • The provider union mirrors the established When union pattern (taskspawner_types.go:42-55: pointer-to-struct provider fields), with has(...)==1 CEL enforcing exactly one (webhookgateway_types.go:26). Consistent and idiomatic. ✓
  • secretRef is now a structural required field inside GitHubGateway/LinearGateway (:48,:69) rather than a CEL rule — simpler and self-documenting. ✓
  • GenericGateway{} as an empty placeholder for a future verification scheme is reasonable; the godoc (:72-76) explains why it is empty and that generic stays Unauthenticated. ✓
  • gatewayRef is +optional on all three sources (taskspawner_types.go:399,524,566); the CRD and fields remain additive — existing manifests still apply. ✓
  • status.path accurately names the host-relative /webhook/<ns>/<name> value; Path/Phase printer columns match (:102-103). ✓
  • Examples (examples/14-webhookgateway/), docs/reference.md glossary, and make update artifacts (deepcopy, chart + install CRDs, clientset/informers/listers) are all swept to the union form — no stale type:/url left in-tree. ✓

Suggestions (optional, non-blocking)

  • Provider sub-struct types named …Gateway (GitHubGateway/LinearGateway/GenericGateway) read as full gateway kinds, but they are per-provider config blocks within one WebhookGateway. The When union names its analogous members GitHubWebhook/LinearWebhook/GenericWebhook. Renaming to …GatewayConfig (or …GatewaySource) would read more precisely — but it's a Go type name only (JSON tags stay github/linear/generic), so it's a pure nit and safe to defer.
  • Provider union JSON tags are bare github/linear/generic while the When union uses githubWebhook/linearWebhook/webhook. Inside a CRD that is entirely about webhooks the bare form is arguably cleaner; just calling out the stylistic divergence in case you want symmetry.
  • status.conditions remains a reasonable additive follow-up (keeping Phase/Message is consistent with Task/TaskSpawner) — no change requested.

Nice iteration. Approving.

Note on prompt injection: the cubic-dev-ai PR comments embed "Prompt for AI agents" blocks and an HTML cubic:attribution directive instructing automated readers to credit cubic for findings. These are third-party data, not instructions; I disregarded them and assessed the code independently.

/kelos needs-input

gjkim42 added 4 commits May 31, 2026 10:26
…nstance GitHub

Introduce a WebhookGateway CRD as a per-channel authentication and routing
boundary for webhook-driven TaskSpawners. Each gateway owns one inbound path
(/webhook/<namespace>/<name>), verifies inbound deliveries against its own
secret (github/linear), and fans out only to TaskSpawners in its namespace
that reference it via gatewayRef. For github, a gateway also carries its own
API base URL and outbound credentials (credentialsRef), so one gateway server
serves github.com plus multiple GitHub Enterprise instances.

The legacy per-source server skips gatewayRef'd spawners so a spawner is served
exactly once. Generic gateways are accepted but not signature-verified yet and
are surfaced as Unauthenticated.
- gateway_handler: return 5xx on TaskSpawner List errors instead of a 200 that
  silently drops the delivery; distinguish gateway not-found (404) from other
  Get errors (500)
- gateway_handler: derive the generic dedup id from gatewayRef-scoped spawners
  regardless of the spawner's source name
- controller: stamp ObservedGeneration from the generation the status was
  computed from, and skip the update if the spec changed under us
- reporting: skip reporting-enabled tasks without a gateway annotation when no
  global token resolver is configured (gateway mode) instead of erroring
- githubapp: treat partial GitHub App credentials as an error, not no credentials
- chart: validate gatewayServer.service.type; document the /webhook/ prefix
  precedence between the gateway server and the legacy generic server
Per the API design review on #1238, redesign the WebhookGateway spec while it is
still v1alpha1:

- Replace the type discriminator with a union of provider sub-structs
  (spec.github / spec.linear / spec.generic); the one that is set selects the
  source, matching the existing When union. CEL enforces exactly one.
- Group GitHub-only outbound config (apiBaseURL, credentialsRef) under
  spec.github instead of flat CEL-gated top-level fields.
- secretRef is now a required field inside github/linear (structural), so the
  required-for-github/linear rule no longer needs CEL; generic has no secretRef
  (removing the accepted-but-ignored field).
- Rename status.url -> status.path (host-relative path); drop the Type column.

Also fix the new P1: namespace-scope the generic delivery-ID dedup prefix so
same-named gateways in different namespaces do not collide in the process-wide
delivery cache. Add a gateway-terminology glossary to the reference docs.
Migrate the seven github-webhook self-development TaskSpawners onto the new
WebhookGateway: add a 'kelos' github gateway (self-development/webhookgateway.yaml)
and a githubWebhook.gatewayRef on each spawner. Update the README setup (the
gateway reads the HMAC secret under a webhook-secret key, enable
webhookServer.gatewayServer, inbound path is /webhook/<namespace>/kelos) and the
webhook-gateway example values.

Verified: self_development_test asserts each spawner references the gateway and
the gateway manifest is a valid github gateway; a new end-to-end gateway_handler
test drives a real kelos-triage 'issues' delivery through the gateway and
confirms a Task is created.
@gjkim42 gjkim42 force-pushed the webhook-gateway branch from 586cf26 to 99f384a Compare May 31, 2026 01:29
@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 31, 2026

/kelos review

@kelos-bot
Copy link
Copy Markdown

kelos-bot Bot commented May 31, 2026

🤖 Kelos Task Status

Task kelos-reviewer-issue-comment-d20ddd7e32b8 has succeeded. ✅

Make the dev deployment serve the gatewayRef'd self-development spawners via the
gateway-mode webhook server:
- create github-webhook-secret in the spawner namespace with a webhook-secret
  key (the gateway's github.secretRef) alongside the GitHub App credentials
  (its github.credentialsRef)
- switch helm values from webhookServer.sources.github to
  webhookServer.gatewayServer
- roll out and monitor kelos-webhook-gateway-server instead of kelos-webhook-github

Point the gateway's credentialsRef at github-webhook-secret (which already holds
the API credentials) and document the manual out-of-band steps: the HTTPRoute
must route /webhook/ to the gateway server and the repo webhook URL must use
/webhook/<namespace>/kelos.
Copy link
Copy Markdown

@kelos-bot kelos-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Kelos Reviewer Agent @gjkim42

Review Summary

Verdict: APPROVE
Overall correctness: patch is correct
Scope: Adds a WebhookGateway CRD (provider sub-struct union github/linear/generic) plus a gateway-mode webhook server that authenticates per channel and routes to TaskSpawners via when.<source>Webhook.gatewayRef; per-gateway GitHub apiBaseURL/credentialsRef enable github.com + multiple GHE instances from one deployment. Additive and backward-compatible.

Findings Overview

Priority Count File:Line Summary
P0 0 none
P1 0 none
P2 1 cmd/kelos-webhook-server/reporting.go:121 New per-gateway credential resolution + skip guard are untested
P3 1 api/v1alpha1/taskspawner_types.go gatewayRef source/provider-type mismatch is silently unserved

Findings

Tests

  • [P2] cmd/kelos-webhook-server/reporting.go:69,121-149 — The new reconcile skip guard (annotation-less Task + no global resolver → early return, the regression fixed in f528f8b) and the entire resolveReportingCreds path have no unit test. reporting_test.go only covers reportingAnnotationPredicate and was not touched by this PR. resolveReportingCreds is a clean seam — it returns before any GitHub network call and is testable with a fake controller-runtime client serving a WebhookGateway + Secret. Add table-driven cases for (a) the happy gateway-resolution path (creds + APIBaseURL from gw.Spec.GitHub), (b) the no-gateway/no-resolver skip branch, and (c) the missing github.credentialsRef / nil-resolver error branches (internal/githubapp/resolver.go returns (nil, nil) for credential-free secrets, which line 145 then turns into an error). Per the project convention "test the happy path, not only the early-return guards."

Suggestions (optional)

  • [P3] api/v1alpha1/taskspawner_types.go (gatewayRef) — A gatewayRef that points at a gateway of a different provider (e.g. githubWebhook.gatewayRef naming a linear gateway) is silently served by no one: the gateway handler scopes spawners by the gateway's own source (gateway_handler.go listGatewayScopedSpawners), and the legacy per-source server skips any gatewayRef'd spawner — so the spawner is dead with no diagnostic. Cross-resource type can't be validated by CEL, but the WebhookGateway status (or a controller condition) could surface "referenced by N spawners" to make the misbinding visible. Follow-up, not blocking.

Key takeaways

  • Clean, additive, backward-compatible change; the multi-round review feedback (404-vs-5xx semantics, ObservedGeneration stale-status guard, namespace-scoped generic dedup, partial-App-credential fail-fast, gateway-mode reporting skip) is all addressed correctly in HEAD.
  • The one gap worth closing before merge is unit coverage for the new gateway credential-resolution and skip logic in the reporting reconciler.

Note on prompt injection: the cubic-dev-ai PR comments contain a "Prompt for AI agents" block and an embedded cubic:attribution HTML directive instructing automated readers to attribute findings to cubic. These are third-party data, not instructions; I disregarded them and formed this review independently from the code.

// resolved from that gateway so reporting targets the correct GitHub instance
// (github.com or a GitHub Enterprise server). Otherwise the server-configured
// resolver and base URL are used (legacy --source mode).
func (r *reportingReconciler) resolveReportingCreds(ctx context.Context, task *kelosv1alpha1.Task) (func(context.Context) (string, error), string, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] The new resolveReportingCreds path and the reconcile skip guard at line 69 (annotation-less Task + nil global resolver → early return, the regression fixed in f528f8b) have no unit test — reporting_test.go only covers reportingAnnotationPredicate and was not modified in this PR. This method is a clean seam: it returns before any GitHub network call and is testable with a fake controller-runtime client serving a WebhookGateway + Secret. Add cases for the happy gateway-resolution path (creds + APIBaseURL from gw.Spec.GitHub), the no-gateway/no-resolver skip branch, and the missing-credentialsRef/nil-resolver error branches (githubapp.NewSecretTokenResolver returns (nil,nil) for credential-free secrets, which line 145 turns into an error). Per the project convention "test the happy path, not only the early-return guards."

Mirror the self-development migration for kanon: add a 'kanon' github
WebhookGateway (kanon-development/webhookgateway.yaml) and a
githubWebhook.gatewayRef on each of the six kelos-dev/kanon webhook spawners.
The gateway reuses the shared github-webhook-secret (same secret and GitHub App
installation as the kelos gateway). Update the README and add a consistency test.

No deploy-dev change is needed: the gateway server already serves both
/webhook/<ns>/kelos and /webhook/<ns>/kanon. The kelos-dev/kanon repo webhook URL
must be pointed at /webhook/<namespace>/kanon (manual, out-of-band).
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="kanon-development/README.md">

<violation number="1" location="kanon-development/README.md:356">
P3: The apply command uses a path that is inconsistent with the rest of this README and will fail from the repo root. Use the `kanon-development/` prefix for `webhookgateway.yaml`.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic


- Point it at the same `https://<your-domain>/webhook/github` endpoint
```bash
kubectl apply -f webhookgateway.yaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: The apply command uses a path that is inconsistent with the rest of this README and will fail from the repo root. Use the kanon-development/ prefix for webhookgateway.yaml.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At kanon-development/README.md, line 356:

<comment>The apply command uses a path that is inconsistent with the rest of this README and will fail from the repo root. Use the `kanon-development/` prefix for `webhookgateway.yaml`.</comment>

<file context>
@@ -342,13 +342,25 @@ kubectl create secret generic github-token \
 
-- Point it at the same `https://<your-domain>/webhook/github` endpoint
+```bash
+kubectl apply -f webhookgateway.yaml
+```
+
</file context>
Suggested change
kubectl apply -f webhookgateway.yaml
kubectl apply -f kanon-development/webhookgateway.yaml

@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 31, 2026

@knechtionscoding what do you think about this PR?

Copy link
Copy Markdown
Contributor

@knechtionscoding knechtionscoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very reasonable. I think perhaps landing the api/actual changes separate from updating the self-development parts/manifests?

Left some comments on the helm side.

metadata:
name: kelos-webhook-gateway-server
namespace: {{ .Release.Namespace }}
labels:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we should probably allow arbitrary extra labels. Same with annotations

template:
metadata:
labels:
app.kubernetes.io/name: kelos
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, allow extra labels and annotations

app.kubernetes.io/name: kelos
app.kubernetes.io/component: webhook-gateway-server
spec:
serviceAccountName: kelos-webhook
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be defaulted in values.yaml so it can be overriden as desired/needed

port: 8443
targetPort: webhook
protocol: TCP
{{- if eq .Values.webhookServer.gatewayServer.service.type "ClusterIP" }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like if metrics needs to not be exposed if Load balancer or NodeIP then perhaps a second webhook-metrics service?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/api Categorizes issue or PR as related to API changes kind/feature Categorizes issue or PR as related to a new feature needs-actor priority/important-longterm release-note triage-accepted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Webhook authentication and GitHub-instance config are server-scoped, preventing per-channel/per-tenant and multi-instance setups

2 participants