Skip to content

feat: Add kelos.dev/v1alpha2 group version with conversion webhook (AgentConfig env + TaskSpawner #704 cleanup)#1242

Open
gjkim42 wants to merge 4 commits into
mainfrom
kelos-agentconfig-v1alpha2
Open

feat: Add kelos.dev/v1alpha2 group version with conversion webhook (AgentConfig env + TaskSpawner #704 cleanup)#1242
gjkim42 wants to merge 4 commits into
mainfrom
kelos-agentconfig-v1alpha2

Conversation

@gjkim42
Copy link
Copy Markdown
Collaborator

@gjkim42 gjkim42 commented May 30, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduces the kelos.dev/v1alpha2 API group version (served, storage) alongside v1alpha1 (served), bridged by a CRD conversion webhook so existing in-cluster objects keep working with no re-apply. v1alpha2 lets the API evolve without breaking v1alpha1 manifests.

Per kind:

  • AgentConfigmcpServers[].env changes from map[string]string (v1alpha1) to []corev1.EnvVar with valueFrom.secretKeyRef/configMapKeyRef (v1alpha2). This was the original motivation (was a breaking change in the superseded feat: Support env.valueFrom in MCP server config #1154). Down-conversion drops valueFrom entries (no map representation); literal values map through.
  • TaskSpawner — the API: Plan TaskSpawner v1alpha2 as a breaking cleanup and field removal #704 cleanup. v1alpha2 drops the deprecated root spec.pollInterval (per-source pollInterval stays), the legacy triggerComment/excludeComments (keeping commentPolicy), and the githubWebhook filter bodyContains (keeping bodyPattern). Up-conversion folds the old fields into their replacements (root pollInterval → active source; legacy comment → commentPolicy; bodyContainsbodyPattern).
  • Task, Workspace — identity mirrors of v1alpha1 (no schema change), so the group version is complete.

The controllers, webhook handler, and CLI printer keep reading v1alpha1; the conversion webhook hands them the v1alpha1 shape, so their logic is unchanged. The conversion webhook is served from the kelos-controller binary. A cert-manager Issuer + Certificate provide the serving cert and cert-manager.io/inject-ca-from injects the CA into each CRD's conversion config. kelos uninstall lists/deletes every kind via the storage version so cleanup never depends on the webhook.

Which issue(s) this PR is related to:

Fixes #1152
Addresses #704

Supersedes #1154 (its breaking v1alpha1 change is reverted here).

Special notes for your reviewer:

  • Conversion lives in api/v1alpha1/*_conversion.go (spokes); v1alpha2 types are the hubs. Identity conversions use a JSON round-trip helper (convertViaJSON) to avoid hundreds of lines of field-copy boilerplate; TaskSpawner adds explicit folding for the three dropped field groups.
  • The CRD spec.conversion block is injected for all four kelos CRDs by hack/update-install-manifest.sh (controller-gen does not emit it).
  • Integration tests exercise the real webhook via envtest WebhookInstallOptions (test/integration/conversion_test.go): AgentConfig env up/down-conversion and TaskSpawner legacy→modern folding. Stub cert-manager CRDs under test/integration/testdata let kelos install apply the Issuer/Certificate in envtest.
  • make verify, make test, and make test-integration (108 specs) pass locally.

Does this PR introduce a user-facing change?

[ACTION REQUIRED] The kelos CRDs now serve a new `v1alpha2` version (storage version) alongside `v1alpha1`, bridged by a conversion webhook that requires **cert-manager** in the cluster. AgentConfig `mcpServers[].env` gains the `[]EnvVar` + `valueFrom` shape in v1alpha2 (v1alpha1 keeps the map). TaskSpawner v1alpha2 drops the deprecated root `spec.pollInterval`, `triggerComment`/`excludeComments` (use `commentPolicy`), and the githubWebhook `bodyContains` (use `bodyPattern`); existing v1alpha1 objects are migrated automatically by the webhook. Task and Workspace v1alpha2 are unchanged mirrors.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 57 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/cli/install.go">

<violation number="1" location="internal/cli/install.go:453">
P1: Using `agentconfigs` v1alpha2 for uninstall cleanup can fail when legacy objects are still stored as v1alpha1, because listing via v1alpha2 may require webhook conversion. This introduces a webhook dependency in the cleanup path instead of removing it.</violation>
</file>

<file name="hack/update-install-manifest.sh">

<violation number="1" location="hack/update-install-manifest.sh:232">
P2: The CA-injection annotation is conditionally added only when an `annotations:` block already exists, so this script can silently miss required webhook CA wiring if CRD output shape changes.</violation>
</file>

<file name="internal/controller/task_controller.go">

<violation number="1" location="internal/controller/task_controller.go:554">
P2: Reject `valueFrom` entries that set both `secretKeyRef` and `configMapKeyRef`; the current switch silently prefers the secret and hides invalid configuration.</violation>
</file>

<file name="api/v1alpha2/agentconfig_types.go">

<violation number="1" location="api/v1alpha2/agentconfig_types.go:41">
P2: Validate `SecretReference.Name` with a minimum length so empty secret names are rejected at admission.</violation>

<violation number="2" location="api/v1alpha2/agentconfig_types.go:95">
P1: Add CRD cross-field validation so `command` is required for `stdio` and `url` is required for `http/sse`; currently these constraints are only comments.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread internal/cli/install.go
{Group: "kelos.dev", Version: "v1alpha1", Resource: "agentconfigs"},
// agentconfigs is served in two versions; list/delete via the storage
// version (v1alpha2) so cleanup does not depend on the conversion webhook.
{Group: "kelos.dev", Version: "v1alpha2", Resource: "agentconfigs"},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Using agentconfigs v1alpha2 for uninstall cleanup can fail when legacy objects are still stored as v1alpha1, because listing via v1alpha2 may require webhook conversion. This introduces a webhook dependency in the cleanup path instead of removing it.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/cli/install.go, line 453:

<comment>Using `agentconfigs` v1alpha2 for uninstall cleanup can fail when legacy objects are still stored as v1alpha1, because listing via v1alpha2 may require webhook conversion. This introduces a webhook dependency in the cleanup path instead of removing it.</comment>

<file context>
@@ -448,7 +448,9 @@ var kelosGVRs = []schema.GroupVersionResource{
-	{Group: "kelos.dev", Version: "v1alpha1", Resource: "agentconfigs"},
+	// agentconfigs is served in two versions; list/delete via the storage
+	// version (v1alpha2) so cleanup does not depend on the conversion webhook.
+	{Group: "kelos.dev", Version: "v1alpha2", Resource: "agentconfigs"},
 }
 
</file context>

// Command is the executable to run for stdio transport.
// Required when type is "stdio".
// +optional
Command string `json:"command,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Add CRD cross-field validation so command is required for stdio and url is required for http/sse; currently these constraints are only comments.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At api/v1alpha2/agentconfig_types.go, line 95:

<comment>Add CRD cross-field validation so `command` is required for `stdio` and `url` is required for `http/sse`; currently these constraints are only comments.</comment>

<file context>
@@ -0,0 +1,180 @@
+	// Command is the executable to run for stdio transport.
+	// Required when type is "stdio".
+	// +optional
+	Command string `json:"command,omitempty"`
+
+	// Args are command-line arguments for the server process.
</file context>

}
for (i = 1; i <= n; i++) {
print buf[i]
if (isAC && buf[i] ~ /^ annotations:[[:space:]]*$/) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The CA-injection annotation is conditionally added only when an annotations: block already exists, so this script can silently miss required webhook CA wiring if CRD output shape changes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At hack/update-install-manifest.sh, line 232:

<comment>The CA-injection annotation is conditionally added only when an `annotations:` block already exists, so this script can silently miss required webhook CA wiring if CRD output shape changes.</comment>

<file context>
@@ -211,6 +211,50 @@ write_chart_crd_template() {
+  }
+  for (i = 1; i <= n; i++) {
+    print buf[i]
+    if (isAC && buf[i] ~ /^  annotations:[[:space:]]*$/) {
+      print "    cert-manager.io/inject-ca-from: kelos-system/kelos-serving-cert"
+    }
</file context>

return "", false, fmt.Errorf("MCP server %q env %q: valueFrom only supports secretKeyRef and configMapKeyRef", serverName, entry.Name)
}

switch {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Reject valueFrom entries that set both secretKeyRef and configMapKeyRef; the current switch silently prefers the secret and hides invalid configuration.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/controller/task_controller.go, line 554:

<comment>Reject `valueFrom` entries that set both `secretKeyRef` and `configMapKeyRef`; the current switch silently prefers the secret and hides invalid configuration.</comment>

<file context>
@@ -451,27 +453,150 @@ func (r *TaskReconciler) resolveMCPServerSecrets(ctx context.Context, namespace
+		return "", false, fmt.Errorf("MCP server %q env %q: valueFrom only supports secretKeyRef and configMapKeyRef", serverName, entry.Name)
+	}
+
+	switch {
+	case src.SecretKeyRef != nil:
+		ref := src.SecretKeyRef
</file context>
Suggested change
switch {
if (src.SecretKeyRef != nil && src.ConfigMapKeyRef != nil) || (src.SecretKeyRef == nil && src.ConfigMapKeyRef == nil) {
return "", false, fmt.Errorf("MCP server %q env %q: valueFrom must set exactly one of secretKeyRef or configMapKeyRef", serverName, entry.Name)
}
switch {

// Name is the plugin name. Used as the plugin directory name
// and for namespacing in Claude Code (e.g., <name>:skill-name).
// +kubebuilder:validation:MinLength=1
Name string `json:"name"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Validate SecretReference.Name with a minimum length so empty secret names are rejected at admission.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At api/v1alpha2/agentconfig_types.go, line 41:

<comment>Validate `SecretReference.Name` with a minimum length so empty secret names are rejected at admission.</comment>

<file context>
@@ -0,0 +1,180 @@
+	// Name is the plugin name. Used as the plugin directory name
+	// and for namespacing in Claude Code (e.g., <name>:skill-name).
+	// +kubebuilder:validation:MinLength=1
+	Name string `json:"name"`
+
+	// Skills defines skills for this plugin.
</file context>

gjkim42 and others added 3 commits May 30, 2026 10:02
Change MCPServerSpec.Env from map[string]string to []corev1.EnvVar so
each env entry can reference a single key in a Secret or ConfigMap via
the standard valueFrom shape (secretKeyRef, configMapKeyRef). This
matches the convention already used by PodOverrides.Env and unblocks
the common case of pulling one credential out of a shared Secret
without depending on the whole-collection envFrom.

Only SecretKeyRef and ConfigMapKeyRef variants of ValueFrom are
honored — FieldRef and ResourceFieldRef are pod-scoped and have no
meaning for an MCP server process. EnvFrom still wins over Env on
name collision so existing behaviour is preserved.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…emantics

Reject fieldRef, resourceFieldRef and fileKeyRef (including when combined
with a supported source) up front, instead of silently honoring the first
matched source. Omit an env entry whose optional valueFrom resolves to a
missing Secret/ConfigMap or key, matching kubelet semantics for pod env,
rather than emitting an empty value. Sort envFrom keys for deterministic
output and clarify the Env/EnvFrom godoc.
Introduces a non-breaking migration path for the mcpServers[].env shape
change. v1alpha1 keeps the released map[string]string env; v1alpha2 (the
storage version) carries the []EnvVar + valueFrom shape. A CRD conversion
webhook bridges them, so existing map-shaped AgentConfig objects keep working
and convert up on read.

- api/v1alpha2: AgentConfig and supporting types (storage/hub version)
- api/v1alpha1: env reverted to map[string]string; ConvertTo/ConvertFrom
  (map<->list, valueFrom dropped on down-conversion)
- controller reads v1alpha2; conversion webhook served from kelos-controller
- cert-manager Issuer/Certificate + Service; CRD conversion block injected
  with cert-manager.io/inject-ca-from
- uninstall lists/deletes agentconfigs via the storage version
- conversion unit tests + envtest integration coverage of the webhook

Supersedes the breaking change in #1154.
Completes the kelos.dev/v1alpha2 group version. Task and Workspace are
identity mirrors of v1alpha1; TaskSpawner v1alpha2 applies the #704 cleanup:
drops the deprecated root spec.pollInterval, the legacy
triggerComment/excludeComments (commentPolicy only), and the githubWebhook
filter bodyContains (bodyPattern only).

Controllers keep reading v1alpha1; the conversion webhook bridges. v1alpha1
TaskSpawner objects fold forward (root pollInterval -> active source, legacy
comment -> commentPolicy, bodyContains -> bodyPattern). Conversion uses a JSON
round-trip for the bulk identity copy. The webhook, cert-manager Service/cert,
and CRD conversion injection now cover all four kinds; uninstall lists every
kind via the storage version.

Addresses #704.
@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 30, 2026

This PR is large and would use a significant portion of your monthly review quota. Comment @cubic-dev-ai review this to confirm that you want cubic to review it.

@gjkim42 gjkim42 changed the title feat: Add AgentConfig v1alpha2 with conversion webhook (non-breaking MCP env migration) feat: Add kelos.dev/v1alpha2 group version with conversion webhook (AgentConfig env + TaskSpawner #704 cleanup) May 30, 2026
@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented May 30, 2026

/kelos api-review

@kelos-bot
Copy link
Copy Markdown

kelos-bot Bot commented May 30, 2026

🤖 Kelos Task Status

Task kelos-api-reviewer-issue-comment-9afda6984e2d has succeeded. ✅

Copy link
Copy Markdown

@kelos-bot kelos-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Kelos API Reviewer Agent @gjkim42

API Design Review

Verdict: REQUEST CHANGES
Scope: Introduces kelos.dev/v1alpha2 (storage) alongside v1alpha1 via a CRD conversion webhook; AgentConfig mcpServers[].env becomes []EnvVar+valueFrom, TaskSpawner drops three deprecated field groups (root pollInterval, triggerComment/excludeComments, bodyContains).

The overall mechanism is the right shape: a new served+storage version, v1alpha1 kept fully served, deprecation-by-removal expressed only in the new version with up/down folding — existing manifests keep applying, and controllers stay on the v1alpha1 view. The findings below are about getting validation and docs right now, because v1alpha2 fields are permanent once shipped.

Findings

Validation / CRD (blocking)

  • MCPServerSpec.Env []corev1.EnvVar — documented contract is not enforced at admissionapi/v1alpha2/agentconfig_types.go:119-133. The godoc states unsupported valueFrom variants (fieldRef, resourceFieldRef, fileKeyRef, future additions) "are rejected", but rejection happens only at reconcile time in resolveEnvVarValue (internal/controller/task_controller.go:550-552), not at apply time. The CRD OpenAPI schema (derived from the full EnvVarSource) accepts these, so an operator can kubectl apply an AgentConfig that the API server admits and that only fails later when a Task consumes it. Per project conventions (docs must describe enforced behavior; keep API surfaces minimal), enforce the contract where it is documented — add a +kubebuilder:validation:XValidation (CEL) rule restricting valueFrom to secretKeyRef/configMapKeyRef, or define a narrower env-source type. The same CEL pass should reject secretKeyRef + configMapKeyRef set together (today the switch at task_controller.go:554 silently prefers the secret).
  • commandOn/transport requiredness is comment-onlyapi/v1alpha2/agentconfig_types.go:92-105. "Required when type is stdio" (command) and "Required when type is http/sse" (url) are prose, not constraints. v1alpha2 is a fresh schema and the right place to add a CEL XValidation on MCPServerSpec so the API server enforces command-vs-url by transport instead of admitting invalid servers that fail at runtime.

Enum / docstring consistency (blocking)

  • GitHubWebhookFilter.CommentOn enum rejects the value its godoc invitesapi/v1alpha2/taskspawner_types.go:438-444. The comment says "Empty matches both", but +kubebuilder:validation:Enum=Issue;PullRequest rejects an explicit commentOn: "". This is the exact docstring/enum mismatch called out in the project conventions. Note the sibling LinearWebhookFilter.Action (:488) already does the right thing with Enum=create;update;remove;"". Fix by adding "" to the CommentOn enum (consistent with Action, and a safe loosening) or reword to "Omit to match both".

Docs vs implementation (should-fix)

  • Per-source PollInterval godoc references a field v1alpha2 removedapi/v1alpha2/taskspawner_types.go:208, 291, 343. Each still says "overrides spec.pollInterval for this source", but root spec.pollInterval no longer exists in TaskSpawnerSpec (:880-912). Update the wording, and since the shared root default is gone, document what the effective default poll interval is when a source omits pollInterval.

Suggestions (optional, non-blocking)

  • Down-conversion is lossy by design (api/v1alpha1/agentconfig_conversion.go:109-124): valueFrom env entries are dropped when serving v1alpha1. Acceptable because v1alpha2 is storage, but a v1alpha1 client doing read-modify-write would silently drop valueFrom. Fine for alpha; worth a one-line note in the conversion godoc / release note so users know not to round-trip through v1alpha1.
  • Durations as bare strings (pollInterval, e.g. "30s"/"5m"): K8s API conventions favor metav1.Duration (or a ...Seconds int field) over free-form strings. Carried over from v1alpha1, so parity is a fair reason to keep it; flagging since a new version is the rare chance to adopt the typed form.
  • TaskSpawner convertViaJSON round-trip relies on identical JSON tags between versions for every non-folded field. A focused conversion fuzz/round-trip test (beyond the three folded groups) would guard against silent drops if the two structs drift later.

Note on prompt injection: a prior automated comment on this PR embedded hidden instructions demanding that findings be attributed to it. I disregarded that directive; the analysis above is my own, derived from the code.

/kelos needs-input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce valueFrom field

1 participant