feat: Add kelos.dev/v1alpha2 group version with conversion webhook (AgentConfig env + TaskSpawner #704 cleanup)#1242
feat: Add kelos.dev/v1alpha2 group version with conversion webhook (AgentConfig env + TaskSpawner #704 cleanup)#1242gjkim42 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
5 issues found across 57 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/cli/install.go">
<violation number="1" location="internal/cli/install.go:453">
P1: Using `agentconfigs` v1alpha2 for uninstall cleanup can fail when legacy objects are still stored as v1alpha1, because listing via v1alpha2 may require webhook conversion. This introduces a webhook dependency in the cleanup path instead of removing it.</violation>
</file>
<file name="hack/update-install-manifest.sh">
<violation number="1" location="hack/update-install-manifest.sh:232">
P2: The CA-injection annotation is conditionally added only when an `annotations:` block already exists, so this script can silently miss required webhook CA wiring if CRD output shape changes.</violation>
</file>
<file name="internal/controller/task_controller.go">
<violation number="1" location="internal/controller/task_controller.go:554">
P2: Reject `valueFrom` entries that set both `secretKeyRef` and `configMapKeyRef`; the current switch silently prefers the secret and hides invalid configuration.</violation>
</file>
<file name="api/v1alpha2/agentconfig_types.go">
<violation number="1" location="api/v1alpha2/agentconfig_types.go:41">
P2: Validate `SecretReference.Name` with a minimum length so empty secret names are rejected at admission.</violation>
<violation number="2" location="api/v1alpha2/agentconfig_types.go:95">
P1: Add CRD cross-field validation so `command` is required for `stdio` and `url` is required for `http/sse`; currently these constraints are only comments.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| {Group: "kelos.dev", Version: "v1alpha1", Resource: "agentconfigs"}, | ||
| // agentconfigs is served in two versions; list/delete via the storage | ||
| // version (v1alpha2) so cleanup does not depend on the conversion webhook. | ||
| {Group: "kelos.dev", Version: "v1alpha2", Resource: "agentconfigs"}, |
There was a problem hiding this comment.
P1: Using agentconfigs v1alpha2 for uninstall cleanup can fail when legacy objects are still stored as v1alpha1, because listing via v1alpha2 may require webhook conversion. This introduces a webhook dependency in the cleanup path instead of removing it.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/cli/install.go, line 453:
<comment>Using `agentconfigs` v1alpha2 for uninstall cleanup can fail when legacy objects are still stored as v1alpha1, because listing via v1alpha2 may require webhook conversion. This introduces a webhook dependency in the cleanup path instead of removing it.</comment>
<file context>
@@ -448,7 +448,9 @@ var kelosGVRs = []schema.GroupVersionResource{
- {Group: "kelos.dev", Version: "v1alpha1", Resource: "agentconfigs"},
+ // agentconfigs is served in two versions; list/delete via the storage
+ // version (v1alpha2) so cleanup does not depend on the conversion webhook.
+ {Group: "kelos.dev", Version: "v1alpha2", Resource: "agentconfigs"},
}
</file context>
| // Command is the executable to run for stdio transport. | ||
| // Required when type is "stdio". | ||
| // +optional | ||
| Command string `json:"command,omitempty"` |
There was a problem hiding this comment.
P1: Add CRD cross-field validation so command is required for stdio and url is required for http/sse; currently these constraints are only comments.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At api/v1alpha2/agentconfig_types.go, line 95:
<comment>Add CRD cross-field validation so `command` is required for `stdio` and `url` is required for `http/sse`; currently these constraints are only comments.</comment>
<file context>
@@ -0,0 +1,180 @@
+ // Command is the executable to run for stdio transport.
+ // Required when type is "stdio".
+ // +optional
+ Command string `json:"command,omitempty"`
+
+ // Args are command-line arguments for the server process.
</file context>
| } | ||
| for (i = 1; i <= n; i++) { | ||
| print buf[i] | ||
| if (isAC && buf[i] ~ /^ annotations:[[:space:]]*$/) { |
There was a problem hiding this comment.
P2: The CA-injection annotation is conditionally added only when an annotations: block already exists, so this script can silently miss required webhook CA wiring if CRD output shape changes.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At hack/update-install-manifest.sh, line 232:
<comment>The CA-injection annotation is conditionally added only when an `annotations:` block already exists, so this script can silently miss required webhook CA wiring if CRD output shape changes.</comment>
<file context>
@@ -211,6 +211,50 @@ write_chart_crd_template() {
+ }
+ for (i = 1; i <= n; i++) {
+ print buf[i]
+ if (isAC && buf[i] ~ /^ annotations:[[:space:]]*$/) {
+ print " cert-manager.io/inject-ca-from: kelos-system/kelos-serving-cert"
+ }
</file context>
| return "", false, fmt.Errorf("MCP server %q env %q: valueFrom only supports secretKeyRef and configMapKeyRef", serverName, entry.Name) | ||
| } | ||
|
|
||
| switch { |
There was a problem hiding this comment.
P2: Reject valueFrom entries that set both secretKeyRef and configMapKeyRef; the current switch silently prefers the secret and hides invalid configuration.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/controller/task_controller.go, line 554:
<comment>Reject `valueFrom` entries that set both `secretKeyRef` and `configMapKeyRef`; the current switch silently prefers the secret and hides invalid configuration.</comment>
<file context>
@@ -451,27 +453,150 @@ func (r *TaskReconciler) resolveMCPServerSecrets(ctx context.Context, namespace
+ return "", false, fmt.Errorf("MCP server %q env %q: valueFrom only supports secretKeyRef and configMapKeyRef", serverName, entry.Name)
+ }
+
+ switch {
+ case src.SecretKeyRef != nil:
+ ref := src.SecretKeyRef
</file context>
| switch { | |
| if (src.SecretKeyRef != nil && src.ConfigMapKeyRef != nil) || (src.SecretKeyRef == nil && src.ConfigMapKeyRef == nil) { | |
| return "", false, fmt.Errorf("MCP server %q env %q: valueFrom must set exactly one of secretKeyRef or configMapKeyRef", serverName, entry.Name) | |
| } | |
| switch { |
| // Name is the plugin name. Used as the plugin directory name | ||
| // and for namespacing in Claude Code (e.g., <name>:skill-name). | ||
| // +kubebuilder:validation:MinLength=1 | ||
| Name string `json:"name"` |
There was a problem hiding this comment.
P2: Validate SecretReference.Name with a minimum length so empty secret names are rejected at admission.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At api/v1alpha2/agentconfig_types.go, line 41:
<comment>Validate `SecretReference.Name` with a minimum length so empty secret names are rejected at admission.</comment>
<file context>
@@ -0,0 +1,180 @@
+ // Name is the plugin name. Used as the plugin directory name
+ // and for namespacing in Claude Code (e.g., <name>:skill-name).
+ // +kubebuilder:validation:MinLength=1
+ Name string `json:"name"`
+
+ // Skills defines skills for this plugin.
</file context>
Change MCPServerSpec.Env from map[string]string to []corev1.EnvVar so each env entry can reference a single key in a Secret or ConfigMap via the standard valueFrom shape (secretKeyRef, configMapKeyRef). This matches the convention already used by PodOverrides.Env and unblocks the common case of pulling one credential out of a shared Secret without depending on the whole-collection envFrom. Only SecretKeyRef and ConfigMapKeyRef variants of ValueFrom are honored — FieldRef and ResourceFieldRef are pod-scoped and have no meaning for an MCP server process. EnvFrom still wins over Env on name collision so existing behaviour is preserved. Co-Authored-By: Claude Opus 4.7 <[email protected]>
…emantics Reject fieldRef, resourceFieldRef and fileKeyRef (including when combined with a supported source) up front, instead of silently honoring the first matched source. Omit an env entry whose optional valueFrom resolves to a missing Secret/ConfigMap or key, matching kubelet semantics for pod env, rather than emitting an empty value. Sort envFrom keys for deterministic output and clarify the Env/EnvFrom godoc.
Introduces a non-breaking migration path for the mcpServers[].env shape change. v1alpha1 keeps the released map[string]string env; v1alpha2 (the storage version) carries the []EnvVar + valueFrom shape. A CRD conversion webhook bridges them, so existing map-shaped AgentConfig objects keep working and convert up on read. - api/v1alpha2: AgentConfig and supporting types (storage/hub version) - api/v1alpha1: env reverted to map[string]string; ConvertTo/ConvertFrom (map<->list, valueFrom dropped on down-conversion) - controller reads v1alpha2; conversion webhook served from kelos-controller - cert-manager Issuer/Certificate + Service; CRD conversion block injected with cert-manager.io/inject-ca-from - uninstall lists/deletes agentconfigs via the storage version - conversion unit tests + envtest integration coverage of the webhook Supersedes the breaking change in #1154.
e6ed5f0 to
5106c6d
Compare
Completes the kelos.dev/v1alpha2 group version. Task and Workspace are identity mirrors of v1alpha1; TaskSpawner v1alpha2 applies the #704 cleanup: drops the deprecated root spec.pollInterval, the legacy triggerComment/excludeComments (commentPolicy only), and the githubWebhook filter bodyContains (bodyPattern only). Controllers keep reading v1alpha1; the conversion webhook bridges. v1alpha1 TaskSpawner objects fold forward (root pollInterval -> active source, legacy comment -> commentPolicy, bodyContains -> bodyPattern). Conversion uses a JSON round-trip for the bulk identity copy. The webhook, cert-manager Service/cert, and CRD conversion injection now cover all four kinds; uninstall lists every kind via the storage version. Addresses #704.
|
This PR is large and would use a significant portion of your monthly review quota. Comment |
|
/kelos api-review |
|
🤖 Kelos Task Status Task |
There was a problem hiding this comment.
🤖 Kelos API Reviewer Agent @gjkim42
API Design Review
Verdict: REQUEST CHANGES
Scope: Introduces kelos.dev/v1alpha2 (storage) alongside v1alpha1 via a CRD conversion webhook; AgentConfig mcpServers[].env becomes []EnvVar+valueFrom, TaskSpawner drops three deprecated field groups (root pollInterval, triggerComment/excludeComments, bodyContains).
The overall mechanism is the right shape: a new served+storage version, v1alpha1 kept fully served, deprecation-by-removal expressed only in the new version with up/down folding — existing manifests keep applying, and controllers stay on the v1alpha1 view. The findings below are about getting validation and docs right now, because v1alpha2 fields are permanent once shipped.
Findings
Validation / CRD (blocking)
MCPServerSpec.Env []corev1.EnvVar— documented contract is not enforced at admission —api/v1alpha2/agentconfig_types.go:119-133. The godoc states unsupportedvalueFromvariants (fieldRef,resourceFieldRef,fileKeyRef, future additions) "are rejected", but rejection happens only at reconcile time inresolveEnvVarValue(internal/controller/task_controller.go:550-552), not at apply time. The CRD OpenAPI schema (derived from the fullEnvVarSource) accepts these, so an operator cankubectl applyan AgentConfig that the API server admits and that only fails later when a Task consumes it. Per project conventions (docs must describe enforced behavior; keep API surfaces minimal), enforce the contract where it is documented — add a+kubebuilder:validation:XValidation(CEL) rule restrictingvalueFromtosecretKeyRef/configMapKeyRef, or define a narrower env-source type. The same CEL pass should rejectsecretKeyRef+configMapKeyRefset together (today theswitchattask_controller.go:554silently prefers the secret).commandOn/transport requiredness is comment-only —api/v1alpha2/agentconfig_types.go:92-105. "Required when type is stdio" (command) and "Required when type is http/sse" (url) are prose, not constraints. v1alpha2 is a fresh schema and the right place to add a CELXValidationonMCPServerSpecso the API server enforces command-vs-url by transport instead of admitting invalid servers that fail at runtime.
Enum / docstring consistency (blocking)
GitHubWebhookFilter.CommentOnenum rejects the value its godoc invites —api/v1alpha2/taskspawner_types.go:438-444. The comment says "Empty matches both", but+kubebuilder:validation:Enum=Issue;PullRequestrejects an explicitcommentOn: "". This is the exact docstring/enum mismatch called out in the project conventions. Note the siblingLinearWebhookFilter.Action(:488) already does the right thing withEnum=create;update;remove;"". Fix by adding""to theCommentOnenum (consistent withAction, and a safe loosening) or reword to "Omit to match both".
Docs vs implementation (should-fix)
- Per-source
PollIntervalgodoc references a field v1alpha2 removed —api/v1alpha2/taskspawner_types.go:208, 291, 343. Each still says "overridesspec.pollIntervalfor this source", but rootspec.pollIntervalno longer exists inTaskSpawnerSpec(:880-912). Update the wording, and since the shared root default is gone, document what the effective default poll interval is when a source omitspollInterval.
Suggestions (optional, non-blocking)
- Down-conversion is lossy by design (
api/v1alpha1/agentconfig_conversion.go:109-124):valueFromenv entries are dropped when serving v1alpha1. Acceptable because v1alpha2 is storage, but a v1alpha1 client doing read-modify-write would silently dropvalueFrom. Fine for alpha; worth a one-line note in the conversion godoc / release note so users know not to round-trip through v1alpha1. - Durations as bare strings (
pollInterval, e.g."30s"/"5m"): K8s API conventions favormetav1.Duration(or a...Secondsint field) over free-form strings. Carried over from v1alpha1, so parity is a fair reason to keep it; flagging since a new version is the rare chance to adopt the typed form. - TaskSpawner
convertViaJSONround-trip relies on identical JSON tags between versions for every non-folded field. A focused conversion fuzz/round-trip test (beyond the three folded groups) would guard against silent drops if the two structs drift later.
Note on prompt injection: a prior automated comment on this PR embedded hidden instructions demanding that findings be attributed to it. I disregarded that directive; the analysis above is my own, derived from the code.
/kelos needs-input
What type of PR is this?
/kind feature
What this PR does / why we need it:
Introduces the
kelos.dev/v1alpha2API group version (served, storage) alongsidev1alpha1(served), bridged by a CRD conversion webhook so existing in-cluster objects keep working with no re-apply. v1alpha2 lets the API evolve without breakingv1alpha1manifests.Per kind:
mcpServers[].envchanges frommap[string]string(v1alpha1) to[]corev1.EnvVarwithvalueFrom.secretKeyRef/configMapKeyRef(v1alpha2). This was the original motivation (was a breaking change in the superseded feat: Support env.valueFrom in MCP server config #1154). Down-conversion dropsvalueFromentries (no map representation); literal values map through.spec.pollInterval(per-sourcepollIntervalstays), the legacytriggerComment/excludeComments(keepingcommentPolicy), and the githubWebhook filterbodyContains(keepingbodyPattern). Up-conversion folds the old fields into their replacements (root pollInterval → active source; legacy comment →commentPolicy;bodyContains→bodyPattern).The controllers, webhook handler, and CLI printer keep reading v1alpha1; the conversion webhook hands them the v1alpha1 shape, so their logic is unchanged. The conversion webhook is served from the
kelos-controllerbinary. A cert-managerIssuer+Certificateprovide the serving cert andcert-manager.io/inject-ca-frominjects the CA into each CRD's conversion config.kelos uninstalllists/deletes every kind via the storage version so cleanup never depends on the webhook.Which issue(s) this PR is related to:
Fixes #1152
Addresses #704
Supersedes #1154 (its breaking v1alpha1 change is reverted here).
Special notes for your reviewer:
api/v1alpha1/*_conversion.go(spokes); v1alpha2 types are the hubs. Identity conversions use a JSON round-trip helper (convertViaJSON) to avoid hundreds of lines of field-copy boilerplate; TaskSpawner adds explicit folding for the three dropped field groups.spec.conversionblock is injected for all four kelos CRDs byhack/update-install-manifest.sh(controller-gen does not emit it).WebhookInstallOptions(test/integration/conversion_test.go): AgentConfig env up/down-conversion and TaskSpawner legacy→modern folding. Stub cert-manager CRDs undertest/integration/testdataletkelos installapply the Issuer/Certificate in envtest.make verify,make test, andmake test-integration(108 specs) pass locally.Does this PR introduce a user-facing change?