Skip to content

feat(deps): bump k8s.io to v0.36.1, lift ceiling, harden lint/vuln#3

Merged
mfacenet merged 11 commits into
mainfrom
feat/bump-k8s-libs
May 18, 2026
Merged

feat(deps): bump k8s.io to v0.36.1, lift ceiling, harden lint/vuln#3
mfacenet merged 11 commits into
mainfrom
feat/bump-k8s-libs

Conversation

@mfacenet
Copy link
Copy Markdown

Summary

Lifts the k8s.io/* ceiling from v0.34.0 to v0.36.1 (kubernetes v1.36.1) and uses the resulting bump as the lever to clear inherited lint/staticcheck findings, fix a real cache bug surfaced by SA4004, regenerate the static schema, and harden CI.

Why now

The user asked for the latest k8s libs. The ADR-feared structural blocker (autoscaling v2beta removal at v0.35) was already resolved by the existing Skaphos cleanup commit (eb643e2) plus upstream evolution of k8s.io/kubernetes/pkg/apis/*/install packages — so the bump itself is mechanical. The substantive work in this PR is everything around the bump.

Commits (chronological)

  1. chore(deps): bump k8s.io to v0.35.5 — clean.
  2. chore(deps): bump k8s.io to v0.36.1 — clean.
  3. chore: make hack/update_static_schema.sh portable — original used BSD sed -i '' which silently no-ops on GNU sed (Linux CI), leaving parser.go un-transformed.
  4. chore(scheme): regenerate static parser from k8s.io/[email protected] — drops 96 stale autoscaling/v2beta1 schema entries, picks up new v0.35–v0.36 fields.
  5. chore: drop two unused helpers flagged by staticcheck (U1000)func max predates Go 1.21 builtin; newCRDVersionConverter was never wired in.
  6. fix(cache): iterate all namespaces in checkPermission (SA4004)real bug: an inner return false, nil short-circuited the loop, so multi-namespace checkPermission only ever inspected the first configured namespace. Author had left //FIXME acknowledging the bug.
  7. chore: clear SA1019 deprecations on Endpoints + FieldsV1 — file-level //lint:file-ignore on the vendor-borrowed endpoint sort helpers; migrate FieldsV1.Raw direct access to the new GetRawReader() / SetRawBytes() API.
  8. fix(scheme,deps): prune runtime-only API groups + bump golang.org/x/net — clears GO-2026-4918 (x/net HTTP/2 infinite loop). Drops 4 install registrations for runtime-only API groups (admission, authentication, authorization, imagepolicy) that never appear in user manifests — eliminates needless call-graph edges that govulncheck was traversing.
  9. chore(lint): baseline minimal .golangci.yaml; fix errcheck + QF1008 — adds .golangci.yaml scoped to SA* checks, fixes the two real findings (Fprintf return-value discard, redundant .Fake. selector).
  10. ci: promote lint + staticcheck from advisory to required — both jobs report 0 issues. vuln stays advisory: see below.

Known accepted false positives (vuln only)

After this PR, govulncheck still flags two CVEs as advisory:

  • GO-2025-3547 — Kubernetes kube-apiserver race condition.
  • GO-2025-3521 — Kubernetes GitRepo Volume local-repo access.

Both report Fixed in: N/A upstream. The call traces reach generated *.DeepCopyObject methods on admissionregistration types (which we MUST register because users ship MutatingWebhookConfiguration / ValidatingWebhookConfiguration in GitOps manifests), but the actual vulnerable kube-apiserver code paths are server-side and never exercised by a client library. Documented in the workflow comment and in the prune commit message; the job is advisory until upstream releases fixes.

Test plan

  • go build ./... — clean against v0.36.1.
  • go vet ./... — clean.
  • go test -race -short ./... — all packages green.
  • go run honnef.co/go/tools/cmd/[email protected] ./... — 0 issues.
  • go run github.com/golangci/golangci-lint/v2/cmd/[email protected] run ./... — 0 issues.
  • reuse lint — compliant.
  • bash hack/update_static_schema.sh — runs cleanly on Linux now (was a no-op).
  • actionlint .github/workflows/ci.yml — clean.

Friendly-fork upstreamability

Per the fork's posture, every commit here is shaped to be PR'd back to argoproj/argo-cd verbatim or with a trivial rebase. Listed for outbound tracking in UPSTREAMING.md (follow-on commit, not in this PR).

Refs: SKA-421, ADR 0007.

mfacenet added 11 commits May 17, 2026 19:17
Combined with the existing Skaphos patch (drop dead autoscaling/v2beta1
and v2beta2 references), v0.35 builds clean: the v2beta install
registrations in pkg/utils/kube/scheme/scheme.go transitively reach
install packages that upstream kubernetes/kubernetes already updated to
no longer reference the removed types.

The k8s.io <= v0.34 ceiling identified in ADR 0006 / ADR 0007 § Context
was effectively unblocked by the same cleanup that motivated the
fork; this commit is the empirical demonstration. The cleanup commit
plus this bump are jointly upstreamable to argoproj/argo-cd.

Refs: SKA-421, ADR 0007.
Signed-off-by: Shawn Stratton <[email protected]>
Step two of lifting the k8s.io ceiling: v0.35.5 → v0.36.1.

build, vet, race tests all clean. parser.go (the embedded static schema)
is regenerated separately in the next commit since
hack/update_static_schema.sh ships BSD sed syntax that doesn't work on
Linux.

Refs: SKA-421, ADR 0007.
Signed-off-by: Shawn Stratton <[email protected]>
The original `sed -i ''` is BSD/macOS-only and silently does nothing on
GNU sed (Linux CI), leaving parser.go un-transformed (still `package
internal`, still `func Parser`). The script also had no error handling
on a failed curl.

Replace with a portable form:
- awk extracts the client-go version (tolerates extra whitespace).
- curl uses `-f` to fail on HTTP errors and `mktemp` for the staging file.
- Header + sed transforms write through a single redirect, no in-place
  flag needed.

Works on macOS (BSD sed) and Linux (GNU sed). Skaphos-original (this
script's prior form ships under upstream argo-cd's BSD-only assumption);
the fix is upstreamable to argoproj.

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
Generated by hack/update_static_schema.sh against the pinned client-go
version. Drops 96 stale autoscaling/v2beta1 entries (removed from
k8s.io/api at v0.35) and picks up new fields added in v0.35–v0.36.

The file is generated; do not hand-edit.

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
- pkg/cache/cluster_test.go: drop func max(a, b int). Go 1.21 added a
  built-in generic max(); the local helper was leftover from before the
  module's go directive moved to 1.21+ and has no callers.
- pkg/diff/internal/fieldmanager/borrowed_versionconverter.go: drop
  newCRDVersionConverter. The function was vendored from upstream
  Kubernetes but never wired into gitops-engine's diff path. The
  matching newVersionConverter (still used) is left intact.

Tests in pkg/cache and pkg/diff still pass.

Both edits are upstreamable to argoproj/argo-cd (the test helper) and
to kubernetes/kubernetes (the borrowed helper).

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
The multi-namespace branch of checkPermission ran a SelfSubjectAccessReview
against every configured namespace, but the early `return false, nil`
inside the loop fired on the first iteration that wasn't Allowed —
the loop only ever inspected the first namespace and never visited
the rest. A `//FIXME` and a `//nolint:staticcheck` had been left in
place to acknowledge the bug.

Move the negative return outside the loop so the function now returns:
- `true` if any configured namespace grants list permission (early exit
  on the first Allowed response, as before), and
- `false` only after all configured namespaces have been queried and
  none granted permission.

This matches the cluster-scope branch's "return true on Allowed, drop
otherwise" semantics, and matches the docstring's stated intent
("check if the controller has permissions to list the resource").

Upstreamable to argoproj/argo-cd.

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
Five staticcheck SA1019 hits resolved across three files:

- internal/kubernetes_vendor/pkg/api/v1/endpoints/util.go: this is a
  vendored copy of upstream k8s.io/kubernetes/pkg/api/v1/endpoints and
  intentionally targets the deprecated v1.Endpoints / v1.EndpointSubset
  types because gitops-engine still has to read user-provided legacy
  Endpoints objects in the diff normalizer. File-level
  `//lint:file-ignore SA1019` with rationale, switched from a
  golangci-lint-flavored `//nolint:staticcheck` (which staticcheck
  itself doesn't honor).

- pkg/diff/diff.go: same situation in `normalizeEndpoint`. Migrated the
  managed-fields read path to `FieldsV1.GetRawReader()` (the new
  metav1.FieldsV1 API in apimachinery v0.36) and added a localized
  `//lint:ignore SA1019` directive on the remaining intentional
  corev1.Endpoints reference.

- pkg/diff/internal/fieldmanager/borrowed_fields.go: this is a vendored
  copy of upstream k8s.io/apiserver fieldmanager.Fields helpers. Migrated
  `f.Raw` accesses to `f.GetRawReader()` / `f.SetRawBytes(raw)`. The
  `bytes` import drops out. Forward-compatible with the upstream
  deprecation of direct .Raw access.

Both code migrations and both suppression directive switches are
upstreamable to argoproj/argo-cd.

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
Three wins for govulncheck:

1. Bump golang.org/x/net v0.49.0 → v0.53.0. Clears GO-2026-4918
   (infinite loop in HTTP/2 transport on bad SETTINGS_MAX_FRAME_SIZE),
   which our discovery client transitively reached. Pulled by `go get`
   plus `go mod tidy`; x/sync, x/sys, x/term, x/text auto-bumped to
   match.

2. Drop four runtime-only API-group install registrations from
   pkg/utils/kube/scheme/scheme.go:
     - admission/install      (AdmissionReview)
     - authentication/install (TokenReview)
     - authorization/install  (SubjectAccessReview)
     - imagepolicy/install    (ImageReview)
   These types are runtime webhook/auth payloads, NEVER user manifests,
   so registering them in legacyscheme.Scheme served no purpose for a
   diff normalizer. The registration was inherited from upstream
   argo-cd's bulk-register-everything pattern. Pruning eliminates
   the call-graph edges that govulncheck was traversing into
   kube-apiserver code, and clears the admission-side false-positive
   trace for GO-2025-3547 (now only reaches via admissionregistration,
   see below).

3. admissionregistration/install is deliberately kept because
   ValidatingWebhookConfiguration / MutatingWebhookConfiguration DO
   appear in user GitOps manifests (operators, cert-manager, etc.).

After this commit:
   GO-2025-3547 (kube-apiserver race) — still flagged via
       admissionregistration.MutatingWebhookConfiguration.DeepCopyObject.
       This is a govulncheck false positive: the trace reaches the
       Go *type* in the same module as the vulnerability, but the
       actual vulnerable kube-apiserver code path is never executed
       from a client library. No upstream fix exists yet (`Fixed in:
       N/A`). Tracked as known-accepted in CI's vuln advisory comment.
   GO-2025-3521 (GitRepo Volume) — same as above; same trace shape.

Build, vet, full test suite, and -race all pass. Upstreamable to
argoproj/argo-cd as a security-hardening + dependency-hygiene PR.

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
Adds .golangci.yaml configured to surface real defects, not style nags:

- enabled linters: errcheck, govet, ineffassign, staticcheck, unused.
- staticcheck.checks scoped to "SA*" (Static Analysis / bug family);
  ST* (naming), S* (simplifications), QF* (quick fixes) all left off
  to keep the lint signal focused and keep our diff against upstream
  argo-cd minimal.
- exclusions narrow SA1019 deprecation suppression to the two files
  that legitimately handle the deprecated corev1.Endpoints /
  v1.EndpointSubset types (file-level //lint:file-ignore directives
  are honored by staticcheck-the-tool but not always by
  golangci-lint's bundled staticcheck; configuring it here is the
  portable form).

Two real fixes the lint surfaced:

- internal/kubernetes_vendor/pkg/util/hash/hash.go: explicitly discard
  the (n, err) return from printer.Fprintf with `_, _ =`. hash.Hash
  never returns an error from Write so this is intentional, but
  silencing errcheck via the canonical blank-assign reads cleaner
  than //nolint.
- pkg/sync/sync_context_test.go: drop the explicit `.Fake.` selector
  on fakedisco.FakeDiscovery before PrependReactor (the Fake field
  is embedded, the selector is reachable directly). Two occurrences.

Run produces 0 issues.

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
After clearing all inherited findings (U1000 unused helpers, SA4004
unconditional break, SA1019 deprecations, FieldsV1 migration) and
baselining .golangci.yaml, the lint and staticcheck jobs both report
0 issues. Promote them to required so regressions are caught at PR
time instead of merging and triggering a CI summary surprise.

vuln stays advisory: two server-side kube-apiserver CVEs
(GO-2025-3547, GO-2025-3521) are reached transitively via
admissionregistration types registered for the diff normalizer. Both
are false positives for a client library (the traces touch generated
DeepCopyObject methods, not the vulnerable apiserver code) and both
report Fixed in: N/A. The job's comment documents the deal so a future
contributor doesn't promote it prematurely.

Refs: SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
Records pending-outbound entries for the k8s.io ceiling-lift series
(commits 2-9 of feat/bump-k8s-libs), plus a sensible bundling
recommendation: commits 1+5+8 ship as a single "lift the v0.34
ceiling" PR to argoproj/argo-cd, the rest as individual small PRs.

Also moves the four fork-mechanism commits (module rename, dressing,
CI scaffold, the now-superseded advisory-CI) under "Deliberately not
upstreamed" with explicit reasons so future maintainers don't
mistakenly try to upstream them.

Refs: SKA-430, SKA-421.
Signed-off-by: Shawn Stratton <[email protected]>
@mfacenet mfacenet merged commit 98c81d7 into main May 18, 2026
11 of 12 checks passed
@mfacenet mfacenet deleted the feat/bump-k8s-libs branch May 18, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant