feat(k8s): add agent service and LiteLLM to the Helm chart by bobbai00 · Pull Request #5272 · apache/texera

bobbai00 · 2026-05-28T18:11:56Z

What changes were proposed in this PR?

The agent-service image is built and runs under single-node compose but had no Helm deployment, and the chart had no in-cluster LLM gateway. This adds both, mirroring the proven preview/production chart while aligning the agent service's env to what the code on main actually reads.

Agent service (gated on agentService.enabled)

agent-service-deployment.yaml + agent-service-service.yaml, wired to in-cluster service DNS using the env names from agent-service/src/config/env.ts: TEXERA_DASHBOARD_SERVICE_ENDPOINT (webserver), LLM_ENDPOINT (access-control-service), WORKFLOW_COMPILING_SERVICE_ENDPOINT, and a per-CU EXECUTION_ENDPOINT_TEMPLATE.
A dedicated /api/agents HTTPRoute (REST + the /api/agents/:id/react WebSocket) plus a BackendTrafficPolicy that consistent-hashes on X-Agent-Workflow-Id — agents are held in memory per pod, so a workflow's requests must always reach the same replica (the client already stamps that header).
Readiness/liveness on /api/healthcheck.

LiteLLM — in-cluster LLM gateway (gated on litellm.enabled)

litellm-deployment.yaml + litellm-service.yaml + litellm-config.yaml (ConfigMap).
Postgres persistence on by default: a texera_litellm database is created by the Postgres init script, and the deployment sets DATABASE_URL + STORE_MODEL_IN_DB=true so keys, spend, and model config survive restarts.
access-control-service wired to LiteLLM (LITELLM_BASE_URL, LITELLM_MASTER_KEY, copilot enabled).

A shared Opaque Secret holds the agent gateway key, the LiteLLM master key, and the provider API keys (supply via --set/override; none committed).

Any related issues, documentation, discussions?

Closes #5269

Also implements the in-cluster LiteLLM Helm support tracked by #4108 (supersedes the approach in #4109).

How was this PR tested?

helm lint and helm template against the chart (subchart dependencies: stripped locally so the render needs no remote charts):

helm lint .                                        # 1 chart(s) linted, 0 chart(s) failed
helm template texera . -f values-development.yaml  # RC=0, 50 objects, no errors

Verified the rendered output: agent deployment env + /api/healthcheck probes; agent-service-svc; the /api/agents -> agent-service-svc HTTPRoute and the BackendTrafficPolicy targeting it (consistent hash on X-Agent-Workflow-Id); LiteLLM DATABASE_URL=postgresql://…/texera_litellm + STORE_MODEL_IN_DB; access-control-service LITELLM_BASE_URL/MASTER_KEY; and the idempotent CREATE DATABASE texera_litellm in the Postgres init script. Not applied to a live cluster.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context)

The agent-service image is built and runs under single-node compose but had no Helm deployment, and there was no in-cluster LLM gateway. This adds both, mirroring the proven preview/production chart: Agent service - deployment + service (gated on agentService.enabled), wired to in-cluster service DNS using the env names the service actually reads (TEXERA_DASHBOARD_SERVICE_ENDPOINT, LLM_ENDPOINT, WORKFLOW_COMPILING_SERVICE_ENDPOINT, EXECUTION_ENDPOINT_TEMPLATE). - a dedicated /api/agents HTTPRoute plus a BackendTrafficPolicy that consistent-hashes on X-Agent-Workflow-Id, so a workflow's requests always reach the replica holding its in-memory agent. - readiness/liveness on /api/healthcheck. LiteLLM (in-cluster LLM gateway) - deployment + service + config ConfigMap (gated on litellm.enabled). - Postgres persistence enabled by default: a texera_litellm database created by the postgres init script, with DATABASE_URL + STORE_MODEL_IN_DB so keys, spend, and model config survive restarts. - access-control-service wired to LiteLLM (LITELLM_BASE_URL/MASTER_KEY, copilot enabled). A shared Opaque Secret holds the agent gateway key, the LiteLLM master key, and the provider API keys (supply via --set / override; none committed). Closes apache#5269

github-actions Bot added feature dev labels May 28, 2026

github-actions Bot assigned bobbai00 May 28, 2026

bobbai00 force-pushed the feat/5269-agent-service-k8s branch from ede4eec to 7620a52 Compare May 29, 2026 00:15

bobbai00 changed the title ~~feat(k8s): add agent service deployment to Helm chart~~ feat(k8s): add agent service and LiteLLM to the Helm chart May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(k8s): add agent service and LiteLLM to the Helm chart#5272

feat(k8s): add agent service and LiteLLM to the Helm chart#5272
bobbai00 wants to merge 1 commit into
apache:mainfrom
bobbai00:feat/5269-agent-service-k8s

bobbai00 commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bobbai00 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bobbai00 commented May 28, 2026 •

edited

Loading