feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents by MarioCadenas · Pull Request #304 · databricks/appkit

MarioCadenas · 2026-04-21T17:58:43Z

The main product layer. Turns an AppKit app into an AI-agent host with
markdown-driven agent discovery, code-defined agents, sub-agents,
human-in-the-loop approval, zero-trust MCP, and a standalone
run-without-HTTP executor.

`createAgent(def)` — pure factory

packages/appkit/src/core/create-agent-def.ts. Returns the passed-in
definition after cycle-detecting the sub-agent graph. No adapter
construction, no side effects — safe at module top-level. The returned
AgentDefinition is plain data, consumable by either agents({ agents })
or runAgent(def, input).

`agents()` plugin

packages/appkit/src/plugins/agents/agents.ts. AgentsPlugin class:

Loads markdown agents from config/agents/*.md (configurable dir)
via real YAML frontmatter parsing (js-yaml). Frontmatter schema:
endpoint, model, toolkits, tools, agents, default,
maxSteps, maxTokens, baseSystemPrompt, ephemeral. Unknown
keys logged, invalid YAML throws at boot.
Merges code-defined agents passed via agents({ agents: { name: def } }).
Code wins on key collision.
For each agent, builds a per-agent tool index from:
1. Sub-agents (agents: frontmatter or agents: code field) —
  synthesized as agent-<key> tools on the parent. Markdown
  agents: [...] resolves against both markdown siblings and
  code-defined agents passed via LoadContext.codeAgents, so a
  markdown orchestrator can delegate to a code-defined specialist.
2. Explicit tool record entries — ToolkitEntrys, inline
  FunctionTools, or HostedTools.
3. Auto-inherit (if nothing explicit) — pulls every registered
  ToolProvider plugin's tools whose author marked
  autoInheritable: true. Asymmetric default: markdown agents
  inherit (file: true), code-defined agents don't (code: false).
Mounts POST /api/agents/invocations (OpenAI Responses compatible) +
POST /api/agents/chat, POST /api/agents/cancel,
POST /api/agents/approve, GET /api/agents/threads/:id,
DELETE /api/agents/threads/:id, GET /api/agents/info.
SSE streaming via executeStream. Tool calls dispatch through
PluginContext.executeTool(req, pluginName, localName, args, signal)
for OBO, telemetry, and timeout.
Exposes appkit.agents.{register, list, get, reload, getDefault, getThreads}
runtime helpers.

Human-in-the-loop approval gate

Any tool annotated destructive: true pauses the stream, emits an
appkit.approval_pending SSE event, and waits for a
POST /api/agents/approve decision from the same user who initiated
the run. A missing decision after approval.timeoutMs auto-denies.
Enabled by default (approval.requireForDestructive: true); opt out
for dev. Per-user ownership enforced (x-forwarded-user).

Zero-trust MCP host policy

tools/mcp-host-policy.ts enforces an allowlist on every MCP URL
before the first byte is sent. Same-origin Databricks workspace URLs
are admitted by default; any other host must be explicitly trusted
via agents({ mcp: { trustedHosts: [...] } }). Blocks link-local
(cloud metadata at 169.254/16), RFC1918, CGNAT, loopback,
ULA, multicast, and IPv4-mapped IPv6 equivalents at DNS-resolve
time. Workspace credentials (service-principal on initialize /
tools/list; caller OBO on tools/call) are never attached to
non-workspace hosts.

DoS caps

limits: { maxConcurrentStreamsPerUser, maxToolCalls, maxSubAgentDepth }
(defaults 5 / 50 / 3). Chat bodies are capped at 64k characters via
Zod schema; 6th concurrent stream for the same user returns 429; tool
budget exhaustion aborts the run with a clear error.

`runAgent(def, input)` — standalone executor

packages/appkit/src/core/run-agent.ts. Runs an AgentDefinition
without createApp or HTTP. Drives the adapter's event stream to
completion, executing inline tools + sub-agents along the way.
Aggregates events into { text, events }. Useful for tests, CLI
scripts, and offline pipelines.

Event translation and thread storage

AgentEventTranslator — stateful converter from internal
AgentEvents to OpenAI Responses API ResponseStreamEvents with
strictly monotonic sequence_number and output_index.
InMemoryThreadStore — per-user conversation persistence with
explicit ephemeral: true opt-in on AgentDefinition for
stateless agents (autocomplete, one-shot tools).
buildBaseSystemPrompt + composeSystemPrompt — formats the
AppKit base prompt (with plugin names and tool names) and layers
the agent's instructions on top.

Frontmatter loader

load-agents.ts — reads *.md files, parses YAML frontmatter with
js-yaml, resolves toolkits: [...] entries against the plugin
provider index at load time, wraps ambient tools (from agents({ tools: {...} })) for tools: [...] frontmatter references.
loadAgentsFromDir runs a two-pass resolver so agents: references
can be resolved regardless of file-system iteration order; supports
markdown siblings + code-defined agents (via LoadContext.codeAgents)
with code precedence on collision.

Plumbing

Adds js-yaml + @types/js-yaml deps.
Manifest mounts routes at /api/agents/* (plural — matches the
appkit.agents.* runtime handle).
Exports from the main barrel: agents, createAgent, runAgent,
AgentDefinition, AgentsPluginConfig, AgentTool, ToolkitEntry,
ToolkitOptions, BaseSystemPromptOption, PromptContext,
isToolkitEntry, loadAgentFromFile, loadAgentsFromDir.

Test plan

Loader tests (25): parse errors, toolkits/tools resolution,
agents: sibling resolution regardless of order, mutual delegation,
missing/self/non-array refs, deduplication, loadAgentFromFile
rejection of agents:, markdown → code references, code precedence
on collision.
Approval gate tests: HITL wait + decision, timeout auto-deny,
stream-owner enforcement.
DoS caps tests: body-size rejection, per-user concurrency 429,
tool-call budget, sub-agent depth cap.
Event translator tests: output-index monotonicity,
message-interruption-by-tool-call, undefined-result coalescing.
MCP host policy tests: URL allowlist, IP blocklist, IPv6 link-local
full range, IPv4-mapped colon-hex normalization.
Full appkit vitest suite: 1552 tests passing at stack tip.
Typecheck clean across all 8 workspace projects.

Signed-off-by: MarioCadenas [email protected]

PR Stack

Shared agent types + LLM adapters — feat(appkit): shared agent types and LLM adapter implementations #301
Tool primitives + ToolProvider surfaces — feat(appkit): tool primitives and ToolProvider surfaces on core plugins #302
Plugin infrastructure (attachContext + PluginContext) — feat(appkit): plugin infrastructure — attachContext + PluginContext mediator #303
agents() plugin + createAgent(def) + markdown-driven agents (this PR)
fromPlugin() DX + runAgent plugins arg + toolkit-resolver — feat(appkit): fromPlugin() DX, runAgent plugins arg, shared toolkit-resolver #305
Reference app + dev-playground + docs — feat(appkit): reference agent-app, dev-playground chat UI, docs, and template #306

Demo

agent-demo.mp4

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> ### Auto-inherit posture documentation (S3 security, Layer 3) Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]> - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <[email protected]> - `apps/dev-playground/config/agents/autocomplete.md` sets `ephemeral: true`. Each debounced autocomplete keystroke no longer leaves an orphan thread in `InMemoryThreadStore` — the server now deletes the thread in the stream's `finally` (PR #304). Closes R1 from the MVP re-review. - `docs/docs/plugins/agents.md` documents the new `ephemeral` frontmatter key alongside the other AgentDefinition knobs. Signed-off-by: MarioCadenas <[email protected]> Documents the MVP resource caps landed in PR #304: the static request-body caps (enforced by the Zod schemas) and the three configurable runtime limits (`maxConcurrentStreamsPerUser`, `maxToolCalls`, `maxSubAgentDepth`). Includes the config-block shape in the main reference and a new "Resource limits" subsection under the Configuration section explaining the intent and per-user semantics of each cap. Signed-off-by: MarioCadenas <[email protected]>

…agents The main product layer. Turns an AppKit app into an AI-agent host with markdown-driven agent discovery, code-defined agents, sub-agents, and a standalone run-without-HTTP executor. Agent runtime files land in core/agent/ from day one: core/agent/create-agent.ts — createAgent() definition factory core/agent/run-agent.ts — standalone adapter loop (no HTTP) core/agent/load-agents.ts — markdown agent discovery core/agent/system-prompt.ts — base system prompt + composition core/agent/types.ts — updated with AgentDefinition, AgentsPluginConfig, RegisteredAgent, etc. HTTP-facing concerns stay in plugins/agents/: agents.ts, thread-store.ts, tool-approval-gate.ts, event-channel.ts, event-translator.ts, schemas.ts, defaults.ts, manifest.json

Tool-agnostic guidelines instead of SQL/files-specific defaults; accept full PromptContext in buildBaseSystemPrompt for parity with custom callbacks. Signed-off-by: MarioCadenas <[email protected]>

Register DATABRICKS_SERVING_ENDPOINT_NAME as optional CAN_QUERY so apps using Databricks-hosted agent models get resource wiring; optional when agents use only external adapters. Sync template/appkit.plugins.json. Signed-off-by: MarioCadenas <[email protected]>

Align optional serving resource with `DatabricksAdapter.fromModelServing()`, which reads `DATABRICKS_AGENT_ENDPOINT` — not `DATABRICKS_SERVING_ENDPOINT_NAME` (serving plugin). Sync template. Signed-off-by: MarioCadenas <[email protected]>

BREAKING CHANGE: top-level config/agents/*.md is no longer loaded. Use <agentId>/agent.md. The skills directory name is reserved and skipped. Orphan top-level .md files error at load; subdirs without agent.md error. Export agentIdFromMarkdownPath for path-based id resolution.

The MCP transport client and host policy aren't agents-specific; they are HTTP + JSON-RPC transport with URL/DNS allowlisting. Move them under packages/appkit/src/connectors/mcp/ so they sit alongside the other transport-layer modules (serving, genie, sql-warehouse, lakebase, …) and stop being reachable only through the agents plugin. - Move mcp-client.ts -> connectors/mcp/client.ts - Move mcp-host-policy.ts -> connectors/mcp/host-policy.ts - Move McpEndpointConfig type -> connectors/mcp/types.ts - Add connectors/mcp/index.ts barrel; re-export from connectors/index.ts - Move mcp-client / mcp-host-policy tests to connectors/mcp/tests/ - Agents plugin keeps hosted-tools.ts (HostedTool sugar + resolve) and imports connector types from ../../connectors/mcp. - tools/ barrel no longer re-exports AppKitMcpClient (never was public). No behaviour change. All existing tests pass against the new paths.

…dispatchToolCall Three small helpers pulled out of the AgentsPlugin streaming path to cut duplication and shrink the two large methods. - normalize-result.ts: void->"", JSON-stringify, 50K truncation with a human-readable marker. Unit-testable (previously covered only via the HTTP path). - consume-adapter-stream.ts: the 'message_delta' + 'message' accumulation loop shared between _streamAgent and runSubAgent. Accepts an optional signal and per-event side-effect callback (for SSE translation). - tool-dispatch.ts: one place that fans out toolkit/function/mcp/subagent entries. 'never'-typed default forces exhaustiveness: adding a fifth source is now a compile error at every call site. _streamAgent: executeTool closure shrinks from ~60 lines of dispatch + normalize to a single dispatchToolCall + normalizeToolResult call. Stream consumption collapses to consumeAdapterStream. runSubAgent: childExecute shrinks from ~30 lines of if/else dispatch to one dispatchToolCall call. Adapter loop collapses to consumeAdapterStream. Behaviour change (minor): childExecute previously silently fell through to 'Unsupported sub-agent tool source' when mcpClient or PluginContext was missing; now it throws the same specific error as the main stream. Matches the main-path behaviour. Tests: 15 new unit tests for normalizeToolResult + consumeAdapterStream. dispatchToolCall is exercised transitively through the full agent suite (288 existing tests still pass, 303 total on this branch).

… → def The `annotations` field (notably `destructive: true`) was silently dropped as tools flowed from `tool({...})` into the resolved `AgentToolDefinition`, so user-defined destructive tools never triggered the approval gate. - `ToolConfig` now accepts `annotations?: ToolAnnotations`. - `tool()` forwards it to the returned `FunctionTool`. - `FunctionTool` exposes `annotations` and `functionToolToDefinition` preserves it on the definition it builds. - `AgentsPlugin` reads the flag via `isDestructiveToolEntry()` (falls back to `functionTool.annotations` so a future divergence between def and function cannot re-introduce the bug) and emits the merged annotations via `combinedToolAnnotations()` on the `approval_pending` SSE payload. Covered by `tests/tool-approval-gate.test.ts` and `tests/function-tool.test.ts`.

ToolAnnotations.destructive is binary and has started to mislead: "save_view" captures a screenshot and creates a new file, which is nothing like deleting a dashboard, yet both trip the same red "destructive" approval card. This adds a semantic `effect` enum with four tiers — `read`, `write`, `update`, `destructive` — so tool authors can tell the UI what blast radius they actually have. The approval gate fires for any mutating effect (`write`/`update`/ `destructive`) and continues to honour the legacy `destructive: true` flag so existing tools keep their current red treatment without migration. Callers consuming `annotations` over the wire (MCP clients, approval UIs) can now differentiate; the playground will ship a tiered approval card as a follow-up.

Follow-up for connector relocate: re-export AppKitMcpClient from connectors/mcp. Adjust Vitest mock pool typing without non-null mocks. Signed-off-by: MarioCadenas <[email protected]>

pkosiec · 2026-05-06T13:51:22Z

should we hide this plugin until all PRs are merged?

pkosiec · 2026-05-06T13:52:14Z

btw maybe we should rename this file as beta.gen.ts? so that we don't review an autogenerated file?

(and add "generated, do not edit" header)

pkosiec · 2026-05-06T13:52:54Z

    "@types/semver": "7.7.1",
    "dotenv": "16.6.1",
    "express": "4.22.0",
+    "js-yaml": "^4.1.1",


Suggested change

"js-yaml": "^4.1.1",

"js-yaml": "4.1.1",

pkosiec · 2026-05-06T13:56:35Z

Agentic review:

Code Review: agent/v2/4-agents-plugin

Scope

Branch: agent/v2/4-agents-plugin (16 commits, 48 files, +5563/-32 lines)
Base: main at a7ebc57
Mode: report-only (plan mode)

Intent

Add a complete agents() plugin to AppKit: markdown-driven agent definitions with folder-based discovery (<id>/agent.md), code-defined agents via createAgent(), human-in-the-loop approval gate for destructive tool calls, sub-agent delegation with depth limiting, SSE streaming with Responses API-compatible event translation, pluggable ThreadStore, MCP tool integration, toolkit inheritance, DoS protection (concurrent stream limits, tool-call budgets), and refactoring of MCP client and utility extraction.

Review Team

correctness (always)

testing (always)

maintainability (always)

security -- tool approval gates, OBO token handling, user auth, input validation

api-contract -- new REST routes, SSE events, Responses API compat

reliability -- abort signals, timeouts, event channel, async error paths

adversarial -- 5500+ line diff, external APIs, user input

kieran-typescript -- large TypeScript codebase, discriminated unions, type guards

Findings

P1 -- Critical

# File Issue Reviewer Confidence

1 agents.ts:751 Approval gate ignores effect field. The gate checks entry.def.annotations?.destructive === true but never reads effect. A tool with effect: "destructive" (the new preferred API) and no legacy destructive: true boolean bypasses approval entirely. The JSDoc on tool.ts:13-16 and function-tool.ts:12-15 explicitly states "any mutating value forces the agents-plugin approval gate." The implementation contradicts the documented contract. Fix: check const isDestructive = ann?.destructive === true || ann?.effect === "destructive"; (or broaden to write/update if the JSDoc intent holds). security, correctness 0.95

2 agents.ts:991-1023 Sub-agent childExecute bypasses tool-call budget. The toolCallsUsed counter and budget check live in the top-level executeTool closure (line 735-745). runSubAgent creates its own childExecute that never increments or checks the budget. A sub-agent can make unlimited tool calls, defeating limits.maxToolCalls. Fix: pass the budget counter (or a shared budget object) into runSubAgent and check on every child tool call. correctness, adversarial 0.95

3 agents.ts:991-1023 Sub-agent childExecute bypasses approval gate. The top-level executeTool checks approvalPolicy.requireForDestructive (line 751) before executing destructive tools and emits approval_pending events. childExecute in runSubAgent does none of this. A sub-agent calling a destructive tool executes it without human approval. Fix: apply the same approval check in childExecute, or factor the gate logic into a shared helper. security 0.90

4 agents.ts:661-707 /invocations bypasses concurrent stream limit. _handleChat checks countUserStreams(userId) >= limits.maxConcurrentStreamsPerUser before streaming. _handleInvocations calls _streamAgent without the same check. A client can bypass the rate limit by hitting /invocations instead of /chat. Fix: add the same guard to _handleInvocations. security, reliability 0.92

P2 -- Moderate

# File Issue Reviewer Confidence

5 consume-adapter-stream.ts, normalize-result.ts Extracted utilities are dead code. consumeAdapterStream and normalizeToolResult were extracted into core/agent/ (with tests), but no production code imports them. The same logic is still inlined in agents.ts:815-826 (result normalization), agents.ts:885-896 (stream consumption), run-agent.ts:97-105, and agents.ts:1058-1067. Three call sites duplicate the accumulation pattern. Either use the extracted functions or remove them. maintainability 0.95

6 agents.ts:148-156 reload() is non-atomic. this.agents.clear() runs before await this.loadAgents(). If loadAgents throws (e.g. malformed markdown), the registry is empty and new requests get "No agent registered" errors. In-flight streams holding old references still work but new ones fail. Fix: build into a new Map, then swap on success. reliability 0.85

7 agents.ts:1072 _handleCancel uses unsafe type assertion for streamId. Unlike /chat, /approve, and /invocations which use Zod schemas, the cancel route extracts streamId via req.body as { streamId?: string }. This is inconsistent and skips validation. Fix: add a small Zod schema (or validate inline). api-contract, security 0.80

8 agents.ts:815-826 Tool result type inconsistency. When serialized.length > MAX, the return is a truncated string. When <= MAX, the return is the raw result (which may be an object). Adapters receive different types depending on length. The extracted normalizeToolResult has the same behavior (returns result not serialized). This is intentional (preserving structured data for short results), but consider documenting the contract or always returning strings. correctness 0.70

9 event-channel.ts:23-31 EventChannel has no backpressure. push() accumulates into this.queue without bound. If the SSE consumer is slow (network backpressure, paused tab), the queue grows indefinitely. For typical chat streams this is fine; for high-throughput tool-calling agents it could cause memory pressure. Consider a max queue size with drop/error policy. performance 0.65

10 load-agents.ts:106,141,145 Synchronous filesystem I/O in agent loader. fs.readFileSync, fs.existsSync, fs.readdirSync block the event loop. Acceptable during startup, but reload() can be called at runtime via the exported appkit.agents.reload() API, which would block the event loop while reading files. performance 0.65

P3 -- Low

# File Issue Reviewer Confidence

11 agents.ts:1176-1188 printRegistry() uses console.log instead of logger. Inconsistent with the rest of the file which uses createLogger("agents"). The formatted output is intentionally styled with picocolors, so this may be deliberate for terminal aesthetics. maintainability 0.60

Coverage

Suppressed: 0 findings below 0.60 confidence.

Untracked files: None.

Testing: Test coverage is extensive (~1800 lines of tests across 7 test files). Tests cover plugin lifecycle, event translation, thread store, approval gate, DoS limits, and load-agents. The tool-call budget bypass in sub-agents (# 2) is not tested. The effect field interaction with the approval gate (chore: rework TelemetryManager to use Node SDK #1) is not tested.

Residual risks: The InMemoryThreadStore has no eviction, bounds, or TTL. The code warns about this in both prod and dev, which is good. A follow-up for bounded eviction is mentioned in comments.

Verdict: Not ready -- fix P1 items before merge

Fix order:

# 1 (approval gate effect gap) -- security hole, contradicts documented API

# 2 + # 3 (sub-agent budget + approval bypass) -- these are the same code path; fix together

# 4 (/invocations rate-limit bypass) -- one-line fix

# 5 (dead code) -- either wire the extracted utilities or delete them

P2 items # 6-# 10 are lower priority and could be addressed in follow-ups, though # 6 (reload() atomicity) is worth fixing now since it's small.

pkosiec · 2026-05-06T13:59:25Z

You might consider using an external lib:

EventChannel (~70 lines) -- This is a basic unbounded async queue (push/consume as async iterable). Existing alternatives:

@repeaterjs/repeater -- Almost exactly this API: push-based async iterable creation with close/error semantics and optional backpressure. Would replace EventChannel entirely.

Node.js built-in events.on(emitter, event) -- Returns an AsyncIterator from an EventEmitter. Could work but is clunkier (requires wrapping in an EventEmitter, less clean close/error semantics).

Web Streams ReadableStream -- The ReadableStream constructor with a controller gives push/pull with built-in backpressure. Slightly heavier API surface.

@repeaterjs/repeater is the cleanest drop-in. That said, 70 lines of zero-dependency code is defensible in an SDK -- adding a dependency has its own cost.

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from a5642df to e26795b Compare April 21, 2026 20:41

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 3c7c35e to cb7fe2b Compare April 21, 2026 20:41

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from e26795b to d73e138 Compare April 22, 2026 08:45

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from cb7fe2b to 0afea5e Compare April 22, 2026 08:45

MarioCadenas mentioned this pull request Apr 22, 2026

feat(appkit): zero-trust MCP host policy with URL allowlist and scoped auth #307

Closed

7 tasks

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 0afea5e to 983461c Compare April 22, 2026 09:24

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 983461c to a7b0444 Compare April 22, 2026 09:46

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from a7b0444 to 623792d Compare April 22, 2026 09:59

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 89ce0e8 to 6c7291b Compare April 27, 2026 14:33

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from f361bd8 to e5ec02f Compare April 29, 2026 17:44

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 6c7291b to d0a4596 Compare April 29, 2026 17:44

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from e5ec02f to a02ab55 Compare April 29, 2026 18:09

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from d0a4596 to 85603f7 Compare April 29, 2026 18:09

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from a02ab55 to 5bf6b22 Compare April 29, 2026 18:19

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch 2 times, most recently from 1b72080 to a6567bc Compare April 29, 2026 18:36

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from 5bf6b22 to 863439e Compare April 29, 2026 18:36

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from 863439e to cca914f Compare April 29, 2026 20:18

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from a6567bc to 5d0fae2 Compare April 29, 2026 20:18

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 5d0fae2 to af9b6ee Compare May 4, 2026 09:22

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from cca914f to a3d2cc6 Compare May 4, 2026 09:22

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from a3d2cc6 to f495962 Compare May 4, 2026 09:41

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from af9b6ee to e4b1322 Compare May 4, 2026 09:41

MarioCadenas added 10 commits May 5, 2026 12:38

refactor(appkit): generalize default base system prompt

b07606d

Tool-agnostic guidelines instead of SQL/files-specific defaults; accept full PromptContext in buildBaseSystemPrompt for parity with custom callbacks. Signed-off-by: MarioCadenas <[email protected]>

fix(appkit): MCP barrel export + lakebase mock typing after move

5d8d684

Follow-up for connector relocate: re-export AppKitMcpClient from connectors/mcp. Adjust Vitest mock pool typing without non-null mocks. Signed-off-by: MarioCadenas <[email protected]>

pkosiec reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents#304

feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents#304
MarioCadenas wants to merge 10 commits intoagent/v2/3-plugin-infrafrom
agent/v2/4-agents-plugin

MarioCadenas commented Apr 21, 2026 •

edited

Loading

Uh oh!

pkosiec May 6, 2026

Uh oh!

pkosiec May 6, 2026

Uh oh!

pkosiec May 6, 2026

Uh oh!

pkosiec May 6, 2026

Uh oh!

pkosiec May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#	File	Issue	Reviewer	Confidence
1	`agents.ts:751`	Approval gate ignores `effect` field. The gate checks `entry.def.annotations?.destructive === true` but never reads `effect`. A tool with `effect: "destructive"` (the new preferred API) and no legacy `destructive: true` boolean bypasses approval entirely. The JSDoc on `tool.ts:13-16` and `function-tool.ts:12-15` explicitly states "any mutating value forces the agents-plugin approval gate." The implementation contradicts the documented contract. Fix: check `const isDestructive = ann?.destructive === true \|\| ann?.effect === "destructive";` (or broaden to `write`/`update` if the JSDoc intent holds).	security, correctness	0.95
2	`agents.ts:991-1023`	Sub-agent `childExecute` bypasses tool-call budget. The `toolCallsUsed` counter and budget check live in the top-level `executeTool` closure (line 735-745). `runSubAgent` creates its own `childExecute` that never increments or checks the budget. A sub-agent can make unlimited tool calls, defeating `limits.maxToolCalls`. Fix: pass the budget counter (or a shared budget object) into `runSubAgent` and check on every child tool call.	correctness, adversarial	0.95
3	`agents.ts:991-1023`	Sub-agent `childExecute` bypasses approval gate. The top-level `executeTool` checks `approvalPolicy.requireForDestructive` (line 751) before executing destructive tools and emits `approval_pending` events. `childExecute` in `runSubAgent` does none of this. A sub-agent calling a destructive tool executes it without human approval. Fix: apply the same approval check in `childExecute`, or factor the gate logic into a shared helper.	security	0.90
4	`agents.ts:661-707`	`/invocations` bypasses concurrent stream limit. `_handleChat` checks `countUserStreams(userId) >= limits.maxConcurrentStreamsPerUser` before streaming. `_handleInvocations` calls `_streamAgent` without the same check. A client can bypass the rate limit by hitting `/invocations` instead of `/chat`. Fix: add the same guard to `_handleInvocations`.	security, reliability	0.92

#	File	Issue	Reviewer	Confidence
5	`consume-adapter-stream.ts`, `normalize-result.ts`	Extracted utilities are dead code. `consumeAdapterStream` and `normalizeToolResult` were extracted into `core/agent/` (with tests), but no production code imports them. The same logic is still inlined in `agents.ts:815-826` (result normalization), `agents.ts:885-896` (stream consumption), `run-agent.ts:97-105`, and `agents.ts:1058-1067`. Three call sites duplicate the accumulation pattern. Either use the extracted functions or remove them.	maintainability	0.95
6	`agents.ts:148-156`	`reload()` is non-atomic. `this.agents.clear()` runs before `await this.loadAgents()`. If `loadAgents` throws (e.g. malformed markdown), the registry is empty and new requests get "No agent registered" errors. In-flight streams holding old references still work but new ones fail. Fix: build into a new Map, then swap on success.	reliability	0.85
7	`agents.ts:1072`	`_handleCancel` uses unsafe type assertion for `streamId`. Unlike `/chat`, `/approve`, and `/invocations` which use Zod schemas, the cancel route extracts `streamId` via `req.body as { streamId?: string }`. This is inconsistent and skips validation. Fix: add a small Zod schema (or validate inline).	api-contract, security	0.80
8	`agents.ts:815-826`	Tool result type inconsistency. When `serialized.length > MAX`, the return is a truncated string. When `<= MAX`, the return is the raw `result` (which may be an object). Adapters receive different types depending on length. The extracted `normalizeToolResult` has the same behavior (returns `result` not `serialized`). This is intentional (preserving structured data for short results), but consider documenting the contract or always returning strings.	correctness	0.70
9	`event-channel.ts:23-31`	EventChannel has no backpressure. `push()` accumulates into `this.queue` without bound. If the SSE consumer is slow (network backpressure, paused tab), the queue grows indefinitely. For typical chat streams this is fine; for high-throughput tool-calling agents it could cause memory pressure. Consider a max queue size with drop/error policy.	performance	0.65
10	`load-agents.ts:106,141,145`	Synchronous filesystem I/O in agent loader. `fs.readFileSync`, `fs.existsSync`, `fs.readdirSync` block the event loop. Acceptable during startup, but `reload()` can be called at runtime via the exported `appkit.agents.reload()` API, which would block the event loop while reading files.	performance	0.65

Conversation

MarioCadenas commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

createAgent(def) — pure factory

agents() plugin

Human-in-the-loop approval gate

Zero-trust MCP host policy

DoS caps

runAgent(def, input) — standalone executor

Event translation and thread storage

Frontmatter loader

Plumbing

Test plan

PR Stack

Demo

Uh oh!

pkosiec May 6, 2026

Choose a reason for hiding this comment

Uh oh!

pkosiec May 6, 2026

Choose a reason for hiding this comment

Uh oh!

pkosiec May 6, 2026

Choose a reason for hiding this comment

Uh oh!

pkosiec May 6, 2026

Choose a reason for hiding this comment

Code Review: agent/v2/4-agents-plugin

Scope

Intent

Review Team

Findings

P1 -- Critical

P2 -- Moderate

P3 -- Low

Coverage

Verdict: Not ready -- fix P1 items before merge

Uh oh!

pkosiec May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarioCadenas commented Apr 21, 2026 •

edited

Loading

`createAgent(def)` — pure factory

`agents()` plugin

`runAgent(def, input)` — standalone executor

Code Review: `agent/v2/4-agents-plugin`