An always-on codebase engineer that audits, optimizes, and keeps your code healthy — delivered with proof, not promises.
Quick start · What Kai does · Proof · Run anywhere · Extend · Architecture
/teleport — hand your live session to a cloud sandbox and keep the conversation going from any browser. Watch the full-resolution clip →
Kai is an autonomous AI engineer that runs on your codebase. Talk to it like you'd talk to a senior teammate; it reads your code, picks what matters, runs the work, and comes back with verified results.
- Code audits — finds real vulnerabilities and proves them with working exploit code, then proposes verified fixes.
- Code optimization — hunts the most expensive functions, runs evolutionary search against a custom fitness function, and ships benchmarked improvements as PRs.
- Codebase hygiene — unifies patterns left by different AI tools, removes dead code, fixes naming drift, keeps the codebase coherent as it grows.
- PR reviews that run the code — simulates edge cases instead of leaving comments.
- Continuous monitoring — schedules recurring work via cron and reports back only when something matters.
- Run anywhere —
/teleporta session to a cloud sandbox and continue in the browser, or/hand-offa long task to a sandboxed agent while you keep working locally.
Everything Kai ships has been through an isolated verification harness. If it can't prove a finding or confirm a fix, it marks the result as unverified. No confidence theater.
Kai's two specialist harnesses don't just describe problems — they produce artifacts you can run and review. Both return a normalized report (harness, status, summary, findings, artifacts, metrics), so a finding always comes with the evidence attached.
Security — security_scan runs the kai-security pipeline: discover → verify → write a PoC exploit → patch. A finding only lands as verified once the exploit actually fires.
Optimization — evolve_optimize runs AlphaEvolve-style evolutionary search (mutate → evaluate → select) against a fitness function you define, and returns the best program it found plus the metrics that beat the baseline.
// evolve_optimize on a hot path, scored by your evaluator (illustrative output)
{
"harness": "kai_evolve",
"status": "ok",
"summary": "best of 40 candidates over 12 generations",
"metrics": { "score": 0.991, "p50_latency_ms": 31.4, "baseline_latency_ms": 88.7 },
"artifacts": ["runs/best/best_program.py"]
}Numbers depend entirely on your repo and fitness function — the point is that every result is backed by a runnable artifact (a PoC, a patch diff, an evolved program), not a confidence score. See Extend Kai to enable these harnesses.
Requirements: Python 3.12+ and an LLM API key (OpenRouter / Anthropic / OpenAI).
Clone with submodules — the terminal tool depends on the mini-swe-agent submodule:
git clone --recurse-submodules https://github.com/firstbatchxyz/kai-agent.git
cd kai-agent
# already cloned without --recurse-submodules? run: git submodule update --initNo account or backend needed; bring your own LLM key. Either the web UI or the terminal CLI:
pip install -e ".[ui]" # FastAPI + the web UI extras
kai ui # opens http://127.0.0.1:3001 — walks you through setupkai ui forces local mode (KAI_LOCAL_MODE=1): your key, tokens, and data live in
~/.kai-agent/ and nothing is sent to a Kai backend. See ui/README.md
for make run / make dev and the local architecture. For the terminal REPL instead:
pip install -e ".[all]"
cp .env.example .env # set OPENROUTER_API_KEY (or ANTHROPIC_API_KEY / OPENAI_API_KEY)
cp cli-config.yaml.example cli-config.yaml
python cli.pySingle-shot query instead of the interactive REPL:
python cli.py --query="Audit my top repo for exploits. Lead with the highest-impact finding."To run against the managed Kai platform (hosted audits, evolution, etc.), additionally
set a KAI_JWT_TOKEN from the Kai platform in .env. The JWT is only needed for
hosted features — local mode above never requires it.
More providers (z.ai / GLM, Kimi, MiniMax, Nous Portal) are configurable in cli-config.yaml. Terminal execution can run locally, in Docker, on Modal, or over SSH — see .env.example.
- It speaks in findings, not capabilities. After a brief intro, no more self-description.
- It's concrete. "142K LOC across 3 services, 12 vulnerable deps, 47 changes to
payment-servicein 30 days" beats "I can see your repos." - It ranks by impact × confidence × visibility. The first thing it surfaces is the most impressive thing it can prove.
- Read-only work (browsing repos, listing resources, checking infrastructure) happens without asking. Anything that costs credits or modifies external systems waits for your confirmation.
Kai's session isn't tied to your terminal. Two commands move the work to a cloud sandbox over a provider-agnostic control plane:
/teleport— copies your workspace and the live conversation to a sandbox, then resumes the same session in your browser at the sandbox URL (that's the demo at the top). Close the laptop; pick it back up from the web. Your local TUI stays attached and can end the sandbox at any time./hand-off— dispatches a full autonomous Kai into a sandbox as a background sub-agent. Keep talking to your local session while it works; it reports back when finished. Check status with/handoffs.
Pick the backend with KAI_SANDBOX_PROVIDER:
| Provider | Value | Notes |
|---|---|---|
| E2B | e2b (default) |
hosted micro-VM sandboxes |
| Vercel Sandbox | vercel |
ephemeral sandboxes on Vercel |
| Local | local |
a working dir on your machine — no cloud, great for testing |
KAI_SANDBOX_PROVIDER=local python cli.py # then: /teleport or /hand-off "investigate TODO.md and report back"Kai has two sets of tools:
Built-in — always available, run inside the agent process: web search (Firecrawl), a full terminal (local / Docker / Modal / SSH), file operations, browser automation, session memory, skill management, cron scheduling, delegation, image generation, TTS, and code execution.
Kai platform (MCP) — tools the backend exposes over Model Context Protocol. A small always-on core (workspaces, repos, lifecycle actions, agent data). The rest are grouped into categories and activated on demand:
audit— start audits, list vulnerabilities, generate exploitsoptimization— run evolutions, manage evaluators, monitor progressrepo_management— add GitHub repos, upload local codebudget— credits, billing, subscription usageintegrations— create GitHub issues, Jira ticketsartifacts— generate and share security reports
Skills are reusable workflow playbooks Kai can load mid-session. Out of the box Kai ships with skills for audit workflows, optimization runs, evaluator writing, vulnerability triage, continuous monitoring, serverless optimization, and more. Kai can also save new workflows it discovers as skills, so it improves at your codebase over time.
Browse skills/ for the full catalog.
| Platform | Transport | Use case |
|---|---|---|
| CLI | interactive REPL or --query |
local dev, scripting |
| Slack | Socket Mode / Events API | team workspace, shared assistant |
| Discord | Gateway | community research bot |
| Telegram | Bot API | personal assistant |
| IMAP / SMTP | async reports |
Gateway adapters live under gateway/platforms/.
Kai has a small hook-based plugin system (the OSS extensibility seam). A plugin is a directory with a plugin.yaml and an __init__.py that registers tools and/or lifecycle hooks — pre_tool_call, post_tool_call, transform_tool_result, on_session_start, on_session_end. Drop one into ~/.kai-agent/plugins/ (or the bundled plugins/) and it's discovered at startup. plugins/example_echo/ is a minimal template.
Two reference plugins ship in plugins/ — Kai's specialist harnesses. Each installs into its own isolated uv venv, so their heavy dependencies never touch the agent's environment, and each can run locally or in a remote sandbox through the same control plane as /hand-off:
- kai-security (
security_scan) — the security pipeline behind the example above: vulnerability discovery → verification → PoC exploit → patch. - kai-evolve (
evolve_optimize) — AlphaEvolve-style evolutionary optimization against a fitness function, returning the best evolved program.
Enable them in ~/.kai-agent/config.yaml:
plugins:
enabled: [kai_security, kai_evolve]Then start with the matching toolset:
python cli.py --toolsets security # security_scan available
python cli.py --toolsets evolve # evolve_optimize available┌────────────────────────────────────────────────────────────┐
│ Kai Agent │
│ │
│ ┌────────────┐ ┌────────────────────────────────┐ │
│ │ Persona + │ │ Tool registry │ │
│ │ guidance │──────▶ built-in: web, terminal, │ │
│ │ blocks │ │ file, browser, memory, cron, │ │
│ └────────────┘ │ execute_code, delegate … │ │
│ │ │ │ │
│ ▼ │ MCP (kai_*): workspaces, repos,│ │
│ ┌────────────┐ │ audit, optimization, billing, │ │
│ │ Skills │─────▶│ integrations, artifacts │ │
│ │ (loaded on │ └────────────────────────────────┘ │
│ │ demand) │ │
│ └────────────┘ │
└──────────────┬─────────────────────────────────────────────┘
│
┌────────┼────────┐
▼ ▼ ▼
CLI Slack Other gateways
run_agent.py— theAIAgentclass; conversation loop, tool-call dispatch, context management.agent/prompt_builder.py— Kai's identity and the guidance blocks that shape behavior.tools/— tool implementations + central registry.toolsets.py— which tools belong to which toolset, filtering logic.skills/— workflow playbooks (each is a directory with aSKILL.md).plugins/— hook-based plugins + the bundled sub-harnesses (kai-security, kai-evolve).deploy/sandbox/— the provider-agnostic sandbox control plane behind/teleportand/hand-off.gateway/— messaging-platform adapters.deploy/e2b-template/— the E2B sandbox template Kai runs in when hosted.
Deeper dive: docs/AGENTS.md for a contributor-facing tour of the codebase.
The essentials live in two files:
.env— secrets and provider keys. Copy.env.example.cli-config.yaml— model defaults, provider routing, terminal backend, display preferences. Copycli-config.yaml.example.
User-level settings (MCP server URLs, saved sessions, skills, memory, enabled plugins) live in ~/.kai-agent/. The CLI writes to this location, so it persists across updates.
See CONTRIBUTING.md for the dev setup, what to build (and what not to), and the PR process. Priorities: bug fixes first, then cross-platform compatibility and security hardening, then skills. New tools are rare — most capabilities should be skills.
Run the test suite:
pytestMIT. See LICENSE.
Kai Agent was originally forked from Hermes Agent by Nous Research (MIT). The core conversation loop and tool-dispatch architecture trace back to that project; Kai has since grown its own identity, skill system, MCP integration with the Kai platform, and domain-specific toolset.
