Skip to content

feat: web search + fetch tools for agents (SearXNG-backed)#661

Open
Wirasm wants to merge 1 commit into
mainfrom
feature/web-search
Open

feat: web search + fetch tools for agents (SearXNG-backed)#661
Wirasm wants to merge 1 commit into
mainfrom
feature/web-search

Conversation

@Wirasm

@Wirasm Wirasm commented Jun 8, 2026

Copy link
Copy Markdown
Owner

What

Gives every kild agent two web tools so any model — including MiniMax, which has no provider-native search — can use the web:

  • webfetch — URL → markdown, in-process (turndown), keyless. Guards: 5 MB cap, 30/120 s timeouts, a UA fallback for bot walls, and an SSRF block on private/loopback hosts.
  • web_search — query → {title, url, snippet}, backed by a self-hosted SearXNG kild only points at via KILD_SEARXNG_URL. The engine never spawns or manages the container. The backend sits behind a SearchProvider seam, so DDG / fastCRW / Tavily are a new impl + a switch arm later — no tool or worker changes.

How it wires in

Both tools register into the worker's customTools (beside the room tools) and the CLI's in-process fallback, so engine and standalone kild run behave the same. webfetch is available whenever web is enabled; web_search only when a backend is configured (otherwise the worker logs a one-line notice). KILD_WEB=off disables both. No protocol or cockpit changes — the tools surface as the generic tool events the UI already renders.

Adds kild web search "<q>" / kild web fetch <url> debug commands (exercise the backend without spending agent tokens) and an optional infra/searxng/ compose for one-command bring-up.

Boundary

kild owns the tools; the search backend is external and swappable. webfetch needs no backend at all. This keeps the engine free of any container-lifecycle concern.

Validation

  • bun run typecheck, bun run lint, bun run compile — all green
  • bun test — 49 pass (+9 new: SearXNG JSON→hits mapping, fetch markdown/html/truncation/SSRF)
  • Live: kild web fetch https://example.com → clean markdown; search error paths + SSRF guard verified
  • ⏭️ SearXNG live container not run here (Docker daemon was down); provider mapping is unit-tested and infra/searxng/README.md has the bring-up + curl check

Try it

cd infra/searxng && docker compose up -d
export KILD_SEARXNG_URL=http://localhost:8888
cd ../../engine && bun run cli -- web search "anthropic claude opus" --json
bun run cli -- run --model minimax/MiniMax-M3 "Use web_search to find X, then summarize."

Follow-ups (deferred)

Per-session --web toggle + cockpit provider picker; DDG/fastCRW providers behind the seam; a text format for webfetch.

Give every kild agent two tools so any model — including MiniMax, which has
no provider-native search — can use the web:

- webfetch: URL → markdown, in-process (turndown), keyless. Guards: 5 MB
  cap, 30/120s timeouts, a UA fallback for bot walls, and an SSRF block on
  private/loopback hosts.
- web_search: query → {title,url,snippet}, backed by a self-hosted SearXNG
  kild only points at via KILD_SEARXNG_URL. The engine never spawns or
  manages the container; the backend sits behind a SearchProvider seam, so
  DDG/fastCRW/Tavily are a new impl + a switch arm later.

Both register into the worker's customTools (beside the room tools) and the
CLI's in-process fallback. webfetch is available whenever web is enabled;
web_search only when a backend is configured (else a one-line notice).
KILD_WEB=off disables both. Adds `kild web search/fetch` debug commands and
an optional infra/searxng compose for one-command bring-up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant