Full Elixir#19
Merged
Merged
Conversation
Build a tests.yaml conformance runner (loader, runner, expect) that exercises the Elixir implementation against the shared 71-case behavioral spec. Fixed all discovered divergences: - call_entity raises on error in code medium (COMP-6, COMP-8) - cantrip_error propagation through child entities (COMP-8) - child turn sequence preservation in parent loom (COMP-5, LOOM-8) - ACP session inference when sessionId omitted (PROD-6, ENTITY-5) - malformed done call treated as error, not termination (LOOP-7) - tool result ID mismatch validation (LLM-7) - circle rejects missing medium declaration (MEDIUM-1) 198 tests, 0 failures.
Six sections covering basic cast, multi-turn gates, streaming events, custom gates, composition with call_entity, and loom table rendering. All sections use FakeLLM — no API keys needed.
Instrument EntityServer with :telemetry events for entity lifecycle, turn lifecycle, gate execution, and code medium evaluation. 8 new tests. Events: [:cantrip, :entity, :start/:stop], [:cantrip, :turn, :start/:stop], [:cantrip, :gate, :start/:stop], [:cantrip, :code, :eval] 206 tests, 0 failures.
Cantrip.Familiar builds a production-ready persistent coding assistant with read_file, list_dir, search, and done gates. JSONL loom persistence. Mix tasks: - mix cantrip.familiar — REPL mode with persistent entity - mix cantrip.cast "intent" — single-shot mode 12 new tests. 218 total, 0 failures.
Section 7 wires telemetry events into Kino widgets with color-coded real-time display, plus summary tables for turn/gate metrics.
LLMs write done(x) instead of done.(x) — now both work. Source-level transform adds dots before parsing, skipping strings and module-qualified calls. 8 new tests. 230 total, 0 failures.
mix cantrip.familiar --acp starts an ACP stdio server using the Familiar's gates and identity. New Runtime.Familiar module handles session construction.
The familiar now uses code medium and constructs child cantrips at runtime via cantrip()/cast()/cast_batch()/dispose() gates. Entity writes Elixir that observes the codebase, builds specialized children with chosen LLMs/mediums/gates/wards, and composes their results. Replaces the previous conversation-medium filesystem assistant. 20 tests. 234 total, 0 failures.
The code-medium familiar can return maps from done(). Handle gracefully with inspect/2 instead of crashing on String.Chars protocol.
Code medium naturally produces Elixir terms. The done gate now renders non-binary values with inspect/2 instead of passing raw maps/lists through to callers.
The entity's loom is now available as a plain variable in code medium. No gate, no file read — just `loom.turns` to access conversation history directly from process state.
Document the code-medium familiar with orchestration gates, loom as data binding, ACP editor setup for Zed, Livebook notebook, telemetry events, and 71/71 conformance. Remove stale limitations.
- Add BashMedium: shell command execution via System.cmd with SUBMIT:
termination pattern, output truncation, configurable cwd/timeout
- Fix systemic normalize_opts bug: bare values (strings, numbers) passed
to code-medium gates were silently erased to %{}. Gate closures now
pass bare values through; call_entity wraps strings as %{intent: value};
cantrip/cast_batch raise clear errors on invalid input
- Fix fork message reconstruction: include tool_calls on assistant messages
and tool_call_id on tool messages; code-medium turns use user-message
format instead of orphaned tool messages
- Add bash support to Circle (type normalization, tool_view, capability text),
EntityServer (execute_turn routing, message construction), and Familiar
(system prompt documents bash children, cast() return value clarity)
- Rewrite livebook demo with real LLM calls
- Add tests: bash medium (14), code medium bare-value ergonomics (3),
fork message format (2)
The Agent Client Protocol uses "sessionUpdate" as the discriminant key
in session/update notifications, not "kind". Also strip extra fields
from PromptResponse — spec says only stopReason + optional _meta.
- protocol.ex: "kind" → "sessionUpdate", result is just {stopReason}
- Update all tests and fixtures to match corrected wire format
- Conformance expect checker now searches across all replies per step
The ACP client sends the project working directory but the Familiar had no way to know where to look. Now the runtime appends the cwd to the system prompt so the Familiar orients itself on first turn.
The "Familiar ACP server starting on stdio..." message was written to stdout via Mix.shell().info, corrupting the JSON-RPC stream.
Add conformance runner, telemetry, Livebook, and familiar
Sandbox: Add Cantrip.CodeMedium.DuneSandbox as opt-in restricted
evaluation via %{sandbox: :dune} ward. Blocks File, System, Node,
Process, spawn. Gate closures work through Dune sessions with
persistent bindings across turns. 20 tests.
ReqLLM: Add Cantrip.LLMs.ReqLLM adapter (req_llm v1.9.0) supporting
18+ providers. llm_from_env now prefers ReqLLM for all known providers,
falling back to legacy adapters only when unavailable. 15 tests.
Verified: real LLM smoke test passes through both OpenAI and Anthropic
via ReqLLM. 288 tests, 0 failures.
Delete the old Protocol module and its tests (m11, m14, m15, m16). AgentHandler is now the single ACP path — a plain module with ETS state, no GenServer bottleneck. Each request runs in a Connection Task concurrently. - AgentHandler stores sessions and last_answer in public ETS - meta field on NewSessionRequest passes through to runtime (for LLM injection in tests) - Conformance runner updated to use AgentHandler with JSON reply reconstruction - Familiar and divergence tests migrated to typed ACP structs - Stdio integration test covers full JSON wire format via spawned BEAM
Use Dotenvy.source/2 with side_effect callback instead of the custom load_dotenv function. Only sets env vars not already defined.
The catch-all clause in normalize_opts converted bare values (strings, numbers) to empty maps. compile_and_load now uses the same inline normalization as gate closures: maps/lists normalize, bare values pass through.
Production deps: dotenvy ~> 0.8, nimble_options ~> 1.1, agent_client_protocol (f1729 GitHub). Test-only deps: mox ~> 1.2.
- Add EventBridge: translates {:cantrip_event, _} messages into ACP
session_notification calls (tool_call, tool_call_update, thought chunks)
- AgentHandler spawns a bridge per prompt and injects stream_to into session
- Cantrip.summon/3 accepts opts (e.g. stream_to:) passed to EntityServer
- Runtimes (Cantrip, Familiar) forward stream_to from session to summon
Replace hand-rolled normalize_retry with NimbleOptions schema validation. Provides clear error messages for invalid retry config (e.g. wrong types).
- WARD-1/COMP-6: extract_numerics guard n>0 → n>=0 so max_depth:0 is preserved during ward composition and delegation gates are stripped - A.12: Save/restore :cantrip_familiar_store across eval Tasks so child cantrips constructed on turn N survive to turn N+1 - LLM-3: Preserve base_url and api_key through ReqLLM normalize_state and pass them in build_opts; extract OPENAI_BASE_URL in llm_from_env - COMP-8: cast_batch sequential fallback now raises on child failure (is_error: true) matching cast behavior Red-green TDD for each fix.
cantrip.cast was routing everything through the Familiar orchestrator (code medium, filesystem gates, child cantrips). Now it creates a minimal conversation cantrip with just a done gate — the simplest useful cast per the spec. Use --familiar / -f for the orchestrator.
The spec says "if no medium is specified, the default is conversation" but validate_medium rejected empty medium_sources. Now it accepts them, matching Circle.new which already defaulted type to :conversation.
* fix: harden bash sandbox workloads * test: show bash workload sandbox failures * fix: restore bubblewrap /dev mount behavior * fix: unshare user for bubblewrap network isolation * fix: allow bwrap loopback setup * test: split bash workload and netns coverage * fix: avoid bwrap user namespace requirement * ci: install uidmap for bubblewrap workloads * ci: enable bubblewrap workload tests * fix: address bash workload review --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]>
* fix: constrain public api docs surface * test: derive public api guard from compiled modules * docs: avoid supervisor names as module links
…owup chore: post-v1.3.2 hardening followup
…icted-115 fix: make Familiar default sandbox unrestricted
max_turns accumulated across sends in a summoned entity (REPL / ACP session). Once cumulative turns crossed the limit, every later intent truncated immediately and the session was bricked — the visible 'How can that possibly be the max turn limit' symptom from dogfooding mix cantrip.familiar. max_turns is meant to bound the work for ONE intent, not the lifetime of the entity. Reset the per-episode turn counter on each new intent; message history, loom, and code_state still persist across sends. Also point TMPDIR at the always-writable per-session sandbox dir so shell heredocs / process substitution work on TMPDIR-honoring shells (modern bash on Linux) without widening the sandbox. macOS bash 3.2 ignores TMPDIR and uses /tmp, so heredocs there still need an explicit bash_writable_paths entry — the sandbox stays deny-by-default. Regression coverage: test/persistent_turn_budget_test.exs. mix verify green: 642 tests, 0 failures, credo clean.
3eb121a to
fb3c893
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reattaches the work done in
deepfates/grimoire(162 commits, full history preserved) onto cantrip's template history.grimoire was created from the cantrip template, so its root "Initial commit" was a fresh snapshot of cantrip
main's tip (identical tree) with no ancestry link. This branch re-parents that root onto cantripmain(a0841ee) via graft + history rewrite, so the branch descends frommainand merges as a normal fast-forwardable PR.Net change: replatform from the
clj/+ex/+ts/+py/monorepo scaffold to the Elixir implementation promoted to the repo root (lib/,test/,docs/).