Skip to content

Full Elixir#19

Merged
deepfates merged 161 commits into
mainfrom
import-copied-work
May 29, 2026
Merged

Full Elixir#19
deepfates merged 161 commits into
mainfrom
import-copied-work

Conversation

@deepfates

Copy link
Copy Markdown
Owner

Reattaches the work done in deepfates/grimoire (162 commits, full history preserved) onto cantrip's template history.

grimoire was created from the cantrip template, so its root "Initial commit" was a fresh snapshot of cantrip main's tip (identical tree) with no ancestry link. This branch re-parents that root onto cantrip main (a0841ee) via graft + history rewrite, so the branch descends from main and merges as a normal fast-forwardable PR.

Net change: replatform from the clj/+ex/+ts/+py/ monorepo scaffold to the Elixir implementation promoted to the repo root (lib/, test/, docs/).

deepfates and others added 30 commits March 22, 2026 21:31
Build a tests.yaml conformance runner (loader, runner, expect) that
exercises the Elixir implementation against the shared 71-case behavioral
spec. Fixed all discovered divergences:

- call_entity raises on error in code medium (COMP-6, COMP-8)
- cantrip_error propagation through child entities (COMP-8)
- child turn sequence preservation in parent loom (COMP-5, LOOM-8)
- ACP session inference when sessionId omitted (PROD-6, ENTITY-5)
- malformed done call treated as error, not termination (LOOP-7)
- tool result ID mismatch validation (LLM-7)
- circle rejects missing medium declaration (MEDIUM-1)

198 tests, 0 failures.
Six sections covering basic cast, multi-turn gates, streaming events,
custom gates, composition with call_entity, and loom table rendering.
All sections use FakeLLM — no API keys needed.
Instrument EntityServer with :telemetry events for entity lifecycle,
turn lifecycle, gate execution, and code medium evaluation. 8 new tests.

Events: [:cantrip, :entity, :start/:stop], [:cantrip, :turn, :start/:stop],
[:cantrip, :gate, :start/:stop], [:cantrip, :code, :eval]

206 tests, 0 failures.
Cantrip.Familiar builds a production-ready persistent coding assistant
with read_file, list_dir, search, and done gates. JSONL loom persistence.

Mix tasks:
- mix cantrip.familiar — REPL mode with persistent entity
- mix cantrip.cast "intent" — single-shot mode

12 new tests. 218 total, 0 failures.
Section 7 wires telemetry events into Kino widgets with color-coded
real-time display, plus summary tables for turn/gate metrics.
LLMs write done(x) instead of done.(x) — now both work. Source-level
transform adds dots before parsing, skipping strings and module-qualified
calls. 8 new tests. 230 total, 0 failures.
mix cantrip.familiar --acp starts an ACP stdio server using the
Familiar's gates and identity. New Runtime.Familiar module handles
session construction.
The familiar now uses code medium and constructs child cantrips at
runtime via cantrip()/cast()/cast_batch()/dispose() gates. Entity
writes Elixir that observes the codebase, builds specialized children
with chosen LLMs/mediums/gates/wards, and composes their results.

Replaces the previous conversation-medium filesystem assistant.
20 tests. 234 total, 0 failures.
The code-medium familiar can return maps from done(). Handle gracefully
with inspect/2 instead of crashing on String.Chars protocol.
Code medium naturally produces Elixir terms. The done gate now renders
non-binary values with inspect/2 instead of passing raw maps/lists
through to callers.
The entity's loom is now available as a plain variable in code medium.
No gate, no file read — just `loom.turns` to access conversation history
directly from process state.
Document the code-medium familiar with orchestration gates, loom as
data binding, ACP editor setup for Zed, Livebook notebook, telemetry
events, and 71/71 conformance. Remove stale limitations.
- Add BashMedium: shell command execution via System.cmd with SUBMIT:
  termination pattern, output truncation, configurable cwd/timeout
- Fix systemic normalize_opts bug: bare values (strings, numbers) passed
  to code-medium gates were silently erased to %{}. Gate closures now
  pass bare values through; call_entity wraps strings as %{intent: value};
  cantrip/cast_batch raise clear errors on invalid input
- Fix fork message reconstruction: include tool_calls on assistant messages
  and tool_call_id on tool messages; code-medium turns use user-message
  format instead of orphaned tool messages
- Add bash support to Circle (type normalization, tool_view, capability text),
  EntityServer (execute_turn routing, message construction), and Familiar
  (system prompt documents bash children, cast() return value clarity)
- Rewrite livebook demo with real LLM calls
- Add tests: bash medium (14), code medium bare-value ergonomics (3),
  fork message format (2)
The Agent Client Protocol uses "sessionUpdate" as the discriminant key
in session/update notifications, not "kind". Also strip extra fields
from PromptResponse — spec says only stopReason + optional _meta.

- protocol.ex: "kind" → "sessionUpdate", result is just {stopReason}
- Update all tests and fixtures to match corrected wire format
- Conformance expect checker now searches across all replies per step
The ACP client sends the project working directory but the Familiar
had no way to know where to look. Now the runtime appends the cwd
to the system prompt so the Familiar orients itself on first turn.
The "Familiar ACP server starting on stdio..." message was written
to stdout via Mix.shell().info, corrupting the JSON-RPC stream.
Add conformance runner, telemetry, Livebook, and familiar
Sandbox: Add Cantrip.CodeMedium.DuneSandbox as opt-in restricted
evaluation via %{sandbox: :dune} ward. Blocks File, System, Node,
Process, spawn. Gate closures work through Dune sessions with
persistent bindings across turns. 20 tests.

ReqLLM: Add Cantrip.LLMs.ReqLLM adapter (req_llm v1.9.0) supporting
18+ providers. llm_from_env now prefers ReqLLM for all known providers,
falling back to legacy adapters only when unavailable. 15 tests.

Verified: real LLM smoke test passes through both OpenAI and Anthropic
via ReqLLM. 288 tests, 0 failures.
Delete the old Protocol module and its tests (m11, m14, m15, m16).
AgentHandler is now the single ACP path — a plain module with ETS
state, no GenServer bottleneck. Each request runs in a Connection
Task concurrently.

- AgentHandler stores sessions and last_answer in public ETS
- meta field on NewSessionRequest passes through to runtime (for LLM injection in tests)
- Conformance runner updated to use AgentHandler with JSON reply reconstruction
- Familiar and divergence tests migrated to typed ACP structs
- Stdio integration test covers full JSON wire format via spawned BEAM
Use Dotenvy.source/2 with side_effect callback instead of the
custom load_dotenv function. Only sets env vars not already defined.
The catch-all clause in normalize_opts converted bare values (strings,
numbers) to empty maps. compile_and_load now uses the same inline
normalization as gate closures: maps/lists normalize, bare values pass
through.
Production deps: dotenvy ~> 0.8, nimble_options ~> 1.1,
agent_client_protocol (f1729 GitHub).
Test-only deps: mox ~> 1.2.
- Add EventBridge: translates {:cantrip_event, _} messages into ACP
  session_notification calls (tool_call, tool_call_update, thought chunks)
- AgentHandler spawns a bridge per prompt and injects stream_to into session
- Cantrip.summon/3 accepts opts (e.g. stream_to:) passed to EntityServer
- Runtimes (Cantrip, Familiar) forward stream_to from session to summon
Replace hand-rolled normalize_retry with NimbleOptions schema validation.
Provides clear error messages for invalid retry config (e.g. wrong types).
- WARD-1/COMP-6: extract_numerics guard n>0 → n>=0 so max_depth:0 is
  preserved during ward composition and delegation gates are stripped
- A.12: Save/restore :cantrip_familiar_store across eval Tasks so child
  cantrips constructed on turn N survive to turn N+1
- LLM-3: Preserve base_url and api_key through ReqLLM normalize_state
  and pass them in build_opts; extract OPENAI_BASE_URL in llm_from_env
- COMP-8: cast_batch sequential fallback now raises on child failure
  (is_error: true) matching cast behavior

Red-green TDD for each fix.
cantrip.cast was routing everything through the Familiar orchestrator
(code medium, filesystem gates, child cantrips). Now it creates a
minimal conversation cantrip with just a done gate — the simplest
useful cast per the spec. Use --familiar / -f for the orchestrator.
The spec says "if no medium is specified, the default is conversation"
but validate_medium rejected empty medium_sources. Now it accepts them,
matching Circle.new which already defaulted type to :conversation.
deepfates and others added 23 commits May 28, 2026 09:19
* fix: harden bash sandbox workloads

* test: show bash workload sandbox failures

* fix: restore bubblewrap /dev mount behavior

* fix: unshare user for bubblewrap network isolation

* fix: allow bwrap loopback setup

* test: split bash workload and netns coverage

* fix: avoid bwrap user namespace requirement

* ci: install uidmap for bubblewrap workloads

* ci: enable bubblewrap workload tests

* fix: address bash workload review

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
* fix: constrain public api docs surface

* test: derive public api guard from compiled modules

* docs: avoid supervisor names as module links
…icted-115

fix: make Familiar default sandbox unrestricted
max_turns accumulated across sends in a summoned entity (REPL / ACP
session). Once cumulative turns crossed the limit, every later intent
truncated immediately and the session was bricked — the visible
'How can that possibly be the max turn limit' symptom from dogfooding
mix cantrip.familiar. max_turns is meant to bound the work for ONE
intent, not the lifetime of the entity.

Reset the per-episode turn counter on each new intent; message history,
loom, and code_state still persist across sends.

Also point TMPDIR at the always-writable per-session sandbox dir so shell
heredocs / process substitution work on TMPDIR-honoring shells (modern
bash on Linux) without widening the sandbox. macOS bash 3.2 ignores
TMPDIR and uses /tmp, so heredocs there still need an explicit
bash_writable_paths entry — the sandbox stays deny-by-default.

Regression coverage: test/persistent_turn_budget_test.exs.
mix verify green: 642 tests, 0 failures, credo clean.
@deepfates deepfates force-pushed the import-copied-work branch from 3eb121a to fb3c893 Compare May 29, 2026 06:33
@deepfates deepfates requested a review from Copilot May 29, 2026 06:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@deepfates deepfates changed the title Import Elixir rewrite from grimoire Convert to full Elixir and level up May 29, 2026
@deepfates deepfates changed the title Convert to full Elixir and level up Full Elixir May 29, 2026
@deepfates deepfates merged commit 5de6e23 into main May 29, 2026
5 of 6 checks passed
@deepfates deepfates deleted the import-copied-work branch May 29, 2026 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants