ACL2 Verified Agent

A formally verified ReAct agent implemented in ACL2 with FTY types. The agent's decision logic is mathematically proven correct, while external tools (LLMs, code execution) are accessed via the Model Context Protocol (MCP).

What is This?

This project demonstrates how to build AI agents with formally verified decision logic:

Proven Safety: ACL2 proves that the agent respects permissions, stays within budget, and terminates
Proven Correctness: State transitions preserve invariants; context management preserves system prompts
Practical Integration: LLM integration via local (LM Studio) or cloud (OpenAI) providers, code execution via MCP

Features

✅ FTY-typed agent state with step counters, budgets, permissions, and conversation history
✅ Permission model with file access levels and code execution controls
✅ Budget tracking for tokens and time
✅ Context management with sliding window truncation that preserves system prompts
✅ Proven termination via max-steps bound
✅ MCP integration for ACL2 code execution with persistent sessions
✅ LLM integration via local (LM Studio) or cloud (OpenAI) providers
✅ Cloud provider support with OpenAI, custom endpoints (Anthropic, Azure planned)
✅ Parinfer integration to auto-fix unbalanced parens in LLM-generated code

Proven Properties

All theorems are in verified-agent.lisp unless noted otherwise.

Safety & Termination

Theorem	What It Proves
`permission-safety`	Tool invocation requires permission
`budget-bounds-after-deduct`	Budgets remain non-negative after deduction
`termination-by-max-steps`	Reaching max-steps forces agent to respond
`remaining-steps-decreases-after-increment`	Step counter progress guarantees termination
`error-state-forces-must-respond`	Internal errors halt the agent

State Partitioning

Theorem	What It Proves
`continue-respond-partition`	Agent is always in exactly one of: must-respond, should-continue, or satisfied
`step-increases-after-increment`	Step counter strictly increases each iteration

Conversation Safety (tool results don't break the agent)

Theorem	What It Proves
`add-tool-result-preserves-error-state`	Tool results don't change internal error state
`add-tool-result-preserves-has-error-p`	Tool results don't affect error status
`add-tool-result-preserves-done`	Tool results don't change done flag
`add-assistant-msg-preserves-must-respond-p`	Assistant messages don't change termination status

Context Management (context-manager.lisp)

Theorem	What It Proves
`truncate-preserves-system-prompt`	System message survives context truncation
`truncate-to-fit-length-bound`	Truncated list never exceeds original length
`drop-oldest-until-fit-is-sublist`	Dropped messages are a sublist of original
`add-message-returns-list`	Adding messages preserves list type

External Tool Axioms (encapsulated)

Axiom	What It Guarantees
`external-tool-call-returns-list`	Tool calls return a proper list
`external-tool-call-bounded`	Response length is bounded (resource safety)

Type Preservation

Theorem	What It Proves
`react-step-preserves-agent-state`	ReAct step returns valid agent state
`deduct-preserves-agent-state`	Budget deduction returns valid agent state
`increment-preserves-agent-state`	Step increment returns valid agent state

Note on error handling: Tool execution errors are not internal errors—they are returned to the agent as messages so it can see and recover from them. The add-tool-result-preserves-* theorems prove this is safe. Only infrastructure failures (LLM unreachable, budget exhausted) halt the loop.

Quick Start

Prerequisites

VS Code with Dev Containers extension
Docker
LM Studio (optional, for local LLM)
OpenAI API Key (optional, for cloud LLM)

1. Clone and Open in Dev Container

git clone https://github.com/YOUR_USERNAME/verified-agent.git
cd verified-agent
code .
# When prompted, click "Reopen in Container"

2. Certify the ACL2 Books

cd src
cert.pl verified-agent.lisp

3. Start acl2-mcp

pip install mcp-proxy
mcp-proxy acl2-mcp --transport streamablehttp --port 8000 --pass-environment

4. Run Interactive Demo (requires LM Studio)

Start LM Studio with a model loaded, then in ACL2:

(ld "chat-demo.lisp")
(interactive-chat-loop *agent-v1* *model-id* state)

4b. Alternative: Use OpenAI (Cloud Provider)

You can use OpenAI instead of a local LLM:

(ld "chat-openai.lisp")

;; Quick start with your API key
(chat-with-openai "sk-your-api-key-here" state)

;; Or use GPT-4o for best results
(chat-with-gpt4o "sk-your-api-key-here" state)

Available models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, etc.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Verified Agent (ACL2)                        │
│  Proven: permission-safety, budget-bounds, termination          │
├─────────────────────────────────────────────────────────────────┤
│  ReAct Loop: react-step → LLM → extract code → execute → loop   │
├─────────────────────────────────────────────────────────────────┤
│  Context Manager: truncate-to-fit preserves system prompt       │
├─────────────────────────────────────────────────────────────────┤
│  Decision Functions: can-invoke-tool-p, must-respond-p          │
├─────────────────────────────────────────────────────────────────┤
│  FTY Types: agent-state, tool-spec, chat-message, error-kind    │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ MCP / HTTP
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  External Tools: acl2-mcp (code execution), LM Studio (LLM)     │
└─────────────────────────────────────────────────────────────────┘

File Structure

verified-agent/
├── .devcontainer/
│   └── devcontainer.json       # Dev container config for ACL2 environment
├── .github/
│   └── copilot-instructions.md # AI assistant guidance
├── src/                        # ACL2 source files
│   ├── verified-agent.lisp     # Core: FTY types, decision functions, safety theorems
│   ├── context-manager.lisp    # Conversation history with truncation proofs
│   ├── llm-types.lisp          # FTY types for chat messages
│   ├── llm-client.lisp         # HTTP client for LM Studio
│   ├── llm-client-raw.lsp      # Raw Lisp JSON serialization
│   ├── http-json.lisp          # HTTP POST/GET with JSON
│   ├── http-json-raw.lsp       # Raw Lisp HTTP implementation
│   ├── mcp-client.lisp         # MCP JSON-RPC client
│   ├── mcp-client-raw.lsp      # Raw Lisp MCP serialization
│   ├── agent-runner.lisp       # Runtime driver for code execution
│   ├── parinfer-fixer.lisp     # Fix unbalanced parens in LLM output
│   ├── chat-demo.lisp          # Interactive demo (local LLM)
│   ├── chat-openai.lisp        # Interactive demo (OpenAI cloud)
│   └── Verified_Agent_Spec.md  # Full specification
├── acl2-mcp/                   # Python MCP server
│   ├── acl2_mcp/
│   │   ├── __init__.py
│   │   └── server.py           # MCP server (15 tools for ACL2)
│   ├── pyproject.toml
│   ├── README.md
│   ├── LICENSE
│   └── SECURITY.md
├── .gitignore
├── CLAUDE.md                   # Quick context for AI assistants
├── LICENSE                     # BSD 3-Clause
├── Makefile                    # Build automation
└── README.md                   # This file

How It Works

The Agent State

(fty::defprod agent-state
  ((step-counter natp :default 0)
   (max-steps natp :default 100)
   (token-budget natp :default 10000)
   (time-budget natp :default 3600)
   (file-access natp :default 0)          ; 0=none, 1=read, 2=write
   (execute-allowed booleanp :default nil)
   (messages chat-message-list-p :default nil)
   (satisfaction natp :default 0)
   (done booleanp :default nil)
   (error-state error-kind-p :default '(:none)))
  :layout :list)

Decision Logic

The agent decides what to do based on pure functions:

;; Can we invoke this tool?
(can-invoke-tool-p tool st) = (tool-permitted-p tool st) 
                             AND (tool-budget-sufficient-p tool st)

;; Must we stop?
(must-respond-p st) = done OR has-error OR (step-counter >= max-steps)
                      OR (token-budget = 0) OR (time-budget = 0)

Code Execution via MCP

The agent can execute ACL2 code through the MCP protocol:

;; LLM writes code in markdown blocks
;; ```acl2
;; (+ 1 2 3)
;; ```

;; Agent extracts and executes via MCP
(mcp-acl2-evaluate conn "(+ 1 2 3)")  ; => "6"

LLM Providers

The agent supports both local and cloud LLM providers:

Provider	Description	Setup
Local (LM Studio)	Run models locally on your machine	Install LM Studio, load a model
OpenAI	Cloud-hosted GPT-4, GPT-3.5, etc.	Get API key from OpenAI
Custom	Any OpenAI-compatible API	Provide endpoint URL

Provider Configuration:

;; Local LM Studio (default)
(make-local-provider-config "model-name")

;; OpenAI
(make-openai-provider-config "sk-..." "gpt-4o-mini")

;; Custom OpenAI-compatible endpoint
(make-custom-provider-config "https://my-api.com/v1/chat/completions" 
                             "api-key" 
                             "model-name")

Using a provider:

;; Single chat completion
(llm-chat-completion-with-provider config messages state)

;; Interactive chat loop
(interactive-chat-loop-with-provider agent-state config state)

Parinfer: Fixing LLM Code Errors

LLMs often generate Lisp code with unbalanced parentheses, even though the indentation is correct. The agent uses parinfer-rust to automatically fix these errors before execution:

;; LLM output (missing closing parens):
(defun factorial (n)
  (if (zp n)
      1
    (* n (factorial (1- n)

;; After parinfer fix:
(defun factorial (n)
  (if (zp n)
      1
    (* n (factorial (1- n)))))

Install parinfer-rust:

make install-parinfer  # Installs Rust + parinfer-rust from GitHub
make test-parinfer     # Verify installation

Development

Running Tests

# Certify all books
cd src && cert.pl verified-agent.lisp

# Run MCP server tests
cd acl2-mcp && python -m pytest

Starting MCP Server for Testing

# Via mcp-proxy for HTTP transport
mcp-proxy acl2-mcp --transport streamablehttp --port 8000 --pass-environment

Design Philosophy

Verify the decision logic, not the world — ACL2 proves properties about how the agent decides, given any external responses
FTY over STObj — Cleaner types, auto-generated theorems, easier reasoning
MCP for external tools — Standard protocol for tool integration
Keep verified core simple — Complex I/O in external driver, proofs in ACL2
Fix LLM output with parinfer — Automatically correct unbalanced parens using indentation

License

BSD 3-Clause License. See LICENSE.

Acknowledgments

Built with ACL2
LLM integration via LM Studio
MCP implementation using MCP Python SDK
Paren fixing via parinfer-rust

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.devcontainer		.devcontainer
.github		.github
acl2-mcp		acl2-mcp
docs		docs
src		src
utils		utils
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
slides.html		slides.html
slides.ipynb		slides.ipynb
verified-agent.code-workspace		verified-agent.code-workspace

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ACL2 Verified Agent

What is This?

Features

Proven Properties

Safety & Termination

State Partitioning

Conversation Safety (tool results don't break the agent)

Context Management (context-manager.lisp)

External Tool Axioms (encapsulated)

Type Preservation

Quick Start

Prerequisites

1. Clone and Open in Dev Container

2. Certify the ACL2 Books

3. Start acl2-mcp

4. Run Interactive Demo (requires LM Studio)

4b. Alternative: Use OpenAI (Cloud Provider)

Architecture

File Structure

How It Works

The Agent State

Decision Logic

Code Execution via MCP

LLM Providers

Parinfer: Fixing LLM Code Errors

Development

Running Tests

Starting MCP Server for Testing

Design Philosophy

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages