agent-loop

A framework-free agent runtime: a single LLM driving a tool-use loop, until it answers.

agent-loop is a minimal, provider-agnostic agent runtime. No graph engine, no orchestration DSL (domain-specific language) — just one interpreting LLM that selects tools, sees their raw results, and composes the final answer. Works with both OpenAI and Gemini from the same code.

from agent_loop import AgentLoop, tool
from agent_loop.providers import OpenAIProvider

@tool
def city_population(city: str) -> dict:
    """Look up a city's population. Returns raw data, not prose."""
    return {"city": city, "population": {"osaka": 2691000}.get(city.lower())}

agent = AgentLoop(provider=OpenAIProvider(), tools=[city_population], model="gpt-5.4-nano")
print(agent.run("How many people live in Osaka?").text)

Why this exists: three generations of design

This library is the distilled core of a production chat system. Its value is not "I built an agent" — anyone can. The value is the judgment that collapsed an over-built multi-agent system back into a single loop, and being able to tell the real story of how that happened — including the wrong turn in the middle.

The honest version: Gen 1 worked fine. Gen 2 was not a fix for a broken Gen 1 — it was a design bet (part learning experiment) to scale to new question categories, and it backfired. Gen 3 is the recovery.

Gen 1  Structured RAG          The app is in charge
   |   LLM only extracts params (NLU); the app builds parameterized SQL from
   |   fixed dictionaries over ground-truth data. No free-form SQL, columns are
   |   dictionary-bound, params whitelisted. Correct, but rigid and weak at
   |   combining multiple data sources.
   |
   |   "We want to add more categories (split stats, etc.) in one chat.
   |    If categories grow, won't dividing the work across specialist agents
   |    scale better — and maybe keep things accurate as it gets complex?"
   v
Gen 2  Supervisor + sub-agents (graph)   Many LLMs in charge
   |   One question gets re-interpreted by 4 separate LLMs
   |   (routing -> sub-agent planner -> in-tool NLU -> synthesizer).
   |   Result: a "telephone game" — intent drifts at each hop, accuracy DROPS.
   |   Dividing the work was counterproductive. (We even considered reverting.)
   v
Gen 3  Agent Loop (tool use)   ONE interpreting LLM in charge
       Collapse interpretation into a single LLM that calls verified tools.
       Telephone game gone; cross-source composition handled by one mind;
       4 LLM hops -> 1-2.

The counter-intuitive lesson: when scope grew, the answer was fewer interpreters, not more agents. Adding specialist agents lost accuracy because every hop re-interpreted the user's intent. The fix was not to revert to Gen 1, but to keep one interpreting LLM and let it call verified, raw-data tools.

Note what this story does not claim: Gen 1 was never the villain. It had no hallucination problem (it never let the LLM write SQL), and it was not "fixed" by Gen 3. Gen 2's telephone game was a new problem we introduced, and Gen 3 is the correction. Owning that — rather than inventing a tidy linear rationale — is the point.

This is also not framework evangelism, nor framework denial. The original system kept a graph engine only for the one path that genuinely needs plan-and-execute with parallel fan-out — and used this loop for everything else. Choose by requirement, not by ideology.

The four ideas it generalizes

#	Idea	What it means
L1	Tool-use loop	One LLM picks tools with structured args -> execute -> feed results back -> repeat until a final answer (bounded by `max_iterations`).
L2	Tools return raw data; the orchestrator composes	Tools are pure data fetchers. The calling LLM synthesizes — which lets it combine results across multiple tools.
L3	Bounded, classified reflection	On failure, classify the error (retryable / not) and retry only the retryable ones, up to a limit. No unbounded self-correction.
L4	Right tool for the job	The library never forces the loop. It coexists with heavier orchestration for the paths that need it.

Install

pip install -e ".[openai]"     # or ".[gemini]", or ".[openai,gemini]"

The core is dependency-free. Provider SDKs are optional extras and load lazily, so importing agent_loop never requires openai or google-genai.

Quickstart (clone -> one API key -> running in minutes)

pip install -e ".[openai]"
export OPENAI_API_KEY=...                       # never commit your key
python examples/toy_agent/run.py "What is 12 * 8, and how big is Osaka?"

Switch providers with no code change:

pip install -e ".[gemini]"
export GEMINI_API_KEY=...
export AGENT_LOOP_PROVIDER=gemini
python examples/toy_agent/run.py "What is 12 * 8, and how big is Osaka?"

Bounded reflection

from agent_loop import AgentLoop, BoundedReflection

agent = AgentLoop(
    provider=provider,
    tools=[...],
    model="...",
    reflection=BoundedReflection(max_retries=2),   # retry only retryable errors
)

When a tool fails, the policy retries retryable errors up to the limit, then feeds the error back to the LLM as raw data instead of crashing the loop.

Observability

The loop emits tool_start / tool_end / final events (SSE-friendly). The default is zero-dependency; plug in any sink (a function) to stream progress or record cost.

def on_event(event):
    print(event.type, event.data)

agent = AgentLoop(provider=provider, tools=[...], model="...", events=on_event)

Design boundaries

Not a full framework. Plan-and-execute is explicitly out of scope; that defeats the reason this loop exists.
Loose coupling by default. No required integration with any other library; optional adapters only.
Provider differences absorbed in one layer. Gemini calls it function calling, OpenAI calls it tool use — same idea, normalized behind one contract.

Layout

src/agent_loop/
  loop.py            # the tool-use loop + max_iterations
  tools.py           # @tool decorator / ToolSpec (tools return raw data)
  reflection.py      # bounded, classified retry policy
  providers/         # gemini + openai (tool-calling), behind one contract
  events.py          # SSE-friendly events
examples/toy_agent/  # domain-free agent (calculator + lookup)
tests/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
examples/toy_agent		examples/toy_agent
notebooks		notebooks
src/agent_loop		src/agent_loop
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-loop

Why this exists: three generations of design

The four ideas it generalizes

Install

Quickstart (clone -> one API key -> running in minutes)

Bounded reflection

Observability

Design boundaries

Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-loop

Why this exists: three generations of design

The four ideas it generalizes

Install

Quickstart (clone -> one API key -> running in minutes)

Bounded reflection

Observability

Design boundaries

Layout

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages