A framework-free agent runtime: a single LLM driving a tool-use loop, until it answers.
agent-loop is a minimal, provider-agnostic agent runtime. No graph engine, no
orchestration DSL (domain-specific language) — just one interpreting LLM that
selects tools, sees their raw results, and composes the final answer. Works with
both OpenAI and Gemini from the same code.
from agent_loop import AgentLoop, tool
from agent_loop.providers import OpenAIProvider
@tool
def city_population(city: str) -> dict:
"""Look up a city's population. Returns raw data, not prose."""
return {"city": city, "population": {"osaka": 2691000}.get(city.lower())}
agent = AgentLoop(provider=OpenAIProvider(), tools=[city_population], model="gpt-5.4-nano")
print(agent.run("How many people live in Osaka?").text)This library is the distilled core of a production chat system. Its value is not "I built an agent" — anyone can. The value is the judgment that collapsed an over-built multi-agent system back into a single loop, and being able to tell the real story of how that happened — including the wrong turn in the middle.
The honest version: Gen 1 worked fine. Gen 2 was not a fix for a broken Gen 1 — it was a design bet (part learning experiment) to scale to new question categories, and it backfired. Gen 3 is the recovery.
Gen 1 Structured RAG The app is in charge
| LLM only extracts params (NLU); the app builds parameterized SQL from
| fixed dictionaries over ground-truth data. No free-form SQL, columns are
| dictionary-bound, params whitelisted. Correct, but rigid and weak at
| combining multiple data sources.
|
| "We want to add more categories (split stats, etc.) in one chat.
| If categories grow, won't dividing the work across specialist agents
| scale better — and maybe keep things accurate as it gets complex?"
v
Gen 2 Supervisor + sub-agents (graph) Many LLMs in charge
| One question gets re-interpreted by 4 separate LLMs
| (routing -> sub-agent planner -> in-tool NLU -> synthesizer).
| Result: a "telephone game" — intent drifts at each hop, accuracy DROPS.
| Dividing the work was counterproductive. (We even considered reverting.)
v
Gen 3 Agent Loop (tool use) ONE interpreting LLM in charge
Collapse interpretation into a single LLM that calls verified tools.
Telephone game gone; cross-source composition handled by one mind;
4 LLM hops -> 1-2.
The counter-intuitive lesson: when scope grew, the answer was fewer interpreters, not more agents. Adding specialist agents lost accuracy because every hop re-interpreted the user's intent. The fix was not to revert to Gen 1, but to keep one interpreting LLM and let it call verified, raw-data tools.
Note what this story does not claim: Gen 1 was never the villain. It had no hallucination problem (it never let the LLM write SQL), and it was not "fixed" by Gen 3. Gen 2's telephone game was a new problem we introduced, and Gen 3 is the correction. Owning that — rather than inventing a tidy linear rationale — is the point.
This is also not framework evangelism, nor framework denial. The original system kept a graph engine only for the one path that genuinely needs plan-and-execute with parallel fan-out — and used this loop for everything else. Choose by requirement, not by ideology.
| # | Idea | What it means |
|---|---|---|
| L1 | Tool-use loop | One LLM picks tools with structured args -> execute -> feed results back -> repeat until a final answer (bounded by max_iterations). |
| L2 | Tools return raw data; the orchestrator composes | Tools are pure data fetchers. The calling LLM synthesizes — which lets it combine results across multiple tools. |
| L3 | Bounded, classified reflection | On failure, classify the error (retryable / not) and retry only the retryable ones, up to a limit. No unbounded self-correction. |
| L4 | Right tool for the job | The library never forces the loop. It coexists with heavier orchestration for the paths that need it. |
pip install -e ".[openai]" # or ".[gemini]", or ".[openai,gemini]"The core is dependency-free. Provider SDKs are optional extras and load
lazily, so importing agent_loop never requires openai or google-genai.
pip install -e ".[openai]"
export OPENAI_API_KEY=... # never commit your key
python examples/toy_agent/run.py "What is 12 * 8, and how big is Osaka?"Switch providers with no code change:
pip install -e ".[gemini]"
export GEMINI_API_KEY=...
export AGENT_LOOP_PROVIDER=gemini
python examples/toy_agent/run.py "What is 12 * 8, and how big is Osaka?"from agent_loop import AgentLoop, BoundedReflection
agent = AgentLoop(
provider=provider,
tools=[...],
model="...",
reflection=BoundedReflection(max_retries=2), # retry only retryable errors
)When a tool fails, the policy retries retryable errors up to the limit, then feeds the error back to the LLM as raw data instead of crashing the loop.
The loop emits tool_start / tool_end / final events (SSE-friendly). The
default is zero-dependency; plug in any sink (a function) to stream progress or
record cost.
def on_event(event):
print(event.type, event.data)
agent = AgentLoop(provider=provider, tools=[...], model="...", events=on_event)- Not a full framework. Plan-and-execute is explicitly out of scope; that defeats the reason this loop exists.
- Loose coupling by default. No required integration with any other library; optional adapters only.
- Provider differences absorbed in one layer. Gemini calls it function calling, OpenAI calls it tool use — same idea, normalized behind one contract.
src/agent_loop/
loop.py # the tool-use loop + max_iterations
tools.py # @tool decorator / ToolSpec (tools return raw data)
reflection.py # bounded, classified retry policy
providers/ # gemini + openai (tool-calling), behind one contract
events.py # SSE-friendly events
examples/toy_agent/ # domain-free agent (calculator + lookup)
tests/
MIT