Open-source tools for AI agent reliability - debug, evaluate, and decide what's worth building.
Most AI agents and most product plans share the same failure mode: they operate in a frictionless vacuum. They assume perfect state, perfect users, perfect coordination. Reality isn't like that. We build tools that make the friction visible - before you ship, and after something breaks.
Time-travel debugger for AI agents. Fork at any step, replay from failure, prove the fix works. Works alongside Langfuse / LangSmith or standalone.
Physics-of-reality decision framework for AI coding assistants. Produces Build / Pivot / Kill verdicts backed by evidence, extended from Jensen Huang's framing of Physical AI.
Claude Code plugin for observing agent sessions in real time.
Two questions drive everything we build:
- When an AI agent fails, can you see what actually happened? Most can't. That's what Rewind solves.
- When you're about to commit engineering time, does the plan obey physics? Most don't ask. That's what Jensen Way surfaces.
Both problems come from the same place - abstractions that hide reality until it breaks.
- Star a repo that looks useful
- Open an issue with a real-world run (build or kill)
- PRs welcome on all projects