🤖 Mini Agentic Code Verifier

An AI-powered agent that generates Python solutions to coding problems and automatically verifies their correctness — without any human inspection.

Built as a hands-on exploration of ProjCode-003 themes: How do we build software with AI that we can actually trust?

🧠 What It Does

Takes a coding problem as a natural language description
Generates a Python solution using an LLM (Groq / LLaMA3-70B)
Executes the code in a sandboxed Python environment
Runs test cases and checks outputs against expected results
Retries on failure — feeds error context back to the LLM for a smarter second attempt

This retry-with-feedback loop is what makes it agentic: the system detects its own failures and attempts self-correction.

📊 Sample Output

============================================================
PROBLEM: Fibonacci
Description: Return the nth Fibonacci number (0-indexed).
------------------------------------------------------------

🔄 Attempt 1/3 → Generating code...
Generated function:
def fibonacci(n: int) -> int:
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n + 1):
        a, b = b, a + b
    return b

📊 Result: 4/4 tests passed
   ✅ PASS | Input: (0,)  → Expected: 0  | Got: 0
   ✅ PASS | Input: (1,)  → Expected: 1  | Got: 1
   ✅ PASS | Input: (6,)  → Expected: 8  | Got: 8
   ✅ PASS | Input: (10,) → Expected: 55 | Got: 55

🎉 SUCCESS! VERIFIED after 1 attempt(s)

📈 OVERALL PERFORMANCE
Accuracy: 12/12 = 1.00

Result: 12/12 tests passed (100% accuracy) across 4 DSA problems on first attempt.

🏗️ Architecture

Input Problem
     │
     ▼
 LLM Agent (Groq / LLaMA3-70B)
     │  generates Python function
     ▼
 Sandbox Runner (exec)
     │  runs test cases
     ▼
 Verifier
     │  pass → done ✅
     │  fail → feed error back to LLM → retry 🔄
     ▼
 Report (pass/fail per test case)

📁 Project Structure

code-verifier/
├── main.py        # Agentic orchestrator — retry loop with error feedback
├── llm.py         # Groq API wrapper — prompt engineering for clean code output
├── runner.py      # Sandbox executor — runs generated code against test cases
├── problems.py    # DSA problem bank with test cases
└── .env           # GROQ_API_KEY=your_key_here

⚙️ Setup & Run

1. Clone the repo

git clone https://github.com/Vedant0527/code-verifier
cd code-verifier

2. Install dependencies

pip install groq python-dotenv

3. Get a free Groq API key

Sign up at console.groq.com — no credit card required.

echo "GROQ_API_KEY=your_key_here" > .env

4. Run

python main.py

🧪 Problems Included

Problem	Tests	Result
Two Sum	3	✅ 3/3
Reverse String	2	✅ 2/2
Fibonacci	4	✅ 4/4
Longest Substring Without Repeating Characters	3	✅ 3/3

🔬 Why This Matters

AI code generation is powerful but not trustworthy on its own. A model that writes an elegant solution to a problem can still produce one that silently fails edge cases. This project is a minimal prototype of a verification layer — a system that wraps AI generation with automated correctness checking.

This directly mirrors the central question of IITB Trust Lab's ProjCode-003:

"How do we build software with AI that we can actually trust?"

The answer this project proposes: don't trust the output, verify it.

🔧 Planned Improvements

Error-aware retry — feed exact failure reason back to LLM for smarter correction
Time-limit enforcement per test case (prevent infinite loops)
JSON logging of all attempts and results
CLI interface via argparse
Expand problem bank to LeetCode Easy/Medium set
Support C++ solution generation + compilation via subprocess

👤 Author

Vedant Shri Agarwal
B.Tech — Electrical and Computer Engineering, Thapar University
GitHub · LinkedIn · [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
llm.py		llm.py
main.py		main.py
problems.py		problems.py
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Mini Agentic Code Verifier

🧠 What It Does

📊 Sample Output

🏗️ Architecture

📁 Project Structure

⚙️ Setup & Run

1. Clone the repo

2. Install dependencies

3. Get a free Groq API key

4. Run

🧪 Problems Included

🔬 Why This Matters

🔧 Planned Improvements

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Mini Agentic Code Verifier

🧠 What It Does

📊 Sample Output

🏗️ Architecture

📁 Project Structure

⚙️ Setup & Run

1. Clone the repo

2. Install dependencies

3. Get a free Groq API key

4. Run

🧪 Problems Included

🔬 Why This Matters

🔧 Planned Improvements

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages