Skip to content

Vedant0527/Agentic-Code-Generation-Verification-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Mini Agentic Code Verifier

An AI-powered agent that generates Python solutions to coding problems and automatically verifies their correctness — without any human inspection.

Built as a hands-on exploration of ProjCode-003 themes: How do we build software with AI that we can actually trust?


🧠 What It Does

  1. Takes a coding problem as a natural language description
  2. Generates a Python solution using an LLM (Groq / LLaMA3-70B)
  3. Executes the code in a sandboxed Python environment
  4. Runs test cases and checks outputs against expected results
  5. Retries on failure — feeds error context back to the LLM for a smarter second attempt

This retry-with-feedback loop is what makes it agentic: the system detects its own failures and attempts self-correction.


📊 Sample Output

============================================================
PROBLEM: Fibonacci
Description: Return the nth Fibonacci number (0-indexed).
------------------------------------------------------------

🔄 Attempt 1/3 → Generating code...
Generated function:
def fibonacci(n: int) -> int:
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n + 1):
        a, b = b, a + b
    return b

📊 Result: 4/4 tests passed
   ✅ PASS | Input: (0,)  → Expected: 0  | Got: 0
   ✅ PASS | Input: (1,)  → Expected: 1  | Got: 1
   ✅ PASS | Input: (6,)  → Expected: 8  | Got: 8
   ✅ PASS | Input: (10,) → Expected: 55 | Got: 55

🎉 SUCCESS! VERIFIED after 1 attempt(s)

📈 OVERALL PERFORMANCE
Accuracy: 12/12 = 1.00

Result: 12/12 tests passed (100% accuracy) across 4 DSA problems on first attempt.


🏗️ Architecture

Input Problem
     │
     ▼
 LLM Agent (Groq / LLaMA3-70B)
     │  generates Python function
     ▼
 Sandbox Runner (exec)
     │  runs test cases
     ▼
 Verifier
     │  pass → done ✅
     │  fail → feed error back to LLM → retry 🔄
     ▼
 Report (pass/fail per test case)

📁 Project Structure

code-verifier/
├── main.py        # Agentic orchestrator — retry loop with error feedback
├── llm.py         # Groq API wrapper — prompt engineering for clean code output
├── runner.py      # Sandbox executor — runs generated code against test cases
├── problems.py    # DSA problem bank with test cases
└── .env           # GROQ_API_KEY=your_key_here

⚙️ Setup & Run

1. Clone the repo

git clone https://github.com/Vedant0527/code-verifier
cd code-verifier

2. Install dependencies

pip install groq python-dotenv

3. Get a free Groq API key

Sign up at console.groq.com — no credit card required.

echo "GROQ_API_KEY=your_key_here" > .env

4. Run

python main.py

🧪 Problems Included

Problem Tests Result
Two Sum 3 ✅ 3/3
Reverse String 2 ✅ 2/2
Fibonacci 4 ✅ 4/4
Longest Substring Without Repeating Characters 3 ✅ 3/3

🔬 Why This Matters

AI code generation is powerful but not trustworthy on its own. A model that writes an elegant solution to a problem can still produce one that silently fails edge cases. This project is a minimal prototype of a verification layer — a system that wraps AI generation with automated correctness checking.

This directly mirrors the central question of IITB Trust Lab's ProjCode-003:

"How do we build software with AI that we can actually trust?"

The answer this project proposes: don't trust the output, verify it.


🔧 Planned Improvements

  • Error-aware retry — feed exact failure reason back to LLM for smarter correction
  • Time-limit enforcement per test case (prevent infinite loops)
  • JSON logging of all attempts and results
  • CLI interface via argparse
  • Expand problem bank to LeetCode Easy/Medium set
  • Support C++ solution generation + compilation via subprocess

👤 Author

Vedant Shri Agarwal
B.Tech — Electrical and Computer Engineering, Thapar University
GitHub · LinkedIn · [email protected]

About

A self-correcting AI system that generates, executes, and validates Python code using LLMs through an automated test-and-retry loop.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages