Skip to content

codcreater1/agent-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Task Execution AI Agent

A small CLI agent that takes a real-world request ("book me a dentist next week after 5pm", "find 3 coworking spaces in Warsaw under $20/day") and tries to finish it: planning subtasks, asking clarifying questions, calling tools, and producing a clear final summary.

Take-home assignment for the Junior AI Agentic Engineer role (uv edition).

Architecture

agent-assignment/
├── pyproject.toml         # uv-managed dependencies
├── uv.lock                # locked versions (generated by `uv sync`)
├── .python-version        # 3.11
├── .env.example           # copy to .env and fill in your key
├── main.py                # CLI entry point (rich UI)
└── src/
    ├── __init__.py
    ├── llm.py             # thin OpenAI wrapper
    ├── prompts.py         # system prompt
    ├── tools.py           # mock tools + JSON schemas
    └── agent.py           # tool-calling loop

The agent follows a standard tool-calling loop:

  1. Send conversation + tool schemas to the LLM.
  2. If the model returns a tool call, run it (or, for ask_user, prompt the human) and append the result to the conversation.
  3. Repeat until the model returns a plain message — that's the final answer.
  4. Bail after AGENT_MAX_STEPS iterations (default 10) so a confused agent can't loop forever.

Tools

Tool Purpose
calendar_check(start_date, end_date) List free slots in a date range
search_service(query, category?, city?, max_price?) Search a small mock catalogue (dentists, coworking, hotels, transport)
booking_service(option_id, when?, notes?) Book a previously-found option (10% simulated transient failure)
reminder_create(title, when, notes?) Create a reminder
ask_user(question) Pseudo-tool — pauses and asks the user for clarification

All tools return {"ok": True, ...} or {"ok": False, "error": "..."} so the agent can read errors and adapt instead of crashing.

Setup

Requires Python 3.10+ and uv.

# 1. Install deps and create a virtualenv
uv sync

# 2. Configure your provider (see .env.example for options)
cp .env.example .env
# edit .env — fill in OPENAI_API_KEY (and OPENAI_BASE_URL if not OpenAI)

# 3. Run the agent
uv run python main.py                           # interactive
uv run python main.py "Find 3 coworking spaces in Warsaw under $20/day"

Provider

The agent talks to any OpenAI-compatible chat-completions endpoint. Defaults in .env.example point at Groq (free, fast) with meta-llama/llama-4-scout-17b-16e-instruct — chosen because it produces well-formed tool calls more reliably than older Llamas. llama-3.3-70b also works but occasionally emits malformed function-name strings; the agent loop catches that and retries with a corrective hint.

To use OpenAI, OpenRouter, Together, etc., just swap OPENAI_BASE_URL, OPENAI_API_KEY, and OPENAI_MODEL.

Environment variables

Var Default Notes
OPENAI_API_KEY Required. Provider key (sk-, gsk_, etc.)
OPENAI_BASE_URL OpenAI's Override for Groq / OpenRouter / etc.
OPENAI_MODEL gpt-4o-mini Any model the provider serves with tool calling
AGENT_MAX_STEPS 10 Hard cap on tool-calling iterations

Demo

Each screenshot is a real run against the Groq backend (llama-3.3-70b-versatile).

1. Filtered search — single tool call

"Find 3 coworking spaces in Warsaw under $20/day"

The agent calls search_service once with the right category, city, and max_price filters and produces the final summary. No clarifications needed.

Coworking search

2. Clarifying questions + self-correction after a tool error

"Book me a dentist appointment next week after 5pm"

This run shows everything the assignment is grading on:

  • Three clarifying questions via ask_user (city, date, time choice between the 17:00 and 18:00 free slots).
  • A real recovery from a model mistake. The agent's first booking_service call passed a hallucinated option_id: "12345". The tool rejected it (Unknown option_id '12345'. Call search_service first to get a valid id.). The agent read the error, called search_service again to get the real id (dent-003), and then booked successfully — confirmation BK-63254422.

That's exactly the failure-handling path the system prompt is designed to push the model into.

Dentist with clarifications and recovery

3. Multi-tool: schedule + reminder, and a budgeted trip plan

"Schedule a meeting with John next Tuesday afternoon and remind me 30 minutes before" "Plan a 2-day trip to Prague under 300 euro"

calendar_checkask_userreminder_create for the meeting; then a chain of search_service (hotel, transport) → ask_user (dates) → booking_service × 2 → calendar_check for the trip — all under the €300 cap.

Meeting reminder and Prague trip Trip booking and Brain Embassy

Design notes

  • Why a pseudo-tool for clarifications? It keeps the agent loop uniform — every model output is either a final message or one or more tool calls. No special-case parsing.
  • Why mock data over real APIs? Reproducibility and zero credentials. Swapping search_service to call a real provider is a one-function change; the agent code doesn't move.
  • Why a 10% booking failure? To force the failure-handling path during evaluation — you'll occasionally see the agent retry once and recover.
  • Why limit max steps? A misconfigured prompt or a model that loves re-checking the calendar can spiral. A hard cap is the cheapest safety net.

Trade-offs / things I'd do next

  • Add an integration test with a recorded LLM response (vcrpy / pytest).
  • Stream tokens for nicer UX on long final answers.
  • Persist _REMINDERS / _BOOKINGS to disk so a follow-up command (list my bookings) can see them across runs.
  • Replace the keyword-match search with a tiny vector store once the catalogue grows past ~50 items.

About

Task Execution AI Agent — plans subtasks, asks clarifying questions, calls tools, recovers from errors. Python + uv + OpenAI-compatible (Groq).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages