Task Execution AI Agent

A small CLI agent that takes a real-world request ("book me a dentist next week after 5pm", "find 3 coworking spaces in Warsaw under $20/day") and tries to finish it: planning subtasks, asking clarifying questions, calling tools, and producing a clear final summary.

Take-home assignment for the Junior AI Agentic Engineer role (uv edition).

Architecture

agent-assignment/
├── pyproject.toml         # uv-managed dependencies
├── uv.lock                # locked versions (generated by `uv sync`)
├── .python-version        # 3.11
├── .env.example           # copy to .env and fill in your key
├── main.py                # CLI entry point (rich UI)
└── src/
    ├── __init__.py
    ├── llm.py             # thin OpenAI wrapper
    ├── prompts.py         # system prompt
    ├── tools.py           # mock tools + JSON schemas
    └── agent.py           # tool-calling loop

The agent follows a standard tool-calling loop:

Send conversation + tool schemas to the LLM.
If the model returns a tool call, run it (or, for ask_user, prompt the human) and append the result to the conversation.
Repeat until the model returns a plain message — that's the final answer.
Bail after AGENT_MAX_STEPS iterations (default 10) so a confused agent can't loop forever.

Tools

Tool	Purpose
`calendar_check(start_date, end_date)`	List free slots in a date range
`search_service(query, category?, city?, max_price?)`	Search a small mock catalogue (dentists, coworking, hotels, transport)
`booking_service(option_id, when?, notes?)`	Book a previously-found option (10% simulated transient failure)
`reminder_create(title, when, notes?)`	Create a reminder
`ask_user(question)`	Pseudo-tool — pauses and asks the user for clarification

All tools return {"ok": True, ...} or {"ok": False, "error": "..."} so the agent can read errors and adapt instead of crashing.

Setup

Requires Python 3.10+ and uv.

# 1. Install deps and create a virtualenv
uv sync

# 2. Configure your provider (see .env.example for options)
cp .env.example .env
# edit .env — fill in OPENAI_API_KEY (and OPENAI_BASE_URL if not OpenAI)

# 3. Run the agent
uv run python main.py                           # interactive
uv run python main.py "Find 3 coworking spaces in Warsaw under $20/day"

Provider

The agent talks to any OpenAI-compatible chat-completions endpoint. Defaults in .env.example point at Groq (free, fast) with meta-llama/llama-4-scout-17b-16e-instruct — chosen because it produces well-formed tool calls more reliably than older Llamas. llama-3.3-70b also works but occasionally emits malformed function-name strings; the agent loop catches that and retries with a corrective hint.

To use OpenAI, OpenRouter, Together, etc., just swap OPENAI_BASE_URL, OPENAI_API_KEY, and OPENAI_MODEL.

Environment variables

Var	Default	Notes
`OPENAI_API_KEY`	—	Required. Provider key (`sk-`, `gsk_`, etc.)
`OPENAI_BASE_URL`	OpenAI's	Override for Groq / OpenRouter / etc.
`OPENAI_MODEL`	`gpt-4o-mini`	Any model the provider serves with tool calling
`AGENT_MAX_STEPS`	`10`	Hard cap on tool-calling iterations

Demo

Each screenshot is a real run against the Groq backend (llama-3.3-70b-versatile).

1. Filtered search — single tool call

"Find 3 coworking spaces in Warsaw under $20/day"

The agent calls search_service once with the right category, city, and max_price filters and produces the final summary. No clarifications needed.

2. Clarifying questions + self-correction after a tool error

"Book me a dentist appointment next week after 5pm"

This run shows everything the assignment is grading on:

Three clarifying questions via ask_user (city, date, time choice between the 17:00 and 18:00 free slots).
A real recovery from a model mistake. The agent's first booking_service call passed a hallucinated option_id: "12345". The tool rejected it (Unknown option_id '12345'. Call search_service first to get a valid id.). The agent read the error, called search_service again to get the real id (dent-003), and then booked successfully — confirmation BK-63254422.

That's exactly the failure-handling path the system prompt is designed to push the model into.

3. Multi-tool: schedule + reminder, and a budgeted trip plan

"Schedule a meeting with John next Tuesday afternoon and remind me 30 minutes before" "Plan a 2-day trip to Prague under 300 euro"

calendar_check → ask_user → reminder_create for the meeting; then a chain of search_service (hotel, transport) → ask_user (dates) → booking_service × 2 → calendar_check for the trip — all under the €300 cap.

Design notes

Why a pseudo-tool for clarifications? It keeps the agent loop uniform — every model output is either a final message or one or more tool calls. No special-case parsing.
Why mock data over real APIs? Reproducibility and zero credentials. Swapping search_service to call a real provider is a one-function change; the agent code doesn't move.
Why a 10% booking failure? To force the failure-handling path during evaluation — you'll occasionally see the agent retry once and recover.
Why limit max steps? A misconfigured prompt or a model that loves re-checking the calendar can spiral. A hard cap is the cheapest safety net.

Trade-offs / things I'd do next

Add an integration test with a recorded LLM response (vcrpy / pytest).
Stream tokens for nicer UX on long final answers.
Persist _REMINDERS / _BOOKINGS to disk so a follow-up command (list my bookings) can see them across runs.
Replace the keyword-match search with a tiny vector store once the catalogue grows past ~50 items.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task Execution AI Agent

Architecture

Tools

Setup

Provider

Environment variables

Demo

1. Filtered search — single tool call

2. Clarifying questions + self-correction after a tool error

3. Multi-tool: schedule + reminder, and a budgeted trip plan

Design notes

Trade-offs / things I'd do next

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs/screenshots		docs/screenshots
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Task Execution AI Agent

Architecture

Tools

Setup

Provider

Environment variables

Demo

1. Filtered search — single tool call

2. Clarifying questions + self-correction after a tool error

3. Multi-tool: schedule + reminder, and a budgeted trip plan

Design notes

Trade-offs / things I'd do next

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages