Describe an agent in plain English → get a model recommendation → pick one → run a production-style agentic RAG workflow with the whole system architecture visible. Built with LangGraph, LangChain, hybrid retrieval (BM25 + vector), an exact-match cache, and LangSmith tracing, in a Streamlit UI.
The agentic workflow reimplements the decision-node pattern from the "arXiv Paper Curator" production RAG course (jamwithai), in original code: guardrail → retrieve → grade documents → rewrite query → generate, with adaptive retrieval (rewrite-and-retry) when results come up short.
pip install -r requirements.txt && streamlit run app.pyOpen the printed URL (usually http://localhost:8501). Add one model API key in the sidebar (OpenAI, Anthropic, or Groq). Embeddings default to a small local model — no key needed.
START → guardrail ─(on-topic?)→ retrieve → grade ─(relevant?)→ generate → END
│ no │ no & attempts left
└────────→ generate └────────→ rewrite → retrieve (loop)
- guardrail — LLM checks if the question is inside the agent's domain; off-topic questions are refused instead of hallucinated.
- retrieve — hybrid search: BM25 (keyword) + Chroma (vector), fused with reciprocal rank fusion via
EnsembleRetriever. - grade — LLM keeps only the documents it judges relevant (structured output, with a graceful fallback for models that don't support it).
- rewrite — when graded results are empty, the query is rephrased and retrieval retries (up to 2 times).
- generate — answers from graded context only, or says it doesn't know.
Solid components run here; the rest are the heavier production pieces you'd swap in. The architecture diagram in the app shows both.
| Concern | This app (runnable) | Course / full production |
|---|---|---|
| UI | Streamlit | Gradio + Telegram bot |
| Orchestration | LangGraph agentic RAG | same |
| Hybrid search | BM25 + Chroma + RRF | OpenSearch (BM25 + vector) |
| Vector store | Chroma | OpenSearch |
| LLM | OpenAI / Anthropic / Groq | Ollama (local) |
| Embeddings | local HF or OpenAI | Jina AI |
| Cache | in-memory exact-match | Redis |
| Observability | LangSmith | Langfuse |
| Ingestion | pasted KB text | Airflow + PostgreSQL + arXiv API |
agent-builder/
├── app.py # Streamlit UI and run loop
├── builder/
│ ├── recommender.py # model registry + keyword recommender
│ ├── providers.py # LLM + embedding factories
│ ├── retrieval.py # hybrid retriever (BM25 + vector) + cache
│ ├── agentic_rag.py # LangGraph workflow (guardrail/grade/rewrite/...)
│ └── diagrams.py # system-architecture Mermaid
└── requirements.txt
- The agentic logic is reimplemented from the course's described architecture,
not copied from its source. Edit the model registry in
recommender.pyto match the models you have access to. - To make this defensible as your own: be able to explain why each node exists,
what
after_gradedecides, and the BM25/vector trade-off. That understanding is what survives an interview — not the code sitting in a file.