Agent-driven Transcription, Learning, Analysis and Summarization
ATLAS is a Streamlit app that transcribes and analyzes lecture audio, video, or YouTube content using Groq Whisper and a multi-agent LLM pipeline. It produces a summary, structured overview, key points, and a quiz, then exports results to PDF, Word, or JSON with validation scores.
- Live microphone recording with start/stop and preview
- File uploads for audio/video (wav, mp3, m4a, mp4, avi, mov)
- YouTube URL transcription via yt-dlp
- Groq Whisper transcription with chunking for large files
- Multi-agent pipeline with validator feedback and scoring
- Summary, overview, key points, and quiz generation
- Export to PDF, Word, or JSON
- Session history and validation panels
- Artifacts stored per run (inputs and outputs)
ATLAS runs 8 agents in total: 4 worker agents that generate content and 4 validator agents that check quality.
Worker agents:
- Summarizer: writes the overall summary
- Overview: creates a structured, hierarchical outline
- Key Points: extracts definitions, facts, methods, and tips
- Quiz: generates MCQ and short-answer questions
Validator agents (one per worker):
- Summary Validator
- Overview Validator
- Key Points Validator
- Quiz Validator
How it works:
- The transcript is sent to a worker agent for a task.
- The paired validator scores the output (1-10) and provides feedback.
- If the score is below the threshold, the worker retries using that feedback.
- This loop repeats up to the configured retry limit, and the best output is kept.
- Final results include validation scores shown in the UI and exports.
🎤 Input Layer
├── Microphone (live start/stop)
├── Audio/Video Upload
└── YouTube URL
↓
🔊 Transcription (Groq Whisper)
- Chunking for large files
- Full transcript
↓
┌──────────────────────────────┐
│ 🤖 Agent Manager (8 agents) │
└──────────────────────────────┘
↓
Summarizer → Summary Validator → retry if below threshold
↓
Overview → Overview Validator → retry if below threshold
↓
Key Points → Key Points Validator → retry if below threshold
↓
Quiz → Quiz Validator → retry if below threshold
↓
📄 Export Layer
├── PDF
├── Word
└── JSON
↓
Artifacts stored in artifacts/<source>/<timestamp>/{inputs,outputs}
| Component | Technology |
|---|---|
| UI | Streamlit + custom CSS |
| Transcription | Groq Whisper (OpenAI-compatible) |
| LLM | Groq Llama 3.3 70B (OpenAI-compatible) |
| Audio | sounddevice, scipy, pydub |
| YouTube | yt-dlp |
| Export | fpdf, python-docx, json |
main.py
pipeline.py
config.py
core/
audio.py
transcription.py
llm.py
export.py
agents/
agent_base.py
pipeline.py
summarizer.py
overview.py
keypoints.py
quiz.py
validators.py
style.css
artifacts/
-
Clone the repo.
-
Install dependencies (Python 3.9+):
pip install -r requirements.txt
-
Create a local env file and set your key:
Windows:
copy example.env .env
macOS/Linux:
cp example.env .env
-
Edit .env and set
GROQ_API_KEYif you are using Groq. The pipeline is model-agnostic, so you can swap in another provider or even a local LLM. -
Run the app:
streamlit run main.py
Note: Do not commit .env. Use example.env for GitHub.
The app is model-agnostic. Groq is the default in this repo, but you can plug in any provider or a local LLM by adapting the client in core/llm.py.
- GROQ_API_KEY (required only if using Groq)
- GROQ_BASE_URL (optional, default https://api.groq.com/openai/v1)
- Choose input method: YouTube URL, file upload, or microphone recording.
- Process the input.
- Review summary, overview, key points, and quiz.
- Download results in the desired format.
Each run is stored under:
artifacts/<source>/<timestamp>/{inputs,outputs}
MIT License.