Skip to content

Lark-Alfen/ATLAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A T L A S

Agent-driven Transcription, Learning, Analysis and Summarization

ATLAS is a Streamlit app that transcribes and analyzes lecture audio, video, or YouTube content using Groq Whisper and a multi-agent LLM pipeline. It produces a summary, structured overview, key points, and a quiz, then exports results to PDF, Word, or JSON with validation scores.

Features

  • Live microphone recording with start/stop and preview
  • File uploads for audio/video (wav, mp3, m4a, mp4, avi, mov)
  • YouTube URL transcription via yt-dlp
  • Groq Whisper transcription with chunking for large files
  • Multi-agent pipeline with validator feedback and scoring
  • Summary, overview, key points, and quiz generation
  • Export to PDF, Word, or JSON
  • Session history and validation panels
  • Artifacts stored per run (inputs and outputs)

Agent Pipeline (8 Agents)

ATLAS runs 8 agents in total: 4 worker agents that generate content and 4 validator agents that check quality.

Worker agents:

  • Summarizer: writes the overall summary
  • Overview: creates a structured, hierarchical outline
  • Key Points: extracts definitions, facts, methods, and tips
  • Quiz: generates MCQ and short-answer questions

Validator agents (one per worker):

  • Summary Validator
  • Overview Validator
  • Key Points Validator
  • Quiz Validator

How it works:

  1. The transcript is sent to a worker agent for a task.
  2. The paired validator scores the output (1-10) and provides feedback.
  3. If the score is below the threshold, the worker retries using that feedback.
  4. This loop repeats up to the configured retry limit, and the best output is kept.
  5. Final results include validation scores shown in the UI and exports.

Architecture

🎤 Input Layer
├── Microphone (live start/stop)
├── Audio/Video Upload
└── YouTube URL
  ↓
🔊 Transcription (Groq Whisper)
- Chunking for large files
- Full transcript
  ↓
┌──────────────────────────────┐
│ 🤖 Agent Manager (8 agents)  │
└──────────────────────────────┘
  ↓
Summarizer → Summary Validator → retry if below threshold
  ↓
Overview → Overview Validator → retry if below threshold
  ↓
Key Points → Key Points Validator → retry if below threshold
  ↓
Quiz → Quiz Validator → retry if below threshold
  ↓
📄 Export Layer
├── PDF
├── Word
└── JSON
  ↓
Artifacts stored in artifacts/<source>/<timestamp>/{inputs,outputs}

Tech Stack

Component Technology
UI Streamlit + custom CSS
Transcription Groq Whisper (OpenAI-compatible)
LLM Groq Llama 3.3 70B (OpenAI-compatible)
Audio sounddevice, scipy, pydub
YouTube yt-dlp
Export fpdf, python-docx, json

Project Structure

main.py
pipeline.py
config.py
core/
  audio.py
  transcription.py
  llm.py
  export.py
agents/
  agent_base.py
  pipeline.py
  summarizer.py
  overview.py
  keypoints.py
  quiz.py
  validators.py
style.css
artifacts/

Setup

  1. Clone the repo.

  2. Install dependencies (Python 3.9+):

    pip install -r requirements.txt
  3. Create a local env file and set your key:

    Windows:

    copy example.env .env

    macOS/Linux:

    cp example.env .env
  4. Edit .env and set GROQ_API_KEY if you are using Groq. The pipeline is model-agnostic, so you can swap in another provider or even a local LLM.

  5. Run the app:

    streamlit run main.py

Note: Do not commit .env. Use example.env for GitHub.

Environment Variables

The app is model-agnostic. Groq is the default in this repo, but you can plug in any provider or a local LLM by adapting the client in core/llm.py.

Usage

  1. Choose input method: YouTube URL, file upload, or microphone recording.
  2. Process the input.
  3. Review summary, overview, key points, and quiz.
  4. Download results in the desired format.

Output Location

Each run is stored under:

artifacts/<source>/<timestamp>/{inputs,outputs}

License

MIT License.

About

ATLAS (Agent‑driven Transcription, Learning, Analysis and Summarization) is a multi‑agent lecture assistant that transcribes audio/video/YouTube with Groq Whisper and generates validated summaries, notes, and quizzes with exportable outputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors