Transform passive browsing into active collaboration. Turn conversations into action.
Modern professionals don't struggle with a lack of informationโthey struggle with what happens after information. Meetings end, videos finish, podcasts pause, and suddenly there's a familiar burden: What did we decide? What do I need to do? Where do I put this?
Cue flips that model. Instead of building another tool that records what happened, we asked a different question: What if the system did the follow-up thinking for you?
Cue is an intelligent Chrome extension that turns your browser from a passive viewing surface into an active collaborator. Whether you're in a Google Meet, watching a technical YouTube video, or listening to a podcast, Cue listens alongside you and then handles the work that normally comes after.
Cue doesn't just capture contentโit understands it, reasons through it, and turns it into action.
- ๐๏ธ Intelligent Audio Capture - Records audio from any tab (Google Meet, YouTube, podcasts, any web content)
- ๐ Advanced Transcription - Leverages Gemini's Native Audio API for human-level understanding of tone, urgency, and speaker emotion
- ๐ง Reasoning, Not Summaries - Uses chain-of-thought prompting to distinguish casual discussion from concrete decisions
- โ Automated Task Extraction - Generates structured action items, decisions, and key points without manual input
- ๐ Sentiment Analysis - Detects emotional context and urgency to prioritize appropriately
- ๐ง Google Workspace Integration - Automatically drafts emails, creates documents, and adds calendar events
- ๐ฌ AI-Generated Explainer Videos - Creates short visual summaries using Veo 3 for quick context review
- ๐ Searchable Library - Web dashboard to browse, search, and replay all recorded sessions
- Start a session - The Halo Strip toolbar appears on any web page
- Capture live - Tab audio is streamed and processed in real-time
- Stop and process - Backend transcribes, analyzes, and extracts actionable insights
- Take action - Results appear in your dashboard with direct integrations to Google apps
- Never repeat work - All sessions are stored and searchable in your personal library
Cue employs a three-tier architecture designed for high-concurrency and web-scale context processing:
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Chrome Extension โ
โ (TypeScript/React) โ
โ - Halo Strip UI โ
โ - Audio Capture โ
โ - Service Worker โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ
โ HTTP/WebSocket
โ
โโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โ FastAPI Backend โ
โ (Python - Port 8000) โ
โ - Audio Processing โ
โ - Gemini Integration โ
โ - Task Extraction โ
โ - Google Apps API โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ
โโโโโโโโโดโโโโโโโโโ
โ โ
โโโโโผโโโโโ โโโโโโโผโโโโโโโ
โ Gemini โ โ MongoDB โ
โ API โ โ Storage โ
โโโโโโโโโโ โโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ React Dashboard โ
โ (Port 3001) โ
โ - Session Library โ
โ - Reels Feed โ
โ - Search & Replay โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Extension Layer - Captures tab audio via Chrome's tabCapture API and streams to backend
- Processing Layer - FastAPI backend manages concurrent streams, handles Gemini API calls, and coordinates task extraction
- Storage Layer - MongoDB stores sessions, transcripts, summaries, tasks, and generated media
- Presentation Layer - React dashboard provides searchable interface with WebSocket progress updates
- Chrome Extension: TypeScript, React, Vite
- Dashboard: React (port 3001)
- UI Components: Custom Halo Strip toolbar, session recorder
- API Framework: FastAPI (Python, port 8000)
- AI Models:
- Gemini Native Audio API for transcription
- Chain-of-thought prompting for reasoning
- Veo 3 for video generation
- Storage: MongoDB (Atlas compatible)
- Real-time: WebSocket for progress updates
By feeding raw audio directly into Gemini's Native Audio API, we achieve perception of tone, urgency, and speaker emotionโleading to "human-level" understanding of intent that normal summaries miss.
Successfully leveraging Gemini's native audio capabilities provides significantly higher accuracy in technical jargon detection compared to standard Whisper-based implementations.
The system preserves enough context for the model to reason accurately, distinguishing casual commentary from concrete decisions through structured context feeding.
Cue doesn't stop at understanding conversationsโit acts on them through integrations that:
- Draft emails in Gmail
- Create documents in Google Docs
- Add events to Google Calendar
- Turn insights directly into execution
Building a Chrome extension that respects the browser's permission model while reliably accessing audio and context from active tabs required careful handling to avoid interruptions, blocked access, or repeated permission prompts.
Feeding the model raw audio transcription alone wasn't enough. Supplying structured context from the session and surrounding discussion was essential to help Gemini differentiate casual commentary from actual decisions and action items.
We successfully built a system where users can finish a session and immediately have a structured task list without clicking a single buttonโthe highest form of automation.
Once Cue analyzes sessions and generates structured tasks, we plan to leverage Nano Banana and Veo 3 to create short clips or key images that capture the most important moments of a meeting or video. This will allow users to:
- See exactly what happened at critical moments
- Understand decisions at a glance
- Quickly grasp context without rereading anything
By combining task extraction with visual highlights, Cue will make follow-up actions faster, clearer, and more intuitive.
- Node.js (v18+)
- Python (3.9+)
- MongoDB (local or Atlas)
- Chrome browser
- Gemini API key
Clone the repository
git clone https://github.com/Siriapps/Cue.git
cd CueInstall dependencies
npm install
cd server && pip install -r requirements.txtConfigure environment
cp .env.example .env
# Add your Gemini API key and MongoDB connection stringBuild the extension
npm run buildLoad extension in Chrome
- Navigate to chrome://extensions/
- Enable "Developer mode"
- Click "Load unpacked"
- Select the extension/build directory
Start the backend
cd server
python -m uvicorn main:app --reload --port 8000Start the dashboard
cd cue
npm startFor detailed setup instructions, see SETUP.md.
For running instructions, see RUN.md.
- SETUP.md - Detailed installation and configuration guide
- RUN.md - Instructions for running all components
- CLAUDE.md - Development notes and AI assistance context
- Automatic meeting minutes with action items
- Task extraction from product demos
- Decision documentation from strategy sessions
- Email drafts from client calls
- Note generation from lecture videos
- Key concept extraction from tutorials
- Study guides from educational podcasts
- Visual summaries for quick review
- Interview transcription and analysis
- Insight extraction from presentations
- Automated literature review notes
- Sentiment tracking across discussions
We're proud of:
- โ Fully Automated Workflow - Users finish sessions with structured task lists without clicking a button
- โ Higher Audio Accuracy - Native Gemini audio processing beats standard Whisper implementations
- โ Actionable Integration - Direct execution inside Google apps turns insights into action
- โ Browser Permissions Mastery - Reliable audio access while respecting Chrome's security model
- โ Context-Aware Reasoning - Accurate task prioritization and decision detection through improved context handling
We welcome contributions! Please see our contributing guidelines for more information.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with:
- Gemini API for advanced AI capabilities
- Veo 3 for video generation
- FastAPI for high-performance backend
- React for modern UI
- MongoDB for flexible data storage
Made with โค๏ธ by the Cue Team