Skip to content

Siriapps/Cue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Cue - Intelligent Chrome Extension for Predictive Work Automation

Transform passive browsing into active collaboration. Turn conversations into action.

๐ŸŽฏ Overview

Modern professionals don't struggle with a lack of informationโ€”they struggle with what happens after information. Meetings end, videos finish, podcasts pause, and suddenly there's a familiar burden: What did we decide? What do I need to do? Where do I put this?

Cue flips that model. Instead of building another tool that records what happened, we asked a different question: What if the system did the follow-up thinking for you?

Cue is an intelligent Chrome extension that turns your browser from a passive viewing surface into an active collaborator. Whether you're in a Google Meet, watching a technical YouTube video, or listening to a podcast, Cue listens alongside you and then handles the work that normally comes after.

โœจ What It Does

Cue doesn't just capture contentโ€”it understands it, reasons through it, and turns it into action.

Core Capabilities

  • ๐ŸŽ™๏ธ Intelligent Audio Capture - Records audio from any tab (Google Meet, YouTube, podcasts, any web content)
  • ๐Ÿ“ Advanced Transcription - Leverages Gemini's Native Audio API for human-level understanding of tone, urgency, and speaker emotion
  • ๐Ÿง  Reasoning, Not Summaries - Uses chain-of-thought prompting to distinguish casual discussion from concrete decisions
  • โœ… Automated Task Extraction - Generates structured action items, decisions, and key points without manual input
  • ๐Ÿ“Š Sentiment Analysis - Detects emotional context and urgency to prioritize appropriately
  • ๐Ÿ“ง Google Workspace Integration - Automatically drafts emails, creates documents, and adds calendar events
  • ๐ŸŽฌ AI-Generated Explainer Videos - Creates short visual summaries using Veo 3 for quick context review
  • ๐Ÿ“š Searchable Library - Web dashboard to browse, search, and replay all recorded sessions

The Workflow

  • Start a session - The Halo Strip toolbar appears on any web page
  • Capture live - Tab audio is streamed and processed in real-time
  • Stop and process - Backend transcribes, analyzes, and extracts actionable insights
  • Take action - Results appear in your dashboard with direct integrations to Google apps
  • Never repeat work - All sessions are stored and searchable in your personal library

๐Ÿ—๏ธ Architecture

Cue employs a three-tier architecture designed for high-concurrency and web-scale context processing:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Chrome Extension      โ”‚
โ”‚  (TypeScript/React)     โ”‚
โ”‚  - Halo Strip UI        โ”‚
โ”‚  - Audio Capture        โ”‚
โ”‚  - Service Worker       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ”‚
            โ”‚ HTTP/WebSocket
            โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   FastAPI Backend       โ”‚
โ”‚   (Python - Port 8000)  โ”‚
โ”‚  - Audio Processing     โ”‚
โ”‚  - Gemini Integration   โ”‚
โ”‚  - Task Extraction      โ”‚
โ”‚  - Google Apps API      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚                โ”‚
โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Gemini โ”‚    โ”‚  MongoDB   โ”‚
โ”‚  API   โ”‚    โ”‚  Storage   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   React Dashboard       โ”‚
โ”‚   (Port 3001)           โ”‚
โ”‚  - Session Library      โ”‚
โ”‚  - Reels Feed           โ”‚
โ”‚  - Search & Replay      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow

  • Extension Layer - Captures tab audio via Chrome's tabCapture API and streams to backend
  • Processing Layer - FastAPI backend manages concurrent streams, handles Gemini API calls, and coordinates task extraction
  • Storage Layer - MongoDB stores sessions, transcripts, summaries, tasks, and generated media
  • Presentation Layer - React dashboard provides searchable interface with WebSocket progress updates

๐Ÿš€ Technology Stack

Frontend

  • Chrome Extension: TypeScript, React, Vite
  • Dashboard: React (port 3001)
  • UI Components: Custom Halo Strip toolbar, session recorder

Backend

  • API Framework: FastAPI (Python, port 8000)
  • AI Models:
    • Gemini Native Audio API for transcription
    • Chain-of-thought prompting for reasoning
    • Veo 3 for video generation
  • Storage: MongoDB (Atlas compatible)
  • Real-time: WebSocket for progress updates

Key Features

Native Multimodal Processing

By feeding raw audio directly into Gemini's Native Audio API, we achieve perception of tone, urgency, and speaker emotionโ€”leading to "human-level" understanding of intent that normal summaries miss.

Higher Audio Accuracy

Successfully leveraging Gemini's native audio capabilities provides significantly higher accuracy in technical jargon detection compared to standard Whisper-based implementations.

Context Preservation for Correct Reasoning

The system preserves enough context for the model to reason accurately, distinguishing casual commentary from concrete decisions through structured context feeding.

Actionable Across Google Applications

Cue doesn't stop at understanding conversationsโ€”it acts on them through integrations that:

  • Draft emails in Gmail
  • Create documents in Google Docs
  • Add events to Google Calendar
  • Turn insights directly into execution

๐ŸŽ“ What We Learned

Chrome Extension & Permissions

Building a Chrome extension that respects the browser's permission model while reliably accessing audio and context from active tabs required careful handling to avoid interruptions, blocked access, or repeated permission prompts.

Context is Critical for Reasoning

Feeding the model raw audio transcription alone wasn't enough. Supplying structured context from the session and surrounding discussion was essential to help Gemini differentiate casual commentary from actual decisions and action items.

Fully Automated Workflow

We successfully built a system where users can finish a session and immediately have a structured task list without clicking a single buttonโ€”the highest form of automation.

๐Ÿ”ฎ What's Next

Visual Context Enhancement

Once Cue analyzes sessions and generates structured tasks, we plan to leverage Nano Banana and Veo 3 to create short clips or key images that capture the most important moments of a meeting or video. This will allow users to:

  • See exactly what happened at critical moments
  • Understand decisions at a glance
  • Quickly grasp context without rereading anything

By combining task extraction with visual highlights, Cue will make follow-up actions faster, clearer, and more intuitive.

๐Ÿ“ฆ Installation & Setup

Prerequisites

  • Node.js (v18+)
  • Python (3.9+)
  • MongoDB (local or Atlas)
  • Chrome browser
  • Gemini API key

Quick Start

Clone the repository

git clone https://github.com/Siriapps/Cue.git
cd Cue

Install dependencies

npm install
cd server && pip install -r requirements.txt

Configure environment

cp .env.example .env
# Add your Gemini API key and MongoDB connection string

Build the extension

npm run build

Load extension in Chrome

  1. Navigate to chrome://extensions/
  2. Enable "Developer mode"
  3. Click "Load unpacked"
  4. Select the extension/build directory

Start the backend

cd server
python -m uvicorn main:app --reload --port 8000

Start the dashboard

cd cue
npm start

For detailed setup instructions, see SETUP.md.
For running instructions, see RUN.md.

๐Ÿ“š Documentation

  • SETUP.md - Detailed installation and configuration guide
  • RUN.md - Instructions for running all components
  • CLAUDE.md - Development notes and AI assistance context

๐ŸŽฏ Use Cases

For Professionals

  • Automatic meeting minutes with action items
  • Task extraction from product demos
  • Decision documentation from strategy sessions
  • Email drafts from client calls

For Learners

  • Note generation from lecture videos
  • Key concept extraction from tutorials
  • Study guides from educational podcasts
  • Visual summaries for quick review

For Researchers

  • Interview transcription and analysis
  • Insight extraction from presentations
  • Automated literature review notes
  • Sentiment tracking across discussions

๐Ÿ† Accomplishments

We're proud of:

  • โœ… Fully Automated Workflow - Users finish sessions with structured task lists without clicking a button
  • โœ… Higher Audio Accuracy - Native Gemini audio processing beats standard Whisper implementations
  • โœ… Actionable Integration - Direct execution inside Google apps turns insights into action
  • โœ… Browser Permissions Mastery - Reliable audio access while respecting Chrome's security model
  • โœ… Context-Aware Reasoning - Accurate task prioritization and decision detection through improved context handling

๐Ÿค Contributing

We welcome contributions! Please see our contributing guidelines for more information.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

Built with:

  • Gemini API for advanced AI capabilities
  • Veo 3 for video generation
  • FastAPI for high-performance backend
  • React for modern UI
  • MongoDB for flexible data storage

Made with โค๏ธ by the Cue Team

About

An intelligent chrome extension agent for predictive work automation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors