LocalVoxScribe

An asynchronous, microservice-based application for private, offline multimedia analysis.

Quick Start · System Architecture · Low-Resource Optimization · Data Contracts

Turn your local terminal into an autonomous multimedia intelligence hub. Drop long-form video, audio recordings, or text-heavy transcripts into your workspace—your local pipeline handles demultiplexing, speech-to-text, diarization, structural aggregation, and abstract summary processing entirely on your host machine.

The Crucial Distinction: This is not a wrapper relying on external cloud APIs or high-end enterprise hardware. This system orchestrates open-source models within tight infrastructure boundaries, maintaining complete data isolation and minimal external telemetry.

🛠️ Ecosystem & Token Reference

LocalVoxScribe is built on a distributed network of open-source frameworks. It relies on the following core environments to handle localized pipelines.

Component / Service	Resource Gateway	Setup & Credentials
Python 3.10+	Python.org Documentation	Execution Runtime
RabbitMQ	RabbitMQ Tutorials	Local Broker Container
Ollama	Ollama Model Library	Pulls `qwen2.5:1.5b`
Docker	Docker Specification Engine	Containerization Runtime
Hugging Face	Hugging Face Portal	Requires User Access Token
Telegram Bot API	Telegram BotFather Core	Requires HTTP Bot Token

🔑 Token Retrieval Guide

To initialize the private offline workspace containers successfully, you must capture authorization tokens from the following endpoints and append them to your .env configuration file:

1. PyAnnote Speaker Diarization Token (`HUGGINGFACE_TOKEN`)

Because PyAnnote Audio 3.1 requires explicit acceptance of its open-source license agreements, you must link a Hugging Face user profile:

Create an account or log in to the Hugging Face Portal.
Accept the terms of service on the weight model repositories: pyannote/segmentation-3.0 and pyannote/speaker-diarization-3.1.
Navigate to Settings > Access Tokens or jump straight to huggingface.co/settings/tokens.
Click New Token, set the permission level to Read, and paste the generated string into your .env file under HUGGINGFACE_TOKEN.

2. Telegram Bot Token (`TELEGRAM_BOT_TOKEN`)

To establish remote interface communication lanes over the Aiogram 3 system:

Open your Telegram client app and search for the verified account @BotFather.
Issue the initialization command: /newbot.
Follow the guided interactive responses to assign a specific display Name and alphanumeric Username for your app.
Copy the unique HTTP API token response hash provided by BotFather into your .env file under TELEGRAM_BOT_TOKEN.

🚀 Key Features

Complete Air-Gapped Isolation: Zero third-party web requests. All source assets, database indexes, and model weights reside strictly inside your local host context.
Decoupled Asynchronous Processing: Powered by a RabbitMQ transaction layer to separate UI event threads from long-running, compute-heavy deep learning workloads.
Unified Speech-to-Text Pipeline: Combines Faster-Whisper parsing with PyAnnote Audio 3.1 multi-speaker cluster analysis to isolate distinct speaker timelines.
Deeply Budgeted Inference Matrix: Configured for extreme constraint execution (≤ 6 GB RAM footprint across the entire container stack) running exclusively over CPU runtimes.
Dynamic Analytics Engines: Driven by an SQLite3 backend, empowering operators to map tailored user instructions directly into custom template outputs.

🏗️ System Architecture

The layout maps complex data processing into container-isolated, decoupled microservices. Instead of blocking interactive layers during operations, components pass lightweight JSON definitions securely across distinct RabbitMQ brokers.

  ┌────────────────┐      ┌─────────────┐
  │   Desktop UI   │      │ Telegram Bot│
  └───────┬────────┘      └──────┬──────┘
          │                      │
          └───────────┬──────────┘
                      ▼
             ┌─────────────────┐
             │    RabbitMQ     │ (Message Broker)
             └────────┬────────┘
                      │
     ┌────────────────┼────────────────┐
     ▼                ▼                ▼
┌───────────┐  ┌────────────┐   ┌────────────┐   ┌────────────┐
│   Media   │  │ Speech-to- │   │ Summarizer │   │ Local LLM  │
│ Processor │  │ Text (STT) │   │  Service   │   │  (Ollama)  │
└─────┬─────┘  └─────┬──────┘   └─────┬──────┘   └─────┬──────┘
      │              │                │                │
      └──────────────┴───────┬────────┴────────────────┘
                             ▼
                    ┌─────────────────┐
                    │  Shared Volume  │ (SQLite3 DB & Media Storage)
                    └─────────────────┘

Microservice Queue Mapping

Queue Name	Source Component	Target Worker Component	Responsibility
`media_tasks`	Desktop UI / Telegram Bot	Media Processor	Extracts and resamples incoming streams to 16kHz mono PCM WAV via FFmpeg.
`transcription_tasks`	Media Processor	Speech-to-Text	Evaluates automated speech recognition and speaker diarization timelines.
`summarization_tasks`	Speech-to-Text	Summarizer	Composes structured text layouts alongside user instruction prompt vectors.
`ui_results` / `bot_results`	Summarizer	Desktop UI / Telegram Bot	Returns completed structured analytical summaries back to the interface layers.

📂 Repository Structure

.
├── .env.example                 # Environment parameters deployment template
├── docker-compose.yml           # Core multi-container system deployment layout
├── storage/                     # Persistent local storage directories
│   ├── db/                      # Houses central app_data.db (SQLite3)
│   ├── model_cache/             # Shared local cache directory for model checkpoints
│   └── raw_media/               # Shared transactional processing volume context
└── services/
    ├── desktop-ui/              # CustomTkinter client environment (app.py)
    ├── telegram-bot/            # Fully independent Aiogram 3 gateway application
    ├── media-processor/         # FFmpeg automated data ingest workflows
    ├── speech-to-text/          # Faster-Whisper and PyAnnote model layers
    ├── summarizer/              # LangChain execution logic and runtime routing
    └── shared/                  # Common internal library modules
        ├── db_manager.py        # Centralized database management interfaces
        ├── rabbitmq_utils.py    # Resilient message broker link wrappers
        └── schema.py            # Data contract structures (MediaTask, etc.)

🛠️ Tech Stack

Core Runtime environment: Python 3.10+
Message Orchestration: RabbitMQ (pika)
Neural Networks & Core ML: LangChain, Pydantic, Faster-Whisper, PyAnnote Audio, PyTorch
Data Storage Engines: SQLite3
Interface Systems: CustomTkinter (Desktop UI), Aiogram 3 (Telegram API Interface)
Infrastructure Layer: Docker, Docker Compose, FFmpeg

⚡ Quick Start

Prerequisites

Operating System: Linux (Ubuntu/Debian recommended) or Windows 10/11 running Docker Desktop.
Hardware Matrix: Minimum 4-Core CPU containing AVX2 instruction architectures, alongside 8 GB system RAM (with 6 GB allocated strictly for the container cluster environment).

Here is the updated Installation & Run sequence, including guidance on how to swap out the default Ollama model for a larger or different alternative.

Installation & Run

Clone the repository:

git clone https://github.com/PhillMckinnon/LocalVoxScribe.git
cd LocalVoxScribe


2. **Establish your local environmental attributes configuration:**
   ```bash
   cp .env.example .env

Open your newly generated .env file and configure your credentials, including your Hugging Face token and Telegram Bot configurations.

Initialize Ollama and pull the language model: Before spinning up the entire microservice cluster, bring up the local inference framework container individually to pull the required 4-bit quantized model layer:

# Spin up the Ollama background engine service
docker compose up -d ollama

# Stream the model download directly into your local container context
docker compose exec ollama ollama pull qwen2.5:1.5b-instruct-q4_K_M


4. **Deploy the remaining microservice cluster environment:**
   Once the model download completes successfully, bring up the rest of the application ecosystem (RabbitMQ, Speech-to-Text pipeline, UI/Bot gateways, and Media Processors):
   ```bash
   docker compose up --build -d

Note: On your initial execution loop, allow a few additional minutes for Docker to download baseline runtime layers and fetch the PyAnnote speaker diarization weights into your local model cache directory.

🧠 Swapping to a Different Language Model (Optional)

If your local hardware has more than 8 GB of RAM and you want to use a more powerful model (such as llama3.2:3b or a larger qwen2.5 variant), you can easily change the target model string in two places:

Step 1: Pull your preferred model variant Tell the Ollama container to fetch your target model from the Ollama registry:

docker compose exec ollama ollama pull <your-desired-model-string>

Step 2: Update the Summarizer runtime configuration Open the worker file located at services/summarizer/src/processor.py (or your local equivalent source path) and modify the model declaration parameter to match the exact string you just pulled:

# services/summarizer/src/processor.py

# Locate the Ollama initialization block and update the model parameter:
self.llm = Ollama(
    base_url=f"http://{OLLAMA_HOST}:11434",
    model="your-desired-model-string"  # Change this from qwen2.5:1.5b-instruct-q4_K_M
)

Once edited, restart the summarizer worker to apply your changes: docker compose up -d --build summarizer-service.

🌐 Handling Large Files via Local Telegram API Server (Optional)

By default, the cloud-hosted Telegram Bot API sets a hard limit of 20 MB for uploads and 50 MB for downloads. Because this system is designed to process multimedia recordings up to 150 MB, it is highly recommended to run a local Telegram API server engine container alongside the application cluster.

Once a self-hosted API server instance is running on your network (e.g., listening on port 8081), update your active .env configuration file to redirect the network worker layers:

TELEGRAM_API_URL=http://telegram-api-server:8081

This lifts the file size constraint to 2 GB, speeds up internal network transits, and ensures that raw audio/video files are kept strictly within your private localized network.

📖 Operational Guidance

Custom Prompt Configurations

When deploying the client GUI applications, the underlying SQLite database initializes with core default summary configurations. You can expand the analytical templates folder instantly using the following structures:

Executive Summary: "Analyze the transcribed meeting dialogue text. Extract a high-level summary overview detailing core operational goals..."
Action Items Mapping: "Isolate all explicit action steps specified across the timeline. Group individual assignments by speaker profile..."
Minutes of Meeting (MoM): "Generate a structured, formal timeline tracking core talking points, key design blockers, and organizational decisions..."

⚙️ Low-Resource Optimization

This workspace operates under strict memory limits to run on standard home computers without an external GPU.

Zero-Key Offline Baseline Setup

Out of the box, building the system establishes a completely self-contained local operations suite:

Capability	Local Tool / Mechanism	Operational Parameter
Inference Framework	Ollama Engine Deployment	Constrained execution loop mapping strictly to CPU threads.
Language Base Weights	`qwen2.5:1.5b-instruct-q4_K_M`	4-bit quantized model layer featuring low active RAM footprints.
Context Window Vector	Native Context Optimization	Managed token boundaries designed to run smoothly on standard CPUs.
Audio Processing Engine	FFmpeg Extraction Utilities	Converts files into lightweight mono streams before processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalVoxScribe

An asynchronous, microservice-based application for private, offline multimedia analysis.

🛠️ Ecosystem & Token Reference

🔑 Token Retrieval Guide

1. PyAnnote Speaker Diarization Token (`HUGGINGFACE_TOKEN`)

2. Telegram Bot Token (`TELEGRAM_BOT_TOKEN`)

🚀 Key Features

🏗️ System Architecture

Microservice Queue Mapping

📂 Repository Structure

🛠️ Tech Stack

⚡ Quick Start

Prerequisites

Installation & Run

🧠 Swapping to a Different Language Model (Optional)

🌐 Handling Large Files via Local Telegram API Server (Optional)

📖 Operational Guidance

Custom Prompt Configurations

⚙️ Low-Resource Optimization

Zero-Key Offline Baseline Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
services		services
shared		shared
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example

Folders and files

Latest commit

History

Repository files navigation

LocalVoxScribe

An asynchronous, microservice-based application for private, offline multimedia analysis.

🛠️ Ecosystem & Token Reference

🔑 Token Retrieval Guide

1. PyAnnote Speaker Diarization Token (HUGGINGFACE_TOKEN)

2. Telegram Bot Token (TELEGRAM_BOT_TOKEN)

🚀 Key Features

🏗️ System Architecture

Microservice Queue Mapping

📂 Repository Structure

🛠️ Tech Stack

⚡ Quick Start

Prerequisites

Installation & Run

🧠 Swapping to a Different Language Model (Optional)

🌐 Handling Large Files via Local Telegram API Server (Optional)

📖 Operational Guidance

Custom Prompt Configurations

⚙️ Low-Resource Optimization

Zero-Key Offline Baseline Setup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. PyAnnote Speaker Diarization Token (`HUGGINGFACE_TOKEN`)

2. Telegram Bot Token (`TELEGRAM_BOT_TOKEN`)

Packages