Skip to content

Pariatorn/textaudio

Repository files navigation

TextAudio - Open Source TTS Platform

License: MIT Python 3.12+ Code Coverage

TextAudio is a comprehensive, production-ready text-to-speech platform that converts documents into high-quality audiobooks using advanced AI voice synthesis. Built with a modern microservices architecture, it supports 23 languages, voice cloning, and real-time progress tracking.

✨ Key Features

  • 🌍 23 Language Support - Multi-language TTS with intelligent routing
  • πŸŽ™οΈ Voice Cloning - Clone voices for personalized audio
  • ⚑ Real-time Progress - Server-Sent Events (SSE) for live updates
  • πŸ”„ Smart Retry System - 3-tier retry logic with automatic credit bonuses
  • 🎯 Pitch-Preserving Speed Control - Adjust playback speed (0.75x-1.5x)
  • πŸ“± Mobile-Responsive - Modern SvelteKit 5 frontend
  • πŸ§ͺ Production-Ready - 97%+ test coverage (192 backend + 107 frontend tests)
  • 🐳 Docker-First - Complete containerized deployment

🎯 Why Open Source?

This project was developed as a commercial product and is now being released to the open source community. While feature-complete and production-ready, we believe it will be more valuable as a community project. We welcome contributions and hope this serves as a reference implementation for modern TTS platforms.

What's Included:

  • βœ… Complete microservices architecture
  • βœ… Comprehensive test suite
  • βœ… Production-grade code quality
  • βœ… Real-world authentication & session management
  • βœ… Credit & retry system implementation

What's Missing (Contributions Welcome!):

  • ⏳ Payment integration (Stripe/PayPal planned)
  • ⏳ Storage encryption (AES-256-GCM planned)

🀝 Project Status & Maintainership

Community-Driven Development: As of December 2025, we (the original creators) are transitioning this project to community maintenance. We will not be actively implementing new features, but the project is fully functional and production-ready.

Our Continued Role:

  • πŸ“š Documentation Support - We'll help improve and clarify documentation
  • πŸ› Issue Guidance - We can provide context and guidance on reported issues
  • πŸ’‘ Architecture Questions - Happy to explain design decisions and codebase structure
  • πŸ‘€ Code Review - We may review significant PRs when time permits

We Encourage You To:

  • πŸ”¨ Fork and extend the project for your needs
  • 🌟 Submit pull requests for new features
  • πŸ“ Improve documentation
  • πŸ› Report and fix bugs
  • πŸ’¬ Help other community members in discussions

As the "parents" of this project, we're excited to see where the community takes it! Feel free to open issues for guidance or questions about the codebase.

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.12+ (for local development)
  • Node.js 18+ (for frontend development)
  • GPU (optional): NVIDIA or AMD for faster TTS processing

Installation

# Clone the repository
git clone https://github.com/yourusername/textaudio-platform.git
cd textaudio-platform

# Copy environment template
cp env.template .env

# Edit .env with your configuration
nano .env

# Start all services
docker compose up

Access Points

πŸ—οΈ Architecture

TextAudio uses a microservices architecture with 6 main services:

Backend Services

  1. API Orchestrator (Port 8000)

    • FastAPI-based request routing
    • Job orchestration and session management
    • PostgreSQL for data persistence
    • Redis for job queuing and caching
  2. TTS Chatterbox (Port 8001)

    • 23 language text-to-speech
    • Voice cloning capabilities
    • GPU acceleration (CUDA/ROCm/CPU fallback)
    • PyTorch-based ML models
  3. Text Processor (Port 8002)

    • Language detection with confidence scoring
    • Multi-format extraction (PDF, EPUB, Markdown, TXT)
    • Token estimation
  4. Job Worker (Background)

    • Redis queue consumer
    • Asynchronous TTS processing
    • Real-time progress via SSE
  5. Preview Worker (Background)

    • High-priority queue for previews
    • 2-sentence preview generation
    • 5-minute caching

Frontend

  • SvelteKit 5 with TailwindCSS 4
  • Modern reactive components
  • TypeScript for type safety
  • Real-time SSE updates
  • Mobile-responsive design

Infrastructure

  • PostgreSQL 15 - Jobs, sessions, users, credits
  • Redis 7 - Job queue, SSE pub/sub, caching
  • Filesystem Storage - Date-organized file storage

πŸ“š Documentation

Comprehensive documentation is available in the docs/ directory:

πŸ› οΈ Development

Development Mode

# Start with hot reload
docker compose -f docker-compose.yml -f docker-compose.dev.yml up

# Run backend tests (192 tests, 97% coverage)
cd services/textaudio/backend/api
pytest

# Run frontend tests (107 tests)
cd services/textaudio/frontend/textaudio
npm test

# Run E2E tests
cd services/textaudio/e2e
npm test

Code Quality

# Format and fix linting (before commits)
make fix-all-full

# Full validation (before releases)
make validate-all

# Code analysis
make analyze

# Security audit
make audit

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

  • Setting up your development environment
  • Code quality standards
  • Submitting pull requests
  • Reporting issues

πŸ“Š Project Status

Code Quality Metrics

  • Backend Tests: 192 tests passing, 97%+ coverage
  • Frontend Tests: 107 tests passing
  • Linting Errors: 0 (ruff, ESLint)
  • Code Complexity: Average 7.70 (Grade B)
  • Security Issues: 0 HIGH severity

Supported Languages

Arabic (ar), Danish (da), German (de), Greek (el), English (en), Spanish (es), Finnish (fi), French (fr), Hebrew (he), Hindi (hi), Italian (it), Japanese (ja), Korean (ko), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Russian (ru), Swedish (sv), Swahili (sw), Turkish (tr), Chinese (zh)

Supported File Formats

Input: PDF, TXT, EPUB, Markdown Output: MP3, WAV, FLAC

πŸ”’ Security

  • Magic link authentication (no password storage)
  • SQL injection protection (SQLAlchemy ORM)
  • XSS protection (Pydantic validation)
  • CORS configuration
  • Rate limiting
  • Environment-based secrets management

Please report security vulnerabilities via GitHub Security Advisories.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Chatterbox TTS - High-quality TTS engine
  • FastAPI - Modern Python web framework
  • SvelteKit - Reactive frontend framework
  • All contributors and the open source community

πŸ’¬ Community & Support

πŸ—ΊοΈ Roadmap

See CHANGELOG.md for version history.

Upcoming Features (contributions welcome):

  • Payment integration (Stripe/PayPal)
  • Storage encryption (AES-256-GCM)
  • Email notifications
  • Invoice system
  • Multi-speaker audiobooks
  • Custom voice training
  • API access for developers

Made with ❀️ by the open source community

This project was originally developed as a commercial product and is now open source. We hope it serves the community well!

About

TextAudio is a comprehensive, production-ready text-to-speech platform that converts documents into high-quality audiobooks using advanced AI voice synthesis. Built with a modern microservices architecture, it supports 23 languages, voice cloning, and real-time progress tracking.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors