Skip to content

BatchLion/AudioDrama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

AudioDrama: AI-Powered Audio Drama Generator

AudioDrama is a full-stack application that transforms novel text into an immersive audio drama experience. It leverages a Large Language Model (LLM) to analyze the text, identify characters, and assign unique voices to each character, supporting multiple languages.

Features

  • AI-Powered Text Analysis: Uses the ZhipuAI (GLM-4) API to parse novel text, separating dialogue from narration and identifying the speaking character.
  • Intelligent Voice Assignment: Automatically detects the language of the dialogue for each character and assigns a suitable voice from the system's installed TTS voices. It tries to assign a unique voice to each character.
  • Multi-Language Support: Dynamically assigns voices based on the detected language of the text (e.g., English, Chinese).
  • Robust TTS Generation: Generates audio files in a separate, isolated process to ensure stability and prevent server hangs.
  • Modern Web Interface: A clean and simple frontend built with React and Vite to input text and play the generated audio drama.
  • Automatic Cleanup: Automatically clears old audio files before generating new ones.

Architecture

The project is a monorepo composed of two main parts:

  • audio-drama-backend: A Python server built with FastAPI that handles text processing, LLM interaction, and TTS audio generation.
  • audio-drama-frontend: A modern web application built with React (using Vite) that provides the user interface.

How It Works

  1. Text Submission: The user pastes novel text into the frontend and submits it to the backend.
  2. Clear Audio Cache: The backend first deletes all previously generated audio files.
  3. LLM Analysis: The FastAPI server sends the text to the GLM-4 model with a detailed prompt to be structured into a list of segments, each containing the character and their dialogue. The prompt is optimized to distinguish between narration and dialogue.
  4. Voice Pre-assignment: The backend aggregates all dialogue for each unique character and performs a one-time, high-accuracy language detection on the large text block.
  5. Voice Selection: Based on the detected language for each character, the system assigns a suitable voice from the appropriate language-specific voice pool, trying to ensure each character gets a unique voice.
  6. Audio Generation: The server processes each text segment individually, calling a dedicated Python script (tts_worker.py) to generate an .aiff audio file using the pre-assigned voice. This ensures maximum stability.
  7. Playback: The frontend receives the list of segments with their corresponding audio URLs and plays them back in sequence, creating the audio drama experience.

Setup and Installation

Prerequisites

  • Python 3.9+
  • Node.js and npm
  • An API key for ZhipuAI (GLM-4)

Backend Setup

  1. Navigate to the backend directory:

    cd audio-drama-backend
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate
  3. Install Python dependencies:

    pip install -r requirements.txt
  4. Configure your API Key:

    • Create a file named .env in the audio-drama-backend directory.
    • Add your ZhipuAI API key to it:
      ZHIPUAI_API_KEY=your_zhipuai_api_key_here
      
  5. (Optional) List Available Voices:

    • To see a list of all TTS voices available on your system, you can run the utility script:
    python3 list_voices.py

Frontend Setup

  1. Navigate to the frontend directory:

    cd ../audio-drama-frontend
  2. Install Node.js dependencies:

    npm install

Running the Application

  1. Start the Backend Server:

    • In a terminal, from the audio-drama-backend directory (with the virtual environment activated):
    uvicorn main:app --host 0.0.0.0 --port 8000
  2. Start the Frontend Development Server:

    • In a separate terminal, from the audio-drama-frontend directory:
    npm run dev
  3. Access the Application:

    • Open your web browser and navigate to the URL provided by the Vite development server (usually http://localhost:5173).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors