AudioDrama: AI-Powered Audio Drama Generator

AudioDrama is a full-stack application that transforms novel text into an immersive audio drama experience. It leverages a Large Language Model (LLM) to analyze the text, identify characters, and assign unique voices to each character, supporting multiple languages.

Features

AI-Powered Text Analysis: Uses the ZhipuAI (GLM-4) API to parse novel text, separating dialogue from narration and identifying the speaking character.
Intelligent Voice Assignment: Automatically detects the language of the dialogue for each character and assigns a suitable voice from the system's installed TTS voices. It tries to assign a unique voice to each character.
Multi-Language Support: Dynamically assigns voices based on the detected language of the text (e.g., English, Chinese).
Robust TTS Generation: Generates audio files in a separate, isolated process to ensure stability and prevent server hangs.
Modern Web Interface: A clean and simple frontend built with React and Vite to input text and play the generated audio drama.
Automatic Cleanup: Automatically clears old audio files before generating new ones.

Architecture

The project is a monorepo composed of two main parts:

audio-drama-backend: A Python server built with FastAPI that handles text processing, LLM interaction, and TTS audio generation.
audio-drama-frontend: A modern web application built with React (using Vite) that provides the user interface.

How It Works

Text Submission: The user pastes novel text into the frontend and submits it to the backend.
Clear Audio Cache: The backend first deletes all previously generated audio files.
LLM Analysis: The FastAPI server sends the text to the GLM-4 model with a detailed prompt to be structured into a list of segments, each containing the character and their dialogue. The prompt is optimized to distinguish between narration and dialogue.
Voice Pre-assignment: The backend aggregates all dialogue for each unique character and performs a one-time, high-accuracy language detection on the large text block.
Voice Selection: Based on the detected language for each character, the system assigns a suitable voice from the appropriate language-specific voice pool, trying to ensure each character gets a unique voice.
Audio Generation: The server processes each text segment individually, calling a dedicated Python script (tts_worker.py) to generate an .aiff audio file using the pre-assigned voice. This ensures maximum stability.
Playback: The frontend receives the list of segments with their corresponding audio URLs and plays them back in sequence, creating the audio drama experience.

Setup and Installation

Prerequisites

Python 3.9+
Node.js and npm
An API key for ZhipuAI (GLM-4)

Backend Setup

Navigate to the backend directory:
```
cd audio-drama-backend
```

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install Python dependencies:
```
pip install -r requirements.txt
```
Configure your API Key:
- Create a file named .env in the audio-drama-backend directory.
- Add your ZhipuAI API key to it:
```
ZHIPUAI_API_KEY=your_zhipuai_api_key_here
```
(Optional) List Available Voices:
- To see a list of all TTS voices available on your system, you can run the utility script:
```
python3 list_voices.py
```

Frontend Setup

Navigate to the frontend directory:
```
cd ../audio-drama-frontend
```
Install Node.js dependencies:
```
npm install
```

Running the Application

Start the Backend Server:
- In a terminal, from the audio-drama-backend directory (with the virtual environment activated):
```
uvicorn main:app --host 0.0.0.0 --port 8000
```
Start the Frontend Development Server:
- In a separate terminal, from the audio-drama-frontend directory:
```
npm run dev
```
Access the Application:
- Open your web browser and navigate to the URL provided by the Vite development server (usually http://localhost:5173).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
audio-drama-backend		audio-drama-backend
audio-drama-frontend		audio-drama-frontend
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioDrama: AI-Powered Audio Drama Generator

Features

Architecture

How It Works

Setup and Installation

Prerequisites

Backend Setup

Frontend Setup

Running the Application

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AudioDrama: AI-Powered Audio Drama Generator

Features

Architecture

How It Works

Setup and Installation

Prerequisites

Backend Setup

Frontend Setup

Running the Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages