Skip to content

fatimaazfar/2D-Talking-Avatar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Avatar - Interactive 2D Talking Avatar

Demo

A classic layered sprite avatar that listens to your voice, processes your speech with OpenAI's GPT, and responds with high-quality text-to-speech using OpenAI's premium voices.

Python OpenAI Pygame License

Features

Realistic 2D Avatar

  • Layered sprite system with hair, face, eyes, nose, mouth, and cheeks
  • Natural animations including blinking, head tilting, and eyebrow movements
  • Dynamic facial expressions that respond to speech intensity
  • Smooth lip-sync with phoneme-based mouth shapes

Advanced Speech Processing

  • Speech Recognition using Google's speech-to-text API
  • OpenAI GPT Integration for intelligent responses
  • Premium TTS with OpenAI's "nova" voice (natural female voice)
  • Real-time conversation with minimal latency

Interactive Visual Feedback

  • State-based backgrounds: Different colors for listening, thinking, and speaking
  • Animated status indicators with emoji and dynamic text
  • Thinking animation with pulsing effects and animated dots
  • Speech-responsive animations that sync with audio output

Cost-Effective

  • Extremely affordable: ~$0.00008 per word for TTS
  • Efficient API usage: Optimized for minimal costs
  • Transparent pricing: See exact costs in real-time

Quick Start

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key
  • Microphone for speech input
  • Speakers/headphones for audio output

Installation

  1. Clone the repository
git clone https://github.com/fatimaazfar/2D-Talking-Avatar.git
cd ai-avatar
  1. Install dependencies
pip install pygame numpy SpeechRecognition openai pyaudio
  1. Set up your OpenAI API key
# Make a .env file with OpenAI API:
API_KEY-sk........123
  1. Run the avatar
python avatar.py

Usage

Controls

  • SPACE - Start/stop listening for voice input
  • ESC - Exit the application

Conversation Flow

  1. 🎤 Listening - Press SPACE and speak your message
  2. 🧠 Thinking - Avatar processes your speech and generates response
  3. 🔊 Speaking - Avatar responds with natural voice and lip-sync

🛠Configuration

Voice Selection

Choose from OpenAI's premium voices:

voice="nova"      # Natural female voice (recommended)
voice="shimmer"   # Softer female voice
voice="alloy"     # Neutral voice

Quality Settings

model="tts-1"     # Standard quality ($0.015/1K chars)
model="tts-1-hd"  # HD quality ($0.030/1K chars)

Avatar Customization

Modify colors in the LayeredSpriteAvatar class:

self.colors = {
    'skin': (235, 210, 185),
    'hair': (60, 40, 30),
    'eye_iris': (100, 150, 180),
    # ... customize as needed
}

Technical Details

Architecture

  • Modular Design: Separate classes for avatar rendering and conversation handling
  • Multi-threading: Non-blocking audio processing and speech recognition
  • Event-driven: Pygame-based event system for smooth user interaction
  • State Management: Clean state transitions between listening, thinking, and speaking

Speech Recognition

  • Uses Google's speech-to-text API via speech_recognition library
  • Automatic noise adjustment for better accuracy
  • Configurable timeout and phrase limits

Text-to-Speech

  • OpenAI's latest TTS models with natural prosody
  • MP3 format for high audio quality
  • Temporary file handling for efficient memory usage

Lip Sync Algorithm

  • Phoneme analysis of text for realistic mouth movements
  • Multiple mouth shapes (closed, O-shape, A-shape, wide open)
  • Dynamic timing based on speech intensity

Cost Breakdown

OpenAI API Costs

Service Model Cost per Unit Example Usage Cost
TTS tts-1 $0.015/1K chars 100 responses (5K chars) $0.075
GPT gpt-3.5-turbo $0.001/1K tokens 100 conversations $0.20
Total - - 100 full conversations ~$0.28

Speech Recognition

  • Google Speech-to-Text: Free tier (60 minutes/month)
  • Alternative: Can be configured for other providers

Troubleshooting

Common Issues

Audio Issues

# Install PyAudio for microphone access
pip install pyaudio

# On macOS, you might need:
brew install portaudio
pip install pyaudio

OpenAI API Errors

  • Verify your API key is correct and has credits
  • Check network connection
  • Ensure API key has TTS permissions

Speech Recognition Problems

  • Check microphone permissions
  • Adjust microphone sensitivity
  • Try speaking closer to the microphone

Performance Issues

  • Reduce avatar window size for better performance
  • Use standard TTS quality instead of HD
  • Close other applications using audio

Roadmap

Planned Features

  • Multiple avatar appearances (different hair, skin tones, styles)
  • Emotion detection and corresponding facial expressions
  • Background environments (office, home, outdoor)
  • Voice cloning support for custom voices
  • Multi-language support for international users
  • Chat history and conversation memory
  • Avatar customization UI for easy personalization

Technical Improvements

  • Real-time audio processing for faster response times
  • Advanced lip-sync using AI phoneme detection
  • 3D avatar option using Three.js integration
  • Mobile app version for smartphones
  • Web browser version for universal access

Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add comments for complex algorithms
  • Test new features thoroughly
  • Update documentation as needed

License

This project is licensed under the Apache 2.0 - see the LICENSE file for details.

Acknowledgments

  • OpenAI for providing excellent TTS and GPT APIs
  • Pygame community for the graphics framework
  • SpeechRecognition library maintainers
  • Contributors who help improve this project

Made with ❤️ by Fatima Azfar

Star ⭐ this repo if you find it helpful!

About

A classic layered sprite avatar that listens to your voice, processes your speech with OpenAI's GPT, and responds with high-quality text-to-speech using OpenAI's premium voices.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages