AI Avatar - Interactive 2D Talking Avatar

A classic layered sprite avatar that listens to your voice, processes your speech with OpenAI's GPT, and responds with high-quality text-to-speech using OpenAI's premium voices.

Features

Realistic 2D Avatar

Layered sprite system with hair, face, eyes, nose, mouth, and cheeks
Natural animations including blinking, head tilting, and eyebrow movements
Dynamic facial expressions that respond to speech intensity
Smooth lip-sync with phoneme-based mouth shapes

Advanced Speech Processing

Speech Recognition using Google's speech-to-text API
OpenAI GPT Integration for intelligent responses
Premium TTS with OpenAI's "nova" voice (natural female voice)
Real-time conversation with minimal latency

Interactive Visual Feedback

State-based backgrounds: Different colors for listening, thinking, and speaking
Animated status indicators with emoji and dynamic text
Thinking animation with pulsing effects and animated dots
Speech-responsive animations that sync with audio output

Cost-Effective

Extremely affordable: ~$0.00008 per word for TTS
Efficient API usage: Optimized for minimal costs
Transparent pricing: See exact costs in real-time

Quick Start

Prerequisites

Python 3.8 or higher
OpenAI API key
Microphone for speech input
Speakers/headphones for audio output

Installation

Clone the repository

git clone https://github.com/fatimaazfar/2D-Talking-Avatar.git
cd ai-avatar

Install dependencies

pip install pygame numpy SpeechRecognition openai pyaudio

Set up your OpenAI API key

# Make a .env file with OpenAI API:
API_KEY-sk........123

Run the avatar

python avatar.py

Usage

Controls

SPACE - Start/stop listening for voice input
ESC - Exit the application

Conversation Flow

🎤 Listening - Press SPACE and speak your message
🧠 Thinking - Avatar processes your speech and generates response
🔊 Speaking - Avatar responds with natural voice and lip-sync

🛠Configuration

Voice Selection

Choose from OpenAI's premium voices:

voice="nova"      # Natural female voice (recommended)
voice="shimmer"   # Softer female voice
voice="alloy"     # Neutral voice

Quality Settings

model="tts-1"     # Standard quality ($0.015/1K chars)
model="tts-1-hd"  # HD quality ($0.030/1K chars)

Avatar Customization

Modify colors in the LayeredSpriteAvatar class:

self.colors = {
    'skin': (235, 210, 185),
    'hair': (60, 40, 30),
    'eye_iris': (100, 150, 180),
    # ... customize as needed
}

Technical Details

Architecture

Modular Design: Separate classes for avatar rendering and conversation handling
Multi-threading: Non-blocking audio processing and speech recognition
Event-driven: Pygame-based event system for smooth user interaction
State Management: Clean state transitions between listening, thinking, and speaking

Speech Recognition

Uses Google's speech-to-text API via speech_recognition library
Automatic noise adjustment for better accuracy
Configurable timeout and phrase limits

Text-to-Speech

OpenAI's latest TTS models with natural prosody
MP3 format for high audio quality
Temporary file handling for efficient memory usage

Lip Sync Algorithm

Phoneme analysis of text for realistic mouth movements
Multiple mouth shapes (closed, O-shape, A-shape, wide open)
Dynamic timing based on speech intensity

Cost Breakdown

OpenAI API Costs

Service	Model	Cost per Unit	Example Usage	Cost
TTS	tts-1	$0.015/1K chars	100 responses (5K chars)	$0.075
GPT	gpt-3.5-turbo	$0.001/1K tokens	100 conversations	$0.20
Total	-	-	100 full conversations	~$0.28

Speech Recognition

Google Speech-to-Text: Free tier (60 minutes/month)
Alternative: Can be configured for other providers

Troubleshooting

Common Issues

Audio Issues

# Install PyAudio for microphone access
pip install pyaudio

# On macOS, you might need:
brew install portaudio
pip install pyaudio

OpenAI API Errors

Verify your API key is correct and has credits
Check network connection
Ensure API key has TTS permissions

Speech Recognition Problems

Check microphone permissions
Adjust microphone sensitivity
Try speaking closer to the microphone

Performance Issues

Reduce avatar window size for better performance
Use standard TTS quality instead of HD
Close other applications using audio

Roadmap

Planned Features

Multiple avatar appearances (different hair, skin tones, styles)
Emotion detection and corresponding facial expressions
Background environments (office, home, outdoor)
Voice cloning support for custom voices
Multi-language support for international users
Chat history and conversation memory
Avatar customization UI for easy personalization

Technical Improvements

Real-time audio processing for faster response times
Advanced lip-sync using AI phoneme detection
3D avatar option using Three.js integration
Mobile app version for smartphones
Web browser version for universal access

Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Add comments for complex algorithms
Test new features thoroughly
Update documentation as needed

License

This project is licensed under the Apache 2.0 - see the LICENSE file for details.

Acknowledgments

OpenAI for providing excellent TTS and GPT APIs
Pygame community for the graphics framework
SpeechRecognition library maintainers
Contributors who help improve this project

Made with ❤️ by Fatima Azfar

Star ⭐ this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
avatar.py		avatar.py
gif.gif		gif.gif
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Avatar - Interactive 2D Talking Avatar

Features

Realistic 2D Avatar

Advanced Speech Processing

Interactive Visual Feedback

Cost-Effective

Quick Start

Prerequisites

Installation

Usage

Controls

Conversation Flow

🛠Configuration

Voice Selection

Quality Settings

Avatar Customization

Technical Details

Architecture

Speech Recognition

Text-to-Speech

Lip Sync Algorithm

Cost Breakdown

OpenAI API Costs

Speech Recognition

Troubleshooting

Common Issues

Roadmap

Planned Features

Technical Improvements

Contributing

Development Guidelines

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages