A classic layered sprite avatar that listens to your voice, processes your speech with OpenAI's GPT, and responds with high-quality text-to-speech using OpenAI's premium voices.
- Layered sprite system with hair, face, eyes, nose, mouth, and cheeks
- Natural animations including blinking, head tilting, and eyebrow movements
- Dynamic facial expressions that respond to speech intensity
- Smooth lip-sync with phoneme-based mouth shapes
- Speech Recognition using Google's speech-to-text API
- OpenAI GPT Integration for intelligent responses
- Premium TTS with OpenAI's "nova" voice (natural female voice)
- Real-time conversation with minimal latency
- State-based backgrounds: Different colors for listening, thinking, and speaking
- Animated status indicators with emoji and dynamic text
- Thinking animation with pulsing effects and animated dots
- Speech-responsive animations that sync with audio output
- Extremely affordable: ~$0.00008 per word for TTS
- Efficient API usage: Optimized for minimal costs
- Transparent pricing: See exact costs in real-time
- Python 3.8 or higher
- OpenAI API key
- Microphone for speech input
- Speakers/headphones for audio output
- Clone the repository
git clone https://github.com/fatimaazfar/2D-Talking-Avatar.git
cd ai-avatar- Install dependencies
pip install pygame numpy SpeechRecognition openai pyaudio- Set up your OpenAI API key
# Make a .env file with OpenAI API:
API_KEY-sk........123- Run the avatar
python avatar.py- SPACE - Start/stop listening for voice input
- ESC - Exit the application
- 🎤 Listening - Press SPACE and speak your message
- 🧠 Thinking - Avatar processes your speech and generates response
- 🔊 Speaking - Avatar responds with natural voice and lip-sync
Choose from OpenAI's premium voices:
voice="nova" # Natural female voice (recommended)
voice="shimmer" # Softer female voice
voice="alloy" # Neutral voicemodel="tts-1" # Standard quality ($0.015/1K chars)
model="tts-1-hd" # HD quality ($0.030/1K chars)Modify colors in the LayeredSpriteAvatar class:
self.colors = {
'skin': (235, 210, 185),
'hair': (60, 40, 30),
'eye_iris': (100, 150, 180),
# ... customize as needed
}- Modular Design: Separate classes for avatar rendering and conversation handling
- Multi-threading: Non-blocking audio processing and speech recognition
- Event-driven: Pygame-based event system for smooth user interaction
- State Management: Clean state transitions between listening, thinking, and speaking
- Uses Google's speech-to-text API via
speech_recognitionlibrary - Automatic noise adjustment for better accuracy
- Configurable timeout and phrase limits
- OpenAI's latest TTS models with natural prosody
- MP3 format for high audio quality
- Temporary file handling for efficient memory usage
- Phoneme analysis of text for realistic mouth movements
- Multiple mouth shapes (closed, O-shape, A-shape, wide open)
- Dynamic timing based on speech intensity
| Service | Model | Cost per Unit | Example Usage | Cost |
|---|---|---|---|---|
| TTS | tts-1 | $0.015/1K chars | 100 responses (5K chars) | $0.075 |
| GPT | gpt-3.5-turbo | $0.001/1K tokens | 100 conversations | $0.20 |
| Total | - | - | 100 full conversations | ~$0.28 |
- Google Speech-to-Text: Free tier (60 minutes/month)
- Alternative: Can be configured for other providers
Audio Issues
# Install PyAudio for microphone access
pip install pyaudio
# On macOS, you might need:
brew install portaudio
pip install pyaudioOpenAI API Errors
- Verify your API key is correct and has credits
- Check network connection
- Ensure API key has TTS permissions
Speech Recognition Problems
- Check microphone permissions
- Adjust microphone sensitivity
- Try speaking closer to the microphone
Performance Issues
- Reduce avatar window size for better performance
- Use standard TTS quality instead of HD
- Close other applications using audio
- Multiple avatar appearances (different hair, skin tones, styles)
- Emotion detection and corresponding facial expressions
- Background environments (office, home, outdoor)
- Voice cloning support for custom voices
- Multi-language support for international users
- Chat history and conversation memory
- Avatar customization UI for easy personalization
- Real-time audio processing for faster response times
- Advanced lip-sync using AI phoneme detection
- 3D avatar option using Three.js integration
- Mobile app version for smartphones
- Web browser version for universal access
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow PEP 8 style guidelines
- Add comments for complex algorithms
- Test new features thoroughly
- Update documentation as needed
This project is licensed under the Apache 2.0 - see the LICENSE file for details.
- OpenAI for providing excellent TTS and GPT APIs
- Pygame community for the graphics framework
- SpeechRecognition library maintainers
- Contributors who help improve this project
Made with ❤️ by Fatima Azfar
Star ⭐ this repo if you find it helpful!