VisionPal is an AI-powered assistive application designed to empower visually impaired individuals by providing auditory descriptions of their surroundings.
The application uses a powerful vision-language model, meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo via Together AI, to analyze images from the user's camera. It provides rich, contextual descriptions of the scene, identifies objects, and can answer follow-up questions to help with navigation and environmental awareness.
This core AI capability, paired with Text-to-Speech (Google TTS), offers a comprehensive and interactive way for users to understand their environment.
- AI-Powered Scene Understanding: Leverages the Llama vision model to generate rich descriptions of scenes and answer user questions about the image.
- Auditory Feedback: Converts AI-generated descriptions and answers into speech using gTTS.
- Multiple Modes:
- Voice-Activated Mode: Totally hands-free operation.
- Button-Based Mode: For tactile control when preferred.
- Web Interface: Accessible via browser using Streamlit for easy deployment and testing.
- Multi-language Support: Offers language selection (Arabic/English).
- Flexible Input: Supports both live camera feed and gallery image uploads.
- Voice Interaction: Includes speech-to-text and text-to-speech capabilities.
- Noise Reduction: Features noise-reduced audio input for clearer commands.
- Core Logic: Python.
- Computer Vision & AI:
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbovia Together AI, OpenCV. - Audio Processing: Google Text-to-Speech (gTTS), PyAudio.
- Frontend: Streamlit.
Follow these steps to get VisionPal running on your local machine.
-
Clone the repository:
git clone https://github.com/mariamashraf731/VisionPal-Assistive-AI.git cd VisionPal-Assistive-AI -
Install dependencies: (This project's dependencies are listed in
requirements.txt.)pip install -r requirements.txt
-
Configure API Key: Create a
.envfile in the project's root directory and add your Together AI API key. This is required for the AI-powered description feature.TOGETHER_API_KEY="your_together_api_key_here"
You can run the application in two different modes:
-
Button-Based Mode (Desktop):
python app_button.py
-
Streamlit Web App:
streamlit run src/app_streamlit.py
- Adjust language settings
- Customize vision model
- Fine-tune noise reduction parameters
- Ensure all dependencies are installed
- Check microphone permissions
- Verify API keys
For detailed system architecture and user flow, refer to the Project Report.