AI Powered Communication and Presentation Coach
SamvaadSetu is an innovative, AI-powered, web-based system designed to transform speakers into confident, self-aware, and impactful communicators.
By analyzing both verbal and non-verbal communication in real time, SamvaadSetu provides instant, holistic, and adaptive feedback to help users elevate their speaking skills.
- Multimodal Integration: Combines audio, video, and text to deliver a complete analysis of communication style.
- Accessibility & Inclusivity: Tailored for marginalized communities, individuals with disabilities, non-native speakers, and anyone seeking affordable, personalized coaching.
- Scenario-Based Training: Practice in real-world contexts like interviews, presentations, and team discussions.
- Comprehensive Feedback: Integrates posture tracking, speech metrics, grammar correction, emotional and sentiment analysis, and facial expression recognition.
- 🤖 Mediapipe: Gesture, posture, and face tracking
- 🗣️ Deepgram STT API: Speech-to-text transcription
- ✍️ LanguageTool: Grammar checks
- 🧠 Google Gemini (LLM): Emotional analysis, sentiment summarization, and enhanced feedback
- 🐍 Python & Flask: Backend logic and web server
- 🎞️ ffmpeg: Video/audio processing and conversion
- 🏷️ NLTK: Text analysis and sentence tokenization
- 📄 Markdown: Displaying formatted feedback
- Speech Transcription: Converts spoken words into text
- Grammar Correction: Identifies and suggests improvements for grammatical errors
- Facial Expression Recognition: Analyzes emotions conveyed through facial cues
- Hand Gesture & Posture Analysis: Evaluates body language for effectiveness
- Sentiment Summarization: Provides an overview of emotional tone
- Real-Time Feedback: On both verbal and non-verbal communication
- Filler Word & Pause Detection: Highlights habits and helps reduce usage
- Scenario-Based Training: Simulates real-life communication challenges
- Comprehensive Reporting: Unified feedback integrating all analysis aspects
- Playback & Review: Watch recordings with synchronized feedback overlays
- Inclusivity: Adaptive interfaces and feedback for diverse user needs
-
Start the Application
python app.py
-
Access the Web Interface
- Open your browser at
http://localhost:5000.
- Open your browser at
-
Begin Your Session
- 🎤 Speak or present as you normally would.
- 👀 The system tracks gestures, posture, and facial expressions.
- 📝 Receive instant, actionable feedback on your delivery.
- 📊 Review your performance with comprehensive, AI-generated reports.
SamvaadSetu/
├── app.py
├── requirements.txt
├── static/
│ └── (HTML/CSS/JS files)
├── templates/
│ └── (Web templates)
├── videos/
├── posture/
├── transcripts/
├── analysis/
├── reports/
├── LLM/
└── README.md
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Serves main recording interface |
/save_video |
POST | Receives and processes recordings |
/video |
GET | Serves latest processed video |
/report |
GET | Returns latest posture report |
/transcript |
GET | Returns latest speech transcript |
/analysis |
GET | Returns latest speech analysis |
/combined_report |
GET | Returns all analysis components |
/playback |
GET | Serves playback interface |
/analyze_report |
POST | Triggers AI enhancement of analysis |
/llm_response |
GET | Provides latest enhanced AI feedback |
-
Clone the repository
git clone https://github.com/abhishekmallav/SamvaadSetu.git cd SamvaadSetu -
Create a virtual environment
python -m venv venv
-
Activate your virtual environment
-
On Windows:
venv\Scripts\activate
-
On Mac/Linux:
source venv/bin/activate
-
-
Install required packages
pip install -r requirements.txt
- Advanced AI Feedback: Deeper, context-aware analysis and recommendations
- Multilingual Support: Feedback in multiple languages
- VR/AR Integration: Immersive practice environments
- Industry Modules: Legal, medical, technical, and more
- Mobile App: Practice and feedback on the go
- API Development: Integration with other platforms
- Collaborative Features: Group practice and peer feedback
- Flask Documentation
- MediaPipe
- Deepgram
- Google Gemini
- NLTK
- LanguageTool
This project is licensed under the MIT License.
Created by abhishekmallav, 09Namratakhade, PrashantTakale369, anushree612
Let me know if you want to add anything specific or need a section tailored further!



