A powerful, voice-first agentic AI designed to assist users specifically with Indian Government Welfare Schemes in their native language. It goes beyond a simple chatbot by acting as an intelligent agent that plans, executes tools, and manages conversation memory to guide users from inquiry to registration.
- 🗣️ Voice-First Interaction: Seamless Speech-to-Text (STT) and Text-to-Speech (TTS) pipeline.
- 🌏 Multi-Language Support: Native support for Hindi, Bengali, Marathi, Telugu, Tamil, and Odia.
- 🧠 Agentic Workflow: Uses a ReAct (Reason+Act) loop to "think" before speaking (
[VICHAAR]logic). - 🛠️ Actionable Tools:
check_all_schemes: Auto-matches users based on Age, Income, and Occupation.register_user: Generates application IDs and manages a mock user database.
- 💾 Conversation Memory: Remembers user details across multiple turns (e.g., "I told you I am a farmer").
├── app/
│ ├── agent.py # Core Logic: LLM integration, ReAct loop, Memory
│ ├── tools.py # Tool definitions (Scheme Search, Eligibility, Registration)
│ └── voice.py # Speech Recognition & TTS wrappers
├── data/
│ ├── schemes.json # Database of govt schemes (PM-KISAN, etc.)
│ └── users.json # (Generated) registered users database
├── static/
│ └── index.html # Frontend for Web Demo
├── main.py # CLI Entry Point
├── server.py # FastAPI Backend for Web Demo
├── architecture.md # Detailed System Architecture & Diagrams
├── evaluation_transcript.md # Logs of testing scenarios
└── requirements.txt # Python Dependencies
- Python 3.9 or higher
- Mac Users: Need
portaudiofor microphone access.brew install portaudio
-
Clone the Repository
git clone <repository-url> cd voice_scheme_agent
-
Install Dependencies
pip install -r requirements.txt
-
Environment Setup
- Create a
.envfile in the root directory. - Add your Google Gemini API Key:
echo "GEMINI_API_KEY=your_api_key_here" > .env
- Create a
The web interface provides the best experience with visual feedback and browser-based speech handling.
- Start the Server:
python server.py
- Open Browser:
Navigate to
http://localhost:8000 - Interact:
- Click "Start Agent".
- Tap the Microphone button and speak in your preferred language (e.g., "Namaste", "Kemon acho").
For quick testing without a browser.
- Run the Script:
python main.py
- Speak: The terminal will log "Listening..." and output the agent's voice response.
Scenario: A 45-year-old farmer asking for help.
- User: "नमस्ते, मुझे सरकारी योजनाओं के बारे में जानना है।" (Namaste, I want to know about schemes.)
- Agent: "नमस्ते! आप किस प्रकार की योजना ढूंढ रहे हैं? क्या आप किसान हैं या छात्र?" (Namaste! ... Are you a farmer or student?)
- User: "मैं एक किसान हूँ।" (I am a farmer.)
- Agent: "जी, आपकी आयु और वार्षिक आय क्या है?" (Right, what is your age and income?)
- User: "मेरी उम्र 45 साल है और आय 2 लाख है।"
- Agent (Thinks): User provided Age=45, Income=200k, Occ=Farmer. Calling
check_all_schemes. - Agent (Acts): Checks database... Found PM-KISAN.
- Agent: "आप PM-KISAN Samman Nidhi के लिए पात्र हैं। क्या आप आवेदन करना चाहते हैं?"
- User: "हाँ, कृपया आवेदन करें।"
- Agent: Calls
register_user... "आपका आवेदन सफल रहा! Application ID: APP-1234. नजदीकी केंद्र पर जाएं।"
OSError: [Errno -9996] Invalid input device: Check your default microphone settings in system preferences.- Browser STT Not working: Use Google Chrome or Edge. Ensure site permissions for Microphone are "Allow".
- "Language not supported": The Web Demo tries to use the specific regional code (e.g.,
bn-IN). If your OS doesn't support it, it attempts a fallback.
- Demo Video: (See attached file)
- Architecture Doc:
architecture.md - Runnable Code: Full repo committed.
- Evaluation Transcript:
evaluation_transcript.md