Voice-Based Native Language Service Agent 🇮🇳

A powerful, voice-first agentic AI designed to assist users specifically with Indian Government Welfare Schemes in their native language. It goes beyond a simple chatbot by acting as an intelligent agent that plans, executes tools, and manages conversation memory to guide users from inquiry to registration.

🚀 Capabilities

🗣️ Voice-First Interaction: Seamless Speech-to-Text (STT) and Text-to-Speech (TTS) pipeline.
🌏 Multi-Language Support: Native support for Hindi, Bengali, Marathi, Telugu, Tamil, and Odia.
🧠 Agentic Workflow: Uses a ReAct (Reason+Act) loop to "think" before speaking ([VICHAAR] logic).
🛠️ Actionable Tools:
- check_all_schemes: Auto-matches users based on Age, Income, and Occupation.
- register_user: Generates application IDs and manages a mock user database.
💾 Conversation Memory: Remembers user details across multiple turns (e.g., "I told you I am a farmer").

📂 Project Structure

├── app/
│   ├── agent.py       # Core Logic: LLM integration, ReAct loop, Memory
│   ├── tools.py       # Tool definitions (Scheme Search, Eligibility, Registration)
│   └── voice.py       # Speech Recognition & TTS wrappers
├── data/
│   ├── schemes.json   # Database of govt schemes (PM-KISAN, etc.)
│   └── users.json     # (Generated) registered users database
├── static/
│   └── index.html     # Frontend for Web Demo
├── main.py            # CLI Entry Point
├── server.py          # FastAPI Backend for Web Demo
├── architecture.md    # Detailed System Architecture & Diagrams
├── evaluation_transcript.md # Logs of testing scenarios
└── requirements.txt   # Python Dependencies

🛠️ Setup Instructions

Prerequisites

Python 3.9 or higher
Mac Users: Need portaudio for microphone access.
```
brew install portaudio
```

Installation

Clone the Repository

git clone <repository-url>
cd voice_scheme_agent

Install Dependencies
```
pip install -r requirements.txt
```
Environment Setup
- Create a .env file in the root directory.
- Add your Google Gemini API Key:
```
echo "GEMINI_API_KEY=your_api_key_here" > .env
```

💻 Usage

Option 1: Web Interface (Recommended) ✨

The web interface provides the best experience with visual feedback and browser-based speech handling.

Start the Server:
```
python server.py
```
Open Browser: Navigate to http://localhost:8000
Interact:
- Click "Start Agent".
- Tap the Microphone button and speak in your preferred language (e.g., "Namaste", "Kemon acho").

Option 2: CLI Mode (Terminal)

For quick testing without a browser.

Run the Script:
```
python main.py
```
Speak: The terminal will log "Listening..." and output the agent's voice response.

🧪 Example Workflow (Walkthrough)

Scenario: A 45-year-old farmer asking for help.

User: "नमस्ते, मुझे सरकारी योजनाओं के बारे में जानना है।" (Namaste, I want to know about schemes.)
Agent: "नमस्ते! आप किस प्रकार की योजना ढूंढ रहे हैं? क्या आप किसान हैं या छात्र?" (Namaste! ... Are you a farmer or student?)
User: "मैं एक किसान हूँ।" (I am a farmer.)
Agent: "जी, आपकी आयु और वार्षिक आय क्या है?" (Right, what is your age and income?)
User: "मेरी उम्र 45 साल है और आय 2 लाख है।"
Agent (Thinks): User provided Age=45, Income=200k, Occ=Farmer. Calling check_all_schemes.
Agent (Acts): Checks database... Found PM-KISAN.
Agent: "आप PM-KISAN Samman Nidhi के लिए पात्र हैं। क्या आप आवेदन करना चाहते हैं?"
User: "हाँ, कृपया आवेदन करें।"
Agent: Calls register_user... "आपका आवेदन सफल रहा! Application ID: APP-1234. नजदीकी केंद्र पर जाएं।"

🔧 Troubleshooting

OSError: [Errno -9996] Invalid input device: Check your default microphone settings in system preferences.
Browser STT Not working: Use Google Chrome or Edge. Ensure site permissions for Microphone are "Allow".
"Language not supported": The Web Demo tries to use the specific regional code (e.g., bn-IN). If your OS doesn't support it, it attempts a fallback.

📜 Deliverables Checklist

Demo Video: (See attached file)
Architecture Doc: architecture.md
Runnable Code: Full repo committed.
Evaluation Transcript: evaluation_transcript.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Based Native Language Service Agent 🇮🇳

🚀 Capabilities

📂 Project Structure

🛠️ Setup Instructions

Prerequisites

Installation

💻 Usage

Option 1: Web Interface (Recommended) ✨

Option 2: CLI Mode (Terminal)

🧪 Example Workflow (Walkthrough)

🔧 Troubleshooting

📜 Deliverables Checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
data		data
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.md		architecture.md
evaluation_transcript.md		evaluation_transcript.md
main.py		main.py
requirements.txt		requirements.txt
server.py		server.py
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Voice-Based Native Language Service Agent 🇮🇳

🚀 Capabilities

📂 Project Structure

🛠️ Setup Instructions

Prerequisites

Installation

💻 Usage

Option 1: Web Interface (Recommended) ✨

Option 2: CLI Mode (Terminal)

🧪 Example Workflow (Walkthrough)

🔧 Troubleshooting

📜 Deliverables Checklist

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages