Skip to content

PiyushInt/voice-native-scheme-agent

Repository files navigation

Voice-Based Native Language Service Agent 🇮🇳

A powerful, voice-first agentic AI designed to assist users specifically with Indian Government Welfare Schemes in their native language. It goes beyond a simple chatbot by acting as an intelligent agent that plans, executes tools, and manages conversation memory to guide users from inquiry to registration.

Status Python Language

🚀 Capabilities

  • 🗣️ Voice-First Interaction: Seamless Speech-to-Text (STT) and Text-to-Speech (TTS) pipeline.
  • 🌏 Multi-Language Support: Native support for Hindi, Bengali, Marathi, Telugu, Tamil, and Odia.
  • 🧠 Agentic Workflow: Uses a ReAct (Reason+Act) loop to "think" before speaking ([VICHAAR] logic).
  • 🛠️ Actionable Tools:
    • check_all_schemes: Auto-matches users based on Age, Income, and Occupation.
    • register_user: Generates application IDs and manages a mock user database.
  • 💾 Conversation Memory: Remembers user details across multiple turns (e.g., "I told you I am a farmer").

📂 Project Structure

├── app/
│   ├── agent.py       # Core Logic: LLM integration, ReAct loop, Memory
│   ├── tools.py       # Tool definitions (Scheme Search, Eligibility, Registration)
│   └── voice.py       # Speech Recognition & TTS wrappers
├── data/
│   ├── schemes.json   # Database of govt schemes (PM-KISAN, etc.)
│   └── users.json     # (Generated) registered users database
├── static/
│   └── index.html     # Frontend for Web Demo
├── main.py            # CLI Entry Point
├── server.py          # FastAPI Backend for Web Demo
├── architecture.md    # Detailed System Architecture & Diagrams
├── evaluation_transcript.md # Logs of testing scenarios
└── requirements.txt   # Python Dependencies

🛠️ Setup Instructions

Prerequisites

  • Python 3.9 or higher
  • Mac Users: Need portaudio for microphone access.
    brew install portaudio

Installation

  1. Clone the Repository

    git clone <repository-url>
    cd voice_scheme_agent
  2. Install Dependencies

    pip install -r requirements.txt
  3. Environment Setup

    • Create a .env file in the root directory.
    • Add your Google Gemini API Key:
    echo "GEMINI_API_KEY=your_api_key_here" > .env

💻 Usage

Option 1: Web Interface (Recommended) ✨

The web interface provides the best experience with visual feedback and browser-based speech handling.

  1. Start the Server:
    python server.py
  2. Open Browser: Navigate to http://localhost:8000
  3. Interact:
    • Click "Start Agent".
    • Tap the Microphone button and speak in your preferred language (e.g., "Namaste", "Kemon acho").

Option 2: CLI Mode (Terminal)

For quick testing without a browser.

  1. Run the Script:
    python main.py
  2. Speak: The terminal will log "Listening..." and output the agent's voice response.

🧪 Example Workflow (Walkthrough)

Scenario: A 45-year-old farmer asking for help.

  1. User: "नमस्ते, मुझे सरकारी योजनाओं के बारे में जानना है।" (Namaste, I want to know about schemes.)
  2. Agent: "नमस्ते! आप किस प्रकार की योजना ढूंढ रहे हैं? क्या आप किसान हैं या छात्र?" (Namaste! ... Are you a farmer or student?)
  3. User: "मैं एक किसान हूँ।" (I am a farmer.)
  4. Agent: "जी, आपकी आयु और वार्षिक आय क्या है?" (Right, what is your age and income?)
  5. User: "मेरी उम्र 45 साल है और आय 2 लाख है।"
  6. Agent (Thinks): User provided Age=45, Income=200k, Occ=Farmer. Calling check_all_schemes.
  7. Agent (Acts): Checks database... Found PM-KISAN.
  8. Agent: "आप PM-KISAN Samman Nidhi के लिए पात्र हैं। क्या आप आवेदन करना चाहते हैं?"
  9. User: "हाँ, कृपया आवेदन करें।"
  10. Agent: Calls register_user... "आपका आवेदन सफल रहा! Application ID: APP-1234. नजदीकी केंद्र पर जाएं।"

🔧 Troubleshooting

  • OSError: [Errno -9996] Invalid input device: Check your default microphone settings in system preferences.
  • Browser STT Not working: Use Google Chrome or Edge. Ensure site permissions for Microphone are "Allow".
  • "Language not supported": The Web Demo tries to use the specific regional code (e.g., bn-IN). If your OS doesn't support it, it attempts a fallback.

📜 Deliverables Checklist

  • Demo Video: (See attached file)
  • Architecture Doc: architecture.md
  • Runnable Code: Full repo committed.
  • Evaluation Transcript: evaluation_transcript.md

About

voice-first agentic AI that helps users identify and apply for Indian Government Welfare Schemes in native languages (Hindi, Bengali, Marathi, etc.) using a ReAct workflow.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors