DocChat — Chat with Your Documents 🤖📄

Upload any document. Ask anything. Get answers powered by 3 AI models simultaneously.

🔗 Live Demo: https://huggingface.co/spaces/loukikreddy22/docchat

🚀 What is DocChat?

DocChat is a full-stack RAG (Retrieval-Augmented Generation) application that lets you upload documents and have intelligent conversations with them. It uses 3 LLMs simultaneously and merges their answers for maximum accuracy.

✨ Features

📁 Upload PDF, DOCX, PPTX, TXT files
🧠 Multi-LLM fusion: Llama 3.3 + Gemini 2.0 Flash + Cohere Command-R
🌐 Web search augmentation via DuckDuckGo
🔍 Semantic search using FAISS vector store
💬 Session-based multi-file chat
🐳 Dockerized and deployed on HuggingFace Spaces

🏗️ System Architecture

User Uploads Document/File
(PDF / DOCX / PPTX / TXT)
                │
                ▼
      File Parsing Layer
      (Text Extraction)
                │
                ▼
      Text Chunking
      (500 Tokens + 50 Token Overlap)
                │
                ▼
      Embedding Model
      (Vector Generation)
                │
                ▼
      FAISS Vector Store
      (Semantic Indexing)
                │
                ▼
      User Query Input
                │
                ▼
      Semantic Retrieval
      (Top 4 Relevant Chunks)
                │
        ┌───────┴────────┐
        │                │
        ▼                ▼
Context Retrieval    Web Search
From Documents       External Sources
        │                │
        └───────┬────────┘
                │
                ▼
      Multi-LLM Layer
      • Llama
      • Gemini
      • Cohere
      (Parallel Inference)
                │
                ▼
      Response Fusion
      (Llama Aggregates Final Answer)
                │
                ▼
      Final AI Response
      (Returned to User)

⚙️ Workflow Summary

User uploads a document (PDF, DOCX, PPTX, or TXT)
The system extracts and preprocesses the document text
Text is split into overlapping chunks for better context retention
Embeddings are generated and stored in a FAISS vector database
User submits a question
Relevant chunks are retrieved using semantic similarity search
Additional context is fetched from:
- Uploaded documents
- External web sources
Multiple LLMs (Llama, Gemini, and Cohere) generate responses simultaneously
Llama combines and refines all generated outputs
Final intelligent response is returned to the user

🛠️ Tech Stack

Layer	Technology
Backend	FastAPI, Python
RAG Framework	LangChain
Vector Store	FAISS
Embeddings	sentence-transformers (all-MiniLM-L6-v2)
LLMs	Groq/Llama 3.3, Gemini 2.0 Flash, Cohere Command-R
Web Search	DuckDuckGo Search
Frontend	HTML, CSS, Vanilla JS
Containerization	Docker
Deployment	HuggingFace Spaces

🔧 Run Locally

# Clone repo
git clone https://github.com/mloukikreddy/docchat
cd docchat

# Create virtual environment
python -m venv venv
venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Create .env file
GROQ_API_KEY=your_key
GEMINI_API_KEY=your_key
COHERE_API_KEY=your_key

# Run
uvicorn main:app --host 0.0.0.0 --port 8000

📁 Project Structure

docchat/
├── main.py           # FastAPI app, routes
├── rag_pipeline.py   # RAG logic, LLMs, FAISS
├── file_loader.py    # PDF/DOCX/PPTX/TXT parser
├── static/
│   └── index.html    # Frontend UI
├── Dockerfile        # Container config
├── requirements.txt
└── .env              # API keys (not committed)

🔑 Environment Variables

Variable	Description
`GROQ_API_KEY`	Get from console.groq.com
`GEMINI_API_KEY`	Get from aistudio.google.com
`COHERE_API_KEY`	Get from cohere.com

👤 Author

Mekala Loukik Reddy — AI & Data Science Student, Anurag University
LinkedIn | GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__pycache__		__pycache__
static		static
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
file_loader.py		file_loader.py
main.py		main.py
rag_pipeline.py		rag_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocChat — Chat with Your Documents 🤖📄

🚀 What is DocChat?

✨ Features

🏗️ System Architecture

⚙️ Workflow Summary

🛠️ Tech Stack

🔧 Run Locally

📁 Project Structure

🔑 Environment Variables

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocChat — Chat with Your Documents 🤖📄

🚀 What is DocChat?

✨ Features

🏗️ System Architecture

⚙️ Workflow Summary

🛠️ Tech Stack

🔧 Run Locally

📁 Project Structure

🔑 Environment Variables

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages