GenAI Chat is an intelligent question-answering chatbot designed to help users interact with their data. Built on the Retrieval Augmented Generation (RAG) technique, it leverages the power of OpenAIβs large language models (LLMs) and Redis vector databases to provide accurate and context-aware answers to complex user queries.
Key Features:
- Advanced RAG pipeline: Standalone-query rewriting + intent classification β hybrid retrieval (vector + BM25) β Reciprocal Rank Fusion β cross-encoder reranking (
BAAI/bge-reranker-v2-m3). - Intent classification: Skips retrieval for chit-chat turns (greetings, thanks, meta) to save latency and tokens.
- Structure-aware chunking: Per-format splitters that preserve headings (PDF/Markdown/DOCX), schema (CSV/XLSX), and JSON structure.
- Dynamic knowledge retrieval from Redis Stack with HNSW or FLAT indexes.
- Multi-format support:
.txt,.md,.pdf,.docx,.json,.csv,.xlsx. - Multi-source indexing:
Local,Azure Blob,AWS S3.
Technology Used:
For more detailed explanation of this project, including its design and implementation, check out the accompanying Medium blog post.
- [23-05-2026]:
v1.3β Advanced RAG release:- Hybrid search (vector KNN + BM25) fused with Reciprocal Rank Fusion (k=60).
- Cross-encoder reranker
BAAI/bge-reranker-v2-m3. - Standalone-query rewriter + intent classifier (skips retrieval for chit-chat).
- Structure-aware chunking.
- [11-05-2025]: Added support for HNSW (Hierarchical Navigable Small World) Redis vector indexing and new improved user interface.
- [11-03-2025]: Added support for
.json,.csvand.xlsxfiles. - [19-01-2025]: Initial release of GenAI-Chat
v1.0.
The chatbot consists of these core components:
- Frontend: Takes user queries and sends them to the backend. It's built with HTML + JavaScript and is running in a Docker container with Nginx.
- Backend: Takes user queries, runs the advanced RAG pipeline (query rewrite + intent classify β hybrid retrieve β RRF β rerank), builds prompts, and sends them to the LLM. Built with Flask and runs in a Docker container.
- Redis Stack: Stores document text, embedding vectors and session data. Provides both vector KNN (HNSW/FLAT) and full-text BM25 search. Runs in a Docker container.
- OpenAI LLM:
gpt-5.5for answer generation,gpt-5.4-minifor cheap query rewriting + intent classification, andtext-embedding-3-large(3072-d) for embeddings. Configurable inbackend/config.py. - HuggingFace Inference API: Hosts the cross-encoder reranker
BAAI/bge-reranker-v2-m3.
flowchart TD
Q["π User question"] --> RW{{"βοΈ rewrite_query + classify intent<br/>(gpt-5.4-mini)"}}
RW -- "chit-chat (greeting / smalltalk / meta)" --> LLM
RW -- "first turn / QUERY_REWRITE_ENABLED=false" --> SQ["π Standalone query"]
RW -- "knowledge_query / followup" --> SQ
SQ --> VEC["π§ Vector KNN<br/>(HNSW or FLAT, top 20)"]
SQ --> BM["π€ BM25 full-text<br/>(RediSearch @content, top 20)"]
VEC --> FUSE{"Both sources?"}
BM -. "only if HYBRID_SEARCH_ENABLED" .-> FUSE
FUSE -- "yes (vector + BM25)" --> RRF["π Reciprocal Rank Fusion<br/>(k=60, top 20)"]
FUSE -- "no (vector only)" --> SINGLE["π₯ Preserve vector order<br/>(top 20)"]
RRF --> RR{{"π― Cross-encoder rerank<br/>(BAAI/bge-reranker-v2-m3, top 5)"}}
SINGLE --> RR
RR -- "RERANKER_ENABLED=true" --> TOP["β Top 5 docs"]
RR -- "RERANKER_ENABLED=false" --> TOP
TOP --> NB{{"β prev/next neighbor expansion"}}
NB -- "INCLUDE_NEIGHBORS=true" --> CTX["π Final context docs"]
NB -- "INCLUDE_NEIGHBORS=false" --> CTX
CTX --> LLM["π€ LLM answer (gpt-5.5)<br/>with [filename] citations"]
Follow below steps in either on Mac and Linux (Ubuntu) machine.
Step 1: Install Docker
Step 2: Clone the repository
$ git clone https://github.com/atinesh/GenAI-Chat.git
Step 3: Configure environment
$ cp .env.example .env
# then edit .env and fill in real values
Required:
OPENAI_API_KEYβ from OpenAI β Your profile β API keys.HF_TOKENβ from HuggingFace Settings β Access Tokens
Optional (only if you index from cloud storage):
AZURE_STORAGE_CONNECTION_STRINGβ for indexing Azure Blob containers.AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_DEFAULT_REGIONβ for indexing S3 buckets.
Step 4: Choose your vector index type
Set INDEX_TYPE in .env (defaults to HNSW):
FLATβ best for small datasets (< 1M vectors) when accuracy matters more than latency.HNSWβ best for larger datasets when speed/scalability matters more than exact recall.
Note: For
HNSWyou can tuneEF_RUNTIMEin.env. Higher values increase recall but also latency. The default64is sized forVECTOR_TOP_K=20; raise it if you raise top-k.
Step 5: Build Images and Run Containers
$ cd GenAI-Chat
$ ./deploy.sh
Note: Redis persistence is handled by a named Docker volume (
redis-data), created automatically on first run.
Step 6: Index the data into Redis by following the instructions provided in the README.md file.
Step 7: Once indexing is complete, you can interact with the frontend by visiting http://localhost:8080/.
Note: RedisInsight can be accessed at http://localhost:8001/
In backend/config.py:
| Setting | Default | What it does |
|---|---|---|
INDEX_TYPE (env) |
HNSW |
Vector index type β HNSW or FLAT. Set in .env so backend & indexer stay in sync. |
EF_RUNTIME (env) |
64 |
HNSW query-time candidate pool. Must be >= VECTOR_TOP_K. Set in .env. |
VECTOR_TOP_K |
20 |
Candidates pulled from vector KNN. |
BM25_TOP_K |
20 |
Candidates pulled from BM25 full-text search. |
RRF_K |
60 |
RRF constant (canonical default). |
RRF_TOP_K |
20 |
Top-K after fusion (fed to reranker). |
RERANK_TOP_N |
5 |
Final docs sent to the LLM. |
INCLUDE_NEIGHBORS |
False |
If True, also fetch each top doc's prev_id/next_id chunks. |
HYBRID_SEARCH_ENABLED |
True |
Toggle BM25 leg of hybrid search. |
RERANKER_ENABLED |
True |
Toggle cross-encoder reranker. |
QUERY_REWRITE_ENABLED |
True |
Toggle standalone-query rewriter. |
INTENT_CLASSIFICATION_ENABLED |
True |
When True, the rewriter also classifies chit-chat (greetings / thanks / meta) and skips retrieval for those turns. |
MODEL |
gpt-5.5 |
Answering LLM. |
REASONING_EFFORT |
low |
Reasoning effort for the answering and rewriter LLMs (none, minimal, low, medium, high, xhigh) β affects latency and cost. |
QUERY_REWRITE_MODEL |
gpt-5.4-mini |
Cheap LLM used for query rewriting + intent classification. |
EMBEDDING_MODEL |
text-embedding-3-large |
Embedding model (3072-d). |
RERANKER_MODEL |
BAAI/bge-reranker-v2-m3 |
HF cross-encoder. |
SESSION_EXPIRATION |
900 |
Redis TTL (seconds) for stored conversation history. |
TOKEN_LIMIT |
5000 |
Max tokens kept in persisted conversation history. |
MAX_OUTPUT_TOKENS |
1500 |
LLM completion cap. |
If you found this repository helpful, please consider giving it a star β to show your support! It helps others discover the project and keeps me motivated to improve it further. If you'd like to support my work even more, consider buying me a coffee.
If you encounter any issues, please open an issue with detailed steps to reproduce the problem. Iβll look into it as soon as possible.
Iβm always looking to improve this project! If you have suggestions for new features or enhancements, feel free to submit a feature request.
Thank you for your support and contributions! π
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.

