LocalChat - Professional RAG Application

A production-ready Retrieval-Augmented Generation (RAG) application built with Flask, Ollama, PostgreSQL (pgvector), and Redis. Features comprehensive document processing, PDF table extraction, intelligent chunking, streaming responses, and accurate context-based answers.

Project Status

Current State: Production Ready | Last Updated: March 2026

See the Architecture and Project Structure sections below for a full overview.

Features

Core Capabilities

Document Processing: PDF, DOCX, TXT, Markdown with advanced table extraction
RAG Pipeline: Intelligent retrieval with hybrid search (semantic + BM25)
Chat Interface: Real-time streaming responses with document context
Enhanced Web Search: Optional live DuckDuckGo integration for up-to-date answers
Persistent Memory: Conversation history stored in PostgreSQL
Vector Search: Lightning-fast similarity search using pgvector HNSW
Table Extraction: Advanced PDF table detection and preservation
Duplicate Prevention: Smart document fingerprinting
Input Validation: Pydantic models with comprehensive sanitization
Caching Layer: Redis/Memory cache for embeddings and queries
Streaming Responses: Server-Sent Events for real-time feedback
Security: Rate limiting, CORS support, JWT authentication, XSS-safe frontend
GPU Acceleration: Automatic NVIDIA/AMD GPU detection; configurable multi-GPU layer offload via OLLAMA_NUM_GPU
Observability: Prometheus metrics endpoint, request timing middleware, detailed health checks, admin dashboard

Quality Assurance

1191 Tests: Unit, integration, and comprehensive test suites
Type Safety: Full type hints across codebase
Modular Architecture: Clean separation of concerns
CI/CD Ready: GitHub Actions configuration
Error Handling: Professional exception system with context preservation

Security

XSS Prevention: DOM-based conversation rendering (no innerHTML with user data)
Rate Limiting: Configurable per-endpoint via Flask-Limiter
CORS Support: Configurable allowed origins
JWT Authentication: Token-based auth for admin endpoints
Input Sanitization: Pydantic validation + server-side sanitization on all inputs
Secret Scanning: No credentials in source; placeholder examples only

Performance Features

Hybrid Search: Combines semantic similarity with BM25 keyword matching
Multi-level Caching:
- Embedding cache (5000 capacity)
- Query cache (1000 capacity)
- Configurable TTL
Efficient Indexing: HNSW for fast approximate nearest neighbor search
Smart Chunking: Context-aware with table preservation
Reranking: Multi-signal fusion for improved relevance
GPU Acceleration: Multi-GPU support via OLLAMA_NUM_GPU; NVIDIA/AMD auto-detection
Request Timing: X-Request-Duration header + Prometheus histogram on every response
TTL-Cached Subprocess Calls: nvidia-smi/rocm-smi results cached 30 s; Ollama /api/ps cached 5 s

Quick Start

# 1. Clone repository
git clone https://github.com/jwvanderstam/LocalChat
cd LocalChat

# 2. Install dependencies
pip install -r requirements.txt

# 3. Set up PostgreSQL with pgvector
# See Configuration section below for details

# 4. (Optional) Start Redis for caching
redis-server
# Or use memory cache (default)

# 5. Start Ollama
ollama serve

# 6. Run application
python app.py

# 7. Open browser
# http://localhost:5000

Usage

Once running, open your browser at http://localhost:5000.

Chat tab — ask questions; toggle RAG Mode to ground answers in uploaded documents, Enhanced to additionally query the web via DuckDuckGo.
Documents tab — upload PDF, DOCX, TXT, or Markdown files and test retrieval.
Models tab — select the active Ollama model.
API — all endpoints are documented in the interactive Swagger UI at /api/docs/.

Architecture

System Components

+---------------------------------------------------------------+
|                     LocalChat RAG System                      |
+---------------------------------------------------------------+
|                                                               |
|  +------------+    +------------+    +------------+           |
|  |  Web UI    |--->| Flask API  |--->|  Services  |           |
|  | (Browser)  |--->|  (Routes)  |--->|   Layer    |           |
|  +------------+    +------------+    +------------+           |
|                          |                |                   |
|                          |                |                   |
|  +----------------------------------------------------+      |
|  |              Application Core                      |      |
|  +----------------------------------------------------+      |
|  |                                                    |      |
|  |  +------------+  +------------+  +------------+    |      |
|  |  | RAG Engine |  |   Cache    |  |  Security  |    |      |
|  |  |  - Hybrid  |  | - Redis    |  | - Rate     |    |      |
|  |  |    Search  |  | - Memory   |  |   Limit    |    |      |
|  |  |  - Rerank  |  | - TTL      |  | - CORS     |    |      |
|  |  +------------+  +------------+  +------------+    |      |
|  |                                                    |      |
|  |  +------------+  +------------+  +------------+    |      |
|  |  | Document   |  |   Ollama   |  | Monitoring |    |      |
|  |  | Processor  |  |   Client   |  | - Metrics  |    |      |
|  |  | - Extract  |  | - LLM      |  | - Health   |    |      |
|  |  | - Chunk    |  | - Embed    |  | - Logs     |    |      |
|  |  +------------+  +------------+  +------------+    |      |
|  |                                                    |      |
|  +----------------------------------------------------+      |
|                          |                |                   |
|                          |                |                   |
|  +------------+    +------------+    +------------+           |
|  | PostgreSQL |    |   Ollama   |    |   Redis    |           |
|  | + pgvector |    |  (LLM API) |    | (Optional) |           |
|  | - Documents|    | - Embeddings|   | - Caching  |           |
|  | - Chunks   |    | - Generation|   | - Sessions |           |
|  | - Vectors  |    +------------+    +------------+           |
|  +------------+                                               |
|                                                               |
+---------------------------------------------------------------+

Data Flow

Document Upload:
  Upload -> Validate -> Extract Text -> Detect Tables ->
  Smart Chunk -> Generate Embeddings -> Store in DB ->
  Update Cache

RAG Query:
  Query -> Cache Check -> Generate Query Embedding ->
  Hybrid Search (Semantic + BM25) -> Retrieve Chunks ->
  Rerank Results -> Format Context -> LLM Generation ->
  Stream Response -> Cache Result

Cache Strategy:
  - Embedding Cache: 7 days TTL, 5000 capacity
  - Query Cache: 1 hour TTL, 1000 capacity
  - LRU eviction for memory cache
  - Redis fallback to memory cache

Technology Stack

Layer	Technology	Purpose
Frontend	HTML, CSS, JavaScript	Web interface
Backend	Flask 3.1	Web framework
Database	PostgreSQL 15+	Document storage
Vector DB	pgvector	Similarity search
Cache	Redis / Memory	Performance optimization
LLM	Ollama	Local inference
Embeddings	nomic-embed-text	Vector generation
GPU	NVIDIA (nvidia-smi) / AMD (rocm-smi)	Hardware acceleration
Metrics	Prometheus text format v0.0.4	Observability
Validation	Pydantic 2.12	Input validation
Testing	pytest	Test framework

Documentation

All documentation lives in-code with comprehensive docstrings and type hints.

Key Entry Points

app.py — Application entry point
src/app_factory.py — Flask app factory with blueprint registration
src/monitoring.py — Prometheus metrics, request timing, health checks (/api/metrics, /api/health)
src/ollama_client.py — Ollama LLM/embedding client with GPU detection and TTL caching
src/routes/admin_routes.py — Admin dashboard with GPU stats and loaded-model breakdown
src/rag/web_search.py — DuckDuckGo web search provider (Enhanced mode)
src/security.py — Rate limiting, CORS, JWT authentication
src/config.py — All configuration (env vars, RAG tuning, GPU settings, cache settings)
config/.env.example — Environment variable template

API Documentation

Interactive Swagger UI available at /api/docs/ when the app is running
Configured in src/api_docs.py

Project Structure

LocalChat/
├── app.py                      # Entry point
├── requirements.txt            # Python dependencies
├── config/
│   └── .env.example            # Environment variable template
├── src/                        # Application source code
│   ├── app_factory.py          # Flask app factory (entry: create_app)
│   ├── config.py               # Configuration (env vars, RAG settings)
│   ├── db/                     # PostgreSQL + pgvector database layer
│   │   ├── __init__.py         # Package: Database class + db singleton
│   │   ├── connection.py       # Connection pool, pgvector adapters, schema
│   │   ├── documents.py        # Document & chunk CRUD + vector search
│   │   └── conversations.py    # Conversation & message persistence
│   ├── exceptions.py           # Custom exception hierarchy
│   ├── models.py               # Pydantic request/response models
│   ├── monitoring.py           # Metrics, health checks, decorators
│   ├── ollama_client.py        # Ollama LLM/embedding client
│   ├── security.py             # Rate limiting, CORS, JWT
│   ├── api_docs.py             # Swagger/OpenAPI configuration
│   ├── types.py                # Type definitions
│   ├── cache/                  # Caching layer
│   │   ├── __init__.py         # Factory + re-exports
│   │   ├── managers.py         # Cache manager (embedding, query)
│   │   └── backends/
│   │       ├── base.py         # CacheStats + CacheBackend ABC
│   │       ├── memory.py       # In-memory LRU cache (OrderedDict)
│   │       ├── redis_cache.py  # Redis-backed distributed cache
│   │       └── database_cache.py # PostgreSQL-backed cache
│   ├── performance/
│   │   └── batch_processor.py  # Batch embedding processor
│   ├── rag/                    # RAG pipeline
│   │   ├── cache.py            # Embedding/query cache
│   │   ├── chunking.py         # Smart text chunking
│   │   ├── loaders.py          # PDF/DOCX/TXT file loaders
│   │   ├── processor.py        # Document ingestion orchestrator
│   │   ├── retrieval.py        # Hybrid search (semantic + BM25)
│   │   ├── scoring.py          # Result reranking & fusion
│   │   └── web_search.py       # DuckDuckGo web search provider
│   ├── routes/                 # API endpoints
│   │   ├── admin_routes.py     # Admin dashboard (/admin, /api/admin/stats)
│   │   ├── api_routes.py       # Chat API (/api/chat)
│   │   ├── document_routes.py  # Document management (/api/documents)
│   │   ├── error_handlers.py   # Global error handlers
│   │   ├── memory_routes.py    # Memory/conversation routes
│   │   ├── model_routes.py     # Ollama model management
│   │   └── web_routes.py       # HTML page routes
│   ├── tools/                  # Tool/function calling
│   │   ├── builtin.py          # Built-in tools
│   │   ├── executor.py         # Tool execution engine
│   │   ├── plugin_loader.py    # Plugin discovery & dynamic loading
│   │   └── registry.py         # Tool registration
│   └── utils/
│       ├── logging_config.py   # Structured logging setup
│       └── sanitization.py     # Input sanitization & validation
├── static/                     # Frontend assets
│   ├── css/
│   │   ├── style.css               # Application styles
│   │   ├── bootstrap.min.css       # Bootstrap 5.3.0 (self-hosted)
│   │   ├── bootstrap-icons.css     # Bootstrap Icons 1.10.0 (self-hosted)
│   │   └── fonts/                  # Bootstrap icon fonts (woff, woff2)
│   └── js/
│       ├── bootstrap.bundle.min.js # Bootstrap 5.3.0 JS (self-hosted)
│       ├── chat.js                 # Chat interface logic
│       └── ingestion.js            # Document upload logic
├── templates/                  # Jinja2 HTML templates
│   ├── admin.html              # Operator dashboard (GPU stats, metrics, cache)
│   ├── base.html
│   ├── chat.html
│   ├── documents.html
│   ├── models.html
│   └── overview.html
├── tests/                      # Test suite
│   ├── conftest.py             # Shared fixtures
│   ├── integration/            # Integration tests
│   ├── unit/                   # Unit tests
│   └── utils/                  # Test helpers & mocks
└── scripts/
    └── check_dependencies.py   # Dependency checker and auto-installer
plugins/                        # Drop-in tool plugins (auto-loaded at startup)
    ├── README.md               # Plugin authoring guide
    └── example_plugin.py       # Annotated starter template

Testing

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific category
pytest tests/unit/
pytest tests/integration/

# Run specific test file
pytest tests/unit/test_rag.py

# Run with verbose output
pytest -v

# Run tests in parallel (if pytest-xdist installed)
pytest -n auto

Test Coverage

# Generate coverage report
pytest --cov=src --cov-report=html

# View report
open htmlcov/index.html

# Or view in terminal
pytest --cov=src --cov-report=term

Current Test Stats

Unit Tests: tests/unit/ — 48 test modules covering all core components
Integration Tests: tests/integration/ — 4 modules covering all API route blueprints
Total: 1191 passing tests (9 integration failures require a live PostgreSQL instance)

Configuration

Environment Variables

Create a .env file in the root directory (copy from config/.env.example):

# Database Configuration
export PG_HOST=localhost
export PG_PORT=5432
export PG_USER=postgres
export PG_PASSWORD=your_password
export PG_DB=rag_db

# Ollama Configuration
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_DEFAULT_MODEL=llama3.2
export OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest
# GPU layer offload: -1 = all layers on GPU (default), 0 = CPU only
export OLLAMA_NUM_GPU=-1

# Redis Configuration (Optional)
export REDIS_ENABLED=False          # Set to True to enable Redis
export REDIS_HOST=localhost
export REDIS_PORT=6379
export REDIS_DB=0
export REDIS_PASSWORD=                # Leave empty if no password

# Flask Configuration
export SECRET_KEY=your_secret_key_here
export JWT_SECRET_KEY=your_jwt_secret_here
export ADMIN_PASSWORD=your_admin_password_here  # Required for /api/auth/login
export FLASK_ENV=production
export DEBUG=False

# Security Configuration
export RATELIMIT_ENABLED=True
export RATELIMIT_CHAT=10 per minute
export RATELIMIT_UPLOAD=5 per hour
export CORS_ENABLED=False
export CORS_ORIGINS=http://localhost:3000

# Observability
# Leave METRICS_TOKEN empty to allow unauthenticated Prometheus scraping
# (acceptable on a private network). Set a strong token in production.
export METRICS_TOKEN=

Cache Configuration

LocalChat supports two caching backends:

Memory Cache (Default)

Pros: No external dependencies, fast, simple setup
Cons: Lost on restart, limited capacity, single-process only
Best for: Development, testing, light loads

# Enable memory cache (default)
export REDIS_ENABLED=False

Redis Cache (Production)

Pros: Persistent, distributed, large capacity
Cons: Requires Redis server
Best for: Production, high load, multi-process deployments

# Enable Redis cache
export REDIS_ENABLED=True
export REDIS_HOST=localhost
export REDIS_PORT=6379
export REDIS_PASSWORD=your_password  # Optional

# Start Redis
redis-server

# Or with Docker
docker run -d -p 6379:6379 redis:alpine

RAG Configuration

Edit src/config.py to customize RAG behavior:

# Chunking Configuration
CHUNK_SIZE = 1024              # Characters per chunk (increased for better context)
CHUNK_OVERLAP = 200            # Overlap between chunks (20%)
TABLE_CHUNK_SIZE = 3000        # Larger chunks for tables

# Retrieval Configuration
TOP_K_RESULTS = 40             # Initial candidates
MIN_SIMILARITY_THRESHOLD = 0.28  # Minimum similarity score
RERANK_TOP_K = 12              # Final results after reranking

# Hybrid Search
HYBRID_SEARCH_ENABLED = True   # Combine semantic + keyword search
SEMANTIC_WEIGHT = 0.70         # Weight for semantic similarity
BM25_ENABLED = True            # Enable BM25 keyword matching

# LLM Configuration
DEFAULT_TEMPERATURE = 0.0      # Zero temperature for factual responses
MAX_CONTEXT_LENGTH = 20000     # Increased context window
STREAM_RESPONSES = True        # Enable streaming

# Cache Configuration
EMBEDDING_CACHE_SIZE = 5000    # Max cached embeddings
EMBEDDING_CACHE_ENABLED = True # Enable embedding cache
EMBEDDING_TTL = 604800         # 7 days
QUERY_TTL = 3600              # 1 hour

Performance Tuning

Database Optimization

# Connection Pool
DB_POOL_MIN_CONN = 2
DB_POOL_MAX_CONN = 10
DB_POOL_TIMEOUT = 5

# HNSW Index Parameters
# ef_search is computed dynamically as max(TOP_K_RESULTS * 2, 40)
DB_INDEX_TYPE = 'hnsw'        # Use HNSW for fast ANN search

Processing Configuration

# Parallel Processing
MAX_WORKERS = 8               # Concurrent threads
BATCH_SIZE = 32              # Embeddings batch size

# Table Extraction
KEEP_TABLES_INTACT = True     # Don't split tables across chunks
MIN_TABLE_ROWS = 3           # Minimum rows to detect as table

See src/config.py for all configuration options.

Monitoring & Observability

Prometheus Metrics

The application exposes a Prometheus-compatible scrape endpoint:

GET /api/metrics        — Prometheus text format v0.0.4
GET /api/metrics.json   — JSON metrics snapshot (used by admin dashboard)
GET /api/health         — Detailed component health check

Sample output from /api/metrics:

# TYPE http_requests_total counter
http_requests_total{method="GET",endpoint="health_check",status="200"} 42
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_count 42
http_request_duration_seconds_sum 1.234
http_request_duration_seconds_bucket{le="0.1"} 38
http_request_duration_seconds_bucket{le="+Inf"} 42
# TYPE app_uptime_seconds gauge
app_uptime_seconds 3600.5

Every response also carries an X-Request-Duration header (e.g. 0.042s).

Securing the Scrape Endpoint

Set METRICS_TOKEN in .env to require a Bearer token:

export METRICS_TOKEN=your_strong_token_here

Prometheus scrape config:

scrape_configs:
  - job_name: localchat
    static_configs:
      - targets: ['localhost:5000']
    bearer_token: your_strong_token_here

Leave METRICS_TOKEN empty for unauthenticated access (safe on a private network).

Health Check

GET /api/health

Returns 200 healthy, 200 degraded (Ollama down), or 503 unhealthy (database down):

{
  "status": "healthy",
  "timestamp": "2026-03-19T10:00:00.000000",
  "checks": {
    "database": { "status": "up", "healthy": true },
    "ollama":   { "status": "up", "healthy": true },
    "cache":    { "status": "up", "healthy": true, "stats": { "hits": 120, "misses": 5 } }
  }
}

Admin Dashboard

Navigate to /admin (JWT required in production; open in demo mode).

The dashboard surfaces:

GPU Hardware — per-physical-GPU cards: VRAM usage bar, utilisation %, temperature (refreshed every 30 s)
Loaded Models — per-model VRAM breakdown with GPU offload % (refreshed every 5 s)
Cache Stats — embedding cache and query cache hit rates
System Info — app version, active model, uptime, request count

GPU Acceleration

LocalChat automatically detects available GPUs:

Vendor	Tool	Detection
NVIDIA	`nvidia-smi`	Auto-detected if on `PATH`
AMD	`rocm-smi`	Auto-detected if on `PATH`

Control GPU layer offload in .env:

# -1 = all transformer layers on GPU (recommended when VRAM is sufficient)
#  0 = CPU-only inference
#  N = offload N layers to GPU
export OLLAMA_NUM_GPU=-1

The value is forwarded in options.num_gpu on every /api/chat and /api/embed request, so Ollama distributes work across all detected GPUs automatically when multiple GPUs are present.

Development

Setting Up Development Environment

# Install dependencies
pip install -r requirements.txt

# Install pre-commit hooks
pre-commit install

# Run code formatters
black src/ tests/
isort src/ tests/

# Run linters
pylint src/
mypy src/

Code Quality Standards

Type Hints: 100% (required)
Docstrings: Google-style (required)
Test Coverage: >=80% for new code
Linting: Pass pylint, mypy, black
Static Analysis: SonarCloud Quality Gate must pass
Documentation: Update relevant docs

CI/CD & Code Quality

Two GitHub Actions workflows run on every push and pull request to main:

Workflow	File	Purpose
Tests	`.github/workflows/tests.yml`	Runs all unit tests on Python 3.11
SonarCloud	`.github/workflows/sonarcloud.yml`	Runs unit tests with coverage, then uploads results to SonarCloud

SonarCloud

Static analysis and coverage tracking are handled by SonarCloud.

Project key: jwvanderstam_LocalChat
Organisation: jwvanderstam
Configuration: sonar-project.properties
Coverage source: coverage.xml produced by pytest --cov=src --cov-report=xml

Vendored third-party assets (static/css/bootstrap*.css, static/js/bootstrap*.js, static/css/fonts/) are excluded from analysis so they don't skew metrics.

To run the same coverage report locally that the SonarCloud workflow uses:

pytest tests/unit/ -v --tb=short --cov=src --cov-report=xml --cov-report=term-missing

The coverage.xml file is produced in the project root and is picked up automatically by the sonarcloud-github-action.

Development Workflow

Create feature branch
```
git checkout -b feature/your-feature
```

Write code and tests

# Add tests first (TDD)
pytest tests/unit/test_your_feature.py

Check code quality

black src/ tests/
pylint src/
pytest --cov

Commit and push

git add .
git commit -m "feat: your feature"
git push origin feature/your-feature

Create pull request

Changelog

May 2026

Security (CodeQL): Replaced str(e) in all API error responses with generic messages to prevent information exposure (CWE-209) — affects document_routes, api_routes, model_routes
Security (CodeQL): Fixed incomplete URL substring sanitization in web-search test assertion (CWE-20)
Security (CodeQL): Removed max_size from MemoryCache init log to cut CodeQL taint chain from Redis password kwargs (CWE-312)
Supply chain: Pinned python:3.12-slim Docker base image to SHA-256 digest in Dockerfile; updated k8s/deployment.yaml comment to use versioned image tags
CI hardening: Added explicit permissions: contents: read to both GitHub Actions workflows (tests.yml, sonarcloud.yml)
SonarCloud (S1172): Prefixed all unused monitoring-stub parameters (metric_name, labels) with _ across rag/__init__.py, chunking.py, loaders.py, processor.py, retrieval.py
Types: Added plugin_loader: Any to LocalChatApp type definition in src/types.py

March 2026

GPU support: Automatic NVIDIA/AMD GPU detection via nvidia-smi/rocm-smi; per-GPU VRAM, utilisation and temperature surfaced in the admin dashboard
Multi-GPU: OLLAMA_NUM_GPU config constant (default -1) forwarded in options.num_gpu on all /api/chat and /api/embed requests so Ollama distributes layers across all available GPUs
Performance: TTL caching on get_gpu_info() (30 s) and get_running_models() (5 s) — eliminates 2 s nvidia-smi cold-start penalty on every admin dashboard load; stale-on-error fallback keeps the models table visible during Ollama hiccups
Monitoring: Prometheus-compatible /api/metrics endpoint (text format v0.0.4); /api/metrics.json JSON snapshot; /api/health detailed component check; optional METRICS_TOKEN Bearer auth for the scrape endpoints
Observability: RequestTimingMiddleware adds X-Request-Duration header to every response; @timed decorator logs slow operations (>1 s) and records histogram; @counted decorator increments request counters
Admin dashboard: New GPU hardware cards row (per-GPU VRAM bar + utilisation bar + temperature); loaded models table with VRAM and GPU offload percentage
Embedding endpoint: Upgraded from legacy /api/embeddings to /api/embed (Ollama ≥ 0.1.32) with automatic fallback; embedding model name cached after first lookup
Connection pooling: All Ollama HTTP requests share a single requests.Session for TCP connection reuse
Tests: Suite expanded from 995 → 1191 passing tests; added TestGetGpuInfo (12 tests), TestMetricsEndpoints (8 tests), TestMetricsAuth (5 tests), TestComputeHealthStatus (5 tests), TestExportPrometheusMetrics (7 tests)

January 2026

Security: Fixed XSS vulnerability in conversation sidebar — replaced innerHTML template-literal interpolation with DOM API (createElement + addEventListener), eliminating injection surface for user-controlled conv.id and conv.title data
Security: Removed eyJ… JWT example placeholder from security.py docstring to suppress secret-scanner false positives
Bug fix: DatabaseUnavailableError now propagates as HTTP 503 from all document routes (/stats, /list, /search-text, /clear) instead of being swallowed as 500
Bug fix: Resolved "Working outside of application context" error on model-pull SSE stream — replaced cast(current_app) with current_app._get_current_object()
Bug fix: Ollama non-200 responses (e.g. 404 model-not-found) now raise RuntimeError instead of being forwarded as chat content; frontend handles data.error SSE events and shows them in the assistant bubble
Refactor: Reduced cognitive complexity in api_chat (43→~~8) and api_upload_documents (19→~~11) by extracting focused helper functions
Tests: Expanded unit suite from 970 → 995 passing tests (48 modules); fixed order-dependent fixture pollution caused by duplicate app() fixture calling create_app(testing=False)

Contributing

We welcome contributions!

Quick Contribution Guide

Fork the repository
Create a feature branch
Make your changes
Add tests
Update documentation
Submit a pull request

Code of Conduct

Be respectful and inclusive
Follow coding standards
Write clear commit messages
Add tests for new features
Update documentation

Project Status (Detailed)

Current Version: 0.5.0

Phase: Production Ready — Clean Architecture

Highlights:

src/db/ package split into connection, documents, conversations modules
src/cache/backends/ subpackage with base, memory, redis_cache, database_cache
Modular src/ package with clean separation of concerns
RAG pipeline split into dedicated modules (rag/cache, chunking, loaders, processor, retrieval, scoring)
Tool/function calling system (tools/builtin, executor, registry)
Flask blueprint architecture with typed routes
Comprehensive test suite: 628 passing tests (unit, integration, comprehensive)
Professional error handling with custom exception hierarchy
Swagger/OpenAPI docs at /api/docs/
Bootstrap 5 and Bootstrap Icons self-hosted (no CDN dependency)
SonarCloud static analysis integrated via GitHub Actions
Database connection pool resource-leak fixes (connection.py)

Troubleshooting

Common Issues

Issue: RAG not retrieving documents

# Check if documents are uploaded
curl http://localhost:5000/api/documents/stats

# Test retrieval
curl -X POST http://localhost:5000/api/documents/test \
  -H "Content-Type: application/json" \
  -d '{"query": "test"}'

Issue: Ollama connection failed

# Check Ollama is running
curl http://localhost:11434/api/tags

# Restart Ollama
ollama serve

Issue: Database connection error

# Check PostgreSQL is running
pg_isready

# Check pgvector extension
psql rag_db -c "SELECT * FROM pg_extension WHERE extname='vector';"

See src/config.py for database and connection pool settings.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Ollama for local LLM inference
pgvector for vector similarity search
Flask for web framework
Pydantic for data validation
pytest for testing framework

Support

Source Code: src/
Configuration: src/config.py
Issues: GitHub Issues

Roadmap

Star History

If you find this project useful, please consider giving it a star!

Made with care by the LocalChat Team

Professional RAG application for document-based question answering

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
.github/workflows		.github/workflows
config		config
k8s		k8s
plugins		plugins
scripts		scripts
src		src
static		static
templates		templates
tests		tests
uploads		uploads
.coveragerc		.coveragerc
.gitignore		.gitignore
Dockerfile		Dockerfile
IaCplan.md		IaCplan.md
LICENSE		LICENSE
LocalChat.pyproj		LocalChat.pyproj
LocalChat.slnx		LocalChat.slnx
PrivateInfraPlan.md		PrivateInfraPlan.md
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
install.py		install.py
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
sonar-project.properties		sonar-project.properties
sonar_issues.json		sonar_issues.json

Folders and files

Latest commit

History

Repository files navigation

LocalChat - Professional RAG Application

Project Status

Features

Core Capabilities

Quality Assurance

Security

Performance Features

Table of Contents

Quick Start

Usage

Architecture

System Components

Data Flow

Technology Stack

Documentation

Key Entry Points

API Documentation

Project Structure

Testing

Running Tests

Test Coverage

Current Test Stats

Configuration

Environment Variables

Cache Configuration

Memory Cache (Default)

Redis Cache (Production)

RAG Configuration

Performance Tuning

Database Optimization

Processing Configuration

Monitoring & Observability

Prometheus Metrics

Securing the Scrape Endpoint

Health Check

Admin Dashboard

GPU Acceleration

Development

Setting Up Development Environment

Code Quality Standards

CI/CD & Code Quality

SonarCloud

Development Workflow

Changelog

May 2026

March 2026

January 2026

Contributing

Quick Contribution Guide

Code of Conduct

Project Status (Detailed)

Current Version: 0.5.0

Troubleshooting

Common Issues

License

Acknowledgments

Support

Roadmap

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages