Multimodal RAG Pipeline

Text–Image Retrieval-Augmented Generation System

A production-oriented multimodal RAG pipeline enabling semantic search and question answering over documents and images by combining vector retrieval, distributed backend services, and multimodal LLMs.

Built with FastAPI, PostgreSQL + pgvector, Amazon S3, Dockerized services, and Angular + React micro-frontends.

Key Capabilities

Micro-frontend UI: Angular + React composed via single-spa, routed through JWT-secured gateways
Backend services: FastAPI REST APIs for ingestion, retrieval, and query orchestration
Vector retrieval:
- CLIP (HuggingFace) embeddings for text and images
- pgvector-backed similarity search with ANN indexes (IVF / HNSW)
- OCR-based enrichment for screenshots and diagrams
Object storage: Images stored in Amazon S3, referenced via metadata and embeddings
RAG orchestration: Retrieved image URLs, OCR text, and metadata composed into grounded LLM prompts

End-to-End Flow

Ingest – Upload documents/images → store binaries in S3, metadata in PostgreSQL
Embed – Generate CLIP embeddings; optionally enrich with OCR/captions
Index – Persist vectors in pgvector with ANN indexing
Retrieve – Embed query → ANN search → hybrid retrieval over images + text
Compose – Assemble multimodal or text-only context
Generate – LLM responds with citations grounded in retrieved data

Tech Stack

Frontend: Angular, React, micro-frontends (single-spa) Backend: FastAPI, REST APIs, JWT auth Data: PostgreSQL, pgvector, Amazon S3 Search: ANN (IVF / HNSW) ML: CLIP embeddings, OCR pipelines, multimodal LLMs (GPT-4o / GPT-4o-mini) Infra: Docker, docker-compose, Nginx

Running Locally

docker compose -f docker-compose.dev.yml up --build

Frontend root: http://localhost:9000
Backend API: http://localhost:8000

Secure User Provisioning

cd python-backend
CREATE_DEMO_USER=false DATABASE_URL=... JWT_SECRET=... \
python -m app.scripts.create_admin_user --username <admin> --password <password>

Passwords are hashed server-side and never logged.

Design Goals

Scalable retrieval over mixed media
Clean separation of storage, retrieval, and reasoning
Extensible foundation for future RAG and multimodal systems

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.husky		.husky
angular-app		angular-app
dashboard		dashboard
db-init		db-init
python-backend		python-backend
react-app		react-app
src		src
.dockerignore		.dockerignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.prettierignore		.prettierignore
Dockerfile		Dockerfile
README.md		README.md
babel.config.json		babel.config.json
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
importmap.json		importmap.json
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
start-servers.sh		start-servers.sh
tsconfig.json		tsconfig.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal RAG Pipeline

Key Capabilities

End-to-End Flow

Tech Stack

Running Locally

Secure User Provisioning

Design Goals

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal RAG Pipeline

Key Capabilities

End-to-End Flow

Tech Stack

Running Locally

Secure User Provisioning

Design Goals

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages