ragged

A locally-hosted Retrieval Augmented Generation (RAG) system for processing and querying PDF documents.

Overview

This project creates a web-based interface for document analysis using:

Local LLM inference with quantized GGUF models
Document chunking and embedding using sentence transformers
Semantic search via k-NN retrieval
Interactive UI with Gradio

Usage

Start the application: python ui.py
Upload PDF documents and click "Process Documents"
Ask questions in the chat interface
View responses based on document content

Project Roadmap

Implementation Details

Tech Stack

LLM: DeepSeek-R1-Distill-Qwen-7B quantized via GGUF
Embedding Model: NovaSearch/stella_en_400M_v5 via SentenceTransformers
Document Processing: PyPDF for text extraction
Vector Search: Scikit-learn NearestNeighbors
Interface: Gradio web UI to upload documents and chat through chatbot interface
GPU Acceleration: Optional GPU offloading for LLM inference

Document Processing Pipeline

Parsing: Extract text from PDFs using PyPDF
Chunking: Split text into manageable chunks (currently 256 words)
Embedding: Generate vector representations using SentenceTransformer
Storage: Save embeddings and text chunks to numpy arrays

Retrieval and Generation

Query Embedding: Convert user query to vector representation
k-NN Search: Find most similar document chunks
Context Construction: Combine relevant chunks into prompt context
LLM Generation: Generate response with context-enhanced prompt

Development Log

Week - 2/24/2025

Built GUI using Gradio
- Implemented file upload and chat interface
- Created processing workflow for documents
- Updated embedding pipeline
  - 256-word chunks with SentenceTransformer (number of tokens probably higher)
  - Numpy storage for embeddings and text chunks

Week - 2/17/2025

Implemented PDF parsing + embedding + LLM prompting
Testing observations:
- Specific queries perform better than general ones
- R1 distilled models' thinking steps seem to improve responses
- 7B model with Q4/Q5 quantization offers good performance
- n_gpu_layers parameter controls GPU offloading
- Successfully tested on AMD with Vulkan

Technical Notes

RAG Implementation

Query-based search of precomputed document embeddings
Context chunks concatenated with query for LLM generation
Using STELLA model for high-quality embeddings

Text Processing Challenges

Current chunking can split sentences at hard boundaries
Potential improvements:
- Respecting paragraph/sentence boundaries
- Using sliding windows for better coverage
- Adding summarization of larger contexts

LLM Configuration

Using DeepSeek-R1-Distill-Qwen quantized models
Q4/Q5 quantization offers good performance/quality balance
Loaded via llama-cpp-python for efficient inference
Working with 65K context window

Embedding Strategy

Sentence-transformers for vector generation
k-NN for similarity search
Embedding stored in numpy arrays for simplicity
Future work: persistent vector storage

Requirements

Python 3.8+
llama-cpp-python (optionally compiled with GPU support)
SentenceTransformers
NumPy
Gradio
Torch

Future Directions

Support for more document types (DOCX, HTML, TXT)
Persistent vector database integration
Improved chunking with semantic boundaries
Reranking for better retrieval precision
User-selectable LLM models
Metadata-aware retrieval

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
ragged		ragged
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
pyproject.toml		pyproject.toml
run.py		run.py
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ragged

Overview

Usage

Project Roadmap

Implementation Details

Tech Stack

Document Processing Pipeline

Retrieval and Generation

Development Log

Week - 2/24/2025

Week - 2/17/2025

Technical Notes

RAG Implementation

Text Processing Challenges

LLM Configuration

Embedding Strategy

Requirements

Future Directions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ragged

Overview

Usage

Project Roadmap

Implementation Details

Tech Stack

Document Processing Pipeline

Retrieval and Generation

Development Log

Week - 2/24/2025

Week - 2/17/2025

Technical Notes

RAG Implementation

Text Processing Challenges

LLM Configuration

Embedding Strategy

Requirements

Future Directions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages