Skip to content

bhavya-x/LegalLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚖️ LegalLens

AI-powered legal document analysis — upload a contract, get summaries, and ask questions in your language.

Python Flask Pinecone Haystack MongoDB


Overview

LegalLens makes legal documents understandable. Upload a PDF, scanned image, or text file and the system extracts text, translates if needed, summarises long passages, and indexes the content for retrieval. You can then chat with the document in your preferred language and get structured, source-backed answers.

Pipeline

Upload → OCR / pdfplumber → Language detect & translate
       → BigBird-Pegasus summarisation
       → Embeddings → Pinecone (retrieval)
       → User question → m2m100 translate
       → Haystack EmbeddingRetriever → Pinecone
       → Seq2SeqGenerator (BART LFQA) → Answer
       → MongoDB chat history → translate back to user language

Features

Capability Implementation
Document ingestion PDFs, scanned images, plain text
OCR Tesseract.js for scanned legal docs
Translation Facebook m2m100 (multi-direction)
Summarisation BigBird-Pegasus with chunking
Vector search Pinecone (us-west4-gcp-free, cosine, 768d)
QA generation Haystack Seq2SeqGenerator + vblagoje/bart_lfqa
Context retention MongoDB conversation history
UI React/Vue web frontend served via Flask

Tech Stack

  • Backend: Flask + Haystack + HuggingFace Transformers
  • NLP models: BERT (length tokeniser), BART LFQA (generation), m2m100 (translation), BigBird-Pegasus (summarisation), flax-sentence-embeddings/all_datasets_v3_mpnet-base (embeddings)
  • Storage: Pinecone (vectors), MongoDB (chat history)
  • Frontend: Web app under webapp/

Getting Started

git clone https://github.com/bhavya-x/LegalLens.git
cd LegalLens
pip install -r requirements.txt   # if present
# Configure Pinecone + MongoDB credentials in includes/dependencies.py
python server.py

Repository Layout

LegalLens/
├── server.py              # Flask app + Haystack pipeline
├── includes/              # Shared dependencies and helpers
├── data/                  # Sample legal datasets
├── models/                # Model artefacts / configs
├── webapp/                # Frontend
├── plots/                 # Architecture diagrams
└── readme.md              # Original project notes

Screenshots

Landing Summary Chat
Landing Summary Chat

About

AI-powered legal document analysis — OCR, summarization (BigBird-Pegasus), and retrieval-augmented Q&A over Pinecone with multilingual support.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors