AI-powered legal document analysis — upload a contract, get summaries, and ask questions in your language.
LegalLens makes legal documents understandable. Upload a PDF, scanned image, or text file and the system extracts text, translates if needed, summarises long passages, and indexes the content for retrieval. You can then chat with the document in your preferred language and get structured, source-backed answers.
Upload → OCR / pdfplumber → Language detect & translate
→ BigBird-Pegasus summarisation
→ Embeddings → Pinecone (retrieval)
→ User question → m2m100 translate
→ Haystack EmbeddingRetriever → Pinecone
→ Seq2SeqGenerator (BART LFQA) → Answer
→ MongoDB chat history → translate back to user language
| Capability | Implementation |
|---|---|
| Document ingestion | PDFs, scanned images, plain text |
| OCR | Tesseract.js for scanned legal docs |
| Translation | Facebook m2m100 (multi-direction) |
| Summarisation | BigBird-Pegasus with chunking |
| Vector search | Pinecone (us-west4-gcp-free, cosine, 768d) |
| QA generation | Haystack Seq2SeqGenerator + vblagoje/bart_lfqa |
| Context retention | MongoDB conversation history |
| UI | React/Vue web frontend served via Flask |
- Backend: Flask + Haystack + HuggingFace Transformers
- NLP models: BERT (length tokeniser), BART LFQA (generation), m2m100 (translation), BigBird-Pegasus (summarisation),
flax-sentence-embeddings/all_datasets_v3_mpnet-base(embeddings) - Storage: Pinecone (vectors), MongoDB (chat history)
- Frontend: Web app under
webapp/
git clone https://github.com/bhavya-x/LegalLens.git
cd LegalLens
pip install -r requirements.txt # if present
# Configure Pinecone + MongoDB credentials in includes/dependencies.py
python server.pyLegalLens/
├── server.py # Flask app + Haystack pipeline
├── includes/ # Shared dependencies and helpers
├── data/ # Sample legal datasets
├── models/ # Model artefacts / configs
├── webapp/ # Frontend
├── plots/ # Architecture diagrams
└── readme.md # Original project notes
| Landing | Summary | Chat |
|---|---|---|
![]() |
![]() |
![]() |


