A sophisticated AI-powered invoice tracking system with OCR capabilities, intelligent categorization, and advanced RAG (Retrieval-Augmented Generation) analysis.
- 📄 OCR Processing: Extract data from invoice images using Google Vision API
- 🤖 AI Categorization: Automatic invoice categorization using Gemini AI
- 🔍 Advanced RAG: Chain-of-thought reasoning for intelligent spending analysis
- 📊 Vector Search: ChromaDB integration for semantic invoice search
- 🎨 Modern UI: Clean Gradio interface with multiple tabs
- 🔄 Duplicate Detection: Smart duplicate removal with Hebrew/multilingual support
- 📈 Analytics: Comprehensive spending insights and trends
- 💾 Database: Persistent storage with ChromaDB vector database
- Backend: Python 3.8+
- AI/ML: Google Gemini, Google Vision API
- Database: ChromaDB (Vector Database)
- Frontend: Gradio
- OCR: Google Cloud Vision API
- Environment: python-dotenv
-
Clone the repository
git clone https://github.com/yourusername/invoice-tracker.git cd invoice-tracker -
Install dependencies
pip install -r requirements.txt
-
Set up environment variables Create a
.envfile in thesrcfolder:GEMINI_API_KEY=your_gemini_api_key_here GOOGLE_CLOUD_PROJECT=your_google_cloud_project_id
-
Get API Keys
- Gemini API: Get from Google AI Studio
- Google Cloud Vision: Set up at Google Cloud Console
-
Start the application
cd src python main.py -
Open your browser to the displayed URL (usually
http://localhost:7860) -
Upload invoices via the "📄 Upload Invoices" tab
-
Ask questions using the "🤖 AI Insights" tab with natural language:
- "What's my total spending this month?"
- "Which vendor do I spend the most on?"
- "Show me my grocery expenses"
invoice-tracker/
├── src/
│ ├── main.py # Application entry point
│ ├── services/
│ │ ├── ocr_service.py # OCR processing
│ │ ├── categorization_service.py # AI categorization
│ │ ├── rag_service.py # RAG with chain-of-thought
│ │ └── chroma_service.py # ChromaDB operations
│ ├── ui/
│ │ └── gradio_interface.py # Web interface
│ ├── utils/
│ │ ├── invoice_processor.py # Invoice processing
│ │ └── cleanup_duplicates.py # Duplicate detection
│ ├── data/
│ │ ├── invoices.json # Invoice storage
│ │ └── chroma_db/ # Vector database
│ └── .env # Environment variables (not in repo)
├── requirements.txt # Dependencies
├── .gitignore # Git ignore rules
└── README.md # This file
- Extracts text from invoice images
- Identifies vendors, amounts, dates, invoice numbers
- Supports multiple image formats (PNG, JPG, PDF)
- Automatic spending category assignment
- Smart vendor recognition
- Context-aware categorization
- Chain-of-Thought Reasoning: Structured 4-step analysis
- Vector Similarity Search: Find relevant invoices using ChromaDB
- Enhanced Context: Combines statistics with query-specific data
- Natural Language Queries: Ask questions in plain English
- Total spending analysis
- Category breakdowns with percentages
- Monthly/yearly trends
- Top vendor identification
- Spending pattern insights
- Multi-criteria duplicate detection
- Hebrew/multilingual support
- Semantic vendor matching
- Date format normalization
- Invoice number matching
The RAG system follows a structured reasoning approach:
-
Understanding the Question
- Analyze what information is needed
- Identify relevant time periods and categories
-
Data Analysis Approach
- Select relevant data points
- Plan necessary calculations
-
Calculations & Insights
- Perform step-by-step analysis
- Identify patterns and anomalies
-
Final Answer
- Provide clear, actionable insights
- Include specific numbers and recommendations
- Go to Google AI Studio
- Create a new API key
- Add to
.envfile asGEMINI_API_KEY
- Create project at Google Cloud Console
- Enable Vision API
- Create service account and download JSON key
- Set
GOOGLE_APPLICATION_CREDENTIALSenvironment variable
- Total Spending: "What's my total spending this year?"
- Category Analysis: "How much do I spend on groceries vs restaurants?"
- Vendor Analysis: "Which vendor do I spend the most money on?"
- Trends: "Show me my spending trends over the last 6 months"
- Specific Items: "Find all invoices for technology purchases"
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google AI for Gemini API
- ChromaDB for vector database capabilities
- Gradio for the web interface
- Google Cloud Vision for OCR capabilities
- RAG not working: Check GEMINI_API_KEY in .env file
- OCR failing: Verify Google Cloud credentials
- ChromaDB errors: Ensure chromadb package is installed
- Import errors: Check Python path and package installations
If you encounter issues:
- Check the console output for error messages
- Verify all environment variables are set correctly
- Ensure all dependencies are installed
- Check API key permissions and quotas