This repository contains both the Flutter frontend and the FastAPI backend for a Clustering-Based Mathematical Information Retrieval (MathIR) system.
It supports LaTeX/MathML search, cross-platform rendering, and fast retrieval of mathematical documents.
- @ak0586
- @chirag0521
- @er-vishalgour
- @rajanks
- π Search LaTeX or math expressions
- π Render HTML with MathML on Android and Web
- π‘ WebView for Android, iframe + MathJax for Web
- β± Show response time and result count
- β Clean UI with animation, loading indicators, and error handling
- β‘ Cluster-based approximate nearest neighbor (ANN) search
- π’ MiniBatchKMeans with Hamming distance for binary bit-vectors
- π Preprocessing of HTML to extract MathML & LaTeX
- π Scalable and optimized for large datasets
We have recently optimized and refactored the system to improve performance, cross-platform compatibility, and deployment reliability:
- B2 Private Streaming: To prevent exposing public URLs and avoid CDN/WAF blocks, the backend now fetches HTML documents privately from Backblaze B2 or local storage and base64-encodes them.
- Multiplexed POST Route: Configured the client to retrieve documents via a
POSTrequest to/searchusing the__VIEW__:{session_id}:{file_id}query format. This bypasses Hugging Face Space proxy blocks on non-root paths and Backblaze B2's400 POST not allowedCORS restrictions on redirect requests.
- Shadowing Compiler Error Fixed: Resolved a critical compilation error in
mobile_html_viewer.dartwhere the local variable shadowed the state field. The frontend now compiles with 0 errors. - MathJax Integration: Enhanced
web_html_viewer.dartandmobile_html_viewer.dartwith MathJax embedding inside dynamic iframes/webviews to guarantee crisp mathematical rendering of equations.
- Auto-URL Switching: Modified
getBaseUrl()to automatically detect local debugging (kDebugMode) and point tohttp://localhost:8000(orhttp://10.0.2.2:8000for Android Emulator) during testing, falling back automatically to the Hugging Face production Space in release builds.
- Combined Database Indexing: Removed all legacy pickle files (
.pkl) and consolidated model centroids, hyperparameters, preprocess tracking, and cluster assignments into a single SQLite database (math_index.db). - Vectorized Hamming Distance: Vectorized loop calculations inside the K-Means Hamming distance logic using NumPy broadcasting, yielding a 65.5x performance gain.
- Parallel Preprocessing: Implemented multi-process preprocessing with a fast regex scanner to skip HTML files lacking mathematical tags, speeding up initial indexing.
π¦ math-ir-system
β£ π frontend
β β£ π main.dart # Main search UI and routing logic
β β£ π mobile_html_viewer.dart # WebView-based HTML renderer for Android
β β£ π web_html_viewer.dart # iframe-based HTML renderer for Web
β β π pubspec.yaml # Flutter dependencies
β
β£ π backend
β β£ π MIR_model
β β β£ π cluster_index.py # Handles cluster index loading and searching
β β β£ π clustering_phase.py # Performs clustering on bit-vector data
β β β£ π driver_clustering.py # Triggers clustering and index creation
β β β£ π driver_preprocessing.py # Preprocesses HTML documents
β β β£ π hamming_mini_batch_kmeans.py # MiniBatchKMeans adapted for Hamming distance
β β β£ π preprocessing.py # Extracts MathML & LaTeX, generates bit-vectors
β β β£ π query_processing.py # Identifies query type and processes
β β β£ π query_to_bitvector.py # Converts LaTeX β MathML β bit-vector
β β β£ π search_query.py # Main search execution logic
β β£ π main.py # FastAPI entry point
β β£ π requirements.txt # contains all required library and modules to be install
β β π math_index_storage # Stores models & clustering indices
β π README.md
If you want to directly copy the latex from the json file then please replace the '//' with '/' and '////' with '//' for perform searching on the UI. but queries given are already UI compatible no need make changes.
Mathematical Queries that are provided by NTCIR-12: http://ntcir-math.nii.ac.jp/
- Raw LaTeX β shows the exact LaTeX code.
- Rendered β shows how it will appear on GitHub.
Raw LaTeX:
-0.026838601\ldots
Rendered:
Raw LaTeX:
\mathfrak{P}
Rendered:
Raw LaTeX:
N = \left\lfloor 0.5 - \log_{2} \left(\frac{\text{Frequency of this item}}{\text{Frequency of most common item}}\right) \right\rfloor
Rendered:
Raw LaTeX:
\nabla \times \mathbf{B} = \mu_{0} \mathbf{J} +\underbrace{\mu_{0}\epsilon_{0} \frac{\partial}{\partial t}\mathbf{E}}_{\text{Maxwell's term}}
Rendered:
Raw LaTeX:
1 + \cfrac{1}{2 + \cfrac{1}{5 + \cfrac{1}{5 + \cfrac{1}{4 + \ddots}}}}
Rendered:
Raw LaTeX:
^{238}{92}\mathrm{U} + ^{64}{28}\mathrm{Ni} ;\rightarrow;^{302}_{120}\mathrm{Ubn}^{*} ;\rightarrow; \textit{fission only}
Rendered:
Raw LaTeX:
0 \rightarrow G^\wedge \xrightarrow{\pi^\wedge} X^\wedge \xrightarrow{i^\wedge} H^\wedge \rightarrow 0
Rendered:
Raw LaTeX:
w = \begin{cases} w^* & \text{if } w^* > \frac{1}{2}, \ \frac{1}{2} & \text{if } w^* \le \frac{1}{2}. \end{cases}
Rendered:
Raw LaTeX:
\begin{bmatrix} V_1 \ I_2 \end{bmatrix} = \begin{bmatrix} h_{11} & h_{12} \ h_{21} & h_{22} \end{bmatrix} \begin{bmatrix} I_1 \ V_2 \end{bmatrix}
Rendered:
Raw LaTeX:
L(\lambda, \alpha, s) = \sum_{n=0}^{\infty} \frac{\exp(2\pi i\lambda n)}{(n+\alpha)^s}.
Rendered:
Raw LaTeX:
ax^{2} + bx + c = 0
Rendered:
Raw LaTeX:
O(mn \log m)
Rendered:
Raw LaTeX:
A \oplus B = (A^c \ominus B^s)^c
Rendered:
Raw LaTeX:
\cos \alpha = -\cos \beta \cos \gamma + \sin \beta \sin \gamma \cosh \frac{a}{k},
Rendered:
Raw LaTeX:
\forall x, y \in A \ [x \neq y \rightarrow \neg \exists z \in X \ [z \leq x \wedge z \leq y]]
Rendered:
Raw LaTeX:
\tau_{\text{rms}} = \sqrt{\frac{\int_{0}^{\infty} (\tau - \bar{\tau})^{2} A_c(\tau), d\tau}{\int_{0}^{\infty} A_c(\tau), d\tau}}
Rendered:
Raw LaTeX:
x - 1 - \frac{1}{2} - \frac{1}{4} - \frac{1}{5} - \frac{1}{6} - \frac{1}{9} - \cdots = 1
Rendered:
Raw LaTeX:
P(x_{i}) = \frac{N!}{n_{x}!(N-n_{x})!} p^{n_{x}}{x} (1-p{x})^{N-n_{x}}
Rendered:
Raw LaTeX:
H_{ij} = \begin{bmatrix} \frac{\partial^{2}V_{ij}}{\partial x_{i}\partial x_{j}} & \frac{\partial^{2}V_{ij}}{\partial x_{i}\partial y_{j}} & \frac{\partial^{2}V_{ij}}{\partial x_{i}\partial z_{j}} \ \frac{\partial^{2}V_{ij}}{\partial y_{i}\partial x_{j}} & \frac{\partial^{2}V_{ij}}{\partial y_{i}\partial y_{j}} & \frac{\partial^{2}V_{ij}}{\partial y_{i}\partial z_{j}} \ \frac{\partial^{2}V_{ij}}{\partial z_{i}\partial x_{j}} & \frac{\partial^{2}V_{ij}}{\partial z_{i}\partial y_{j}} & \frac{\partial^{2}V_{ij}}{\partial z_{i}\partial z_{j}} \end{bmatrix}
Rendered:
Raw LaTeX:
r_{xy} = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{(n-1) s_x s_y} = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2 \sum_{i=1}^{n} (y_i - \bar{y})^2}},
Rendered:
- Extract MathML/LaTeX from HTML documents.
- Convert to binary bit-vector representation.
- Apply MiniBatchKMeans clustering with Hamming distance.
- Store cluster indices for fast lookup.
- User submits query (
LaTeXor plain text). - Query is processed into a bit-vector.
- Nearest clusters are found.
- Retrieve and rank top results.
cd frontend
flutter run -d chrome # For Web
dart pub get # Install dependencies
flutter run -d emulator-5554 # For Androidcd backend
pip install -r requirements.txt
uvicorn main:app --reloadRuns at http://127.0.0.1:8000
{
"query": "\\frac{a}{b} + c^2"
}Response:
{ "session_id" : "6bcf8ceb-0a51-416b-b52c-78eac5c955c9",
"time_taken_in_second": 0.25,
"results": [
{ "id": "1", "filename": "doc1.html" },
{ "id": "2", "filename": "doc2.html" }
]
}Returns HTML content with MathML.
Author: Ankit Kumar, Chirag Sarda, Vishal Gaur, Rajan Kumar Singh Email: [email protected], [email protected], [email protected].
This repository is for demonstration purposes only.
You may:
- β View the code
- β Access the demo
You may not:
- β Copy, clone, or reuse the code
- β Modify or distribute any part of this project
Violations will be prosecuted under copyright law.
π― Designed for cross-platform simplicity and full MathML compatibility.
