v2 roadmap: BibTeX, Beamer, direct OMML, run-merger, style map, S3/R2 storage, worker autoscale#2
Conversation
…-merger
Implements four shipped-as-planned v2 roadmap items:
- BibTeX: extract Word's managed Sources part into references.bib, resolve each
in-text CITATION field to a \cite{key}, emit \bibliography, and bundle
references.bib in the download zip (new app/converter/bibliography.py).
- Beamer: new 'beamer' template + frame-segmenting renderer (one frame per
H1/H2, allowframebreaks, references frame); skips article geometry.
- Direct OMML->LaTeX converter (app/converter/handlers/omml_direct.py): a
Pandoc-free fallback covering fractions, scripts, radicals, delimiters,
n-ary operators, accents, matrices, Greek + symbol maps. Used when Pandoc is
absent or drops an equation, so equations no longer degrade to placeholders.
- Run-merger (app/converter/run_merger.py): collapse Word's adjacent same-format
run fragmentation so the renderer emits one wrapper per span.
Frontend: Beamer added to the template selector + TemplateChoice union.
Tests: +20 (90 backend pass, ruff clean, tsc/eslint/vitest green).
Completes the v2 roadmap (all 7 items now shipped): - Style mapping (app/converter/style_map.py): optional JSON maps custom Word style names to IR roles (headings/code/lists), loaded via CORETEX_STYLE_MAP or app/style_map.json. Parser consults it for heading/code/list detection; empty config = exact pre-v2 behaviour. Ships app/style_map.example.json. - Figure storage (app/storage.py): FigureStore abstraction. Redis backend is byte-identical to v1 (manifest + raw-byte keys); S3FigureStore offloads to S3/Cloudflare R2/MinIO when FIGURE_STORAGE=s3. Misconfig degrades to Redis rather than failing jobs. Worker + download routed through the store. - Worker autoscale: railway.worker.toml numReplicas=2 — independent RQ workers on the shared queue remove single-worker head-of-line blocking. - Docs: README roadmap marked v2 shipped + citations limitation updated; DEPLOY.md scaling section reflects the implemented S3/R2 + replica paths. - Tests: +18 (102 backend pass, ruff/tsc/eslint green).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6142e68ae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR implements the project’s v2 roadmap items across the conversion pipeline and deployment setup, adding BibTeX/citation support, a Beamer output mode, a Pandoc-free OMML math fallback, run-merging/style-mapping improvements, and optional S3/R2-backed figure storage—while preserving v1 defaults via graceful fallbacks.
Changes:
- Add BibTeX extraction from Word managed sources, resolve in-text citations to
\cite{...}, emit\bibliography{references}, and shipreferences.bibin downloads. - Add a Beamer template and renderer mode that segments headings into frames, plus frontend template selection support.
- Add infra/scaling improvements: figure-store abstraction (Redis default, optional S3/R2) and Railway worker replica configuration.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_v2_features.py | Adds tests for run-merging, BibTeX extraction, and citation rendering behavior. |
| tests/test_style_map.py | Adds tests validating style-map normalization, clamping, and file-loading behavior. |
| tests/test_storage.py | Adds tests for Redis figure-store behavior and S3 misconfig fallback. |
| tests/test_renderer.py | Expands renderer tests to include Beamer and bibliography frame behavior. |
| tests/test_omml_direct.py | Adds tests for direct OMML→LaTeX conversion coverage. |
| README.md | Updates limitations/roadmap documentation to mark v2 items as shipped. |
| railway.worker.toml | Sets worker replica count and documents scaling intent. |
| frontend/src/types.ts | Extends template union type to include beamer. |
| frontend/src/components/TemplateSelector.tsx | Adds “Beamer Slides” option to the template picker. |
| frontend/src/tests/TemplateSelector.test.tsx | Updates UI tests to expect the new template option. |
| DEPLOY.md | Updates scaling guidance to reflect worker replicas and S3 figure offload support. |
| app/templates/beamer.tex.j2 | Introduces a Beamer LaTeX template. |
| app/style_map.example.json | Adds an example style-map configuration file for deployments. |
| app/storage.py | Introduces FigureStore abstraction with Redis + S3 implementations and fallback selection. |
| app/queue/worker.py | Switches figure persistence to FigureStore and threads extracted BibTeX into results/compile-check. |
| app/converter/style_map.py | Implements configurable style mapping and global STYLE_MAP singleton loader. |
| app/converter/run_merger.py | Adds adjacent-run merging to reduce LaTeX verbosity and wrapper fragmentation. |
| app/converter/renderer.py | Adds Beamer template support, frame segmentation rendering, and resolved citation rendering. |
| app/converter/parser.py | Integrates BibTeX extraction, style-map hooks, citation resolution, and run merging during parse. |
| app/converter/ir_schema.py | Extends IR/result schema to carry extracted BibTeX content. |
| app/converter/handlers/omml_direct.py | Adds dependency-free OMML→LaTeX converter implementation. |
| app/converter/handlers/equation_handler.py | Adds direct OMML fallback when Pandoc is unavailable or drops equations. |
| app/converter/compile_check.py | Writes references.bib for compile-check when BibTeX exists. |
| app/converter/bibliography.py | Implements Word Sources-part parsing and BibTeX emission. |
| app/config.py | Adds settings for figure storage backend selection and S3/R2 configuration. |
| app/api/routes.py | Updates template parameter, includes references.bib in zip output, and loads figures via FigureStore. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Implements all seven v2 roadmap items (see README roadmap), each test-covered and behind graceful fallbacks so default behaviour is unchanged.
Converter features
app/converter/bibliography.pyparses Word's managed Sources part intoreferences.bib, resolves each in-textCITATIONfield to a\cite{key}, emits\bibliography{references}, and bundlesreferences.bibin the download zip. Un-managed citations still fall back to plain text.beamertemplate + frame-segmenting renderer (one frame per H1/H2,allowframebreaks, dedicated References frame); omits articlegeometry.app/converter/handlers/omml_direct.py, a Pandoc-free fallback (fractions, scripts, radicals, delimiters, n-ary ops, accents, matrices, Greek/symbol maps). Used when Pandoc is absent or drops an equation, so equations no longer degrade to placeholders.app/converter/run_merger.pycollapses Word's adjacent same-format run fragmentation; one wrapper per span.app/converter/style_map.py+CORETEX_STYLE_MAP/app/style_map.jsonmap custom Word style names to headings/code/lists. Empty config = pre-v2 behaviour.Infra / scaling
app/storage.pyFigureStoreabstraction. Redis backend is byte-identical to v1;S3FigureStoreoffloads to S3/Cloudflare R2/MinIO whenFIGURE_STORAGE=s3, degrading back to Redis on misconfig.railway.worker.tomlnumReplicas=2; independent RQ workers on the shared queue remove single-worker head-of-line blocking.Verification
ruffclean.tsc --noEmit,eslint,vitestall green.