Skip to content

v2 roadmap: BibTeX, Beamer, direct OMML, run-merger, style map, S3/R2 storage, worker autoscale#2

Merged
TheClazer merged 2 commits into
mainfrom
feature/v2-roadmap
Jun 10, 2026
Merged

v2 roadmap: BibTeX, Beamer, direct OMML, run-merger, style map, S3/R2 storage, worker autoscale#2
TheClazer merged 2 commits into
mainfrom
feature/v2-roadmap

Conversation

@TheClazer

Copy link
Copy Markdown
Owner

Implements all seven v2 roadmap items (see README roadmap), each test-covered and behind graceful fallbacks so default behaviour is unchanged.

Converter features

  • Full BibTeX extractionapp/converter/bibliography.py parses Word's managed Sources part into references.bib, resolves each in-text CITATION field to a \cite{key}, emits \bibliography{references}, and bundles references.bib in the download zip. Un-managed citations still fall back to plain text.
  • Beamer slides template — new beamer template + frame-segmenting renderer (one frame per H1/H2, allowframebreaks, dedicated References frame); omits article geometry.
  • Direct OMML→LaTeX parserapp/converter/handlers/omml_direct.py, a Pandoc-free fallback (fractions, scripts, radicals, delimiters, n-ary ops, accents, matrices, Greek/symbol maps). Used when Pandoc is absent or drops an equation, so equations no longer degrade to placeholders.
  • Run-mergerapp/converter/run_merger.py collapses Word's adjacent same-format run fragmentation; one wrapper per span.
  • Style mapping configapp/converter/style_map.py + CORETEX_STYLE_MAP / app/style_map.json map custom Word style names to headings/code/lists. Empty config = pre-v2 behaviour.

Infra / scaling

  • S3 / R2 figure storageapp/storage.py FigureStore abstraction. Redis backend is byte-identical to v1; S3FigureStore offloads to S3/Cloudflare R2/MinIO when FIGURE_STORAGE=s3, degrading back to Redis on misconfig.
  • Worker autoscalerailway.worker.toml numReplicas=2; independent RQ workers on the shared queue remove single-worker head-of-line blocking.

Verification

  • Backend: 102 passed (was 70), ruff clean.
  • Frontend: tsc --noEmit, eslint, vitest all green.
  • Docs: README roadmap marked v2 shipped; DEPLOY.md scaling section updated.

…-merger

Implements four shipped-as-planned v2 roadmap items:

- BibTeX: extract Word's managed Sources part into references.bib, resolve each
  in-text CITATION field to a \cite{key}, emit \bibliography, and bundle
  references.bib in the download zip (new app/converter/bibliography.py).
- Beamer: new 'beamer' template + frame-segmenting renderer (one frame per
  H1/H2, allowframebreaks, references frame); skips article geometry.
- Direct OMML->LaTeX converter (app/converter/handlers/omml_direct.py): a
  Pandoc-free fallback covering fractions, scripts, radicals, delimiters,
  n-ary operators, accents, matrices, Greek + symbol maps. Used when Pandoc is
  absent or drops an equation, so equations no longer degrade to placeholders.
- Run-merger (app/converter/run_merger.py): collapse Word's adjacent same-format
  run fragmentation so the renderer emits one wrapper per span.

Frontend: Beamer added to the template selector + TemplateChoice union.
Tests: +20 (90 backend pass, ruff clean, tsc/eslint/vitest green).
Completes the v2 roadmap (all 7 items now shipped):

- Style mapping (app/converter/style_map.py): optional JSON maps custom Word
  style names to IR roles (headings/code/lists), loaded via CORETEX_STYLE_MAP
  or app/style_map.json. Parser consults it for heading/code/list detection;
  empty config = exact pre-v2 behaviour. Ships app/style_map.example.json.
- Figure storage (app/storage.py): FigureStore abstraction. Redis backend is
  byte-identical to v1 (manifest + raw-byte keys); S3FigureStore offloads to
  S3/Cloudflare R2/MinIO when FIGURE_STORAGE=s3. Misconfig degrades to Redis
  rather than failing jobs. Worker + download routed through the store.
- Worker autoscale: railway.worker.toml numReplicas=2 — independent RQ workers
  on the shared queue remove single-worker head-of-line blocking.
- Docs: README roadmap marked v2 shipped + citations limitation updated;
  DEPLOY.md scaling section reflects the implemented S3/R2 + replica paths.
- Tests: +18 (102 backend pass, ruff/tsc/eslint green).
Copilot AI review requested due to automatic review settings June 9, 2026 22:59
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
core-tex Ready Ready Preview, Comment Jun 9, 2026 10:59pm

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6142e68ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/queue/worker.py
Comment thread app/storage.py
Comment thread app/converter/bibliography.py

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the project’s v2 roadmap items across the conversion pipeline and deployment setup, adding BibTeX/citation support, a Beamer output mode, a Pandoc-free OMML math fallback, run-merging/style-mapping improvements, and optional S3/R2-backed figure storage—while preserving v1 defaults via graceful fallbacks.

Changes:

  • Add BibTeX extraction from Word managed sources, resolve in-text citations to \cite{...}, emit \bibliography{references}, and ship references.bib in downloads.
  • Add a Beamer template and renderer mode that segments headings into frames, plus frontend template selection support.
  • Add infra/scaling improvements: figure-store abstraction (Redis default, optional S3/R2) and Railway worker replica configuration.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_v2_features.py Adds tests for run-merging, BibTeX extraction, and citation rendering behavior.
tests/test_style_map.py Adds tests validating style-map normalization, clamping, and file-loading behavior.
tests/test_storage.py Adds tests for Redis figure-store behavior and S3 misconfig fallback.
tests/test_renderer.py Expands renderer tests to include Beamer and bibliography frame behavior.
tests/test_omml_direct.py Adds tests for direct OMML→LaTeX conversion coverage.
README.md Updates limitations/roadmap documentation to mark v2 items as shipped.
railway.worker.toml Sets worker replica count and documents scaling intent.
frontend/src/types.ts Extends template union type to include beamer.
frontend/src/components/TemplateSelector.tsx Adds “Beamer Slides” option to the template picker.
frontend/src/tests/TemplateSelector.test.tsx Updates UI tests to expect the new template option.
DEPLOY.md Updates scaling guidance to reflect worker replicas and S3 figure offload support.
app/templates/beamer.tex.j2 Introduces a Beamer LaTeX template.
app/style_map.example.json Adds an example style-map configuration file for deployments.
app/storage.py Introduces FigureStore abstraction with Redis + S3 implementations and fallback selection.
app/queue/worker.py Switches figure persistence to FigureStore and threads extracted BibTeX into results/compile-check.
app/converter/style_map.py Implements configurable style mapping and global STYLE_MAP singleton loader.
app/converter/run_merger.py Adds adjacent-run merging to reduce LaTeX verbosity and wrapper fragmentation.
app/converter/renderer.py Adds Beamer template support, frame segmentation rendering, and resolved citation rendering.
app/converter/parser.py Integrates BibTeX extraction, style-map hooks, citation resolution, and run merging during parse.
app/converter/ir_schema.py Extends IR/result schema to carry extracted BibTeX content.
app/converter/handlers/omml_direct.py Adds dependency-free OMML→LaTeX converter implementation.
app/converter/handlers/equation_handler.py Adds direct OMML fallback when Pandoc is unavailable or drops equations.
app/converter/compile_check.py Writes references.bib for compile-check when BibTeX exists.
app/converter/bibliography.py Implements Word Sources-part parsing and BibTeX emission.
app/config.py Adds settings for figure storage backend selection and S3/R2 configuration.
app/api/routes.py Updates template parameter, includes references.bib in zip output, and loads figures via FigureStore.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/storage.py
Comment thread app/converter/style_map.py
@TheClazer TheClazer merged commit f6142e6 into main Jun 10, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants