feat: add Google OAuth login UI and fix auth proxy#1
Open
Dia-Arora wants to merge 8 commits into
Open
Conversation
Performance
- equation_handler: batch every equation in a document through ONE Pandoc
subprocess instead of N. A 150-equation paper goes from ~60 s (150
process boots) to ~0.6 s. Sentinel paragraphs around each OMML block
let us split the batched LaTeX output back into per-equation chunks.
Security: image decompression bomb
- Set Image.MAX_IMAGE_PIXELS = 40_000_000 (~40 MP) and upgrade Pillow's
DecompressionBombWarning to an error. A 2 MB zip-compressed
50000x50000 PNG bomb now raises before allocating ~7.5 GB of RAM.
- Add img.verify() pass before resampling so malformed headers fail
early rather than partway through processing.
Security: pdflatex hardening
- Explicit -no-shell-escape flag (default but cheap to be explicit).
- Set openin_any=p and openout_any=p env vars so pdflatex can only read
and write files in the current working dir, even if the .tex source
attempts \input{/etc/passwd}.
- Drop timeout from 60 s to 30 s — tight enough to bound a TeX macro
loop, loose enough for a real paper compile.
Security: drop pickle from Redis
- Figures were stored as pickle.dumps({filename: bytes}) under one key.
Replaced with per-file raw-byte keys (figures:{job_id}:f:{name}) plus
a newline-delimited manifest. Eliminates Python-specific
deserialisation risk and avoids the single-blob bottleneck.
Correctness: image fidelity
- Preserve EXIF orientation and ICC colour profile through Pillow
recompression. A photo with a landscape orientation tag no longer
renders rotated; a CMYK medical scan keeps its colour intent.
Correctness: i18n list detection
- Parser's style-name fallback now covers English, French, German,
Spanish, Italian, Portuguese list-style names. As a final probe,
walks the style chain via the styles part to find an inherited
w:numPr — locale-independent, catches custom-named list styles.
Docs
- DEPLOY.md: new 'Scaling constraints' section spells out the v1
trade-offs (single worker, Redis-staged figures, upload memory
duplication, TeX Live image size) plus the upgrade paths.
…ecurity
- Bump test count badge: 48 → 60
- Comparison table: add Unicode math row, i18n list row, bomb-hardening
row; update equation entry to (batched); 20-page paper conversion
time: ~8 s → ~3 s
- Features grid: 140 unicode math glyphs, batched Pandoc, EXIF/ICC
preserved, i18n lists. Production-grade card: streaming upload,
Pillow bomb cap, pdflatex sandbox flags, no-pickle Redis, explicit
CORS allowlist
- Smart preamble: mention amssymb auto-injection for unicode math
- Sequence diagram: updated to reflect batched Pandoc + sandbox flags
+ per-file figure keys (replacing pickled dict)
- API table: /temp/{id}.tex/.zip aliases now documented
- Architectural decisions: 2 new collapsible sections covering the
unicode-math-at-escape strategy, single-Pandoc batching rationale,
and the no-pickle Redis design
- Known limitations: split into Content (unchanged) + Scaling
(deliberate v1 trade-offs with upgrade paths to S3 / Pro / etc.)
- Roadmap: new 'v1.1 (shipped — hardening)' section with each piece
of recent work; v2 timeline shifted to reflect the work already done
10 forensic-audit findings addressed.
Security
- parser: enforce uncompressed-size caps before lxml touches document.xml.
Refuses zips whose declared payload exceeds 200 MB total / 100 MB per
entry / 5000 entries. A 20 MB upload could otherwise expand to 20 GB
(1026x ratio observed) and OOM-kill the worker.
- rate_limit: new shared Limiter module that wires SlowAPI to Redis when
REDIS_URL is present. Previously two independent in-memory Limiters
existed in main.py and routes.py, and the active one didn't survive
process restarts or scale across uvicorn workers.
- routes: switch UUID4 to secrets.token_urlsafe(16) for the job-ID
generator. UUID4 was already CSPRNG-backed; this is intent-revealing.
Correctness — escape layer
- Combining-diacritic regex now matches any non-control base character,
not just [A-Za-z0-9]. Greek + tilde (α̃) now produces
\tilde{\alpha}, not \alpha + bare U+0303 (which crashed pdflatex).
- Apply ALL combining marks on a base char, nesting in source order.
X with tilde and circumflex yields \hat{\tilde{X}} (was: tilde wrapped,
circumflex dropped).
- When the base is itself a math glyph (Greek, blackboard), expand inline
before wrapping so the subsequent math-glyph pass doesn't double-wrap.
Correctness — parser
- vMerge row_span: count consecutive <w:vMerge/> continuation cells in
the same column position (OOXML §17.4.84) instead of hardcoding 2.
- Detect <w:pageBreakBefore/> in paragraph properties as a leading
\newpage. Catches deliberate chapter breaks in dissertations.
- Detect body-level <w:sectPr> with nextPage/oddPage/evenPage and emit
PageBreakNode (was silently ignored).
- r:link external images: emit a warning naming the broken URL so the
user can re-insert as embedded, instead of silently dropping the figure.
Correctness — renderer
- table_handler: switch columns with > 40-char cell text to p{width}
blocks so long-text cells wrap inside the page width instead of running
off the right edge.
Correctness — API
- /temp race condition fixed by folding the type tag into the value as a
4-byte prefix (ZIP: / TEX:). A single Redis GET now returns both content
and type atomically — previously two sequential GETs created a 1-tick
window where TTL expiry between them caused a UnicodeDecodeError 500.
Tests
- Greek + diacritic test (alpha + combining tilde)
- Multi-diacritic test (X + tilde + circumflex)
- Zip-bomb rejection test (300 MB uncompressed payload)
- 63 tests pass, ruff clean
Adds optional user accounts to CoreTex. When DATABASE_URL is configured,
users can sign up via email+password or OAuth (Google/GitHub), and every
conversion they run is persisted to a per-user history table. Re-uploading
an identical .docx returns the cached result in <100 ms instead of
re-running the full Pandoc + pdflatex pipeline.
Anonymous use still works exactly as before — auth is purely additive.
Backend
- New Postgres schema: users, oauth_identities, conversions, figures
- SQLAlchemy 2.0 sync engine, lazy-initialised so the app still boots
with DATABASE_URL unset
- bcrypt-hashed passwords (bcrypt<4.1 + passlib bcrypt_sha256 path)
- HS256 JWT bearer tokens with 7-day default expiry
- POST /auth/signup, POST /auth/login, GET /auth/me
- GET /auth/{google|github}/start + /callback — full authorization-code
flow with CSRF state cookie, server-side secret, fragment-bounce to
frontend
- GET /auth/providers — feature-detection endpoint for the SPA
- GET /history list (paginated), GET /history/{id} detail,
GET /history/{id}/download (.tex or .zip), DELETE /history/{id}
- /convert: optional bearer auth; if authenticated, hash the upload and
short-circuit when (user, sha256, template) already exists in DB
- worker: persists ConversionResult + figures to DB when user_id present
- main.py: includes new routers; creates tables on startup when DB enabled;
/ endpoint now reports configured features
Frontend
- react-router-dom 6 with /, /login, /signup, /auth/callback, /history
- AuthContext + useAuth hook with localStorage token storage
- AuthPage (login + signup) with Google + GitHub OAuth buttons
(auto-hidden when not configured server-side)
- OAuthCallback page captures token from URL fragment
- HistoryPage: paginated timeline with badges (template, images, citations,
warnings, compile status), instant re-download, delete confirmation
- Header: account chip with dropdown (history + sign out), sign-in/up
CTAs when logged out
- ConverterPage extracted from App.tsx; broadcasts phase to header via
CustomEvent so the persistent topbar survives navigation
- useConversion: handles the cached/dedup-hit branch by going straight
from upload to history-download, skipping polling entirely
- New CSS: account chip + dropdown, auth shell + card, OAuth buttons,
history list rows, badges, dot-separated foot links
Tests
- 7 new auth tests (providers endpoint, signup/login/me round-trip,
duplicate rejection, bad password, auth-required guards, empty history)
- In-memory SQLite test DB so the suite runs without Postgres
- 70 tests total, ruff clean, frontend tsc + vitest + vite build all green
Docs
- DEPLOY.md: new Part 5 walking through Postgres add-on, JWT secret,
Google OAuth app creation, GitHub OAuth app creation
- README: new 'Accounts + history' feature card; tests badge 60 → 70;
API table extended with /auth and /history endpoints
Previously a KeyError on 'access_token' bubbled up as a generic 'provider error' on the frontend with no clue what GitHub or Google actually objected to. We now inspect the JSON, log the error fields (error, error_description) when access_token is missing, and bounce to the frontend with a clean 302. Lets us read the real provider complaint (e.g. incorrect_client_credentials, bad_verification_code, redirect_uri_mismatch) directly from Railway logs.
…esh rule ESLint's react-refresh/only-export-components rule was firing on AuthContext.tsx because the file exported both the AuthProvider component and the useAuth hook. Mixed exports prevent React Fast Refresh from reliably hot-reloading the file. The clean architectural fix: extract the context object, AuthContextValue type, and useAuth hook into a plain .ts file (no JSX, not subject to the rule). AuthContext.tsx now only exports the AuthProvider component. CI was failing on `npm run lint` with --max-warnings 0; this resolves it.
|
@Dia-Arora is attempting to deploy a commit to the therayyn16-7825's projects Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.