Skip to content

feat: add Google OAuth login UI and fix auth proxy#1

Open
Dia-Arora wants to merge 8 commits into
TheClazer:mainfrom
Dia-Arora:feature/google-oauth-login
Open

feat: add Google OAuth login UI and fix auth proxy#1
Dia-Arora wants to merge 8 commits into
TheClazer:mainfrom
Dia-Arora:feature/google-oauth-login

Conversation

@Dia-Arora

Copy link
Copy Markdown
Collaborator

No description provided.

TheClazer and others added 8 commits May 15, 2026 04:35
Performance
- equation_handler: batch every equation in a document through ONE Pandoc
  subprocess instead of N. A 150-equation paper goes from ~60 s (150
  process boots) to ~0.6 s. Sentinel paragraphs around each OMML block
  let us split the batched LaTeX output back into per-equation chunks.

Security: image decompression bomb
- Set Image.MAX_IMAGE_PIXELS = 40_000_000 (~40 MP) and upgrade Pillow's
  DecompressionBombWarning to an error. A 2 MB zip-compressed
  50000x50000 PNG bomb now raises before allocating ~7.5 GB of RAM.
- Add img.verify() pass before resampling so malformed headers fail
  early rather than partway through processing.

Security: pdflatex hardening
- Explicit -no-shell-escape flag (default but cheap to be explicit).
- Set openin_any=p and openout_any=p env vars so pdflatex can only read
  and write files in the current working dir, even if the .tex source
  attempts \input{/etc/passwd}.
- Drop timeout from 60 s to 30 s — tight enough to bound a TeX macro
  loop, loose enough for a real paper compile.

Security: drop pickle from Redis
- Figures were stored as pickle.dumps({filename: bytes}) under one key.
  Replaced with per-file raw-byte keys (figures:{job_id}:f:{name}) plus
  a newline-delimited manifest. Eliminates Python-specific
  deserialisation risk and avoids the single-blob bottleneck.

Correctness: image fidelity
- Preserve EXIF orientation and ICC colour profile through Pillow
  recompression. A photo with a landscape orientation tag no longer
  renders rotated; a CMYK medical scan keeps its colour intent.

Correctness: i18n list detection
- Parser's style-name fallback now covers English, French, German,
  Spanish, Italian, Portuguese list-style names. As a final probe,
  walks the style chain via the styles part to find an inherited
  w:numPr — locale-independent, catches custom-named list styles.

Docs
- DEPLOY.md: new 'Scaling constraints' section spells out the v1
  trade-offs (single worker, Redis-staged figures, upload memory
  duplication, TeX Live image size) plus the upgrade paths.
…ecurity

- Bump test count badge: 48 → 60
- Comparison table: add Unicode math row, i18n list row, bomb-hardening
  row; update equation entry to (batched); 20-page paper conversion
  time: ~8 s → ~3 s
- Features grid: 140 unicode math glyphs, batched Pandoc, EXIF/ICC
  preserved, i18n lists. Production-grade card: streaming upload,
  Pillow bomb cap, pdflatex sandbox flags, no-pickle Redis, explicit
  CORS allowlist
- Smart preamble: mention amssymb auto-injection for unicode math
- Sequence diagram: updated to reflect batched Pandoc + sandbox flags
  + per-file figure keys (replacing pickled dict)
- API table: /temp/{id}.tex/.zip aliases now documented
- Architectural decisions: 2 new collapsible sections covering the
  unicode-math-at-escape strategy, single-Pandoc batching rationale,
  and the no-pickle Redis design
- Known limitations: split into Content (unchanged) + Scaling
  (deliberate v1 trade-offs with upgrade paths to S3 / Pro / etc.)
- Roadmap: new 'v1.1 (shipped — hardening)' section with each piece
  of recent work; v2 timeline shifted to reflect the work already done
10 forensic-audit findings addressed.

Security
- parser: enforce uncompressed-size caps before lxml touches document.xml.
  Refuses zips whose declared payload exceeds 200 MB total / 100 MB per
  entry / 5000 entries. A 20 MB upload could otherwise expand to 20 GB
  (1026x ratio observed) and OOM-kill the worker.
- rate_limit: new shared Limiter module that wires SlowAPI to Redis when
  REDIS_URL is present. Previously two independent in-memory Limiters
  existed in main.py and routes.py, and the active one didn't survive
  process restarts or scale across uvicorn workers.
- routes: switch UUID4 to secrets.token_urlsafe(16) for the job-ID
  generator. UUID4 was already CSPRNG-backed; this is intent-revealing.

Correctness — escape layer
- Combining-diacritic regex now matches any non-control base character,
  not just [A-Za-z0-9]. Greek + tilde (α̃) now produces
  \tilde{\alpha}, not \alpha + bare U+0303 (which crashed pdflatex).
- Apply ALL combining marks on a base char, nesting in source order.
  X with tilde and circumflex yields \hat{\tilde{X}} (was: tilde wrapped,
  circumflex dropped).
- When the base is itself a math glyph (Greek, blackboard), expand inline
  before wrapping so the subsequent math-glyph pass doesn't double-wrap.

Correctness — parser
- vMerge row_span: count consecutive <w:vMerge/> continuation cells in
  the same column position (OOXML §17.4.84) instead of hardcoding 2.
- Detect <w:pageBreakBefore/> in paragraph properties as a leading
  \newpage. Catches deliberate chapter breaks in dissertations.
- Detect body-level <w:sectPr> with nextPage/oddPage/evenPage and emit
  PageBreakNode (was silently ignored).
- r:link external images: emit a warning naming the broken URL so the
  user can re-insert as embedded, instead of silently dropping the figure.

Correctness — renderer
- table_handler: switch columns with > 40-char cell text to p{width}
  blocks so long-text cells wrap inside the page width instead of running
  off the right edge.

Correctness — API
- /temp race condition fixed by folding the type tag into the value as a
  4-byte prefix (ZIP: / TEX:). A single Redis GET now returns both content
  and type atomically — previously two sequential GETs created a 1-tick
  window where TTL expiry between them caused a UnicodeDecodeError 500.

Tests
- Greek + diacritic test (alpha + combining tilde)
- Multi-diacritic test (X + tilde + circumflex)
- Zip-bomb rejection test (300 MB uncompressed payload)
- 63 tests pass, ruff clean
Adds optional user accounts to CoreTex. When DATABASE_URL is configured,
users can sign up via email+password or OAuth (Google/GitHub), and every
conversion they run is persisted to a per-user history table. Re-uploading
an identical .docx returns the cached result in <100 ms instead of
re-running the full Pandoc + pdflatex pipeline.

Anonymous use still works exactly as before — auth is purely additive.

Backend
- New Postgres schema: users, oauth_identities, conversions, figures
- SQLAlchemy 2.0 sync engine, lazy-initialised so the app still boots
  with DATABASE_URL unset
- bcrypt-hashed passwords (bcrypt<4.1 + passlib bcrypt_sha256 path)
- HS256 JWT bearer tokens with 7-day default expiry
- POST /auth/signup, POST /auth/login, GET /auth/me
- GET /auth/{google|github}/start + /callback — full authorization-code
  flow with CSRF state cookie, server-side secret, fragment-bounce to
  frontend
- GET /auth/providers — feature-detection endpoint for the SPA
- GET /history list (paginated), GET /history/{id} detail,
  GET /history/{id}/download (.tex or .zip), DELETE /history/{id}
- /convert: optional bearer auth; if authenticated, hash the upload and
  short-circuit when (user, sha256, template) already exists in DB
- worker: persists ConversionResult + figures to DB when user_id present
- main.py: includes new routers; creates tables on startup when DB enabled;
  / endpoint now reports configured features

Frontend
- react-router-dom 6 with /, /login, /signup, /auth/callback, /history
- AuthContext + useAuth hook with localStorage token storage
- AuthPage (login + signup) with Google + GitHub OAuth buttons
  (auto-hidden when not configured server-side)
- OAuthCallback page captures token from URL fragment
- HistoryPage: paginated timeline with badges (template, images, citations,
  warnings, compile status), instant re-download, delete confirmation
- Header: account chip with dropdown (history + sign out), sign-in/up
  CTAs when logged out
- ConverterPage extracted from App.tsx; broadcasts phase to header via
  CustomEvent so the persistent topbar survives navigation
- useConversion: handles the cached/dedup-hit branch by going straight
  from upload to history-download, skipping polling entirely
- New CSS: account chip + dropdown, auth shell + card, OAuth buttons,
  history list rows, badges, dot-separated foot links

Tests
- 7 new auth tests (providers endpoint, signup/login/me round-trip,
  duplicate rejection, bad password, auth-required guards, empty history)
- In-memory SQLite test DB so the suite runs without Postgres
- 70 tests total, ruff clean, frontend tsc + vitest + vite build all green

Docs
- DEPLOY.md: new Part 5 walking through Postgres add-on, JWT secret,
  Google OAuth app creation, GitHub OAuth app creation
- README: new 'Accounts + history' feature card; tests badge 60 → 70;
  API table extended with /auth and /history endpoints
Previously a KeyError on 'access_token' bubbled up as a generic
'provider error' on the frontend with no clue what GitHub or Google
actually objected to. We now inspect the JSON, log the error fields
(error, error_description) when access_token is missing, and bounce
to the frontend with a clean 302. Lets us read the real provider
complaint (e.g. incorrect_client_credentials, bad_verification_code,
redirect_uri_mismatch) directly from Railway logs.
…esh rule

ESLint's react-refresh/only-export-components rule was firing on AuthContext.tsx
because the file exported both the AuthProvider component and the useAuth hook.
Mixed exports prevent React Fast Refresh from reliably hot-reloading the file.

The clean architectural fix: extract the context object, AuthContextValue type,
and useAuth hook into a plain .ts file (no JSX, not subject to the rule).
AuthContext.tsx now only exports the AuthProvider component.

CI was failing on `npm run lint` with --max-warnings 0; this resolves it.
@vercel

vercel Bot commented May 27, 2026

Copy link
Copy Markdown

@Dia-Arora is attempting to deploy a commit to the therayyn16-7825's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants