Self-hosted study companion: turn lecture PDFs into audiobooks, flashcards and exam sessions — entirely on your own hardware. No cloud, no API credits, no subscription.
Note: Personal hobby project for data-obsessed self-hosters. Not intended for commercial use, no warranties, no support guarantees.
Built alongside a job with a lot of time behind the wheel and an open-university degree on the side. Reality of that combination:
- Lecture scripts pile up faster than evenings to read them. A ~100-page chapter eats a Sunday; a commute eats nothing.
- The cloud-AI study tools out there want to host your scripts. Some of mine come from a private university, some from internal training — those PDFs aren't going on a SaaS server.
- And honestly: while every "AI study buddy" still routes everything through someone else's GPU 🙃 I'd rather burn my own electricity.
So I built PageMentor: drop a PDF in, and the same evening you get back an M4B audiobook for the car, an AI-generated flashcard deck for the train and a timed exam trainer for the weekend before the actual exam. All three driven by a local Ollama + local TTS stack, with cloud LLMs as optional fallback.
If you're in the same bucket — long drives, open-university study, sensitive material, limited quiet desk time — the tool was designed for exactly that.
PageMentor is a fresh build and does not fork any existing project.
Three things, one drag-and-drop:
Drop in a PDF, get back an M4B audiobook with chapter markers — narrated by local Qwen3-TTS-Flash (Apache 2.0) or Microsoft Edge Neural as fallback. One-click upload to Audiobookshelf, or subscribe as a private podcast feed.
AI-generated flashcards + SM-2 spaced repetition. Exercises are pulled automatically from the chapter text, cards are generated by a local Ollama model, and the review loop tracks your progress with keyboard shortcuts.
Timed exam trainer with score history. Picks questions from your card deck, times you, gives you side-by-side review of your answers versus the reference — and plots the trend as a sparkline.
- 🔒 Your PDFs never leave your network — ideal for university scripts, internal training material, anything you don't want in a SaaS log.
- 🆓 No API keys required — every feature runs against a local Ollama + local TTS. Cloud LLMs (Claude / OpenAI / Gemini) are optional fallbacks, configured from the UI.
- 🏠 One Docker stack — runs on a Synology NAS (bind-mounted data), scales to a GPU workstation when you want speed.
- Server know-how. This is Docker-Compose + Portainer territory. If you've never bind-mounted a NAS volume or opened a stack file, expect a learning curve.
- A decent GPU on the host that runs Ollama and Qwen3-TTS. A used RTX 3090 (24 GB) handles the full stack comfortably; smaller cards work if you scale the models down.
Every heavy component is swappable from the Settings modal, so the stack scales from a laptop to a 24 GB workstation:
- Ollama model — swap
llama3.2:3bforqwen2.5:7b,gemma3:27bor whatever your VRAM budget allows; models pulled on the Ollama box show up automatically. - Qwen3-TTS model variant — pick between Qwen-Base (voice-clone),
CustomVoice (preset slots) and VoiceDesign (text-described). Lower
the
max_new_tokensif your GPU OOMs. - TTS chunk size — max characters per synthesis request. Smaller chunks (1500–2500) are safer on a 12 GB card, larger chunks (3500–4000) give better prosody continuity on a 24 GB card.
- Embedding model — auto-detected, but you can pin any model the Ollama server has pulled.
None of this requires rebuilding the container — all of it persists
in data/llm-config.json and survives restarts.
| Learning menu (sub-tabs per document) | Modules overview (series view) |
|---|---|
![]() |
![]() |
| Modules management |
|---|
![]() |
Source: docs/diagrams/architecture.puml.
Render with plantuml -tpng docs/diagrams/architecture.puml -o ../screenshots.
Deeper diagrams — pipeline + GPU arbitration
Source: docs/diagrams/pipeline.puml.
Source: docs/diagrams/gpu-arbiter.puml.
- Library — card grid with type/tag/module filters, pipeline-dot dashboard per document (text → chapters → audio → cards → summary), live SSE status, cover rendered from any page you choose.
- Modules & tags — organise by module (series) + free-form tags, instead of a folder tree.
- PDF viewer — continuous scroll, zoom, page jump, text-selection → flashcard.
- Chapter detection — heading heuristic + page-slice fallback, long chapters auto-split, inline editor (rename, merge, delete).
- Audio cleanup — two-stage. Extractor strips watermarks, running
heads, figure/table captions and inline references. Then a local LLM
(Ollama) rewrites for narration: expands abbreviations, verbalises
formulas and Greek letters, strips citations, removes bullet markers.
Cached per chapter (
chapters.audio_text) — regeneration is free. - Audiobook (Qwen3-TTS) — default engine is
Qwen3-TTS-Flash served
via vLLM behind
pm-gpu-arbiter(Apache 2.0, fully local, university-/company-safe). Four curated German presets (Anna / Emma / Klaus / Max) with fixed per-voice seeds — same voice always sounds identical across runs. Additional preset-voice slots (CustomVoice) and text-described voices (VoiceDesign) are editable from the Settings modal. - Voice cloning (optional) — upload a 3–10 s reference clip and the Qwen-Base path clones the voice; cached in the sidecar for stable narrator quality across chapters.
- Audiobook (Edge fallback) — Microsoft Edge Neural TTS via
msedge-tts, no GPU required, works immediately for quick tests. - Audio output — EBU R128 loudness normalisation, MP3 per chapter and a complete M4B with chapter markers, cover, ID3 tags (title / author / album = series / grouping = short code).
- Audiobookshelf integration — one-click upload of the finished
M4B (with cover) to a configured
Audiobookshelf library; backend
resolves the freshly scanned item via
/searchandPATCHes subtitle / series / description cleanly — no reliance on MP4 iTunes-atom mapping. - Send to Apple — M4B download for Apple Books, private RSS feed
for Apple Podcasts / Overcast / Pocket Casts (token derived from
PM_PASSWORD). - Flashcards — task extraction from chapter text, AI generator via local Ollama, SM-2 spaced repetition with hints, CSV export/import (Anki-compatible).
- Exam trainer — timed sessions with self-assessment, side-by-side review and a sparkline-chart score history.
- Summaries — Ollama-powered, three detail levels per chapter, streaming output, document-language aware.
- Full-text search — Meilisearch over documents and chapters with in-place highlights.
- RAG (experimental) — embedding auto-detection on the Ollama
server (prefers
jina/jina-embeddings-v2-base-de), chunk-based per-document retrieval, feeds Q&A and AI cards. - GPU manager — a
GPU resetbutton flushes all Ollama models and restarts the TTS container to reclaim VRAM between long runs. - i18n — English + German UI, auto-detects browser language, override persists in settings.
cp .env.example .env # set PM_PASSWORD
docker compose up -d # backend + Meilisearch
# optional, for summaries and AI cards:
docker compose --profile ai up -d # adds Ollama
open http://localhost:3100The repo ships with a second compose file that uses bind mounts
under /volume1/docker/pagementor/… instead of Docker-managed named
volumes. That way your uploads, SQLite database and generated audio
survive container rebuilds and are visible through File Station.
SSH into the NAS or use File Station and create:
mkdir -p /volume1/docker/pagementor/data
mkdir -p /volume1/docker/pagementor/meili
mkdir -p /volume1/docker/pagementor/ollama # optional, only if you enable the AI profile-
Stacks → Add stack → Web editor
-
Name:
pagementor -
Paste the contents of
docker-compose.yml— it ships pre-configured with/volume1/docker/pagementor/{data,meili,ollama}bind mounts for Synology. -
Environment variables → Advanced mode — paste and tweak:
PM_PORT=3100 PM_PASSWORD=change-me-to-something-long MEILI_MASTER_KEY=change-me-to-a-long-random-key PM_OLLAMA_URL=http://ollama:11434 PM_OLLAMA_MODEL=llama3.1:8b
-
Deploy the stack.
The image is pulled from ghcr.io/theingof/pagementor:latest. The
package is public, so no token is required for docker pull. Skip the
PAT setup unless you switch to the private -dev channel:
- Go to GitHub → Settings → Developer settings → Personal access tokens (classic)
- Generate a token with scope
read:packages - In Portainer: Registries → Add registry → Custom →
ghcr.io, username = your GitHub handle, password = the token
After the first successful deploy, open http://NAS-IP:3100, log in
with the PM_PASSWORD you set, and upload a PDF.
When a new tag or push lands on main, GitHub Actions builds a new
image. In Portainer, open the stack and click Update (with "Pull
the latest images" enabled). All your data lives on the bind-mounted
folders and stays intact.
Ollama needs some horsepower; a Synology is borderline for small models
(llama3.2:3b or qwen2.5:3b are realistic). To enable it, keep the
ollama service in the stack (it's already defined but tied to the
ai profile). In Portainer, the profile gate is ignored — the service
will start as long as it's in the YAML. Pull a model once it's up:
docker exec -it pm-ollama ollama pull llama3.2:3bThen set PM_OLLAMA_MODEL=llama3.2:3b in the stack env and redeploy.
For real throughput (Gemma 26B, Qwen3-TTS voice-cloning, GPU Docling), run the heavy sidecars on a separate NVIDIA box. A complete example compose lives at docker-compose.gpu-host.example.yaml:
# on the GPU host:
curl -fLO https://raw.githubusercontent.com/TheInGoF/PageMentor/main/docker-compose.gpu-host.example.yaml
mv docker-compose.gpu-host.example.yaml docker-compose.yaml
# edit DOCLING_SERVE_API_KEY to match your NAS .env
docker compose pull
docker compose up -dThat brings up ollama, docling, qwen-tts, and pm-gpu-arbiter.
PageMentor (on the NAS) then only talks to the arbiter at
http://<gpu-host-ip>:8790 — the arbiter serializes GPU access
between the LLM and the TTS sidecar. Set that URL once in the
Settings modal (LLM → Ollama URL, and Local TTS → Arbiter URL) and
it persists in llm-config.json.
Images ghcr.io/theingof/pm-gpu-arbiter and
ghcr.io/theingof/pm-qwen-tts are built and pushed by the GitHub
Actions workflows in this repo (publish-arbiter.yml,
publish-qwen-tts.yml) — you only docker compose pull, never build
locally.
# prerequisites
brew install bun ffmpeg
npm run install:all # backend (bun) + frontend (npm)
PM_PASSWORD=devpass npm run dev # backend :3000, frontend :5173Vite proxies /api to :3000, so the app opens at
http://localhost:5173.
| Variable | Default | Purpose |
|---|---|---|
PM_PASSWORD |
changeme |
Login password (required) |
PM_PORT |
3100 |
Host port for the container |
PM_DATA_DIR |
./data |
Uploads + SQLite + audio files |
MEILI_URL |
http://meilisearch:7700 |
Meilisearch endpoint |
MEILI_MASTER_KEY |
pm-dev-key |
Meilisearch API key |
Meilisearch is optional (full-text search disables itself if missing).
LLM chain — configured in the app, not in ENV. Open the Settings modal and fill in whichever endpoints you have:
- Ollama URLs for up to three boxes (PC, Mac, generic) — model name can stay blank and PageMentor picks whatever the server has pulled automatically
- API keys for Claude, OpenAI and/or Gemini as paid/free-tier fallbacks
The config persists to $PM_DATA_DIR/llm-config.json and survives
container rebuilds. The chain tries PC-Ollama → Mac-Ollama → generic
Ollama → Claude → OpenAI → Gemini → none; the first reachable wins.
If nothing responds, the Summary and AI-Card tabs show a "not
configured" hint and the rest of the app keeps working.
- Backend: Bun + Hono + better-sqlite3
- Frontend: Vite + React 19 + Tailwind (night theme, red accents)
- TTS (default): Qwen3-TTS-Flash (Apache 2.0) served via vLLM's OpenAI-compatible server on a GPU host, proxied through pm-gpu-arbiter which serializes GPU access with Ollama. Four curated German voice slots (Anna / Emma / Klaus / Max) map to Qwen built-in speakers with fixed per-voice seeds — same voice always produces identical audio.
- TTS (fallback): Microsoft Azure Neural voices via msedge-tts
- Text extraction: Docling-Serve sidecar
- Search: Meilisearch v1.12
- LLM: Ollama (optional, self-hosted) — also drives the audio text preprocessor before TTS
- Embeddings: auto-detected against the Ollama server, prefers
jina/jina-embeddings-v2-base-de(Apache 2.0, bilingual DE+EN, trained on German corpora — best pick for German study material), falls back tobge-m3(multilingual) ornomic-embed-text(English-focused last resort) - Audio: FFmpeg (loudnorm + M4B chapters, invoked as subprocess)
backend/src/
index.ts # Hono app, routing, Meilisearch bootstrap
auth.ts # Cookie-based password auth
db.ts # SQLite schema
lib/i18n.ts # Accept-Language → error dictionaries
routes/ # documents, chapters, audio, cards, exam,
# search, summaries, feed
services/ # pipeline, cleanup, audiobook, events,
# cards-extract, SM-2, search, ollama
tts/ # voices, segmenter, SSML, synth, ffmpeg
frontend/src/
App.tsx # router + auth gate
components/ # Library, LearningMenu, PdfViewer,
# Audiobook, Cards, CardReview, ExamTrainer,
# Summaries, SearchOverlay, SettingsModal,
# PodcastShareModal, ShortcutsOverlay,
# ErrorBoundary
lib/api.ts # typed API client
lib/i18n.ts # useLocale hook + EN/DE dictionaries
lib/settings.ts # localStorage settings store
Huge thanks to the authors of the open-source tools this project stands on top of — in particular Meilisearch, Ollama, react-pdf, pdfjs-dist and FFmpeg. Microsoft Neural Voices are of course a Microsoft product; PageMentor only calls the same public Edge Read-Aloud endpoint a browser would, and routes the audio through local ffmpeg.
PageMentor is a fresh build and does not fork any existing project.
Built in a human-in-the-loop workflow: architecture, design decisions and scope were made by the author; an LLM coding assistant was used as a pair-programmer for implementation. Every change was reviewed before merging.
GNU Affero General Public License v3.0 — see LICENSE.
All third-party dependencies are either MIT, Apache 2.0 or LGPL-2.1-linked, all AGPLv3-compatible. See NOTICE for the full attribution list and the Microsoft Neural Voice usage note.
Donations are voluntary and solely support the project. They do not influence the prioritisation of bugs, feature requests or support enquiries.









