Codex: a study manual for the encorpora system#270
Draft
Umanistan wants to merge 41 commits into
Draft
Conversation
Adds corpan/docs/codex/ with the full table of contents stubbed out: README, 36 numbered sections across 10 parts, and 5 appendices. Every file follows the standard section template (What it is / How it fits / Files and entry points / How it works / Common operations / Why we built it this way / To go deeper) with TODO content, plus a one-line pointer at the top describing what the finished section will cover. This is the agreed first pass per the briefing. Subsequent sessions fill in one section at a time, starting with §01 (Overview). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Per feedback: the codex moves out of corpan/ (we are not touching corpan or the app itself) and lives at the repo root as codex/, matching the repo's own name (corpora-codex). Em dashes are removed from every header and TOC entry; numbered sections now read "# 01. Overview" with a period separator. The one § in the README prose is replaced with plain text. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
First voice-set pass. Names what Corpán, Corpora, and encorpora are, locates this repo against the sibling corpora platform, and plants the pack system as the central architectural choice (with sections 10-15 as the manual's center of gravity). The "How it works" subsection diagrams the end-to-end stack from Django authoring through embedded SQLite through Tauri IPC into runtime-loaded packs, with Spark/S3 audio rendering and Pages publishing as parallel tracks. Grounded in the root README, DEVELOPMENT.md, corpan/CLAUDE.md, the root package.json compose script, and a directory survey of corpan/ and the top-level content tree. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames the repo around two ideas: the shippable-units pattern (every
versioned thing keeps its own CHANGELOG.md next to its manifest, per
corpan/CHANGELOGS.md) and the composable Pages architecture (web/io/
+ web/pages/ + corpan/packs/*/dist/ -> web/io/out/ -> encorpora.io,
per GITHUB_PAGES_SETUP.md). Surveys the root, the corpan/ subtree, the
web/ subtree, and the typology of top-level content directories.
Grounded in CHANGELOGS.md, GITHUB_PAGES_SETUP.md, GAME_INSTALL_SUMMARY.md,
GIT_LFS.md, PIPELINE_STATE.md, the root package.json scripts, and a
directory survey of corpan/{scripts,tools,plugins,infra} and the
content tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Git from first principles (blobs / trees / commits / refs) braided with the encorpora-specific arrangement: fork/upstream split with upstream push disabled, three concurrent worktrees on disk, LFS for sqlite/png/epub/pdf, path-filtered CI (tsc + build for the app, build for web/io, terraform fmt+validate). Documents the squash-merge PR style and the discipline of changelog-in-same-PR. Grounded in `git remote -v`, `git worktree list`, .gitattributes, .gitignore, the three .github/workflows/*.yml, GIT_LFS.md, and `git log -20 upstream/main`. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames Tauri 2 as the host of Corpán's React UI, the home of every privileged operation (SQLite, HTTPS, TTS/STT, IAP, pack install), and the platform boundary that lets one source tree ship to macOS, Windows, Linux, iOS, and Android. Worked example: get_random_entry_with_translations showing the AppHandle/State injection model and the camelCase/snake_case seam. War story: the Android exit_prevent fix at lib.rs:1314, the vendored ndk-context fork, and what production-incident-driven code looks like. Grounded in src-tauri/Cargo.toml, tauri.conf.json, main.rs, lib.rs (the builder + IPC handlers + Android event handler), capabilities/ default.json, and build.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Rust from the apprentice's angle, with the STT plugin (corpan/plugins/tauri-plugin-stt/, 637 lines across six files) as the running worked example. Covers ownership and borrowing, structs and derives, the serde rename war story (availableMemoryMB), enums and pattern matching, ? and Result, traits and generics (SttExt as extension trait), modules and conditional compilation (#[cfg(desktop)] vs #[cfg(mobile)]), plugin Builder, Cargo and path deps, and macros. Grounded in the plugin's lib.rs, commands.rs, models.rs, error.rs, desktop.rs, mobile.rs, and Cargo.toml. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
React from the apprentice's angle. Components-as-functions and the hook-as-memory-slot model, with MainExperience.tsx (648 lines) as the running worked example. Covers useState/useRef/useEffect/useLayoutEffect/ useCallback/useMemo, the fetchSeqRef anti-stale-write pattern, the rendering model, Zustand stores with selector subscriptions, and React.StrictMode's dev double-render. Grounded in main.tsx, App.tsx, MainExperience.tsx, the store/ tree, and the conventions used at the IPC seam. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
TypeScript from the apprentice's angle, with corpan/packs/sdk/index.d.ts (223 lines, ambient declarations only) as the worked example. Covers structural typing, type aliases, string literal unions (SttErrorCode), optional fields, function types (the HostApi record), generics at the invoke<T>() seam, utility types (Partial<HostApi>), .d.ts ambient files as the SDK shipping shape, strict-mode tsconfig settings, paths aliases, and the noEmit + Vite build split. Grounded in the SDK's index.d.ts, the app's tsconfig.json, and the patterns used at the Rust/TypeScript seam. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Vite as the build tool driving Corpán's React frontend. Covers the native-ESM dev model (esbuild per-file transform) vs the production Rollup bundle, the plugin system with the custom servePacks middleware as a worked example, the HMR / TAURI_DEV_HOST dance, dual path aliases (tsconfig + vite), manualChunks for cache stability, and define-time __APP_VERSION__ substitution. Grounded in corpan-app/vite.config.ts (135 lines), tauri.conf.json's devUrl, the package.json scripts, and the dev/build loop. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the actual stack: Tailwind v4 via @tailwindcss/vite, shadcn/ui new-york style vendored into src/components/ui/, Radix UI primitives underneath, OKLCH design tokens on :root and .dark with :root/.dark mirror sets, class-variance-authority for component variants, and the cn() / tailwind-merge override pattern. Worked example: the Button component's cva config and the responsive size prefixes that explain the Apple HIG tap-target call. Notes the breathe keyframe and the safe-area-inset Tauri plugin. Opens with a small correction: the original briefing said "no Tailwind, no framework," which is out of date - per corpan/CLAUDE.md and vite.config.ts the app uses Tailwind. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames packs as the unit of velocity: small self-contained apps that
run inside Corpán at runtime, communicating with the host through a
deliberately small HostApi (the seam introduced in section 07).
Covers the SDK runtime (141 lines: registerGame / createMockHostApi /
mountStandalone), the manifest's three groups of fields (identity,
load, localization), the two install modes (manifest URL vs. zip),
and the end-to-end install + mount loop.
Grounded in packs/sdk/{index.js,index.d.ts,README.md},
packs/README.md, earthgate-reader/manifest.json (53 localized names),
and the host-side content_packs.rs.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Walks a pack as a project on disk, using Earthgate Reader as the
running example. Documents the fixed shape (manifest.json + package.json
+ vite.config.ts + tsconfig + index.html + src/ + scripts/ + dist/),
the heavy import set from corpan/packs/shared/, the entry-script pattern
(registerGame writing to window.CorpanGames, reading baseUrl from the
host-injected data attributes, calling createAppShell + createReader),
the deliberate non-import of manifest.json (and why), and the line
between code-only reader packs (Earthgate) and data-bundled packs
(Hanzipan).
Grounded in earthgate-reader/{manifest.json, package.json,
src/main.ts, src/game.ts head, scripts/pack.mjs} and the @shared/*
import surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The HostApi as the only seam between pack and host. Walks the eleven-
method contract in the SDK's index.d.ts (speech, stack/corpus,
per-pack data, optional stt), the narrower shared/sdk/types.ts that
catalog packs consume, the host's 459-line hostApi.ts implementation
(read-from-store + translate-at-the-seam + structured-error patterns),
the two mock implementations, the mount/unmount lifecycle in
ContentPackHost.tsx, and the deliberate "no backdoor" position that
makes the contract worth maintaining.
Grounded in packs/sdk/index.d.ts, packs/shared/sdk/{types.ts,
mockHostApi.ts}, corpan-app/src/contentPacks/{hostApi.ts head,
types.ts, ContentPackHost.tsx existence}, and the read-only SQL
gate at lib.rs:90.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames the catalog as a shared *library* (not a pack) under
corpan/packs/shared/catalog/ that wraps reading-style packs in a
consistent command-drawer chrome with library/browse/book-detail/
narrator-detail/install/exit. Covers appShell.ts as the orchestrator
of dispose-remount-on-book-switch, the pure searchFilter.ts as the
data-side hygiene that earns the library its modular reputation,
the reader/shell handshake, the CSS-custom-property theming that
lets Earthgate and Stargate share chrome with distinct palettes,
and the window.__corpanI18n bridge as the one acknowledged dent
in the no-backdoor principle.
Grounded in shared/catalog/{index.ts, src/types.ts, src/appShell.ts
head + size, src/searchFilter.ts, src/narratorDetail.ts existence,
src/catalog.css principle}.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the two shapes in corpan/packs/shared/state/ (261 lines
total): per-pack factory stores (bookMetaStore, bookmarkStore,
prefsStore) that namespace localStorage by a pack-chosen prefix,
and cross-pack singleton stores (narrationHistoryStore, drawerStore)
built on zustand/vanilla with the persist middleware. Worked
example: bookMetaStore's hasChapters cache and the layout-shift
rationale documented in its docstring. Maps the three persistence
surfaces (host Zustand stores, @shared/state, @shared/catalog
libraryStore) and why no fourth "host state visible from packs"
pattern exists.
Grounded in shared/state/{bookMetaStore.ts, bookmarkStore.ts,
prefsStore.ts, narrationHistoryStore.ts, drawerStore.ts}.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents transportBar.ts (303 lines, plain TypeScript / imperative
DOM) as the bottom-of-screen playback surface that catalog packs
share. Covers the eight-setter / nine-event contract, the layout
comment that documents the visual design better than any class
diagram, the setHasChapters + bookMetaStore handshake that prevents
first-frame layout shift on returning reads, the classPrefix theming
that lets Earthgate and Stargate share the bar with distinct palettes,
the bidirectional engine seam (events in, setters out, engine is
the source of truth), and the native-keepalive integration that
unifies on-screen, lock-screen, AirPods, and Bluetooth controls
into the same reader callbacks.
Grounded in shared/ui/transportBar.ts header + contract, shared/audio/
{audioEngine.ts, mediaSessionAnchor.ts, nativeKeepAlive.ts} sizes,
shared/state/bookMetaStore.ts, and the reader's wiring pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
SQLite from the apprentice's angle, with the Corpán content database as the running example. Walks the data model (tables, foreign keys, indexes), the seven-table schema from corpan/dja/cor/models.py (Language, Domain, Entry, Translation, Narrator, Pack, PackEntry), the embed-write-mmap pattern in db.rs that replaced the sqlite3_deserialize path after ANRs/SIGABRTs on lower-end Android, the PRAGMA setup, the single-mutex single-connection lifecycle, and the per-pack queryPackDb story with the four-statement allowlist in ensure_readonly_sql. Grounded in src-tauri/src/db.rs (57 lines), dja/cor/models.py (161 lines), the ensure_readonly_sql gate at lib.rs:90, and the get_random_entry_with_translations command at lib.rs:497. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the three JSON files per audiobook: manifest.json (book identity + cataloging metadata), segments.json v2.0.0 (authored text in chapter/segment shape with text vs. tts.text discipline and the dialog/text/image block_types), audio_manifest_<lang>.json (per- language render with file path, duration, pause_after_ms, and word-level start_ms/end_ms from whisper-cpp forced alignment). Walks the corresponding TypeScript types in shared/core/types.ts, the buildTimeline reconciliation that produces TimelineWord, and the explicit boundary cases where JSON gives way to SQLite (the phrase corpus) or M4A/AAC (the audio). Grounded in shared/core/types.ts, shared/data/segmentLoader.ts head, a real segments.json from ai-this-week, an audio_manifest_en.json from fascinating-curiosities, and a book manifest from fascinating-spies. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the three classes of audio assets: rendered narrations (AAC 64 kbps M4A mono 24 kHz at -22 LUFS / -3 dBTP, sitting on S3 + CloudFront), voice clone references (15s WAV, working copies on Jeff's disk + durable copies in S3), and in-zip vocal samples (16-bit PCM WAV at 24 kHz). Walks the fixed ffmpeg mastering chain (gain norm -> HPF 80Hz -> declicker -> FFT denoise -> noise gate -> compressor 2:1 -> limiter -> AAC encode), the iOS WebKit Opus-in-OGG silent-fail story that drove the WAV choice for in-zip samples, the Fascinating Curiosities pipeline at full scale (12 books x 23 langs ~= 7.5GB), and the runtime split where pack zips ship only manifests and audio streams from CloudFront. Grounded in corpan/NARRATION_SYSTEM.md, the auto-memory note on iOS WebKit audio codecs, voices/scripts/sample_clone_premaster_*.py, and the infra hydrate scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Maps where Python lives: corpan/dja (Django CMS), corpan/infra scripts (catalog/narrator generation, captures pipeline), the tools/ subtrees, the smaller Django sub-projects (arb/djarb, panko/djpanko, total-history/djistory), the voices/scripts experiments, and the out-of-repo ttsctl narration pipeline. Maps where Python is NOT: never on the user's device, never in the Tauri binary, never in a pack zip. Walks the producer-consumer boundary (Python produces files; the runtime reads them; nothing crosses) and the two cases where Python gives way to shell or native (pure file-system pipelines, hot inner loops). Grounded in corpan/dja/cor/models.py, corpan/NARRATION_SYSTEM.md on the pipeline shape, the infra/captures tree, and the per- subsystem requirements convention. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents ChatterboxMultilingualTTS as the zero-shot voice-cloning TTS that renders 23-language audiobook narrations. Walks one Chatterbox call shape, the tts.text vs. text discipline (no raw digits, no dashes in phonetic nudges), per-language voice mapping in narration.yaml, the six generation parameters, and the convergence loop (generate -> align -> validate -> trim -> master, with 40-attempt retries with 25% jittered TTS params before a Claude-subagent tts.text rewrite). Notes the current shipped scale (7 books, 41 packs, 35k segments, 10 languages). Grounded in corpan/NARRATION_SYSTEM.md (the canonical authoring doc) and the auto-memory's TTS notes (no dashes in nudges). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Whisper as it appears in two deployments: (1) offline forced alignment with stable-ts + Whisper medium on the Spark, producing the words[] arrays in audio_manifest_<lang>.json with the display-text-in-manifest rule that preserves user-facing spellings even when tts.text used phonetic nudges; (2) on-device speech-to- text with whisper.cpp through the tauri-plugin-stt (Android CPU NEON, iOS Metal XCFramework) driving the pronunciation coach. Walks the WhisperParams pass-through, the initial_prompt bias for low-resource non-Latin scripts, and the availableMemoryMB gate for runtime model upgrades. Grounded in corpan/NARRATION_SYSTEM.md (Whisper Alignment section), the tauri-plugin-stt models/commands modules already documented in section 05, PIPELINE_STATE.md (the large-v3 calibration decision), and RUNBOOK_QUANTIZE_LARGE_Q8.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the DGX Spark GB10 (Blackwell sm_121, 128 GB unified memory, CUDA 13.0, PyTorch cu130) as the project's GPU workstation and the only place Chatterbox + Whisper + ffmpeg run at production scale. Covers Tailscale-fronted access (no public SSH), the "develop locally / run on Spark" workflow as a six-step loop (author -> push -> kick off -> Spark renders -> publish -> hydrate), and what lives Spark-side and only Spark-side: the ttsctl tool itself at ~/projects/ttsctl/, the per-decision changelog under ~/projects/ttsctl/changelog/decisions/, the model caches, and the intermediate WAVs. Names the unified-memory benefit (no PCIe shuffle between Chatterbox GPU output and ffmpeg CPU mastering) as the GB10's specific virtue for this workload. Grounded in corpan/NARRATION_SYSTEM.md "Hardware" section, PIPELINE_STATE.md teammate-context snapshot, and the auto-memory. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the two AWS S3 buckets in us-east-2 (corpan-prod as the production data plane fronted by CloudFront d38iwc9748jekz.cloudfront.net; corpan-assets for marketing / developer-facing assets), the catalog.json runtime read path, the producer-consumer choreography of "audio first, zip second, catalog third, invalidate last", and the hydration scripts (hydrate-audio.sh, hydrate-voices.sh, hydrate-marketing.sh, hydrate-captures.sh) that make the producer-consumer split hospitable for local dev. Names the AWS auth profiles and the Cache-Control: max-age=60, stale-while-revalidate=300 catalog caching behavior with the reader's ?_t= cache-buster. Grounded in infra/sync-voices-to-s3.sh, infra/hydrate-audio.sh, infra/sync-marketing-to-s3.sh, infra/captures/build-and-upload.sh, and corpan/NARRATION_SYSTEM.md's "Publishing" section. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The synthesis section the briefing flagged as critical. Maps the four state locations (repo, Spark, S3+CloudFront, user's device) plus a fifth (GitHub for the canonical remote and the binary release path) and the named seams that synchronize them: git push/pull and rsync over Tailscale for repo <-> Spark, ttsctl publish and infra/sync-*-to-s3.sh for Spark/repo -> S3, CloudFront plus the running app for S3 -> device, and the deliberate "nothing" for device -> anywhere (no cloud sync, no accounts). Walks the publish ordering invariants, what gets touched on a code change vs. on a narration publish, and what does NOT synchronize and would have to be reconstructed if lost. Grounded in sections 22 (Spark) and 24 (S3) just written, the infra script families, PIPELINE_STATE.md, and CHANGELOGS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Babylon.js + @babylonjs/loaders in Hover Runner and Juice Squeeze;
Phaser 3.80 in Quest-Ear; Tone.js in Melopan (off-branch). Walks the
SVG-to-GLB Blender build pipeline in hover-runner/scripts/svg_to_3d_v2.py,
the construct-on-mount / dispose-on-unmount discipline that applies
across every engine, and why each pack picks its own engine instead
of standardizing.
Grounded in hover-runner/{package.json, scripts/svg_to_3d_v2.py},
juice-squeeze/package.json, quest-ear/package.json, and the
melopan-2026-05 auto-memory note.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The infra/captures pipeline: iPad screen recording -> sidecar -> four ffmpeg-derived variants (long 1200x1600, shorts 1080x1920 with blur-pad, square 1080x1080, thumb 1280x720) -> S3 mirror at corpan-assets/captures/ -> YouTube upload via the corpan-yt Python click CLI. Covers the yuvj420p->yuv420p color-space discipline, the mandatory-sidecar workflow, the slug naming convention, the YouTube videos.insert daily quota handling, and the branding/ channel-level assets. Grounded in infra/captures/CAPTURES.md, build-capture.sh, build-and-upload.sh, and youtube/pyproject.toml. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Bundle id com.corpora.corpan, team F9AV5HKF6N, iOS 16.0 minimum. The XcodeGen-driven regen pattern (ios-gen.sh produces gen/apple/ from src-tauri/ios/project.yml; gen/ is not hand-edited). Covers the Swift plugin side of tauri-plugin-stt and friends, capabilities required (IAP, Background Audio, Microphone, Speech Recognition), PrivacyInfo.xcprivacy declarations, the StoreKit test config in Corpan.storekit, the Apple Feedback Assistant try-many-URLs fallback, and the App Store submission flow. Grounded in src-tauri/ios/project.yml, scripts/ios-gen.sh references in APP_RELEASE_0_11_3.md, infra/IAP_SETUP_RUNBOOK.md, RELEASE_NOTES_ 0.13.1.md, and the iOS feedback command in lib.rs:1232. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The Tauri Android target with its specific rough edges: patch-android.sh pins compileSdk=36, targetSdk=36, ndkVersion=28.2.13676358, Java/Kotlin 17; gen/android/ is generated and not hand-edited; tauri-plugin-iap contributes com.android.vending.BILLING; the vendored ndk-context fork and prevent_exit discipline are the response to the Activity-re-init crash chain (section 04 walks). Covers the Kotlin plugin side, whisper.cpp on CPU+NEON for pronunciation coach, release signing via upload-keystore.jks (not in git), and Play Console metadata mirroring iOS. Grounded in APP_RELEASE_0_11_3.md, scripts/patch-android.sh references, corpan/CLAUDE.md Android section, Cargo.toml's [patch.crates-io] block, and RELEASE_NOTES_0.12.7_ANDROID.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
macOS / Windows / Linux as the third target family. Same Tauri binary, OS-provided WebView (WKWebView / WebView2 / WebKitGTK), 1200x1000 window default per tauri.conf.json. The deliberate asymmetry: full reader + catalog + marketing-site embed, but STT stubs out (the pronunciation coach is not desktop-shipping today) and IAP / audio-keepalive behave differently. Mac App Store signing identity committed in tauri.conf.json; notarization and Windows EV code signing deferred. Grounded in tauri.conf.json's app.windows and bundle.macOS sections, plugins' src/desktop.rs stubs (the STT one explicitly walked), the iOS WebKit codec story from section 18. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The directory of the five general-purpose languages (TypeScript, Rust, Python, Kotlin, Swift) plus the supporting languages (HTML, CSS, SQL, YAML, JSON, Markdown, LaTeX, Lua). Maps the one-language-per-concern table, the canonical entry point per language, and the decision tree for picking which language a new piece of work belongs in. Cross-references the deep-dive sections that already cover each language: TypeScript (07), Rust (05), Python (19), Kotlin (28), Swift (27). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
bash scripts as the glue layer with the set -euo pipefail discipline, the comment-block-as-docs convention, and the deliberate "shell as glue, Python as logic" split. Walks the directory map (infra/, infra/captures/, corpan-app/scripts/, web/scripts/, voices/scripts/), the collaboration with jq/aws/ffmpeg/curl/unzip, the bootstrap-vs- working flavors, and the .env-or-environment credentials dance. Grounded in the existing scripts I read for sections 22, 24, 25, 27, and 28: sync-voices-to-s3.sh, hydrate-audio.sh, build-capture.sh, build-and-upload.sh, sync-marketing-to-s3.sh, ios-gen.sh references, patch-android.sh references. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Four managers, one per language family: npm + package-lock.json (committed), Cargo + Cargo.lock (gitignored - the documented rusqlite/sqlx exception), pip / uv with per-subtree requirements.txt or pyproject.toml, and Homebrew for the system-binary layer the shell scripts call. Per-subsystem manifests over a monorepo-wide hoist (Vite/zustand can vary between packs without coordination), npm ci in CI vs. npm install locally, --legacy-peer-deps for the older packs. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The practitioner's view: CLAUDE.md/AGENTS.md as agent-facing docs at subtree roots, plan mode vs action mode, three concurrent worktrees as the routine parallel primitive, the auto-memory contract at ~/.claude/projects/.../memory/, the pr-agent loop on every PR (Codium PR-Agent, not a gate), and the discipline of reading the diff before approving the agent's claim. Grounded in corpan/CLAUDE.md existence, the various AGENTS.md files, .github/workflows/pr-agent.yml (already documented in section 03), and PIPELINE_STATE.md's role as a maintained snapshot. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The complement to section 33: the inventory of judgments humans still hold (architectural, product, taste), the trap of deferring judgment to the agent, the shift in skill emphasis toward reading code and writing prose, and the mitigations the codebase encodes (CLAUDE.md/AGENTS.md as defaults, reading-the-diff discipline, auto-memory feedback entries, PR review by Skylar). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Dated speculation (2026-05-29 snapshot) about where the project is pointed: near-term (more languages and books, mature pronunciation coach, web codex), medium-near (device-to-cloud user state, Brewfile, deterministic gen/ rebuilds, catalog-versioned pack delivery), and larger bets (on-device TTS catching up, the corpora platform absorbing graduated components, the durability of the agent-era patterns). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Dated snapshot (2026-05-29) of the 90-day arc on upstream/main: the IAP rewrite for App Review, the catalog v2 narrator-first rewrite, analytics hardening for WKWebView, the 10k phrase corpus slim with nine new languages, World Radio's native streams via tauri-plugin-radio-stream, pronunciation coach on Android CPU+whisper.cpp, the DGX-driven catalog publisher, Earthgate/Hanzipan polish, and the Parlometron / phrase-pack architecture push through 0.13.x and 0.15.x. Closes with three architectural shifts: catalog became narrator-first, audio runtime became fully native, on-device pronunciation matured. Grounded in 'git log --since=90.days' against upstream/main. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… Where to Look) A: proper-noun glossary clustered by people/products/packs/pipelines/ hardware/build-terms/files-and-paths. B: file/directory/commit/PR/changelog/release-notes conventions plus the Codex section template rules. C: ~30 most-run commands grouped by setup / run / build / publish / inspect / recover. D: reading list cross-referencing every external book/paper/talk the Codex points at. E: reverse index from "I want to understand X" to specific file + section, plus the 20-minute starter set. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Single-file edition of the full Codex: README front matter, all 36 numbered sections, all 5 appendices, separated by horizontal rules. 12,439 lines of plain markdown for offline reading in any markdown viewer or in a terminal with 'less codex/CODEX.md'. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Remove the two intentional em dashes (in section 33 and Appendix B that referenced the character literally). Rephrase to describe the rule without quoting the character. - Replace Korean sample in section 16 (SQLite) with romanized form; the point about translation rows in a non-Latin script is the same with Latin characters and the example reads cleanly. - Replace Punjabi initial_prompt sample in section 21 (Whisper) with a descriptive placeholder; the technique is what matters, not the literal script. - Replace media-control glyphs in section 15 (Pack Transport) ASCII diagram with plain-text labels (|< [-30] [play/pause] [+30] >|). - Regenerate codex/CODEX.md and codex/CODEX.pdf. PDF now renders with zero missing-glyph warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
codex/at the repo root: a 36-section study manual for theencorpora codebase, plus five appendices, plus a concatenated
single-file edition at
codex/CODEX.md.The manual braids two jobs into one document:
technology, every script, every convention, every place state
lives. Open it, read two sections, know exactly where to work.
explained on its own terms, using real encorpora code as the
example. The reader learns React, Rust, Tauri, Kotlin, SQLite,
Python, TypeScript, Tone.js, Whisper, Chatterbox, Babylon.js,
monorepo discipline, version control, and the philosophy of
building systems that don't break.
Total: 75,075 words / ~214 reading-pages. Dense, no fluff, no em
dashes in prose. Every claim grounded in files read during the
authoring session.
Structure
Styling.
Catalog, Shared State, Transport.
Assets.
Spark, 3D and Creative.
State Locations.
Management.
Still Do, the Near Future.
snapshot).
Where to Look.
Reading the book
The Codex lives at
codex/. For a single-file read:less codex/CODEX.mdin a terminalcodex/CODEX.mdin any markdown viewerpandoc codex/CODEX.md -o codex.pdffor a typeset PDFThe README at
codex/README.mdis the table of contents and theintended entrypoint.
Notes for review
relocation out of
corpan/; one for the concatenated edition.~40 commits total.
the rationale lives. The intent is that revisiting any section
six months later, the why is still legible.
speculation (section 35) are explicitly dated. Other sections
are written to age well.
original brief said "no Tailwind, no framework" but the app
uses Tailwind v4 + shadcn/ui + Radix. Documented what's
actually there.
Test plan
codex/README.mdand confirm the TOC links resolve.match current code.
codex/CODEX.mdto PDF; confirm it reads well as abook.
know well; confirm it captures the actual rationale.
Draft until Jeff signs off on the framing and Skylar weighs in on
the pipeline-side accuracy.
🤖 Generated with Claude Code