Skip to content

Codex: a study manual for the encorpora system#270

Draft
Umanistan wants to merge 41 commits into
corpora-inc:mainfrom
Umanistan:codex
Draft

Codex: a study manual for the encorpora system#270
Umanistan wants to merge 41 commits into
corpora-inc:mainfrom
Umanistan:codex

Conversation

@Umanistan

Copy link
Copy Markdown
Contributor

Summary

Adds codex/ at the repo root: a 36-section study manual for the
encorpora codebase, plus five appendices, plus a concatenated
single-file edition at codex/CODEX.md.

The manual braids two jobs into one document:

  1. A reference manual for this specific system. Every
    technology, every script, every convention, every place state
    lives. Open it, read two sections, know exactly where to work.
  2. A general programming education. Each technology is
    explained on its own terms, using real encorpora code as the
    example. The reader learns React, Rust, Tauri, Kotlin, SQLite,
    Python, TypeScript, Tone.js, Whisper, Chatterbox, Babylon.js,
    monorepo discipline, version control, and the philosophy of
    building systems that don't break.

Total: 75,075 words / ~214 reading-pages. Dense, no fluff, no em
dashes in prose. Every claim grounded in files read during the
authoring session.

Structure

  • Part I, The System: Overview, Monorepo, Version Control.
  • Part II, The App: Tauri, Rust, React, TypeScript, Vite,
    Styling.
  • Part III, The Pack System: Overview, Anatomy, Host API,
    Catalog, Shared State, Transport.
  • Part IV, Data and Content: SQLite, Content Formats, Audio
    Assets.
  • Part V, The Pipeline: Python, Chatterbox, Whisper, the
    Spark, 3D and Creative.
  • Part VI, Storage and Delivery: S3, Captures and YouTube,
    State Locations.
  • Part VII, Platforms: iOS, Android, Desktop.
  • Part VIII, The Toolchain: Languages, the Shell, Package
    Management.
  • Part IX, The Agent Era: Working with Agents, What Humans
    Still Do, the Near Future.
  • Part X, Recent Evolutions: Changelog of the System (90-day
    snapshot).
  • Appendices: Glossary, Conventions, Commands, Reading List,
    Where to Look.

Reading the book

The Codex lives at codex/. For a single-file read:

  • less codex/CODEX.md in a terminal
  • Open codex/CODEX.md in any markdown viewer
  • pandoc codex/CODEX.md -o codex.pdf for a typeset PDF

The README at codex/README.md is the table of contents and the
intended entrypoint.

Notes for review

  • One commit per section; one for the skeleton; one for the
    relocation out of corpan/; one for the concatenated edition.
    ~40 commits total.
  • The "Why we built it this way" coda on each section is where
    the rationale lives. The intent is that revisiting any section
    six months later, the why is still legible.
  • The 90-day changelog (section 36) and the near-future
    speculation (section 35) are explicitly dated. Other sections
    are written to age well.
  • Section 09 (Styling) opens with a brief correction: the
    original brief said "no Tailwind, no framework" but the app
    uses Tailwind v4 + shadcn/ui + Radix. Documented what's
    actually there.

Test plan

  • Open codex/README.md and confirm the TOC links resolve.
  • Skim two arbitrary sections; confirm the technical claims
    match current code.
  • Render codex/CODEX.md to PDF; confirm it reads well as a
    book.
  • Read the "Why we built it this way" coda on a section you
    know well; confirm it captures the actual rationale.

Draft until Jeff signs off on the framing and Skylar weighs in on
the pipeline-side accuracy.

🤖 Generated with Claude Code

Umanistan and others added 30 commits May 29, 2026 14:09
Adds corpan/docs/codex/ with the full table of contents stubbed out:
README, 36 numbered sections across 10 parts, and 5 appendices. Every
file follows the standard section template (What it is / How it fits /
Files and entry points / How it works / Common operations / Why we built
it this way / To go deeper) with TODO content, plus a one-line pointer
at the top describing what the finished section will cover.

This is the agreed first pass per the briefing. Subsequent sessions
fill in one section at a time, starting with §01 (Overview).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Per feedback: the codex moves out of corpan/ (we are not touching corpan
or the app itself) and lives at the repo root as codex/, matching the
repo's own name (corpora-codex). Em dashes are removed from every header
and TOC entry; numbered sections now read "# 01. Overview" with a period
separator. The one § in the README prose is replaced with plain text.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
First voice-set pass. Names what Corpán, Corpora, and encorpora are,
locates this repo against the sibling corpora platform, and plants the
pack system as the central architectural choice (with sections 10-15
as the manual's center of gravity). The "How it works" subsection
diagrams the end-to-end stack from Django authoring through embedded
SQLite through Tauri IPC into runtime-loaded packs, with Spark/S3
audio rendering and Pages publishing as parallel tracks.

Grounded in the root README, DEVELOPMENT.md, corpan/CLAUDE.md, the
root package.json compose script, and a directory survey of corpan/
and the top-level content tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames the repo around two ideas: the shippable-units pattern (every
versioned thing keeps its own CHANGELOG.md next to its manifest, per
corpan/CHANGELOGS.md) and the composable Pages architecture (web/io/
+ web/pages/ + corpan/packs/*/dist/ -> web/io/out/ -> encorpora.io,
per GITHUB_PAGES_SETUP.md). Surveys the root, the corpan/ subtree, the
web/ subtree, and the typology of top-level content directories.

Grounded in CHANGELOGS.md, GITHUB_PAGES_SETUP.md, GAME_INSTALL_SUMMARY.md,
GIT_LFS.md, PIPELINE_STATE.md, the root package.json scripts, and a
directory survey of corpan/{scripts,tools,plugins,infra} and the
content tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Git from first principles (blobs / trees / commits / refs) braided
with the encorpora-specific arrangement: fork/upstream split with
upstream push disabled, three concurrent worktrees on disk, LFS for
sqlite/png/epub/pdf, path-filtered CI (tsc + build for the app,
build for web/io, terraform fmt+validate). Documents the squash-merge
PR style and the discipline of changelog-in-same-PR.

Grounded in `git remote -v`, `git worktree list`, .gitattributes,
.gitignore, the three .github/workflows/*.yml, GIT_LFS.md, and
`git log -20 upstream/main`.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames Tauri 2 as the host of Corpán's React UI, the home of every
privileged operation (SQLite, HTTPS, TTS/STT, IAP, pack install), and
the platform boundary that lets one source tree ship to macOS, Windows,
Linux, iOS, and Android. Worked example: get_random_entry_with_translations
showing the AppHandle/State injection model and the camelCase/snake_case
seam. War story: the Android exit_prevent fix at lib.rs:1314, the
vendored ndk-context fork, and what production-incident-driven code
looks like.

Grounded in src-tauri/Cargo.toml, tauri.conf.json, main.rs, lib.rs
(the builder + IPC handlers + Android event handler), capabilities/
default.json, and build.rs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Rust from the apprentice's angle, with the STT plugin
(corpan/plugins/tauri-plugin-stt/, 637 lines across six files) as
the running worked example. Covers ownership and borrowing, structs
and derives, the serde rename war story (availableMemoryMB), enums
and pattern matching, ? and Result, traits and generics (SttExt as
extension trait), modules and conditional compilation (#[cfg(desktop)]
vs #[cfg(mobile)]), plugin Builder, Cargo and path deps, and macros.

Grounded in the plugin's lib.rs, commands.rs, models.rs, error.rs,
desktop.rs, mobile.rs, and Cargo.toml.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
React from the apprentice's angle. Components-as-functions and the
hook-as-memory-slot model, with MainExperience.tsx (648 lines) as the
running worked example. Covers useState/useRef/useEffect/useLayoutEffect/
useCallback/useMemo, the fetchSeqRef anti-stale-write pattern, the
rendering model, Zustand stores with selector subscriptions, and
React.StrictMode's dev double-render.

Grounded in main.tsx, App.tsx, MainExperience.tsx, the store/ tree,
and the conventions used at the IPC seam.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
TypeScript from the apprentice's angle, with corpan/packs/sdk/index.d.ts
(223 lines, ambient declarations only) as the worked example. Covers
structural typing, type aliases, string literal unions (SttErrorCode),
optional fields, function types (the HostApi record), generics at the
invoke<T>() seam, utility types (Partial<HostApi>), .d.ts ambient files
as the SDK shipping shape, strict-mode tsconfig settings, paths aliases,
and the noEmit + Vite build split.

Grounded in the SDK's index.d.ts, the app's tsconfig.json, and the
patterns used at the Rust/TypeScript seam.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Vite as the build tool driving Corpán's React frontend. Covers the
native-ESM dev model (esbuild per-file transform) vs the production
Rollup bundle, the plugin system with the custom servePacks middleware
as a worked example, the HMR / TAURI_DEV_HOST dance, dual path aliases
(tsconfig + vite), manualChunks for cache stability, and define-time
__APP_VERSION__ substitution.

Grounded in corpan-app/vite.config.ts (135 lines), tauri.conf.json's
devUrl, the package.json scripts, and the dev/build loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the actual stack: Tailwind v4 via @tailwindcss/vite,
shadcn/ui new-york style vendored into src/components/ui/, Radix UI
primitives underneath, OKLCH design tokens on :root and .dark with
:root/.dark mirror sets, class-variance-authority for component
variants, and the cn() / tailwind-merge override pattern. Worked
example: the Button component's cva config and the responsive size
prefixes that explain the Apple HIG tap-target call. Notes the
breathe keyframe and the safe-area-inset Tauri plugin.

Opens with a small correction: the original briefing said
"no Tailwind, no framework," which is out of date - per
corpan/CLAUDE.md and vite.config.ts the app uses Tailwind.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames packs as the unit of velocity: small self-contained apps that
run inside Corpán at runtime, communicating with the host through a
deliberately small HostApi (the seam introduced in section 07).
Covers the SDK runtime (141 lines: registerGame / createMockHostApi /
mountStandalone), the manifest's three groups of fields (identity,
load, localization), the two install modes (manifest URL vs. zip),
and the end-to-end install + mount loop.

Grounded in packs/sdk/{index.js,index.d.ts,README.md},
packs/README.md, earthgate-reader/manifest.json (53 localized names),
and the host-side content_packs.rs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Walks a pack as a project on disk, using Earthgate Reader as the
running example. Documents the fixed shape (manifest.json + package.json
+ vite.config.ts + tsconfig + index.html + src/ + scripts/ + dist/),
the heavy import set from corpan/packs/shared/, the entry-script pattern
(registerGame writing to window.CorpanGames, reading baseUrl from the
host-injected data attributes, calling createAppShell + createReader),
the deliberate non-import of manifest.json (and why), and the line
between code-only reader packs (Earthgate) and data-bundled packs
(Hanzipan).

Grounded in earthgate-reader/{manifest.json, package.json,
src/main.ts, src/game.ts head, scripts/pack.mjs} and the @shared/*
import surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The HostApi as the only seam between pack and host. Walks the eleven-
method contract in the SDK's index.d.ts (speech, stack/corpus,
per-pack data, optional stt), the narrower shared/sdk/types.ts that
catalog packs consume, the host's 459-line hostApi.ts implementation
(read-from-store + translate-at-the-seam + structured-error patterns),
the two mock implementations, the mount/unmount lifecycle in
ContentPackHost.tsx, and the deliberate "no backdoor" position that
makes the contract worth maintaining.

Grounded in packs/sdk/index.d.ts, packs/shared/sdk/{types.ts,
mockHostApi.ts}, corpan-app/src/contentPacks/{hostApi.ts head,
types.ts, ContentPackHost.tsx existence}, and the read-only SQL
gate at lib.rs:90.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Frames the catalog as a shared *library* (not a pack) under
corpan/packs/shared/catalog/ that wraps reading-style packs in a
consistent command-drawer chrome with library/browse/book-detail/
narrator-detail/install/exit. Covers appShell.ts as the orchestrator
of dispose-remount-on-book-switch, the pure searchFilter.ts as the
data-side hygiene that earns the library its modular reputation,
the reader/shell handshake, the CSS-custom-property theming that
lets Earthgate and Stargate share chrome with distinct palettes,
and the window.__corpanI18n bridge as the one acknowledged dent
in the no-backdoor principle.

Grounded in shared/catalog/{index.ts, src/types.ts, src/appShell.ts
head + size, src/searchFilter.ts, src/narratorDetail.ts existence,
src/catalog.css principle}.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the two shapes in corpan/packs/shared/state/ (261 lines
total): per-pack factory stores (bookMetaStore, bookmarkStore,
prefsStore) that namespace localStorage by a pack-chosen prefix,
and cross-pack singleton stores (narrationHistoryStore, drawerStore)
built on zustand/vanilla with the persist middleware. Worked
example: bookMetaStore's hasChapters cache and the layout-shift
rationale documented in its docstring. Maps the three persistence
surfaces (host Zustand stores, @shared/state, @shared/catalog
libraryStore) and why no fourth "host state visible from packs"
pattern exists.

Grounded in shared/state/{bookMetaStore.ts, bookmarkStore.ts,
prefsStore.ts, narrationHistoryStore.ts, drawerStore.ts}.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents transportBar.ts (303 lines, plain TypeScript / imperative
DOM) as the bottom-of-screen playback surface that catalog packs
share. Covers the eight-setter / nine-event contract, the layout
comment that documents the visual design better than any class
diagram, the setHasChapters + bookMetaStore handshake that prevents
first-frame layout shift on returning reads, the classPrefix theming
that lets Earthgate and Stargate share the bar with distinct palettes,
the bidirectional engine seam (events in, setters out, engine is
the source of truth), and the native-keepalive integration that
unifies on-screen, lock-screen, AirPods, and Bluetooth controls
into the same reader callbacks.

Grounded in shared/ui/transportBar.ts header + contract, shared/audio/
{audioEngine.ts, mediaSessionAnchor.ts, nativeKeepAlive.ts} sizes,
shared/state/bookMetaStore.ts, and the reader's wiring pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
SQLite from the apprentice's angle, with the Corpán content database
as the running example. Walks the data model (tables, foreign keys,
indexes), the seven-table schema from corpan/dja/cor/models.py
(Language, Domain, Entry, Translation, Narrator, Pack, PackEntry),
the embed-write-mmap pattern in db.rs that replaced the
sqlite3_deserialize path after ANRs/SIGABRTs on lower-end Android,
the PRAGMA setup, the single-mutex single-connection lifecycle,
and the per-pack queryPackDb story with the four-statement allowlist
in ensure_readonly_sql.

Grounded in src-tauri/src/db.rs (57 lines), dja/cor/models.py (161
lines), the ensure_readonly_sql gate at lib.rs:90, and the
get_random_entry_with_translations command at lib.rs:497.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the three JSON files per audiobook: manifest.json (book
identity + cataloging metadata), segments.json v2.0.0 (authored text
in chapter/segment shape with text vs. tts.text discipline and the
dialog/text/image block_types), audio_manifest_<lang>.json (per-
language render with file path, duration, pause_after_ms, and
word-level start_ms/end_ms from whisper-cpp forced alignment).
Walks the corresponding TypeScript types in shared/core/types.ts,
the buildTimeline reconciliation that produces TimelineWord, and
the explicit boundary cases where JSON gives way to SQLite (the
phrase corpus) or M4A/AAC (the audio).

Grounded in shared/core/types.ts, shared/data/segmentLoader.ts head,
a real segments.json from ai-this-week, an audio_manifest_en.json
from fascinating-curiosities, and a book manifest from fascinating-spies.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the three classes of audio assets: rendered narrations (AAC
64 kbps M4A mono 24 kHz at -22 LUFS / -3 dBTP, sitting on S3 +
CloudFront), voice clone references (15s WAV, working copies on Jeff's
disk + durable copies in S3), and in-zip vocal samples (16-bit PCM
WAV at 24 kHz). Walks the fixed ffmpeg mastering chain (gain norm ->
HPF 80Hz -> declicker -> FFT denoise -> noise gate -> compressor 2:1
-> limiter -> AAC encode), the iOS WebKit Opus-in-OGG silent-fail
story that drove the WAV choice for in-zip samples, the Fascinating
Curiosities pipeline at full scale (12 books x 23 langs ~= 7.5GB),
and the runtime split where pack zips ship only manifests and audio
streams from CloudFront.

Grounded in corpan/NARRATION_SYSTEM.md, the auto-memory note on iOS
WebKit audio codecs, voices/scripts/sample_clone_premaster_*.py, and
the infra hydrate scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Maps where Python lives: corpan/dja (Django CMS), corpan/infra
scripts (catalog/narrator generation, captures pipeline), the
tools/ subtrees, the smaller Django sub-projects (arb/djarb,
panko/djpanko, total-history/djistory), the voices/scripts
experiments, and the out-of-repo ttsctl narration pipeline. Maps
where Python is NOT: never on the user's device, never in the
Tauri binary, never in a pack zip. Walks the producer-consumer
boundary (Python produces files; the runtime reads them; nothing
crosses) and the two cases where Python gives way to shell or
native (pure file-system pipelines, hot inner loops).

Grounded in corpan/dja/cor/models.py, corpan/NARRATION_SYSTEM.md
on the pipeline shape, the infra/captures tree, and the per-
subsystem requirements convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents ChatterboxMultilingualTTS as the zero-shot voice-cloning
TTS that renders 23-language audiobook narrations. Walks one
Chatterbox call shape, the tts.text vs. text discipline (no raw
digits, no dashes in phonetic nudges), per-language voice mapping
in narration.yaml, the six generation parameters, and the
convergence loop (generate -> align -> validate -> trim -> master,
with 40-attempt retries with 25% jittered TTS params before a
Claude-subagent tts.text rewrite). Notes the current shipped scale
(7 books, 41 packs, 35k segments, 10 languages).

Grounded in corpan/NARRATION_SYSTEM.md (the canonical authoring
doc) and the auto-memory's TTS notes (no dashes in nudges).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Whisper as it appears in two deployments: (1) offline forced
alignment with stable-ts + Whisper medium on the Spark, producing
the words[] arrays in audio_manifest_<lang>.json with the
display-text-in-manifest rule that preserves user-facing spellings
even when tts.text used phonetic nudges; (2) on-device speech-to-
text with whisper.cpp through the tauri-plugin-stt (Android CPU
NEON, iOS Metal XCFramework) driving the pronunciation coach.
Walks the WhisperParams pass-through, the initial_prompt bias for
low-resource non-Latin scripts, and the availableMemoryMB gate
for runtime model upgrades.

Grounded in corpan/NARRATION_SYSTEM.md (Whisper Alignment section),
the tauri-plugin-stt models/commands modules already documented in
section 05, PIPELINE_STATE.md (the large-v3 calibration decision),
and RUNBOOK_QUANTIZE_LARGE_Q8.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the DGX Spark GB10 (Blackwell sm_121, 128 GB unified
memory, CUDA 13.0, PyTorch cu130) as the project's GPU workstation
and the only place Chatterbox + Whisper + ffmpeg run at production
scale. Covers Tailscale-fronted access (no public SSH), the
"develop locally / run on Spark" workflow as a six-step loop
(author -> push -> kick off -> Spark renders -> publish -> hydrate),
and what lives Spark-side and only Spark-side: the ttsctl tool
itself at ~/projects/ttsctl/, the per-decision changelog under
~/projects/ttsctl/changelog/decisions/, the model caches, and the
intermediate WAVs. Names the unified-memory benefit (no PCIe
shuffle between Chatterbox GPU output and ffmpeg CPU mastering)
as the GB10's specific virtue for this workload.

Grounded in corpan/NARRATION_SYSTEM.md "Hardware" section,
PIPELINE_STATE.md teammate-context snapshot, and the auto-memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the two AWS S3 buckets in us-east-2 (corpan-prod as
the production data plane fronted by CloudFront
d38iwc9748jekz.cloudfront.net; corpan-assets for marketing /
developer-facing assets), the catalog.json runtime read path,
the producer-consumer choreography of "audio first, zip second,
catalog third, invalidate last", and the hydration scripts
(hydrate-audio.sh, hydrate-voices.sh, hydrate-marketing.sh,
hydrate-captures.sh) that make the producer-consumer split
hospitable for local dev. Names the AWS auth profiles and the
Cache-Control: max-age=60, stale-while-revalidate=300 catalog
caching behavior with the reader's ?_t= cache-buster.

Grounded in infra/sync-voices-to-s3.sh, infra/hydrate-audio.sh,
infra/sync-marketing-to-s3.sh, infra/captures/build-and-upload.sh,
and corpan/NARRATION_SYSTEM.md's "Publishing" section.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The synthesis section the briefing flagged as critical. Maps the
four state locations (repo, Spark, S3+CloudFront, user's device)
plus a fifth (GitHub for the canonical remote and the binary
release path) and the named seams that synchronize them: git
push/pull and rsync over Tailscale for repo <-> Spark, ttsctl
publish and infra/sync-*-to-s3.sh for Spark/repo -> S3, CloudFront
plus the running app for S3 -> device, and the deliberate
"nothing" for device -> anywhere (no cloud sync, no accounts).
Walks the publish ordering invariants, what gets touched on a
code change vs. on a narration publish, and what does NOT
synchronize and would have to be reconstructed if lost.

Grounded in sections 22 (Spark) and 24 (S3) just written, the
infra script families, PIPELINE_STATE.md, and CHANGELOGS.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Babylon.js + @babylonjs/loaders in Hover Runner and Juice Squeeze;
Phaser 3.80 in Quest-Ear; Tone.js in Melopan (off-branch). Walks the
SVG-to-GLB Blender build pipeline in hover-runner/scripts/svg_to_3d_v2.py,
the construct-on-mount / dispose-on-unmount discipline that applies
across every engine, and why each pack picks its own engine instead
of standardizing.

Grounded in hover-runner/{package.json, scripts/svg_to_3d_v2.py},
juice-squeeze/package.json, quest-ear/package.json, and the
melopan-2026-05 auto-memory note.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The infra/captures pipeline: iPad screen recording -> sidecar -> four
ffmpeg-derived variants (long 1200x1600, shorts 1080x1920 with blur-pad,
square 1080x1080, thumb 1280x720) -> S3 mirror at corpan-assets/captures/
-> YouTube upload via the corpan-yt Python click CLI. Covers the
yuvj420p->yuv420p color-space discipline, the mandatory-sidecar
workflow, the slug naming convention, the YouTube videos.insert
daily quota handling, and the branding/ channel-level assets.

Grounded in infra/captures/CAPTURES.md, build-capture.sh,
build-and-upload.sh, and youtube/pyproject.toml.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Bundle id com.corpora.corpan, team F9AV5HKF6N, iOS 16.0 minimum.
The XcodeGen-driven regen pattern (ios-gen.sh produces gen/apple/
from src-tauri/ios/project.yml; gen/ is not hand-edited). Covers
the Swift plugin side of tauri-plugin-stt and friends, capabilities
required (IAP, Background Audio, Microphone, Speech Recognition),
PrivacyInfo.xcprivacy declarations, the StoreKit test config in
Corpan.storekit, the Apple Feedback Assistant try-many-URLs fallback,
and the App Store submission flow.

Grounded in src-tauri/ios/project.yml, scripts/ios-gen.sh references
in APP_RELEASE_0_11_3.md, infra/IAP_SETUP_RUNBOOK.md, RELEASE_NOTES_
0.13.1.md, and the iOS feedback command in lib.rs:1232.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The Tauri Android target with its specific rough edges: patch-android.sh
pins compileSdk=36, targetSdk=36, ndkVersion=28.2.13676358, Java/Kotlin 17;
gen/android/ is generated and not hand-edited; tauri-plugin-iap
contributes com.android.vending.BILLING; the vendored ndk-context fork
and prevent_exit discipline are the response to the Activity-re-init
crash chain (section 04 walks). Covers the Kotlin plugin side, whisper.cpp
on CPU+NEON for pronunciation coach, release signing via upload-keystore.jks
(not in git), and Play Console metadata mirroring iOS.

Grounded in APP_RELEASE_0_11_3.md, scripts/patch-android.sh references,
corpan/CLAUDE.md Android section, Cargo.toml's [patch.crates-io] block,
and RELEASE_NOTES_0.12.7_ANDROID.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Umanistan and others added 11 commits May 29, 2026 16:30
macOS / Windows / Linux as the third target family. Same Tauri
binary, OS-provided WebView (WKWebView / WebView2 / WebKitGTK),
1200x1000 window default per tauri.conf.json. The deliberate
asymmetry: full reader + catalog + marketing-site embed, but STT
stubs out (the pronunciation coach is not desktop-shipping today)
and IAP / audio-keepalive behave differently. Mac App Store signing
identity committed in tauri.conf.json; notarization and Windows
EV code signing deferred.

Grounded in tauri.conf.json's app.windows and bundle.macOS sections,
plugins' src/desktop.rs stubs (the STT one explicitly walked),
the iOS WebKit codec story from section 18.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The directory of the five general-purpose languages (TypeScript,
Rust, Python, Kotlin, Swift) plus the supporting languages (HTML,
CSS, SQL, YAML, JSON, Markdown, LaTeX, Lua). Maps the
one-language-per-concern table, the canonical entry point per
language, and the decision tree for picking which language a new
piece of work belongs in.

Cross-references the deep-dive sections that already cover each
language: TypeScript (07), Rust (05), Python (19), Kotlin (28),
Swift (27).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
bash scripts as the glue layer with the set -euo pipefail discipline,
the comment-block-as-docs convention, and the deliberate "shell as
glue, Python as logic" split. Walks the directory map (infra/,
infra/captures/, corpan-app/scripts/, web/scripts/, voices/scripts/),
the collaboration with jq/aws/ffmpeg/curl/unzip, the bootstrap-vs-
working flavors, and the .env-or-environment credentials dance.

Grounded in the existing scripts I read for sections 22, 24, 25, 27,
and 28: sync-voices-to-s3.sh, hydrate-audio.sh, build-capture.sh,
build-and-upload.sh, sync-marketing-to-s3.sh, ios-gen.sh references,
patch-android.sh references.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Four managers, one per language family: npm + package-lock.json
(committed), Cargo + Cargo.lock (gitignored - the documented
rusqlite/sqlx exception), pip / uv with per-subtree
requirements.txt or pyproject.toml, and Homebrew for the
system-binary layer the shell scripts call. Per-subsystem
manifests over a monorepo-wide hoist (Vite/zustand can vary
between packs without coordination), npm ci in CI vs. npm install
locally, --legacy-peer-deps for the older packs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The practitioner's view: CLAUDE.md/AGENTS.md as agent-facing docs
at subtree roots, plan mode vs action mode, three concurrent
worktrees as the routine parallel primitive, the auto-memory
contract at ~/.claude/projects/.../memory/, the pr-agent loop on
every PR (Codium PR-Agent, not a gate), and the discipline of
reading the diff before approving the agent's claim.

Grounded in corpan/CLAUDE.md existence, the various AGENTS.md
files, .github/workflows/pr-agent.yml (already documented in
section 03), and PIPELINE_STATE.md's role as a maintained snapshot.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The complement to section 33: the inventory of judgments humans
still hold (architectural, product, taste), the trap of deferring
judgment to the agent, the shift in skill emphasis toward reading
code and writing prose, and the mitigations the codebase encodes
(CLAUDE.md/AGENTS.md as defaults, reading-the-diff discipline,
auto-memory feedback entries, PR review by Skylar).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Dated speculation (2026-05-29 snapshot) about where the project is
pointed: near-term (more languages and books, mature pronunciation
coach, web codex), medium-near (device-to-cloud user state, Brewfile,
deterministic gen/ rebuilds, catalog-versioned pack delivery), and
larger bets (on-device TTS catching up, the corpora platform
absorbing graduated components, the durability of the agent-era
patterns).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Dated snapshot (2026-05-29) of the 90-day arc on upstream/main:
the IAP rewrite for App Review, the catalog v2 narrator-first
rewrite, analytics hardening for WKWebView, the 10k phrase
corpus slim with nine new languages, World Radio's native
streams via tauri-plugin-radio-stream, pronunciation coach on
Android CPU+whisper.cpp, the DGX-driven catalog publisher,
Earthgate/Hanzipan polish, and the Parlometron / phrase-pack
architecture push through 0.13.x and 0.15.x. Closes with three
architectural shifts: catalog became narrator-first, audio
runtime became fully native, on-device pronunciation matured.

Grounded in 'git log --since=90.days' against upstream/main.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… Where to Look)

A: proper-noun glossary clustered by people/products/packs/pipelines/
hardware/build-terms/files-and-paths.
B: file/directory/commit/PR/changelog/release-notes conventions plus
the Codex section template rules.
C: ~30 most-run commands grouped by setup / run / build / publish /
inspect / recover.
D: reading list cross-referencing every external book/paper/talk the
Codex points at.
E: reverse index from "I want to understand X" to specific file +
section, plus the 20-minute starter set.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Single-file edition of the full Codex: README front matter, all 36
numbered sections, all 5 appendices, separated by horizontal rules.
12,439 lines of plain markdown for offline reading in any markdown
viewer or in a terminal with 'less codex/CODEX.md'.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Remove the two intentional em dashes (in section 33 and Appendix B
  that referenced the character literally). Rephrase to describe the
  rule without quoting the character.
- Replace Korean sample in section 16 (SQLite) with romanized form;
  the point about translation rows in a non-Latin script is the same
  with Latin characters and the example reads cleanly.
- Replace Punjabi initial_prompt sample in section 21 (Whisper) with
  a descriptive placeholder; the technique is what matters, not the
  literal script.
- Replace media-control glyphs in section 15 (Pack Transport) ASCII
  diagram with plain-text labels (|< [-30] [play/pause] [+30] >|).
- Regenerate codex/CODEX.md and codex/CODEX.pdf.

PDF now renders with zero missing-glyph warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant