feat(plugins): scheduled sync, metadata enrichment, and detailed progress by AshDevFr · Pull Request #33 · AshDevFr/codex

AshDevFr · 2026-06-07T05:42:58Z

Summary

Expands the plugin integration system with four capabilities: admin-scheduled automatic sync, opt-in metadata enrichment of sync and recommendation payloads, accurate per-book reading progress in the sync protocol, and a set of "echo" debug plugins that record host→plugin traffic to disk. Together these let integration authors build richer, rules-driven plugins, fix a long-standing progress-accuracy bug for libraries with volume gaps, and give operators a credential-free way to inspect what the server actually sends to plugins.

Motivation

Previously a plugin sync only ran when a user clicked "sync", and the sync payload carried reading progress as a relative count of books read, which misreports the true position for any library that doesn't start at volume 1. Plugin authors also had no way to build behavior driven by series metadata (tags, genres, summaries) because that data was never sent, and no way to inspect sync or recommendation payloads without wiring up a real external account and reading interleaved server logs. This work addresses all of those: keeping integrations current automatically on an admin-controlled cadence, shipping richer data under user control, sending correct progress numbers, and making the wire protocol inspectable.

Changes

Operators / admin: Admins can set a per-plugin cron cadence for automatic sync (capability-gated to sync-capable plugins, validated, applied without a server restart). When the schedule fires, the server enqueues a sync only for connections that are enabled, authenticated, and opted into automatic sync, with no duplicate stacking across ticks.
Web UI / integrations: Users get an "Automatic sync" toggle on connected sync plugins (manual by default, so nothing changes until opted in), plus per-field metadata toggles (send tags, send genres, send bibliographic metadata, send custom metadata) shown only for plugins that consume enriched data. Each toggle is independent so a rules plugin can ship just tags without heavy summaries; custom metadata is called out as a privacy-sensitive export.
API: New user endpoints to set sync mode and metadata-sharing preferences; plugin responses now expose the new wantsFullMetadata and wantsDetailedProgress capabilities and the stored sync schedule.
Sync protocol: Sync entries can now carry series tags, genres, a bibliographic block, and custom metadata (all opt-in). Progress now includes the highest read volume and chapter derived from detected per-book numbers, fixing inaccurate reporting for non-contiguous libraries, plus an optional per-book breakdown for plugins that request it. The existing relative count is unchanged for backward compatibility.
Official AniList plugin: Now reports the accurate highest volume/chapter when available, falling back to the previous behavior against older hosts.
Plugin SDK & debug tooling: SDK types gain the new capability and data-directory fields, and new sync-echo and recommendations-echo plugins (alongside the existing metadata echo) record each request and response as paired JSON files in their data directory, with credentials redacted, for protocol inspection.
Docs: Plugin and integration docs describe scheduled sync, the enrichment toggles and their privacy/payload trade-offs, the new progress fields, and the echo plugins' payload recording.

Notes

Adds one additive, nullable column to the plugins table for the sync schedule; existing installs are unaffected until an admin sets a cadence. No other schema changes.
All new behavior is inert by default: scheduled sync requires both an admin cadence and a user opt-in, and every enrichment/progress-detail field is gated by a plugin capability plus (where applicable) a default-off user toggle, so existing plugins and payloads are unchanged.
The heavier metadata block and per-book breakdown scale with library size; they are opt-in and capability-gated for that reason, while the small highest-volume/chapter accuracy fix flows to every sync plugin without opt-in.

Lay the groundwork for admin-scheduled, per-user-opt-in plugin syncs: an admin sets a cron cadence per plugin, and each user chooses whether their connection syncs automatically on that cadence. - Add nullable `plugins.sync_cron_schedule` column (migration + entity). NULL means no scheduled sync, so existing installs are unchanged. - Add `PluginsRepository::list_sync_scheduled()` returning enabled plugins with a cron set, for the scheduler to load at boot/reload. - Add `user_plugins::Model::auto_sync_enabled()` reading the host-side `config._codex.autoSync` opt-in (default false, manual-only). The `_codex` namespace constant now lives in codex-db and is re-used by codex-tasks so the parser and accessor share one source. No behavior change yet: the column is inert until the scheduler and API surface are wired up. Includes unit tests for the repo query and the auto-sync accessor.

When an enabled, sync-capable plugin has an admin-configured cron, the scheduler now registers one cron entry per plugin and, on each firing, enqueues a UserPluginSync task for every connected user whose connection is enabled, authenticated, and opted into auto sync (config._codex.autoSync). - Add Scheduler::load_plugin_sync_schedules() + add_plugin_sync_schedule(), wired into start() so reload_schedules() applies cron changes without a restart. Plugins lacking the user_read_sync capability are skipped. - Add fan_out_plugin_sync(): selects eligible connections, skips any with a pending/processing sync (reusing the existing per-user dedup), and logs a one-line summary of the outcome. Also fix a dedup bug this surfaced: enqueue coalesced UserPluginSync by task_type alone (the type has no FK columns and no dedup_params), so every user's fan-out task would have collapsed into one (and the manual path was latently affected too). Add TaskType::plugin_user_dedup() and dedup UserPluginSync on (plugin_id, user_id) in find_existing_task, so different users keep independent tasks while a repeat for the same user coalesces. Includes scheduler and task-queue tests.

Expose the scheduled-sync controls over the API: - Admin: PATCH /admin/plugins/{id} accepts syncCronSchedule (omit = no change, null = clear, string = set). The value is validated and normalized, and rejected unless the plugin's manifest declares the user_read_sync capability. PluginDto now returns the stored schedule. - User: new PATCH /user/plugins/{id}/sync-mode { auto } toggles a connection between automatic and manual sync. It is capability-gated and does a read-modify-write of the host-only config._codex.autoSync flag, preserving all other config keys. UserPluginDto exposes a derived auto_sync field. Reload the cron scheduler when a schedule changes so it takes effect without a server restart: on plugin update when the schedule field is present, and on enable/disable for plugins that carry a schedule (since registration is gated on the enabled state). Includes API tests for both endpoints (validation, capability gating, clearing, and config-merge preservation).

Surface the scheduled-sync controls in the web app: - Admin: a "Sync Schedule (cron)" input on the plugin edit form, shown only for plugins whose manifest declares the user_read_sync capability. It initializes from the plugin and is sent only for sync-capable plugins (a string sets the cadence, empty clears it). - User: an "Automatic sync" switch on the connection card, shown only for connected, sync-capable integrations. Toggling it calls the sync-mode endpoint and reflects the connection's autoSync state. Adds userPluginsApi.setSyncMode and wires it through a mutation with cache invalidation and toasts. Includes component tests covering the switch's visibility (capability gating), checked state, and toggle callback.

Document the admin-managed sync cadence and the per-user auto/manual opt-in: - Plugins overview gains a "Scheduled (Automatic) Sync" section covering how an admin sets a per-plugin cron on the Execution tab, how users opt their connection in (off by default), and what happens when the schedule fires (enabled + connected + opted-in connections only, deduped, expired credentials skipped). Adds autoSync to the _codex settings reference. - AniList Sync page gains a "Manual vs Automatic Sync" subsection that points at the overview.

The "fetch library data + aggregate per-series reading progress" logic was copy-pasted across build_user_library (recommendations) and build_push_entries/build_unmatched_entries (sync), each re-fetching books, progress, metadata, and ratings and folding them the same way. Collapse it into a single batched build_series_engagements in codex-services: one pass fetches series metadata, books, the user's reading progress and ratings, library names, and optionally taxonomy, then folds book-level progress into a per-series SeriesEngagement. Each caller projects that aggregate into its own wire DTO; the sync path adds a shared project_sync_entry and drops the now-dead series_library_info. An EngagementOptions::include_taxonomy flag lets the sync path skip the genres/tags/alternate-title/external-id queries it doesn't need, keeping its query count unchanged. No behaviour change; existing sync push and recommendation tests pass unmodified, with new builder unit tests added.

Add a SeriesMetadata bibliographic block (summary, publisher, role-carrying authors, age rating, language, reading direction) plus two optional, independently gated fields — metadata and customMetadata — to the UserLibraryEntry and SyncEntry wire types. Both are omitted when unset, so existing plugins are unaffected. SeriesEngagement gains projection helpers that build the block and parse the user-defined custom_metadata JSON. Author parsing is refactored into a shared helper that is resilient per-element: an entry with an unrecognized role keeps its name rather than dropping the whole list. Covers and a standalone artists field are deliberately omitted (no external cover URL; role on the author subsumes artists). No caller enables the new fields yet; this only establishes the contract. Tests added for serialization, omission, author parsing, and the projections.

Wire the previously-defined enrichment fields to plugins, controlled by a manifest capability plus four independent per-user toggles so a connection sends exactly as much as it opts into. - Add a `wantsFullMetadata` plugin capability; the host only does the extra work, and the UI only shows the toggles, when a plugin declares it. - Add four host-only `_codex` toggles (sendTags, sendGenres, sendMetadata, sendCustomMetadata) with entity accessors; the effective flag per field is capability AND the toggle. All default off, so existing connections push the same minimal payload as before. - Add top-level genres/tags to sync entries (they carried no taxonomy), so a rules-based sync plugin can filter on tags without receiving heavier bibliographic fields. - Attach each opted-in field on sync push (matched and search-fallback entries) and on recommendation seeds; taxonomy is only fetched when a toggle needs it. Recommendations keep sending genres/tags as before. - Add PATCH /api/v1/user/plugins/{id}/metadata-settings (partial update, preserves sibling _codex keys, rejected when the plugin lacks the capability), and expose the capability and current toggle values on the user-plugin DTO. Tests added across the entity accessors, settings parsing, per-flag sync push, and the new endpoint.

…ings UI Surface the four per-field metadata opt-ins on a plugin connection. The settings modal gains a "Metadata Enrichment" section, shown only when the plugin declares the wantsFullMetadata capability, with switches for sending tags, genres, the bibliographic metadata block, and custom metadata. Tags and genres appear only for sync plugins (recommendation entries already carry them); descriptions call out that metadata (summaries) is the heavy option and that custom metadata can expose private fields. Toggles persist through the existing config-update path, preserving other _codex settings. Mock fixtures advertise the capability so the mock UI shows the section, and the plugin sync docs describe the capability and trade-offs. Adds component tests for the capability gating and initial state.

Extend SyncProgress with maxVolume/maxChapter — the highest read volume and chapter derived from Codex's per-book number detection — alongside a readBooks per-book breakdown, backed by a new SyncBookProgress struct carrying detected volume/chapter and page position. Unlike the existing `volumes` count (relative books-read), maxVolume and maxChapter stay accurate for libraries that don't start at volume 1 or have gaps, giving sync plugins the data to report absolute progress. readBooks exposes the raw per-book detail so authors of custom sync targets can map progress however their service expects. All new fields are additive and optional, omitted from the wire when unset, so existing plugins and the `volumes` field are unaffected. This lands the protocol contract only; computing and attaching the values in the push path is wired up separately. Includes serialization, round-trip, and backward-compatibility tests.

Declare a wantsDetailedProgress plugin capability that gates the per-book reading-progress breakdown (readBooks) on sync entries. The accurate maxVolume/maxChapter fields are always sent and stay ungated; only the heavier per-book detail is opt-in, so plugins that don't consume it pay no extra fetch or payload cost. Surface the capability on the user-plugin capabilities DTO so tooling can detect support, populated from the cached manifest alongside the existing capability flags, and regenerate the OpenAPI spec and TypeScript types to match. The flag is inert until the push path attaches the breakdown. Includes capability parse tests and a DTO serialization assertion.

Wire the detailed-progress fields end to end. The shared engagement builder gains an opt-in per-book detail pass: when enabled it batch-fetches book metadata and folds each book's detected volume/chapter and page position into a per-book breakdown, leaving the recommendations path untouched. The sync push projection then derives the highest read volume and chapter from that breakdown — accurate for libraries that don't start at volume 1 or have gaps, unlike the existing relative count — and, for plugins that declare wantsDetailedProgress, attaches the full per-book breakdown. The matched and search-fallback paths share one projection so both carry the same data, and the extra metadata query runs only when a detail-consuming plugin is connected. Includes builder and push projection tests.

…ss SDK types Make the highest read volume/chapter always flow on the sync push path so every sync plugin gets accurate progress for libraries with gaps, with no opt-in; only the heavier per-book breakdown stays gated behind the wantsDetailedProgress capability. Previously the accurate numbers rode the same gate as the breakdown, which would have forced the AniList plugin to receive (and ignore) the full breakdown just to read two numbers. Mirror maxVolume/maxChapter/readBooks and SyncBookProgress into the TypeScript SDK, and update the AniList plugin to prefer maxVolume/maxChapter (flooring fractional chapters) over the relative books-read count, falling back to the count for older hosts. Document the accuracy behaviour, the gapped-library example, and the capability/payload trade-off. Includes plugin mapping tests.

cloudflare-workers-and-pages · 2026-06-07T05:47:38Z

Deploying codex with Cloudflare Pages

Latest commit:	`2934468`
Status:	✅ Deploy successful!
Preview URL:	https://f0a56d38.codex-asm.pages.dev
Branch Preview URL:	https://plugin-sync-cron.codex-asm.pages.dev

View logs

The UserPluginCapabilitiesDto gained a required wantsDetailedProgress field, but the MSW mock handlers for user plugins were not updated, breaking the frontend type-check and build. Add the field to all three capability objects in the mock handlers (anilist enabled plugin, mangabaka available plugin, and the dynamically created enable handler) so the mocks satisfy the generated type and the build passes.

Add a file-based payload recorder to the echo (debug) plugins, starting with metadata-echo. On each call it writes the request and its matching response to paired JSON files under the plugin's host-provided data directory, so the host->plugin protocol traffic can be inspected without trawling server logs. - Files share a sortable basename (yyyy-MM-dd-HH-mm-ss-{id}-{method}, UTC) and differ only by a -request.json / -response.json suffix. - Each file is a JSON envelope holding the payload plus a snapshot of the active config. Credentials are never written; secret-like config keys are redacted. - Recording is bounded (oldest files pruned), falls back to a temp dir when no data directory is provided, and is best-effort so a disk error never breaks an RPC response. Toggled via recordPayloads / maxPayloadFiles admin config. Surface two fields the host already supports but the TypeScript SDK omitted: InitializeParams.dataDir (the scoped writable file-storage directory) and PluginCapabilities.wantsDetailedProgress (opt in to per-book sync progress). Add unit tests for the recorder.

A test/debug sync plugin that talks to no external service. It accepts any push (echoing every entry back as alternating created/updated, never failing), returns deterministic, fully-populated entries on pull (respecting the request limit), and reports canned status. Declares wantsDetailedProgress so it receives the per-book detailed progress payload, and records every request and response to files via the shared payload recorder, making it easy to inspect what the host sends over the sync protocol without trawling server logs. Includes unit tests for the provider and the recording integration.

Add the recommendations-echo debug plugin: it echoes the user's library seeds back as fully-populated recommendations (respecting limit and excludeIds), implements updateProfile/dismiss/clear, and records every request/response to files via the shared payload recorder. Includes tests. Put the echo plugins on equal footing in the build and packaging setup: - CI: add sync-echo and recommendations-echo to the plugin test matrices in both ci.yml and build.yml, and to the publish matrix in build.yml. - docker-compose: mount, build, and watch both new plugins in the dev stack. - .dockerignore: exclude all plugins' build artifacts from the build context via globs. - docs: list the echo sync/recommendations plugins and document payload recording. Treat the echo plugins as debug-only rather than store offerings: remove the echo entry from the user-facing Official Plugins gallery and instead provision the echoes via the seed config (sample updated), where test/dev plugins belong. The Makefile needs no change; its plugin glob already discovers the new plugins.

…tity Two related changes that make no-auth user plugins (e.g. a debug echo or a NocoDB-backed sync) first-class. Per-user identity: the host now sends userId and userPluginId to every user plugin in the initialize params (absent for system plugins). A plugin that has no per-user credential — or authenticates with an admin-configured shared key — can't derive the user's identity from its credentials, so it needs this to scope data per user in its own backend. The identifier is sent regardless of whether the plugin requires auth. No-auth / shared-key support: a plugin is now "connected" when it has credentials OR requires no per-user authentication. requires_authentication() on the manifest (declares OAuth or required credentials) drives a new UserPlugin::is_connected(requires_auth), applied consistently to the user-plugin DTO (which also exposes requires_auth), the manual-sync guard, the sync-status DTO, and the auto-sync scheduler eligibility. Previously such plugins could never connect, so they could be neither synced manually nor scheduled. Frontend: the integrations card shows sync controls (Sync Now, Automatic sync) and an "Enabled" badge for a no-auth plugin, and hides Disconnect since there is no external account to unlink; auth plugins are unchanged. OpenAPI types regenerated. Tests added across the manifest, entity, scheduler, and card.

…the sync cron Whether a plugin receives enriched series data is a property of the plugin, not a per-user preference, so move that control to the admin side and keep only a genuine privacy choice for the user. - Tags, genres, and the bibliographic metadata block are now admin policy on the plugin (config._codex.send*, default on when the manifest declares wantsFullMetadata), exposed via plugins::Model accessors and configured in the plugin's Configure dialog. The per-user tags/genres/metadata toggles are removed. - custom_metadata stays a per-user privacy opt-out (default off), shown on the connection settings; the metadata-settings endpoint is narrowed to that field. - Effective send = capability AND admin policy (tags/genres/metadata) or AND user opt-out (custom). Recommendations keep genres/tags as baseline taste signal. - Per-user rules use the existing userConfigSchema, which now supports a multi-line JSON/textarea field type. The echo plugins declare wantsFullMetadata and a sample rules field, and the plugin SDK gains the capability. - Relocate the automatic-sync cron from the admin Edit modal to the Configure dialog, alongside the library filter (sync plugins only). Also makes the sync-plugin test helper declare OAuth so sync plugins require authentication. Tests added for the admin policy, the user opt-out, and the relocated cron.

…policy to recommendations Round out the admin metadata-enrichment policy in the plugin Configure dialog: - Add an "Allow custom metadata" admin gate (config._codex.allowCustomMetadata, default on) layered over the user's per-connection opt-in. Custom metadata is sent only when the plugin declares the capability, the admin allows it, and the user opts in — wired into both the sync and recommendation handlers. - Show the tags/genres policy for recommendation plugins too, not just sync. genres/tags remain baseline taste signal for recommendations (default on), and the recommendation handler strips them only when an admin explicitly turns the policy off — so there's no change unless an admin opts to trim payload. The Configure dialog now offers send tags/genres/metadata plus allow-custom for any plugin that declares wantsFullMetadata, sync or recommendation. Docs and tests updated.

The worker built its PluginManager without `.with_plugin_file_storage(...)`, unlike the serve path. As a result `resolve_plugin_data_dir` returned None for every plugin spawned in the worker, and the init message carried `dataDir: null`. Plugins that write files (the sync cron and recommendations refresh, plus the echo plugins' recorded request/response payloads) then fell back to a container-local temp dir, so their output never reached the configured plugins directory. Metadata plugins were unaffected because enrichment runs in the serve process, which already wires file storage. Construct PluginFileStorage from config.files.plugins_dir in the worker command and pass it to the manager, mirroring serve. Also fix the compose wiring so the written files are actually visible: - mount ./docker/data/plugins into the production codex service - mount the sync-echo and recommendations-echo dist dirs into the screenshots service

The task priority scheme was reworked so user-facing and integration work (plugin sync/recommendations, metadata fetch) is claimed ahead of bulk background work like scanning and analysis. Two integration assertions in task_queue_integration still expected the old defaults and were failing: - AnalyzeBook is now 500 (was 800) - ScanLibrary is now 600 (was 1000) Update both assertions and their messages to match default_priority().

Follow-up to the priority rework: the task_priority_ordering and task_queue integration tests still asserted the old default priorities and were failing. Plugin and metadata work is now claimed ahead of scanning and analysis. - Rewrite the expected claim order and category comments so user plugin and metadata tasks precede scanning, analysis, and thumbnails. - Update per-type priority values: scan_library 1000->600, analyze_book 800->500, find_duplicates 400->860. - Refresh stale ordering comments in the PostgreSQL ordering test. Ordering-only assertions were left intact since scan-before-analysis still holds under the new values.

Three per-variant extraction tests still asserted pre-rework priorities that the central test_default_priority_values block had already been updated past: - refresh_library_metadata 385 -> 890 - poll_release_source 170 -> 850 - bulk_track_for_releases 155 -> 840 These match the values in default_priority(). The ordering-invariants test is relative-only and needed no change.

… YAML The seed config's library `title_preprocessing_rules` and `auto_match_conditions` fields were typed as raw JSON strings, forcing JSON-in-YAML with double-escaped regex backslashes and deferring any validation until scan time, where a malformed value silently stored and later failed. Deserialize both as native types (Vec<PreprocessingRule> and AutoMatchConditions) and serialize them to the JSON the libraries table stores in the seed loop, with error context naming the offending library. Patterns can now be written as single-quoted YAML scalars (e.g. '\s*$Digital$$') with no escaping, and a bad shape fails fast at seed time. This also brings these two fields in line with how allowed_formats and excluded_patterns are already handled. Document both fields with a commented example in the sample config and add seed-parsing tests.

The scheduler runs in every serve process, but the API's handle to it (AppState.scheduler) was gated on CODEX_DISABLE_WORKERS. In the standard split web/worker topology the web process has workers disabled, so the handle was None and reload_schedules() silently no-op'd. An admin-set plugin sync cron was written to the DB but never registered in the live scheduler, so it never fired until the process restarted. The same gap affected every live schedule edit (library scans, release sources). Decouple the handle from disable_workers: the scheduler is always started in serve, so the API always gets a handle and runtime schedule edits take effect without a restart. Also surface the admin-set cadence to users: UserPluginDto now carries a read-only syncCronSchedule, and the connection card shows a human-readable cadence (via a shared describeCron helper) or "not set up yet" with the auto-sync toggle disabled when no schedule is configured. Tests added for the DTO field, the cron description helper, and the card's configured/not-configured states.

Every serve replica runs its own in-process scheduler, so in a horizontally-scaled deployment each plugin sync cron fires once per replica and fans out duplicate per-user syncs. The only guard was a check-then-insert pending check, which races under concurrency. Add a per-firing claim: a scheduled_firing_claims table whose composite primary key (job_key, fire_slot) lets exactly one replica win a given firing. The plugin-sync closure claims "plugin_sync:<id>" at the firing's minute slot before fanning out and skips if it lost; the slot is truncated to the minute so replicas firing for the same occurrence agree despite clock skew. Claiming fails open so a transient claim-table error keeps syncs flowing. Back that with a partial unique index enforcing at most one pending/processing user_plugin_sync task per (plugin_id, user_id). The keys live in the params JSON, so the index is on the extracted values (params->>'...' on Postgres, json_extract on SQLite). enqueue already retries on a unique violation, so this turns its racy dedup into an atomic one, closing the minute-boundary and scheduled-vs-manual overlap windows. Scope: only the plugin-sync fan-out is gated on the claim, since that is the firing that does redundant external work. Entity-keyed jobs are already deduped by existing unique indexes; non-entity-keyed jobs still fire per replica and can reuse the generic claim helper in a follow-up. Tests added for claim election under concurrency, slot truncation, and the unique index rejecting duplicate pending rows.

AshDevFr added 13 commits June 6, 2026 14:11

AshDevFr added 4 commits June 7, 2026 09:42

AshDevFr force-pushed the plugin-sync-cron branch from 61684ba to 1f178c7 Compare June 7, 2026 18:20

AshDevFr added 10 commits June 7, 2026 12:43

AshDevFr merged commit d1e1f75 into main Jun 8, 2026
21 checks passed

AshDevFr deleted the plugin-sync-cron branch June 8, 2026 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(plugins): scheduled sync, metadata enrichment, and detailed progress#33

feat(plugins): scheduled sync, metadata enrichment, and detailed progress#33
AshDevFr merged 27 commits into
mainfrom
plugin-sync-cron

AshDevFr commented Jun 7, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AshDevFr commented Jun 7, 2026

Summary

Motivation

Changes

Notes

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying codex with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jun 7, 2026 •

edited

Loading