Skip to content

feat: add post_niche mapping table#3912

Merged
idoshamun merged 3 commits into
mainfrom
feat/post-niche-mapping
May 27, 2026
Merged

feat: add post_niche mapping table#3912
idoshamun merged 3 commits into
mainfrom
feat/post-niche-mapping

Conversation

@idoshamun
Copy link
Copy Markdown
Member

Summary

Adds the materialized post → niche mapping in postgres so the feed diversifier can look up a post's niches in O(1), without joining post.tagsStrkeyword_niche on every feed request.

Follow-up to #3911 which introduced the niche and keyword_niche tables.

New table

post_niche
  postId   text       FK -> post.id    (CASCADE)
  nicheId  uuid       FK -> niche.id   (RESTRICT)
  rank     smallint   1=primary, 2=secondary  (CHECK 1..2)
  score    real       nullable — derivation score, for debug/audit
  computedAt, updatedAt

  PK (postId, nicheId)        — prevents the same niche twice per post
  UQ (postId, rank)           — enforces 1 primary + at most 1 secondary
  IDX (nicheId, rank)         — for 'all posts in niche X' analytics + future per-niche listings

Why a mapping table (not columns on post)

  • Symmetric: primary and secondary use the same row shape, no special-casing.
  • Cheap reverse lookups: WHERE nicheId = ? is a single index scan.
  • Extensible: bumping to 3+ niches per post later is a CHECK constraint change, not a schema migration.
  • Cascade-on-post-delete keeps it self-cleaning.

How it'll be populated

Out of scope for this PR. Future work:

  1. A TypeScript derivation helper that takes post.tagsStr + keyword_niche and returns {primary, secondary} using the rules in docs/feed-niche-taxonomy.md (IDF × weightMultiplier + ecosystem boost, top-1 + top-1 secondary with 0.35 threshold, fallback to other).
  2. A hook on post create / tagsStr change to recompute.
  3. A backfill job over existing posts.

How the diversifier will use it

At rerank time, each candidate joins to post_niche (1-2 rows), and the MMR penalty is applied when any niche overlaps with higher-ranked posts in the same response. Ecosystem niches get a sharper penalty multiplier than theme niches.

Materializes the post -> niche relationship in postgres so the feed
diversifier can look up a post's niches in O(1) (vs. joining
tags -> keyword_niche per request).

Schema only — population logic (derivation from post.tagsStr +
keyword_niche via weighted vote) lands separately.

  post_niche
    postId   text   FK -> post.id    (CASCADE)
    nicheId  uuid   FK -> niche.id   (RESTRICT)
    rank     smallint   1=primary, 2=secondary (CHECK 1..2)
    score    real   nullable — derivation score, for debug/audit
    PK (postId, nicheId)        — prevents the same niche twice per post
    UQ (postId, rank)           — enforces 1 primary + at most 1 secondary
    IDX (nicheId, rank)         — for 'all posts in niche X' analytics
@pulumi
Copy link
Copy Markdown

pulumi Bot commented May 27, 2026

🍹 The Update (preview) for dailydotdev/api/prod (at 681e2a5) was successful.

✨ Neo Explanation

Routine image rollout deploying the new `post_niche` taxonomy feature, which creates the `post_niche` table and installs PostgreSQL triggers on the `post` table to derive niche assignments automatically on insert/update. ✅ Low Risk — no stateful resource replacements; the main consideration is the per-row trigger overhead on post inserts.

This PR introduces the post_niche feature: a new PostNiche entity, two database migrations (PostNiche1779882869443 creating the post_niche table and PostNicheTrigger1779883630763 installing the post_niche_recompute function plus two PostgreSQL triggers on the post table), and accompanying test fixtures.

The deployment rolls the new image (668a464a) across all deployments and cron jobs, replaces the migration Jobs with new ones stamped with the new commit hash (standard pattern), and runs the DB migration job that will execute both new migrations against production.

🔵 Info — The new post_niche_insert_trigger fires AFTER INSERT ON post FOR EACH ROW. Every post insert in production will now execute the post_niche_recompute PL/pgSQL function synchronously. On a high-insert-rate table this adds per-row work; the function does several CTEs and subqueries joining keyword_niche and niche. Reviewers should be satisfied that this overhead is acceptable at production insert volume, or that inserts are already low-frequency enough for this to be a non-issue.

🔵 Info — The post_niche table has FK_post_niche_niche with ON DELETE RESTRICT, meaning a niche record cannot be deleted while any post_niche row references it. This is intentional but worth knowing operationally.

Resource Changes

    Name                                                       Type                           Operation
-   vpc-native-api-db-migration-f96bf451                       kubernetes:batch/v1:Job        delete
~   vpc-native-temporal-deployment                             kubernetes:apps/v1:Deployment  update
~   vpc-native-update-current-streak-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-channel-highlights-cron                   kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-gifted-plus-cron                          kubernetes:batch/v1:CronJob    update
+   vpc-native-api-clickhouse-migration-668a464a               kubernetes:batch/v1:Job        create
~   vpc-native-expire-super-agent-trial-cron                   kubernetes:batch/v1:CronJob    update
~   vpc-native-generic-referral-reminder-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tags-str-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-channel-digests-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-expired-better-auth-sessions-cron         kubernetes:batch/v1:CronJob    update
~   vpc-native-update-source-public-threshold-cron             kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tag-materialized-views-cron              kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-old-notifications-cron                    kubernetes:batch/v1:CronJob    update
-   vpc-native-api-clickhouse-migration-f96bf451               kubernetes:batch/v1:Job        delete
~   vpc-native-clean-zombie-users-cron                         kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-analytics-history-clickhouse-cron  kubernetes:batch/v1:CronJob    update
~   vpc-native-rotate-daily-quests-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-update-trending-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-update-highlighted-views-cron                   kubernetes:batch/v1:CronJob    update
~   vpc-native-squad-posts-analytics-refresh-cron              kubernetes:batch/v1:CronJob    update
~   vpc-native-private-deployment                              kubernetes:apps/v1:Deployment  update
~   vpc-native-sync-subscription-with-cio-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-materialize-monthly-best-post-archives-cron     kubernetes:batch/v1:CronJob    update
~   vpc-native-post-analytics-history-day-clickhouse-cron      kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-analytics-clickhouse-cron          kubernetes:batch/v1:CronJob    update
~   vpc-native-user-posts-analytics-refresh-cron               kubernetes:batch/v1:CronJob    update
~   vpc-native-generate-search-invites-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-personalized-digest-cron                        kubernetes:batch/v1:CronJob    update
+   vpc-native-api-db-migration-668a464a                       kubernetes:batch/v1:Job        create
~   vpc-native-clean-zombie-user-companies-cron                kubernetes:batch/v1:CronJob    update
~   vpc-native-deployment                                      kubernetes:apps/v1:Deployment  update
~   vpc-native-daily-digest-cron                               kubernetes:batch/v1:CronJob    update
~   vpc-native-channel-highlights-cron                         kubernetes:batch/v1:CronJob    update
~   vpc-native-update-achievement-rarity-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-opportunities-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-calculate-top-readers-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-update-views-cron                               kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-images-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-materialize-yearly-best-post-archives-cron      kubernetes:batch/v1:CronJob    update
~   vpc-native-check-analytics-report-cron                     kubernetes:batch/v1:CronJob    update
... and 12 other changes

idoshamun added 2 commits May 27, 2026 12:08
Adds the postgres-side derivation pipeline for post_niche:

* `post_niche_recompute(post_id, tags_str)` — PL/pgSQL function that
  recomputes post_niche rows for a single post from its tagsStr +
  keyword_niche. Implements the v1 derivation rules:
  - score = weightMultiplier × ecosystem_boost(1.4) × {1.0 primary | 0.5 secondary}
  - primary = argmax niche
  - secondary = next-highest niche where score >= 0.35 × primary AND post has >=2 labeled tags
  - fallback (no labeled tags) = niche with slug 'other'

  Callable directly for backfill:
    SELECT post_niche_recompute(id, "tagsStr") FROM post WHERE ...

* `post_niche_insert_trigger` AFTER INSERT ON post — fires for every new post.
* `post_niche_update_trigger` AFTER UPDATE OF "tagsStr" ON post — fires
  only when tagsStr actually changes (IS DISTINCT FROM guard).
  Both invoke post_niche_recompute via post_niche_trigger_function().

Tests cover:
  - empty / null tagsStr -> 'other' fallback
  - tagsStr containing only unlabeled tags -> 'other' fallback
  - single labeled tag -> primary only (no secondary, min-2-tags rule)
  - multiple labeled tags into same niche -> sums correctly
  - ecosystem boost beats theme niches on equal vote
  - secondary niche emitted when above 0.35 threshold and post has >=2 labeled tags
  - secondary niche suppressed when only one labeled tag is present
  - weightMultiplier dampens generic tags (programming -> software_craft)
  - tagsStr UPDATE triggers recomputation
  - tagsStr -> empty triggers fallback to 'other'
  - unrelated UPDATE does NOT fire the trigger (computedAt unchanged)
  - post DELETE cascades post_niche rows away
  - function is callable directly for backfill scenarios

Notes:
  - IDF is intentionally dropped in this v1 derivation (vs. our Python
    prototype) because keeping it in pl/pgsql requires per-call lookups
    of total keyword counts. Curated weightMultiplier + ecosystem boost
    carry the same intent for the cases that matter; revisit if audits
    show drift.
  - Re-labeling keyword_niche does NOT auto-propagate to affected posts;
    that's a backfill job (`SELECT post_niche_recompute(id, tagsStr) ...`).
@idoshamun idoshamun merged commit 55be342 into main May 27, 2026
9 checks passed
@idoshamun idoshamun deleted the feat/post-niche-mapping branch May 27, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant