feat: add post_niche mapping table#3912
Conversation
Materializes the post -> niche relationship in postgres so the feed
diversifier can look up a post's niches in O(1) (vs. joining
tags -> keyword_niche per request).
Schema only — population logic (derivation from post.tagsStr +
keyword_niche via weighted vote) lands separately.
post_niche
postId text FK -> post.id (CASCADE)
nicheId uuid FK -> niche.id (RESTRICT)
rank smallint 1=primary, 2=secondary (CHECK 1..2)
score real nullable — derivation score, for debug/audit
PK (postId, nicheId) — prevents the same niche twice per post
UQ (postId, rank) — enforces 1 primary + at most 1 secondary
IDX (nicheId, rank) — for 'all posts in niche X' analytics
|
🍹 The Update (preview) for dailydotdev/api/prod (at 681e2a5) was successful. ✨ Neo ExplanationRoutine image rollout deploying the new `post_niche` taxonomy feature, which creates the `post_niche` table and installs PostgreSQL triggers on the `post` table to derive niche assignments automatically on insert/update. ✅ Low Risk — no stateful resource replacements; the main consideration is the per-row trigger overhead on post inserts.This PR introduces the The deployment rolls the new image ( 🔵 Info — The new 🔵 Info — The Resource Changes Name Type Operation
- vpc-native-api-db-migration-f96bf451 kubernetes:batch/v1:Job delete
~ vpc-native-temporal-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-update-current-streak-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-channel-highlights-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-gifted-plus-cron kubernetes:batch/v1:CronJob update
+ vpc-native-api-clickhouse-migration-668a464a kubernetes:batch/v1:Job create
~ vpc-native-expire-super-agent-trial-cron kubernetes:batch/v1:CronJob update
~ vpc-native-generic-referral-reminder-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tags-str-cron kubernetes:batch/v1:CronJob update
~ vpc-native-channel-digests-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-expired-better-auth-sessions-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-source-public-threshold-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tag-materialized-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-old-notifications-cron kubernetes:batch/v1:CronJob update
- vpc-native-api-clickhouse-migration-f96bf451 kubernetes:batch/v1:Job delete
~ vpc-native-clean-zombie-users-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-profile-analytics-history-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-rotate-daily-quests-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-trending-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-highlighted-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-squad-posts-analytics-refresh-cron kubernetes:batch/v1:CronJob update
~ vpc-native-private-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-sync-subscription-with-cio-cron kubernetes:batch/v1:CronJob update
~ vpc-native-materialize-monthly-best-post-archives-cron kubernetes:batch/v1:CronJob update
~ vpc-native-post-analytics-history-day-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-profile-analytics-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-posts-analytics-refresh-cron kubernetes:batch/v1:CronJob update
~ vpc-native-generate-search-invites-cron kubernetes:batch/v1:CronJob update
~ vpc-native-personalized-digest-cron kubernetes:batch/v1:CronJob update
+ vpc-native-api-db-migration-668a464a kubernetes:batch/v1:Job create
~ vpc-native-clean-zombie-user-companies-cron kubernetes:batch/v1:CronJob update
~ vpc-native-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-daily-digest-cron kubernetes:batch/v1:CronJob update
~ vpc-native-channel-highlights-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-achievement-rarity-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-opportunities-cron kubernetes:batch/v1:CronJob update
~ vpc-native-calculate-top-readers-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-images-cron kubernetes:batch/v1:CronJob update
~ vpc-native-materialize-yearly-best-post-archives-cron kubernetes:batch/v1:CronJob update
~ vpc-native-check-analytics-report-cron kubernetes:batch/v1:CronJob update
... and 12 other changes |
Adds the postgres-side derivation pipeline for post_niche:
* `post_niche_recompute(post_id, tags_str)` — PL/pgSQL function that
recomputes post_niche rows for a single post from its tagsStr +
keyword_niche. Implements the v1 derivation rules:
- score = weightMultiplier × ecosystem_boost(1.4) × {1.0 primary | 0.5 secondary}
- primary = argmax niche
- secondary = next-highest niche where score >= 0.35 × primary AND post has >=2 labeled tags
- fallback (no labeled tags) = niche with slug 'other'
Callable directly for backfill:
SELECT post_niche_recompute(id, "tagsStr") FROM post WHERE ...
* `post_niche_insert_trigger` AFTER INSERT ON post — fires for every new post.
* `post_niche_update_trigger` AFTER UPDATE OF "tagsStr" ON post — fires
only when tagsStr actually changes (IS DISTINCT FROM guard).
Both invoke post_niche_recompute via post_niche_trigger_function().
Tests cover:
- empty / null tagsStr -> 'other' fallback
- tagsStr containing only unlabeled tags -> 'other' fallback
- single labeled tag -> primary only (no secondary, min-2-tags rule)
- multiple labeled tags into same niche -> sums correctly
- ecosystem boost beats theme niches on equal vote
- secondary niche emitted when above 0.35 threshold and post has >=2 labeled tags
- secondary niche suppressed when only one labeled tag is present
- weightMultiplier dampens generic tags (programming -> software_craft)
- tagsStr UPDATE triggers recomputation
- tagsStr -> empty triggers fallback to 'other'
- unrelated UPDATE does NOT fire the trigger (computedAt unchanged)
- post DELETE cascades post_niche rows away
- function is callable directly for backfill scenarios
Notes:
- IDF is intentionally dropped in this v1 derivation (vs. our Python
prototype) because keeping it in pl/pgsql requires per-call lookups
of total keyword counts. Curated weightMultiplier + ecosystem boost
carry the same intent for the cases that matter; revisit if audits
show drift.
- Re-labeling keyword_niche does NOT auto-propagate to affected posts;
that's a backfill job (`SELECT post_niche_recompute(id, tagsStr) ...`).
Summary
Adds the materialized post → niche mapping in postgres so the feed diversifier can look up a post's niches in O(1), without joining
post.tagsStr→keyword_nicheon every feed request.Follow-up to #3911 which introduced the
nicheandkeyword_nichetables.New table
Why a mapping table (not columns on
post)WHERE nicheId = ?is a single index scan.How it'll be populated
Out of scope for this PR. Future work:
post.tagsStr+keyword_nicheand returns{primary, secondary}using the rules indocs/feed-niche-taxonomy.md(IDF × weightMultiplier + ecosystem boost, top-1 + top-1 secondary with 0.35 threshold, fallback toother).tagsStrchange to recompute.How the diversifier will use it
At rerank time, each candidate joins to
post_niche(1-2 rows), and the MMR penalty is applied when any niche overlaps with higher-ranked posts in the same response. Ecosystem niches get a sharper penalty multiplier than theme niches.