[API-359] Adds user score indexer by schottra · Pull Request #488 · AudiusProject/api

schottra · 2025-10-21T20:35:58Z

This is an attempt to do a less disruptive update to aggregate_user.score. The legacy DN implementation ran an update query that would compute all user scores at once and then update them in the aggregate_user table.
This is bad for at least the following reasons:

Locks aggregate user table for the length of the query (which is on the order of 5-10 minutes to complete)
Reads from the write db to compute the scores, so subject to blocking by all sorts of other queries

get_user_scores was also doing some stuff per-user that made the query a little inefficient.

New plan is thus:

Run an indexer job that computes user scores in batches from the read replica, with a modified query that is a little more efficient against batches of a few thousand users. The read query sorts by created_at,user_id and returns a cursor we use to start the next batch.
Indexer fetches ids and scores, dedupes them (we have some duplicate user records even with the grouping and distinct is really slow here), then pushes the updated ids/scores into the write replica.
Job finishes when fetched size is < batch size (hit the beginning). It'll loop back around immediately.

Some additional features added after PR feedback:

Added tables to hold precomputed values for distinct hours and tracks played as well as a score features table that currently holds number of fast challenges (these have been backfilled manually)
I updated the triggers for plays and challenges to update the new precomputed aggregate tables. We can also truncate and recompute these whenever we want if we want to change the logic.
Added an index to aggregates that will help with a hot path in our score computation (index needs to include both followers and follow count, since we use those in this query).

Testing on local machine with a prod replica as a data source it takes ~ 3 minutes, but that's without the new index on aggregate_user.

This is a halfway step between our existing slow query and something that can update scores on a streaming basis.

raymondjacobson

i dont love this honestly, but i'm down to release it and see how it performs.

it's a lot of manual work to track if we end up with lots of jobs like these.

wondering if instead when we re-do indexing, we do something more "streaming"-like where we watch for certain things and then run one off queries against individual users. but would have to put more thought into that.

@rickyrombo may have a lot of thoughts here, but I am generally in favor of moving quickly, trying things, and then course correcting if they don't pan out

schottra · 2025-10-22T14:58:35Z

i dont love this honestly, but i'm down to release it and see how it performs.

it's a lot of manual work to track if we end up with lots of jobs like these.

wondering if instead when we re-do indexing, we do something more "streaming"-like where we watch for certain things and then run one off queries against individual users. but would have to put more thought into that.

@rickyrombo may have a lot of thoughts here, but I am generally in favor of moving quickly, trying things, and then course correcting if they don't pan out

I don't love it either. There's a lot of downside to not updating scores immediately when the conditions affecting them change. But adding triggers in all the right places also feels fraught. Adding a new feature to the score calculation means making sure we find all the places where that gets changed and triggering a score calculation. We could probably get close enough by leaning on triggers on a bunch of tables. But triggers are also causing us a lot of headache the more we use them 🤷 .

I can spend a little more time and see if I can get the update query to work efficiently with small batches of users so we can just loop on that until its done and throw out all the multi replica logic. Then it's just the same thing we're doing today, only a little slower to finish the cycle so that it doesn't block indexing writes.

rickyrombo

Finally got around to looking into this a bit...

tl;dr - I think I agree let's go forward with this and keep things moving and iterate on it later.

Truth be told I don't have much experience w/ this scoring query so it's hard to give good advice. I think I agree that some utility/intermediate tables might help. I also think I might agree with the endlessly cycling approach vs timer based. Generally agree we want to minimize job count creep, and realtime "streaming" updates sounds nice, but there's a point in queries like these where the cost of the updates on each request and the complexity of ensuring the proper triggers exceeds the benefit (similar to the Solana indexer things, where I'm also wary).

Might be a really dumb q: what does a read/calculation of a score look like for a single user? If it's fast enough (or can be made to be fast), maybe we do the calculation and update the score on read, like a cache (or even better yet, no score caching and just compute on demand). That would save a ton of wasted cycles on recalculating scores of inactive users...

Maybe something like:

If the score hasn't expired, return it
If the score has soft-expired, return it, and update the score after returning
If the score is very expired, calculate first, return new result

I'm not generally aware of how the score gets used though or if that's reasonable. If it's being used at the query level for things that kinda falls apart - would need some separate app code probably... but yeah forget all that I say ship as-is. This is probably one of the more gnarly of aggregates as it doesn't have a lane and sort of reaches into all sorts of tables for signals, so of all the ones to be a job I think this one is worthy anyway.....

schottra · 2025-10-24T15:13:58Z

Might be a really dumb q: what does a read/calculation of a score look like for a single user? If it's fast enough (or can be made to be fast), maybe we do the calculation and update the score on read, like a cache (or even better yet, no score caching and just compute on demand). That would save a ton of wasted cycles on recalculating scores of inactive users...

Maybe something like:

If the score hasn't expired, return it

If the score has soft-expired, return it, and update the score after returning

If the score is very expired, calculate first, return new result

I'm not generally aware of how the score gets used though or if that's reasonable. If it's being used at the query level for things that kinda falls apart - would need some separate app code probably... but yeah forget all that I say ship as-is. This is probably one of the more gnarly of aggregates as it doesn't have a lane and sort of reaches into all sorts of tables for signals, so of all the ones to be a job I think this one is worthy anyway.....

There are some expensive bits in this query that aren't consistent across users. Play history, reposts, followers all can be huge or small. So maybe the score query returns immediately or maybe it takes a few seconds. And it doesn't help that some of the conditions that invalidate your score are actions taken by other users. We could always set a minimum interval for a score update (only update if updated_at > 5 mins ago or something like that). I think that's an avenue worth exploring for the second pass at this if streaming updates prove to be too complicated to implement in a maintainable/reliable way.

* main: Add logo_uri to user coin (#487) [PE-7200] Add wallet coins endpoint (#486) Solana Indexer DBC pool improvements (#485) Remove log line (#484) Don't update DBC pool address from job to prevent spam to slack (#483) Refactor Solana Indexer, Support DAMM V2 (#473)

gitguardian · 2025-10-24T22:58:38Z

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
21650187	Triggered	Generic High Entropy Secret	`e4e2f43`	solana/indexer/damm_v2/indexer_test.go	View secret
21650188	Triggered	Generic High Entropy Secret	`e4e2f43`	solana/indexer/damm_v2/indexer_test.go	View secret
1606950	Triggered	Generic High Entropy Secret	`e4e2f43`	solana/indexer/damm_v2/indexer_test.go	View secret
21650189	Triggered	Generic High Entropy Secret	`e4e2f43`	solana/indexer/damm_v2/indexer_test.go	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

schottra · 2025-10-24T23:06:14Z

@raymondjacobson @rickyrombo Latest changes clean it up a bit and moves some of the features we use in the query to be precomputed and updated in a streaming fashion. Not quite ready for on-the-fly usage, but it should be faster!

### Description Being replaced by: AudiusProject/api#488 ### How Has This Been Tested? Lots of manual testing on the new indexer against prod data.

schottra added 2 commits October 21, 2025 11:55

add aggregates indexer and user score updater

b1a0f17

update batch size

1dab594

schottra requested review from raymondjacobson and rickyrombo October 21, 2025 20:35

raymondjacobson approved these changes Oct 21, 2025

View reviewed changes

Comment thread main.go Outdated

raymondjacobson reviewed Oct 21, 2025

View reviewed changes

Comment thread jobs/job_runner.go Outdated

raymondjacobson reviewed Oct 21, 2025

View reviewed changes

Comment thread main.go Outdated

rickyrombo approved these changes Oct 24, 2025

View reviewed changes

Comment thread jobs/job_runner.go Outdated

Comment thread main.go Outdated

schottra added 4 commits October 24, 2025 12:07

reorganize to make aggregates part of core indexer

511d13a

add db index to help with scores calculations

33baae4

more optimizations

ce3d93f

rickyrombo reviewed Oct 24, 2025

View reviewed changes

Comment thread ddl/migrations/0175_add_user_score_tables.sql Outdated

rickyrombo reviewed Oct 24, 2025

View reviewed changes

Comment thread sql/01_schema.sql

comments on indexes

021b74d

rickyrombo reviewed Oct 24, 2025

View reviewed changes

Comment thread indexer/aggregates_indexer.go Outdated

rickyrombo approved these changes Oct 24, 2025

View reviewed changes

schottra added 3 commits October 24, 2025 19:11

update schema

abc15bb

rename

9915392

update models

7097c62

schottra mentioned this pull request Oct 27, 2025

Remove user score computation from DN indexer AudiusProject/apps#13318

Merged

less logs, recover from panic

3de5f79

schottra merged commit 8ab08c9 into main Oct 27, 2025
5 checks passed

schottra deleted the user-score-indexing branch October 27, 2025 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API-359] Adds user score indexer#488

[API-359] Adds user score indexer#488
schottra merged 11 commits into
mainfrom
user-score-indexing

schottra commented Oct 21, 2025 •

edited

Loading

Uh oh!

raymondjacobson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schottra commented Oct 22, 2025

Uh oh!

rickyrombo left a comment

Uh oh!

Uh oh!

Uh oh!

schottra commented Oct 24, 2025

Uh oh!

gitguardian Bot commented Oct 24, 2025 •

edited

Loading

Uh oh!

schottra commented Oct 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

schottra commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raymondjacobson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schottra commented Oct 22, 2025

Uh oh!

rickyrombo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

schottra commented Oct 24, 2025

Uh oh!

gitguardian Bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Uh oh!

schottra commented Oct 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

schottra commented Oct 21, 2025 •

edited

Loading

gitguardian Bot commented Oct 24, 2025 •

edited

Loading