perf(core): Add sharded map to speed up post query aggregations · Pull Request #9426 · dgraph-io/dgraph

ghost · 2025-05-30T09:57:32Z

No description provided.

trunk-io · 2025-05-30T09:58:20Z

Running Code Quality on PRs by uploading data to Trunk will soon be removed. You can still run checks on your PRs using trunk-action - see the migration guide for more information.

trunk-io · 2025-05-30T10:51:27Z

Failed Test	Failure Summary	Logs
`TestVectorGraphQlEuclideanIndexMutationAndQuery`		Logs ↗︎
`TestACLSuite/TestAddNewPredicate`	The ACL group count is not 1, which is unexpected.	Logs ↗︎
`TestUniqueUpsertSingleMutationTwoBlankNode`		Logs ↗︎
`TestACLSuite`		Logs ↗︎

_{View Full Report ↗︎ ⋅ Docs}

rahst12 · 2025-05-31T06:07:44Z

I'm always interested in the performance updates. Any early thoughts on what the community can expect with these improvements?

ghost · 2025-06-04T14:23:27Z

@rahst12 I am going to close this in favor of other smaller diffs. There are 3 dfferent optimizations here: (Most of them are for improving latency of a query, throughput might not be that affected)

We have a function called MergeSorted(). It basically takes multiple different sorted list of uids and merge them. It's basically an heap sort. Basically where ever we get multiple different keys, and then merge their data it's used. Almost all the complicated queries will use it. I have parallelized it now, because what would happen is that we might have a lot of different lists, which are small in size. This increases the size of the heap, making it ineffective.
Currently while caching, we cache the posting list object. This object has stored data in certain format, so that it's easier to consume for various different tasks. One of the tasks is to get the list of uids in that key. (Which then goes into the merge sorted function). I have now added that the uid list itself would get cached, instead of us deriving it again and again after getting from the cache.
We have something called variable propogation. Basically variables in a query are transformed, worked up, and then sent to other parts of the query. Currently this was done in a single thread. I have changed it such that it can be now done in multiple threads.
The query I am working to optimize, shows significant optimization. I will run it a full suite of queries to see the performance across the board.

darkcoderrises added 3 commits May 29, 2025 16:36

added stuf

83808f3

added sharded map

78a2754

added changes

5497000

ghost self-requested a review May 30, 2025 09:57

github-actions Bot added area/testing Testing related issues area/querylang Issues related to the query language specification and implementation. area/core internal mechanisms go Pull requests that update Go code labels May 30, 2025

added changes

31a85c4

darkcoderrises added 5 commits May 30, 2025 16:27

added changes

6ee9b2e

something

f4e8acc

removed logs

382cb40

added changes

dd4334b

added uids cache

11f67e0

ghost closed this Jun 4, 2025

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(core): Add sharded map to speed up post query aggregations#9426

perf(core): Add sharded map to speed up post query aggregations#9426
ghost wants to merge 9 commits intomainfrom
harshil-goel/perf

ghost commented May 30, 2025

Uh oh!

trunk-io Bot commented May 30, 2025

Uh oh!

trunk-io Bot commented May 30, 2025 •

edited

Loading

Uh oh!

rahst12 commented May 31, 2025

Uh oh!

ghost commented Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

ghost commented May 30, 2025

Uh oh!

trunk-io Bot commented May 30, 2025

Uh oh!

trunk-io Bot commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rahst12 commented May 31, 2025

Uh oh!

ghost commented Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

trunk-io Bot commented May 30, 2025 •

edited

Loading