Skip to content

perf(core): Add sharded map to speed up post query aggregations#9426

Closed
ghost wants to merge 9 commits intomainfrom
harshil-goel/perf
Closed

perf(core): Add sharded map to speed up post query aggregations#9426
ghost wants to merge 9 commits intomainfrom
harshil-goel/perf

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented May 30, 2025

No description provided.

@ghost ghost self-requested a review May 30, 2025 09:57
@github-actions github-actions Bot added area/testing Testing related issues area/querylang Issues related to the query language specification and implementation. area/core internal mechanisms go Pull requests that update Go code labels May 30, 2025
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 30, 2025

Running Code Quality on PRs by uploading data to Trunk will soon be removed. You can still run checks on your PRs using trunk-action - see the migration guide for more information.

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 30, 2025

Static BadgeStatic BadgeStatic BadgeStatic Badge

Failed Test Failure Summary Logs
TestVectorGraphQlEuclideanIndexMutationAndQuery Logs ↗︎
TestACLSuite/TestAddNewPredicate The ACL group count is not 1, which is unexpected. Logs ↗︎
TestUniqueUpsertSingleMutationTwoBlankNode Logs ↗︎
TestACLSuite Logs ↗︎

View Full Report ↗︎Docs

@rahst12
Copy link
Copy Markdown

rahst12 commented May 31, 2025

I'm always interested in the performance updates. Any early thoughts on what the community can expect with these improvements?

@ghost
Copy link
Copy Markdown
Author

ghost commented Jun 4, 2025

@rahst12 I am going to close this in favor of other smaller diffs. There are 3 dfferent optimizations here: (Most of them are for improving latency of a query, throughput might not be that affected)

  1. We have a function called MergeSorted(). It basically takes multiple different sorted list of uids and merge them. It's basically an heap sort. Basically where ever we get multiple different keys, and then merge their data it's used. Almost all the complicated queries will use it. I have parallelized it now, because what would happen is that we might have a lot of different lists, which are small in size. This increases the size of the heap, making it ineffective.
  2. Currently while caching, we cache the posting list object. This object has stored data in certain format, so that it's easier to consume for various different tasks. One of the tasks is to get the list of uids in that key. (Which then goes into the merge sorted function). I have now added that the uid list itself would get cached, instead of us deriving it again and again after getting from the cache.
  3. We have something called variable propogation. Basically variables in a query are transformed, worked up, and then sent to other parts of the query. Currently this was done in a single thread. I have changed it such that it can be now done in multiple threads.
    The query I am working to optimize, shows significant optimization. I will run it a full suite of queries to see the performance across the board.

@ghost ghost closed this Jun 4, 2025
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core internal mechanisms area/querylang Issues related to the query language specification and implementation. area/testing Testing related issues go Pull requests that update Go code

Development

Successfully merging this pull request may close these issues.

2 participants