fix(rpc): keep trace_id bare in cross-item subqueries so the bloom filter index is used#8130
Open
phacops wants to merge 2 commits into
Open
fix(rpc): keep trace_id bare in cross-item subqueries so the bloom filter index is used#8130phacops wants to merge 2 commits into
phacops wants to merge 2 commits into
Conversation
…lter index is used
The cross-item-query path filters the main query by a subquery of matching
trace_ids: `trace_id IN (<subquery>)`. This was built as
`in_cond(column("trace_id"), DangerousRawSQL(subquery))`.
UUIDColumnProcessor only keeps a trace_id column bare (so the `bf_trace_id`
bloom-filter skip index can prune granules) for `=`/`IN` comparisons against
literal values. It does not recognize an `IN (subquery)` term, so it fell back
to wrapping the column in `replaceAll(toString(trace_id), '-', '')`, which
hides it from the index. The subquery itself also projected trace_id through
that same dash-stripping rewrite, so both sides were function-wrapped strings.
Build the whole predicate as raw SQL (`trace_id IN (<subquery>)`) via the new
`trace_id_in_subquery_condition` helper so the outer column reads bare, and
project/group the subquery's trace_id bare so it emits a real UUID matching the
column type. This mirrors the existing SubqueryFilterOptimizer pattern, whose
purpose is exactly to let the bloom-filter index prune granules.
Applied to the three cross-item call sites:
- endpoint_get_traces (_get_metadata_for_traces_with_subquery)
- resolver_trace_item_table (build_query)
- resolver_time_series (build_query)
The simple `EQ`/`IN`-literal-array trace_id filters elsewhere are unaffected:
UUIDColumnProcessor already keeps those index-friendly.
Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01UJsCjzicTRwE39Vc4TEvu9
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cross-item RPC queries filter the main query by a subquery of matching trace IDs:
trace_id IN (<subquery>). Thebf_trace_idbloom-filter skip index on thetrace_idUUID column only prunes granules when the column reads bare — these call sites were unintentionally hiding it from the index.Root cause
UUIDColumnProcessorkeeps atrace_idcolumn bare (index-friendly) only for=/INcomparisons against literal values. It does not recognize anIN (subquery)term, so it fell back to wrapping the column inreplaceAll(toString(trace_id), '-', ''), which the index cannot use. The cross-item subquery itself also projected/groupedtrace_idthrough that same dash-stripping rewrite, so both sides of the comparison were function-wrapped strings rather than UUIDs.Affected call sites (all in the cross-item-query path):
endpoint_get_traces._get_metadata_for_traces_with_subqueryresolver_trace_item_table.build_queryresolver_time_series.build_querySimple
EQ/IN-literal-arraytrace_idfilters elsewhere are already index-friendly and are unchanged.Fix
trace_id_in_subquery_condition(trace_ids_sql)which builds the wholetrace_id IN (<subquery>)predicate asDangerousRawSQL, so the outer column stays bare and is eligible for the bloom-filter index. This mirrors the existingSubqueryFilterOptimizerpattern, whose documented purpose is exactly to let the bloom-filter index prune granules viakey_column IN (subquery).trace_idbare (viaDangerousRawSQL) inget_trace_ids_sql_for_cross_item_query, so it emits a realUUIDthat matches the outer column's type (instead of dash-stripped hex, whichtoUUID/CASTwould reject).Verification
New unit tests in
test_uuid_column_processor.pyassert that:DangerousRawSQLcolumn IN (<subquery>)predicate survivesUUIDColumnProcessoruntouched (column stays bare), while the previousin(col, DangerousRawSQL(subquery))form gets wrapped inreplaceAll(toString(...))— a regression guard.DangerousRawSQLprojection is not rewritten by the processor.Rendered the actual cross-item SQL in
dry_runmode and confirmed both the subquery projection and the outer predicate now use baretrace_id:Previously this was
replaceAll(toString(trace_id), '-', '') IN (SELECT replaceAll(toString(trace_id), '-', '') ...).The existing
clickhouse_dbcross-item integration tests exercise these paths end-to-end.Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
🤖 Generated with Claude Code
https://claude.ai/code/session_01UJsCjzicTRwE39Vc4TEvu9
Generated by Claude Code