Skip to content

fix(rpc): keep trace_id bare in cross-item subqueries so the bloom filter index is used#8130

Open
phacops wants to merge 2 commits into
masterfrom
claude/rpc-trace-id-uuid-u46ar3
Open

fix(rpc): keep trace_id bare in cross-item subqueries so the bloom filter index is used#8130
phacops wants to merge 2 commits into
masterfrom
claude/rpc-trace-id-uuid-u46ar3

Conversation

@phacops

@phacops phacops commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Cross-item RPC queries filter the main query by a subquery of matching trace IDs: trace_id IN (<subquery>). The bf_trace_id bloom-filter skip index on the trace_id UUID column only prunes granules when the column reads bare — these call sites were unintentionally hiding it from the index.

Root cause

UUIDColumnProcessor keeps a trace_id column bare (index-friendly) only for =/IN comparisons against literal values. It does not recognize an IN (subquery) term, so it fell back to wrapping the column in replaceAll(toString(trace_id), '-', ''), which the index cannot use. The cross-item subquery itself also projected/grouped trace_id through that same dash-stripping rewrite, so both sides of the comparison were function-wrapped strings rather than UUIDs.

Affected call sites (all in the cross-item-query path):

  • endpoint_get_traces._get_metadata_for_traces_with_subquery
  • resolver_trace_item_table.build_query
  • resolver_time_series.build_query

Simple EQ / IN-literal-array trace_id filters elsewhere are already index-friendly and are unchanged.

Fix

  • Added trace_id_in_subquery_condition(trace_ids_sql) which builds the whole trace_id IN (<subquery>) predicate as DangerousRawSQL, so the outer column stays bare and is eligible for the bloom-filter index. This mirrors the existing SubqueryFilterOptimizer pattern, whose documented purpose is exactly to let the bloom-filter index prune granules via key_column IN (subquery).
  • Projected and grouped the subquery's trace_id bare (via DangerousRawSQL) in get_trace_ids_sql_for_cross_item_query, so it emits a real UUID that matches the outer column's type (instead of dash-stripped hex, which toUUID/CAST would reject).
  • Updated the three call sites to use the helper and removed now-unused imports.

Verification

  • New unit tests in test_uuid_column_processor.py assert that:

    • a DangerousRawSQL column IN (<subquery>) predicate survives UUIDColumnProcessor untouched (column stays bare), while the previous in(col, DangerousRawSQL(subquery)) form gets wrapped in replaceAll(toString(...)) — a regression guard.
    • a DangerousRawSQL projection is not rewritten by the processor.
  • Rendered the actual cross-item SQL in dry_run mode and confirmed both the subquery projection and the outer predicate now use bare trace_id:

    trace_id IN (SELECT trace_id FROM eap_items_1_local WHERE ... GROUP BY organization_id, project_id, trace_id ...)
    

    Previously this was replaceAll(toString(trace_id), '-', '') IN (SELECT replaceAll(toString(trace_id), '-', '') ...).

  • The existing clickhouse_db cross-item integration tests exercise these paths end-to-end.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

🤖 Generated with Claude Code

https://claude.ai/code/session_01UJsCjzicTRwE39Vc4TEvu9


Generated by Claude Code

…lter index is used

The cross-item-query path filters the main query by a subquery of matching
trace_ids: `trace_id IN (<subquery>)`. This was built as
`in_cond(column("trace_id"), DangerousRawSQL(subquery))`.

UUIDColumnProcessor only keeps a trace_id column bare (so the `bf_trace_id`
bloom-filter skip index can prune granules) for `=`/`IN` comparisons against
literal values. It does not recognize an `IN (subquery)` term, so it fell back
to wrapping the column in `replaceAll(toString(trace_id), '-', '')`, which
hides it from the index. The subquery itself also projected trace_id through
that same dash-stripping rewrite, so both sides were function-wrapped strings.

Build the whole predicate as raw SQL (`trace_id IN (<subquery>)`) via the new
`trace_id_in_subquery_condition` helper so the outer column reads bare, and
project/group the subquery's trace_id bare so it emits a real UUID matching the
column type. This mirrors the existing SubqueryFilterOptimizer pattern, whose
purpose is exactly to let the bloom-filter index prune granules.

Applied to the three cross-item call sites:
- endpoint_get_traces (_get_metadata_for_traces_with_subquery)
- resolver_trace_item_table (build_query)
- resolver_time_series (build_query)

The simple `EQ`/`IN`-literal-array trace_id filters elsewhere are unaffected:
UUIDColumnProcessor already keeps those index-friendly.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01UJsCjzicTRwE39Vc4TEvu9
@phacops phacops requested review from a team as code owners June 28, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants