Skip to content

fix(admin): use default HTTP port for by-host connections; don't crash tracing on empty trace#8121

Open
phacops wants to merge 3 commits into
masterfrom
claude/snuba-admin-host-query-19xu5r
Open

fix(admin): use default HTTP port for by-host connections; don't crash tracing on empty trace#8121
phacops wants to merge 3 commits into
masterfrom
claude/snuba-admin-host-query-19xu5r

Conversation

@phacops

@phacops phacops commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Two snuba-admin bugs that surface once the clickhouse-connect (HTTP) driver is enabled. The driver is chosen at runtime (use_clickhouse_connect_driver) inside the shared connection cache, so admin paths get the HTTP pool transparently — and these two assumptions, which were fine for the native driver, break.

1. By-host connections must use the node's default HTTP port, not the cluster's http_port

_build_validated_pool built the node with http_port=cluster.get_http_port(). But admin always targets a specific node by hostname (e.g. a replica discovered from system.clusters). The cluster's configured http_port belongs to the cluster's query endpoint — which may sit behind a proxy/load balancer — not to an arbitrary individual node.

  • Native driver: talks to the node directly on its native port → unaffected.
  • HTTP driver: has to reach the node's own ClickHouse HTTP listener, which is the well-known default port.

The node is now built with DEFAULT_CLICKHOUSE_HTTP_PORT. This mirrors the documented "by-host helper" pattern already used by the CLI helpers and the connection cache's own default.

2. The tracing tool 500'd with list index out of range on the HTTP driver

clickhouse-connect doesn't surface the server's send_logs_level output (a known, documented limitation of that path), so trace_output is always empty. summarize_trace_output indexed parsed[0] unconditionally, raising IndexError — which the trace view's catch-all turned into:

An error occurred: {"message":"list index out of range","type":"unknown"}

It now returns an empty TracingSummary when there are no parseable trace lines, so the tracing tool returns its (empty) trace instead of erroring.

Changes

  • snuba/admin/clickhouse/common.py_build_validated_pool builds the by-host node with the default ClickHouse HTTP port instead of cluster.get_http_port().
  • snuba/admin/clickhouse/trace_log_parsing.pysummarize_trace_output returns an empty summary on empty/unparseable input.

Tests

  • test_by_host_connection_uses_default_http_port (parametrized over all four by-host helpers) asserts the node passed to the connection cache carries the default HTTP port and not the cluster's configured port (verified with a sentinel port so the guard holds even when the test cluster uses the default).
  • test_tracing_summary_empty_trace_output asserts summarize_trace_output("") (and whitespace) returns an empty summary rather than raising.

The trace-parsing change was verified locally; the existing test_tracing_summary still passes. ruff check and ruff format --check pass on all changed files.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

🤖 Generated with Claude Code

https://claude.ai/code/session_01Lzu1UmpvvA1S2Ypq2uqzEX


Generated by Claude Code

…n't crash tracing on empty trace output

Two snuba-admin bugs surface when the clickhouse-connect (HTTP) driver is
enabled (it is selected at runtime, behind the abstract ClickhousePool type):

1. By-host connections used cluster.get_http_port(). Admin always targets a
   *specific* node by hostname, but the cluster's configured http_port belongs
   to the cluster's query endpoint (which may sit behind a proxy/load
   balancer), not to an arbitrary individual node. The native driver talks to
   the node directly on its native port, so it was unaffected; the HTTP driver
   needs the node's own ClickHouse HTTP listener, which is the well-known
   default port. _build_validated_pool now builds the node with
   DEFAULT_CLICKHOUSE_HTTP_PORT.

2. The tracing tool 500'd with {"message":"list index out of range"} on the
   HTTP driver. clickhouse-connect does not surface the server's
   send_logs_level output, so trace_output is always empty, and
   summarize_trace_output indexed parsed[0] unconditionally. It now returns an
   empty summary when there are no parseable trace lines.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01Lzu1UmpvvA1S2Ypq2uqzEX
@phacops phacops requested a review from a team as a code owner June 27, 2026 00:06
Comment thread tests/admin/test_system_queries.py
…le prefix match

The by-host port test cleared NODE_CONNECTIONS by matching keys starting with
"errors-", but the real keys are built from str(StorageKey), which falls back
to its repr ("StorageKey.ERRORS-..."), so the clear was a no-op and could leak
a mocked connection into other tests. Snapshot the cache, clear it, and restore
it afterward — independent of the key format, and guarantees the helper builds
a fresh node so the connection-cache call is actually exercised.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01Lzu1UmpvvA1S2Ypq2uqzEX

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a3bfc0e. Configure here.

Comment thread snuba/admin/clickhouse/common.py Outdated
The previous commit forced DEFAULT_CLICKHOUSE_HTTP_PORT for every connection
built by _build_validated_pool, but get_ro_query_node_connection (used by the
tracing, querylog and cardinality tools) reaches the cluster's configured query
endpoint — the same host the normal read path reaches on cluster.get_http_port().
That endpoint may be a load balancer on a non-default HTTP port, so forcing 8123
there would break those HTTP admin flows.

Only override to the default port when the target is NOT the query node (i.e. a
specific individual node selected by host in the admin tools). The query-node
path keeps cluster.get_http_port(), matching the read path.

Adds test_query_node_connection_uses_cluster_http_port as the counterpart to
test_by_host_connection_uses_default_http_port.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01Lzu1UmpvvA1S2Ypq2uqzEX
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants