Replace RowBinary serializer with clickhouse-crate inserter sink [EAP-584]#8063
Replace RowBinary serializer with clickhouse-crate inserter sink [EAP-584]#8063phacops wants to merge 10 commits into
Conversation
Replace the hand-rolled RowBinary write path for eap_items (a vendored RowBinary serializer + reqwest/native-LZ4 HTTP client) with the official `clickhouse` crate's `Inserter`, picking up RowBinaryWithNamesAndTypes schema validation, compression, and connection pooling. The Rust consumer is a sentry_arroyo push-based pipeline, so the crate's typed `Inserter` is wired into a new `EapItemsInserterSink` strategy: - The eap_items processor now emits a typed `EAPItemRow` (`process_message_eap_row` → `BytesInsertBatch<Option<EAPItemRow>>`) instead of pre-encoded bytes. - `EapItemsInserterSink` owns a long-lived `Inserter<EAPItemRow>` on a dedicated actor task. It writes each row on arrival (the wide struct is serialized into the inserter buffer and dropped immediately, so peak memory stays bounded by the buffer, not row count) and lets the inserter own the flush boundary via `with_max_rows`/`with_max_bytes` + `with_period`. On each `commit()` flush it pushes exactly the flushed rows' Kafka offsets downstream — the flush is the durability barrier, so a failed flush never advances offsets (batch replays). - `EAPItemRow` derives `clickhouse::Row`; the hand-maintained `COLUMN_NAMES` and the vendored `rowbinary` serializer are removed (the crate generates the column list and RowBinaryWithNamesAndTypes tolerates field ordering). The path is gated by the existing `use_row_binary` flag (EAPItemsProcessor only); JSONEachRow remains the default and the rollback. The JSONEachRow HTTP writer (`writer_v2`'s client + native-LZ4) stays for all other storages and is simplified to JSON-only. The crate's validation requires Rust 1.89+, so bump the pinned toolchain to 1.94.1 (Dockerfile/CI install from rust-toolchain.toml); this surfaced two newer clippy lints on existing code, fixed in passing. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
Address PR review findings: - Actor write-ahead past failure (Bugbot, high): on a write/commit error the actor previously rebuilt the inserter and kept consuming queued rows. If a later window flushed durably while the strategy had already failed the consumer on the first error, those rows landed in ClickHouse with their offsets uncommitted → duplicates on replay. The actor now fail-stops (surfaces the error and returns), matching the old writer's fail-stop + replay model; the in-flight window never flushed, so it replays cleanly. - join() timeout doubling (Seer, medium): the shutdown wait loop could consume the full timeout and then pass the original timeout to next_step.join(), blocking up to 2x the caller's budget. Pass the remaining time instead. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
Re-add the resilience the old JSON writer had (RetryConfig: jittered exponential backoff, 4 retries) to the clickhouse-crate inserter path, without an unbounded memory cost: - The actor retains the current unflushed window's rows (bounded by max_batch_size) until that window's flush succeeds. - On a write/commit error the inserter is poisoned; the actor replays the retained window through a fresh inserter via flush_window_with_retry (write all rows + end()), with backoff, before failing. On success it resets the main inserter, emits the window's offsets, and continues. - If retries are exhausted, it fail-stops (offsets never pushed → replay), matching the old fail-stop-after-retries model. Peak memory is now one batch of typed rows plus the inserter's byte buffer (the accepted tradeoff for retry), still bounded by max_batch_size, not by total row count. Also: clarify in submit() why an unreserved/closed channel is safe (try_reserve rejects a closed channel; a dropped row's offset is never committed, so it replays). The skip-only offset-advance guard is preserved as `acc.rows.is_empty()` — it must not emit while real rows are buffered but unflushed, since their offsets are lower than later skips and committing past them would lose data. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
Address a PR review finding (Bugbot, high) about potentially committing offsets past not-yet-durable rows. In our usage this can't happen: we commit() after every single write(), and the crate flushes the entire current INSERT (never a strict subset), so a flush always covers exactly the retained window and emit_ready only advances offsets for durable rows. Make that invariant explicit and enforced with a debug_assert that the flushed row count equals acc.rows.len(), so a future change (e.g. batching multiple writes before a commit) can't silently regress into committing ahead. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
Address a PR review finding (Bugbot, medium): if join() hit its deadline while the actor was still finalizing, a late FlushOutcome::Ready was never drained, so rows could be durable in ClickHouse with offsets uncommitted → duplicate on replay. On timeout, abort the actor (cancelling an in-flight INSERT that hasn't landed, so that window simply replays rather than committing rows we'll never ack), then drain outcomes one final time so any Ready produced right around the deadline still has its offsets pushed. A flush that already landed server-side before the abort replays as a duplicate — the same at-least-once shutdown exposure the old writer had, to be closed by the (deferred) insert_deduplication_token. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ab7d324. Configure here.
…y column CI's live-ClickHouse test_rust job failed: Code: 117 ... Type of 'attributes_array' must be JSON(max_dynamic_paths=128), not JSON ... While executing BinaryRowInputFormat. (INCORRECT_DATA) `eap_items.attributes_array` is a native JSON(max_dynamic_paths=N) column. Under RowBinaryWithNamesAndTypes the server validates the client's names+types header, and our `String` field can't satisfy a parametrized JSON column (no Rust type maps to it). The old hand-rolled writer avoided this by sending plain RowBinary + input_format_binary_read_json_as_string=1 (write the JSON as a string, let ClickHouse parse it into the JSON column). Set Client::with_validation(false) so the crate uses plain RowBinary, matching the old wire behavior. The crate still emits the column list (from the Row derive) in the INSERT, so column mapping stays correct without a hand-rolled list. Trade-off: this drops RowBinaryWithNamesAndTypes schema validation (it is incompatible with the JSON column, not merely an optimization). Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
Address a PR review finding (Bugbot, medium): the clickhouse crate was built with default-features = false and no TLS backend, so build_client's https URL (used when ClickhouseConfig.secure is true) had no TLS stack — row-binary inserts to secure clusters would fail while the JSON path kept working. Add the crate's native-tls feature. It matches the existing reqwest-based writer's TLS (OpenSSL + system roots) and reuses openssl-sys, which reqwest already pulls into the build, so no new system dependency is introduced. Builds clean. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
…king-0ju4nd Resolve conflict in rust_snuba/src/processors/eap_items.rs: - Drop master's manual COLUMN_NAMES const; the clickhouse::Row derive now generates the column list from struct field order (covers the new session_id, ai_conversation_id, and attributes_array_* columns). - Point session_id's serde adapter at clickhouse::serde::uuid (the vendored rowbinary module was removed in this branch). - Update test_column_names_match_struct_layout to read the derived COLUMN_NAMES and assert the new 102-column layout. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
| Ok(FlushOutcome::Err(e)) => { | ||
| counter!( | ||
| "rust_consumer.clickhouse_insert_error", | ||
| 1, | ||
| "status" => "insert_error", | ||
| "retried" => "false" | ||
| ); | ||
| tracing::error!("ClickHouse inserter flush failed: {}", e); | ||
| return Err(StrategyError::Other(e.into())); | ||
| } |
There was a problem hiding this comment.
Bug: An unrecoverable insert error causes the rust_consumer.clickhouse_insert_error metric to be emitted twice, once in flush_window_with_retry and again in drain_outcomes, inflating the error count.
Severity: MEDIUM
Suggested Fix
Remove one of the two metric emissions. The most logical place to remove it is from flush_window_with_retry. The drain_outcomes function should be the single source of truth for emitting metrics based on final flush outcomes, whether success or failure. This centralizes metric emission and prevents double-counting.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: rust_snuba/src/strategies/clickhouse/inserter_sink.rs#L559-L568
Potential issue: When an insert to ClickHouse fails and all retries are exhausted, the
error metric `rust_consumer.clickhouse_insert_error` is emitted twice. The first
emission occurs in `flush_window_with_retry` before it returns an error. The caller then
sends this error to the `drain_outcomes` function, which processes the error and emits
the exact same metric a second time. This double-counting inflates the error rate by a
factor of two, which can trigger false alerts and distort monitoring dashboards, making
it difficult to assess the true health of the insert pipeline.
Also affects:
rust_snuba/src/strategies/clickhouse/inserter_sink.rs:266~272
…l failure flush_window_with_retry already emits clickhouse_insert_error (retried=false) when all retries are exhausted, then returns Err. The actor forwards that as FlushOutcome::Err, and drain_outcomes was emitting the same metric with identical tags a second time, inflating the error count 2x. Drop the emission in drain_outcomes and keep it at the source in flush_window_with_retry, where the per-attempt retried=true counter also lives. This also makes the metric robust to the best-effort outcome send (let _ = out_tx.send(...)): the count is recorded even if the receiver is already gone during shutdown. drain_outcomes still logs and fail-stops. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf
| debug_assert_eq!( | ||
| rows as usize, | ||
| acc.rows.len(), | ||
| "flush count must equal the retained window; committing more offsets \ | ||
| than were made durable would lose data", | ||
| ); |
There was a problem hiding this comment.
Bug: A debug_assert_eq! in emit_ready is compiled away in release builds, risking silent data loss if the number of rows flushed to ClickHouse doesn't match the buffered row count.
Severity: HIGH
Suggested Fix
Replace the debug_assert_eq! with a standard assert_eq! or a runtime check that returns an error. This ensures the invariant is enforced in all builds, preventing the system from committing offsets for rows that were not successfully written.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: rust_snuba/src/strategies/clickhouse/inserter_sink.rs#L137-L142
Potential issue: The `emit_ready` function uses a `debug_assert_eq!` to verify that the
number of rows flushed to ClickHouse matches the number of rows buffered locally. This
assertion is compiled out in release builds. If ClickHouse reports a different number of
rows written (e.g., due to server-side deduplication or filtering) than what was sent,
the check will not trigger in production. This can occur in both the normal processing
loop and the error recovery path. This causes the system to commit offsets for rows that
were not durably stored, leading to silent data loss.
Also affects:
rust_snuba/src/strategies/clickhouse/inserter_sink.rs:440~456
emit_ready guarded "flushed rows == retained window length" with a debug_assert!, which compiles out of release builds. The invariant protects against committing Kafka offsets for rows that weren't made durable (silent data loss), so gate it in all builds. The crate's Quantities.rows is a client-side counter (pending.rows += 1 per write()), not a server ack, so the invariant holds by construction today and can't be tripped by ClickHouse-side dedup/filtering. The runtime check guards against a future refactor (e.g. batching writes before a commit) silently breaking it. On mismatch, emit_ready now returns false after sending FlushOutcome::Err (rather than panicking): a panic in the spawned actor would only disconnect the channel and stall the consumer, whereas FlushOutcome::Err propagates as a StrategyError out of poll() — the same loud fail-stop used for exhausted insert retries. Callers stop the actor on false; the batch replays from the last committed offset. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf

Overview
This PR replaces the custom RowBinary serialization pipeline for
eap_itemswith a newEapItemsInserterSinkthat uses the officialclickhousecrate'sInserterAPI. This eliminates the need for custom serialization code and improves memory efficiency by streaming rows directly to ClickHouse.Key Changes
New Inserter Sink (
inserter_sink.rs)clickhouse::Inserter<EAPItemRow>instancemax_rows,max_bytes,max_period)Processor Changes (
eap_items.rs)process_message_row_binarytoprocess_message_eap_rowfor clarityTypedInsertBatch<EAPItemRow>instead of pre-encoded bytesType System (
types.rs)TypedInsertBatch<R>to represent rows that will be serialized downstreamrow == None) for messages that don't produce insertsPipeline Wiring (
factory_v2.rs,processor.rs)make_rust_processor_typedto build pipelines for typed-row processorsuse_row_binaryis enabled, eap_items now uses the inserter sink path instead of theReduce+ClickhouseWriterSteppairEAPItemsProcessorRemoved Code
rowbinary/ser.rsserializer module (replaced byclickhousecrate)InsertFormatenum and related writer step configurationclickhouse::serde::uuidinstead)Dependencies
clickhousecrate (v0.15) with default features disabledBenefits
clickhousecrate's battle-tested inserter instead of custom serializationclickhousecrate validates rows against the table schema at serialization timeTesting
Existing unit tests for the RowBinary serializer are removed as that functionality is now provided by the upstream crate. The inserter sink's behavior is covered by integration with the processor pipeline and the existing Arroyo strategy test infrastructure.
https://claude.ai/code/session_011Bz3aya6Udg2sd9Sc9uuHf