feat(consumer): set client.rack placement from ZONE env var by phacops · Pull Request #546 · getsentry/arroyo

phacops · 2026-06-28T18:49:59Z

Summary

Allows the consumer placement (client.rack) to be set in the Kafka consumer configuration based on a ZONE environment variable, so a rack-aware fetch strategy (fetch-from-follower) can be enabled later on. Implemented in both the Python and Rust runtimes.

When the consumer configuration is built and the ZONE environment variable is set, its value is propagated to librdkafka's client.rack config. This lets the consumer advertise its availability zone to the broker.

An explicit client.rack provided in the default config or via override params always takes precedence over the env var.

This is opt-in and off by default (no client.rack unless ZONE is set), and rack-aware fetch only takes effect once the broker is configured with replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector. Per the discussion below, this is intended as a foundational / emergency-only capability and should not be enabled fleet-wide until workloads are spread evenly across zones.

Changes

arroyo/backends/kafka/configuration.py: read the ZONE env var (ZONE_ENV_VAR) in build_kafka_consumer_configuration and set client.rack when present and not already configured.
rust-arroyo/src/backends/kafka/config.rs: read the ZONE env var in KafkaConfig::new_consumer_config and set client.rack before applying override params (so overrides win).
Unit tests in both runtimes covering the env-var case, the absent case, and explicit-override precedence.
CHANGELOG.md: note the new feature.

Test plan

Python: pytest tests/backends/test_kafka.py -k "client_rack or zone" — all 3 tests pass.
Rust: cargo test --lib backends::kafka::config — passes; cargo fmt --check and cargo check clean.

🤖 Generated with Claude Code

https://claude.ai/code/session_01YS4onNgraFjT9gffNP6Jzo

Propagate the ZONE environment variable to librdkafka's client.rack config when building the consumer configuration, so consumers advertise their availability zone to the broker. This is a prerequisite for enabling a rack-aware fetch strategy (fetch-from-follower) later on. An explicit client.rack in the provided config still takes precedence. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_01YS4onNgraFjT9gffNP6Jzo

…untime Mirror the Python behavior in the rust-arroyo consumer config: propagate the ZONE environment variable to librdkafka's client.rack when building the consumer configuration. An explicit client.rack override still takes precedence. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_01YS4onNgraFjT9gffNP6Jzo

untitaker · 2026-06-28T19:07:46Z

arroyo already allows you to pass in arbitrary rdkafka consumer options, so you could patch this into snuba directly.

is the idea that we can override this setting for arbitrary arroyo consumers in our infra in emergency situations?

if so, i wonder if we should instead support arbitrary overrides like this:

export ARROYO_RDKAFKA_CONFIG={"client.rack": ...}

while easy to do and very powerful, i think this overlaps a bit with other topicctl stuff we want to do. would have to sync with @enochtangg but i think we already may have a plan for setting arbitrary consumer options?

phacops · 2026-06-28T19:12:42Z

I think that’s something you’d want to do at the platform level, for every consumer, that’s why I made a PR here. As a user of arroyo/streaming platform, I don’t want to know about placement and where my consumer consumes from.

Happy to contain it to Snuba too for now and wait for whatever plans you have for this.

fpacifici · 2026-06-28T20:33:57Z

As we discussed in Slack last week, please avoid making the consumers zone aware unless we can spread consumers out across zones evenly.

Right now there is an important imbalance between the nodes in us-central1-a and the other two nodes, which will produce more load on the kafka brokers in that zone. Our Kafka infrastructure is sized with the assumption that load is spread more or less evenly.

If you need to do an experiment to troubleshoot the incident, this can be alright on the spans cluster with the streaming oncall aware of the experiment. There are so many distinct workloads on the cluster that an imbalance on one consumer will probably be acceptable. On transactions things are different, the utilization of the system is considerably higher.

Though we cannot make this a platform feature in the general case with today's infra which is basically guaranteed to be unevenly distributed between zones. I would consider this an emergency only feature to be used only for experiments and incidents.

phacops · 2026-06-28T20:41:37Z

Fair enough, we don't have to merge this as a platform feature. Though, I will say, you still have to add it before we make use of it, regardless if the workload is balanced or not. It's not because we set it that it'll be in use right away.

On transactions things are different, the utilization of the system is considerably higher.

A bit confused by this. Are you suggesting the transactions consumer has a higher utilization of the system overall compared to eap-items?

fpacifici · 2026-06-28T20:45:40Z

A bit confused by this. Are you suggesting the transactions consumer has a higher utilization of the system overall compared to eap-items?

It is a smaller cluster. It has higher utilization per node (we are about to scale it up) and fewer different workloads.
This means an imbalanced workload will have a larger impact on transactions than on spans.
So caution is needed before running an experiment there.

phacops · 2026-06-28T20:46:12Z

Ah, I understand.

phacops · 2026-06-28T20:47:25Z

By the way, we still need to run the broker with this selector so it's not like this would be enabled by default. It's just laying the foundation for this.

replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector

fpacifici · 2026-06-28T21:08:19Z

Ah, I understand.

It should be possible to do this test safely on that cluster as well. We just need to ensure the oncall is aware and rollback is ready.

phacops · 2026-06-29T03:35:56Z

There you go for properly spreading workloads across zones: getsentry/ops#21545

phacops requested review from a team as code owners June 28, 2026 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(consumer): set client.rack placement from ZONE env var#546

feat(consumer): set client.rack placement from ZONE env var#546
phacops wants to merge 2 commits into
mainfrom
claude/consumer-placement-zone-config-rcm3b6

phacops commented Jun 28, 2026 •

edited

Loading

Uh oh!

untitaker commented Jun 28, 2026

Uh oh!

phacops commented Jun 28, 2026 •

edited

Loading

Uh oh!

fpacifici commented Jun 28, 2026 •

edited

Loading

Uh oh!

phacops commented Jun 28, 2026

Uh oh!

fpacifici commented Jun 28, 2026

Uh oh!

phacops commented Jun 28, 2026

Uh oh!

phacops commented Jun 28, 2026

Uh oh!

fpacifici commented Jun 28, 2026 •

edited

Loading

Uh oh!

phacops commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Uh oh!

Conversation

phacops commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related

Changes

Test plan

Uh oh!

untitaker commented Jun 28, 2026

Uh oh!

phacops commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fpacifici commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phacops commented Jun 28, 2026

Uh oh!

fpacifici commented Jun 28, 2026

Uh oh!

phacops commented Jun 28, 2026

Uh oh!

phacops commented Jun 28, 2026

Uh oh!

fpacifici commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phacops commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

phacops commented Jun 28, 2026 •

edited

Loading

phacops commented Jun 28, 2026 •

edited

Loading

fpacifici commented Jun 28, 2026 •

edited

Loading

fpacifici commented Jun 28, 2026 •

edited

Loading