feat(consumer): set client.rack placement from ZONE env var#546
feat(consumer): set client.rack placement from ZONE env var#546phacops wants to merge 2 commits into
Conversation
Propagate the ZONE environment variable to librdkafka's client.rack config when building the consumer configuration, so consumers advertise their availability zone to the broker. This is a prerequisite for enabling a rack-aware fetch strategy (fetch-from-follower) later on. An explicit client.rack in the provided config still takes precedence. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_01YS4onNgraFjT9gffNP6Jzo
…untime Mirror the Python behavior in the rust-arroyo consumer config: propagate the ZONE environment variable to librdkafka's client.rack when building the consumer configuration. An explicit client.rack override still takes precedence. Co-Authored-By: Claude Opus 4.8 <[email protected]> Claude-Session: https://claude.ai/code/session_01YS4onNgraFjT9gffNP6Jzo
|
arroyo already allows you to pass in arbitrary rdkafka consumer options, so you could patch this into snuba directly. is the idea that we can override this setting for arbitrary arroyo consumers in our infra in emergency situations? if so, i wonder if we should instead support arbitrary overrides like this: while easy to do and very powerful, i think this overlaps a bit with other topicctl stuff we want to do. would have to sync with @enochtangg but i think we already may have a plan for setting arbitrary consumer options? |
|
I think that’s something you’d want to do at the platform level, for every consumer, that’s why I made a PR here. As a user of arroyo/streaming platform, I don’t want to know about placement and where my consumer consumes from. Happy to contain it to Snuba too for now and wait for whatever plans you have for this. |
|
Fair enough, we don't have to merge this as a platform feature. Though, I will say, you still have to add it before we make use of it, regardless if the workload is balanced or not. It's not because we set it that it'll be in use right away.
A bit confused by this. Are you suggesting the transactions consumer has a higher utilization of the system overall compared to |
It is a smaller cluster. It has higher utilization per node (we are about to scale it up) and fewer different workloads. |
|
Ah, I understand. |
|
By the way, we still need to run the broker with this selector so it's not like this would be enabled by default. It's just laying the foundation for this. |
It should be possible to do this test safely on that cluster as well. We just need to ensure the oncall is aware and rollback is ready. |
|
There you go for properly spreading workloads across zones: getsentry/ops#21545 |

Summary
Allows the consumer placement (
client.rack) to be set in the Kafka consumer configuration based on aZONEenvironment variable, so a rack-aware fetch strategy (fetch-from-follower) can be enabled later on. Implemented in both the Python and Rust runtimes.When the consumer configuration is built and the
ZONEenvironment variable is set, its value is propagated to librdkafka'sclient.rackconfig. This lets the consumer advertise its availability zone to the broker.An explicit
client.rackprovided in the default config or via override params always takes precedence over the env var.This is opt-in and off by default (no
client.rackunlessZONEis set), and rack-aware fetch only takes effect once the broker is configured withreplica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector. Per the discussion below, this is intended as a foundational / emergency-only capability and should not be enabled fleet-wide until workloads are spread evenly across zones.Related
Changes
arroyo/backends/kafka/configuration.py: read theZONEenv var (ZONE_ENV_VAR) inbuild_kafka_consumer_configurationand setclient.rackwhen present and not already configured.rust-arroyo/src/backends/kafka/config.rs: read theZONEenv var inKafkaConfig::new_consumer_configand setclient.rackbefore applying override params (so overrides win).CHANGELOG.md: note the new feature.Test plan
pytest tests/backends/test_kafka.py -k "client_rack or zone"— all 3 tests pass.cargo test --lib backends::kafka::config— passes;cargo fmt --checkandcargo checkclean.🤖 Generated with Claude Code
https://claude.ai/code/session_01YS4onNgraFjT9gffNP6Jzo