Skip to content

Transactions remain stuck on broker change #1145

@nicolaferraro

Description

@nicolaferraro

Describe the bug

We've seen the client stuck on a pending transaction when a broker was removed from a cluster.

The client kept sending a AddPartitionsToTxnRequest to the wrong broker, failing because the broker was not responding.

I think the root cause is that _coordinator_dead is only called upon receiving a NOT_COORDINATOR error from the broker, but it never expires if the broker is no longer available.

The problem seems to also affect other requests that make use of coordinators.

Expected behaviour
The client should make sender caches expire whenever a MetadataResponse tells that a coordinator is no longer present.

It may also expire on a temporal basis in case of persistent errors with the coordinator.

Environment (please complete the following information):

Can't tell precise information, since we have seen this issue server side while not controlling the client.

Reproducible example

Not easy to reproduce. One should create and keep some transactions open while a broker is decommissioned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions