Transactions remain stuck on broker change

**Describe the bug**

We've seen the client stuck on a pending transaction when a broker was removed from a cluster.

The client kept sending a `AddPartitionsToTxnRequest` to the wrong broker, failing because the broker was not responding.

I think the root cause is that [_coordinator_dead](https://github.com/aio-libs/aiokafka/blob/3b7ccd0fff5c92a9cf12c12e361370082ad12b0c/aiokafka/producer/sender.py#L214) is only called upon receiving a NOT_COORDINATOR error from the broker, but it never expires if the broker is no longer available.

The problem seems to also affect other requests that make use of coordinators.

**Expected behaviour**
The client should make sender caches expire whenever a `MetadataResponse` tells that a coordinator is no longer present.

It may also expire on a temporal basis in case of persistent errors with the coordinator.

**Environment (please complete the following information):**

Can't tell precise information, since we have seen this issue server side while not controlling the client.

**Reproducible example**

Not easy to reproduce. One should create and keep some transactions open while a broker is decommissioned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transactions remain stuck on broker change #1145

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Transactions remain stuck on broker change #1145

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions