docs: Dekaf consumer behaviors + Spark Structured Streaming guidance by jwhartley · Pull Request #3094 · estuary/flow

jwhartley · 2026-06-30T02:09:26Z

What

Extends using-dekaf.md with consumer guidance that has come up repeatedly in support, in two layers:

Consumer behaviors to know (any client): offsets are journal byte positions; the advertised latest offset can transiently move backward during a broker hand-off and is not data loss (with how to confirm via flowctl collections read); Avro logicalType decoding (e.g. uuid -> UUID object); parallelism via journal splits.
Reading from Apache Spark Structured Streaming: avoid maxOffsetsPerTrigger (byte-budget cap drops partial records), handle failOnDataLoss (it aborts on the transient backward-offset case), and set spark.sql.avro.datetimeRebaseModeInRead explicitly (PERMISSIVE silently nulls pre-Gregorian dates, SPARK-31404). Plus an example reader config.

Why

These are recurring, non-obvious Dekaf consumer issues. The byte-offset model, the transient latest-offset regression, and the Avro decoding traps each surfaced as "missing data" reports that turned out to be consumer-side or transient. The transient-latest behavior is tracked in #3092.

Part 1 lives with the general consumer guidance so non-Spark consumers (Flink, librdkafka, kcat) benefit too; Part 2 is the Spark-specific config that builds on it.

Notes

Single file changed, no code. Generic content, no customer specifics.

Add a 'Consumer behaviors to know' section to using-dekaf.md (offsets are journal byte positions; the advertised latest offset can transiently move backward on a broker hand-off and is not data loss; Avro logicalType decoding; parallelism via journal splits) and a 'Reading from Apache Spark Structured Streaming' section (avoid maxOffsetsPerTrigger, handle failOnDataLoss, set the Avro datetime rebase mode explicitly).

github-actions · 2026-06-30T02:14:24Z

🚀 Preview deployed to https://docs.estuary.dev/pr-preview/pr-3094/

📄 Changed pages:

/guides/dekaf_reading_collections_from_kafka/

jwhartley mentioned this pull request Jun 30, 2026

dekaf: advertised high-water-mark regresses below served reads during journal primary hand-off (aborts failOnDataLoss consumers) #3092

Open

jwhartley requested review from aeluce and jshearer June 30, 2026 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Dekaf consumer behaviors + Spark Structured Streaming guidance#3094

docs: Dekaf consumer behaviors + Spark Structured Streaming guidance#3094
jwhartley wants to merge 1 commit into
masterfrom
docs/dekaf-consumer-spark

jwhartley commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jwhartley commented Jun 30, 2026

What

Why

Notes

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant