Skip to content

Ignore stale executor success when TI is queued after defer#68741

Open
goingforstudying-ctrl wants to merge 1 commit into
apache:mainfrom
goingforstudying-ctrl:fix/stale-success-queued-after-defer
Open

Ignore stale executor success when TI is queued after defer#68741
goingforstudying-ctrl wants to merge 1 commit into
apache:mainfrom
goingforstudying-ctrl:fix/stale-success-queued-after-defer

Conversation

@goingforstudying-ctrl

Copy link
Copy Markdown

Fixes a race where the trigger resumes a deferred task to queued (rather than scheduled) before the scheduler processes the executor SUCCESS from the worker defer exit. The scheduler then treated queued vs executor success as a state mismatch and failed the TI (#67287).

The fix for #66374 (#66431) added handling for the scheduled variant. Under load the trigger may reschedule into either queued or scheduled depending on scheduler loop timing, so both states must be treated as stale defer-exit events when next_method is set.

Changes:

  • Extend ti_requeued condition in process_executor_events to include TaskInstanceState.QUEUED
  • Add test_process_executor_events_stale_success_when_queued_after_defer (positive + negative without next_method)
  • Add newsfragment 67287.bugfix.rst

Testing:

pytest airflow-core/tests/unit/jobs/test_scheduler_job.py::TestSchedulerJob::test_process_executor_events_stale_success_when_queued_after_defer -q

@boring-cyborg boring-cyborg Bot added the area:Scheduler including HA (high availability) scheduler label Jun 19, 2026
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stale-success-queued-after-defer branch from eed2ad5 to 2e53123 Compare June 19, 2026 10:17

@shahar1 shahar1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title shouldn't use conventional commit style, please fix it as well.

Comment thread airflow-core/newsfragments/67287.bugfix.rst Outdated
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stale-success-queued-after-defer branch 4 times, most recently from 75126b1 to b290d48 Compare June 19, 2026 18:22
@goingforstudying-ctrl

Copy link
Copy Markdown
Author

Thanks for the review. I updated the commit message to remove the conventional commit style from the PR title as requested. Let me know if anything else needs adjustment.

@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stale-success-queued-after-defer branch 3 times, most recently from 26f4b79 to 021cf9c Compare June 19, 2026 22:08
@goingforstudying-ctrl goingforstudying-ctrl changed the title fix(scheduler): ignore stale executor success when TI is queued after defer Ignore stale executor success when TI is queued after defer Jun 19, 2026
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stale-success-queued-after-defer branch from 021cf9c to 2f2f410 Compare June 20, 2026 00:16
@goingforstudying-ctrl

Copy link
Copy Markdown
Author

Fixed the PR title to remove conventional commit style. Let me know if anything else needs adjustment.

@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stale-success-queued-after-defer branch from 2f2f410 to 8b08e8a Compare June 20, 2026 05:20
@shahar1 shahar1 dismissed their stale review June 20, 2026 06:44

Blocking issues were addressed

@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stale-success-queued-after-defer branch 2 times, most recently from ac01a8f to 3cec1ac Compare June 21, 2026 14:29
When a task is deferred and later rescheduled into QUEUED state before the
scheduler processes a stale SUCCESS event from the executor, the scheduler
should ignore the stale success rather than incorrectly marking the task as
succeeded.

- Add guard in _process_executor_events to check current TI state
- Add test for stale success after defer -> queued transition

Fixes apache#67287
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/stale-success-queued-after-defer branch from 3cec1ac to 33dff37 Compare June 21, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition between scheduler processing events and trigger completion — queued-state

2 participants