Feature/add max new dagruns to schedule#64294
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a scheduler tuning knob to limit how many new (never-before-examined) running DagRuns are considered per scheduling loop, to reduce starvation/slowdown when large batches of DagRuns are created at once.
Changes:
- Add
scheduler.max_new_dagruns_per_loop_to_scheduleconfig (default0) and plumb it into DagRun selection. - Update
DagRun.get_running_dag_runs_to_examine()to optionally split selection into “previously examined” vs “new” DagRuns. - Add/adjust unit tests to cover the new selection behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| airflow-core/src/airflow/models/dagrun.py | Adds config-backed limit and changes running DagRun selection logic to optionally fetch “old” and “new” runs separately. |
| airflow-core/src/airflow/config_templates/config.yml | Documents the new scheduler configuration option. |
| airflow-core/tests/unit/models/test_dagrun.py | Adds tests for the new DagRun selection behavior and updates an existing test to handle the new return type. |
…/add-max-new-dagruns-to-schedule
Co-authored-by: Copilot <[email protected]>
|
@Nataneljpwd — Removing the The label's contract is that the PR is ready for maintainer review — a regression like this means the PR temporarily isn't. Rebase your branch onto the latest
No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
…/add-max-new-dagruns-to-schedule
|
@Nataneljpwd — There is 1 unresolved review thread on this PR from If yes, please mark the thread as resolved and ping the reviewer ( If you are still working on the thread, please reply with what is outstanding so the thread stays unresolved on purpose. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
cc @kaxil waiting for 2nd review |
|
This one also needs a review from @ashb . Also cc @BIS7 @ephraimbuddy -- who might be interested in reviewing it |
…/add-max-new-dagruns-to-schedule
ashb
left a comment
There was a problem hiding this comment.
I'm not convinced that this is the right fix. Tuning and configuring the scheduler is already nigh on impossible I am wary of adding more.
Additionally, couldn't the already existing max active runs controls be used here? That would keep most of the dagruns in the Queued state, meaning the scheduler only looks at at most 16( by default I think) newly created runs and massively reduces the impact of "cause the scheduler to stall, as it has to both examine a lot of dagruns, and create new tasks for those dagruns." as it doesn't do that. That is why DagRuns can exist in the queued state.
Did you try this existing tunable first?
We have tried running both the scheduler count and the max dagruns per loop to schedule, and each time we had a different issue but I understand the concern, we have this locally and it fixed our problem, the main problem being is that dags are created in batches at our clusters, sometimes very large batches, and a new dagrun is heavier to process than a running one, mainly due to the fact of having to create tasks for it (when it starts) rather than other dagruns which occasionally (once tasks finish) create new tasks, while also having dagruns not moved to running due to processing large batches of new dagruns, we have tried increasing the scheduler count quite a bit, in addition to increasing the max dagruns per loop to schedule, which caused scheduler heartbeat timeouts as we had a lot of runs with mapped tasks, and so we had to also increase that configuration to a very big (10 minutes), that is in addition to dagruns timing out due to not being examined, and we even saw in the gant that there were large pauses between tasks where no task existed, and so dagruns could cause other dagruns to miss their sla, or even if I create a medium backfill, along with my regular dags which include mapped tasks, when I increase the number of examined dagruns, I get one of the issues stated above.
As states above, we had tried to tune it, we changed it to around 300 and even tripled the scheduler count, yet for both batch triggered runs and large backfills we still experienced the issue, we even tried dividing the batch size by a few times (spread more evenly), the scheduler either got a lot of queued dagruns and would never finish the batch OR when it was able to finish the batch it was reset quite often due to not emmiting a heartbeat and failing the readiness probe / having an oom / other dagruns timing out (which Is why we didn't increase the number beyond 300)
As states above, yes, we have tried, I am pretty sure we had tried all related configurations, as I have gone over all of the scheduler configurations |
|
No, not those. I mean the |
That as well, yet when our clients changed this we were unable to stay within the Dag's sla and more runs were created in a day than finished, yet it also happens when we have a lot of dags (over 1000) in one airflow instance where we limit the max active runs to 40 with a cluster policy, yet most clients use the default of 16 |
ephraimbuddy
left a comment
There was a problem hiding this comment.
This looks like a deployment specific mitigation. Do you have a simple repro/benchmark showing max_active_runs and other existing scheduler knobs cannot solve this?
Also, as I read it, the starvation comes from the nulls_first(last_scheduling_decision) ordering: never examined runs are always pulled to the front. Have you considered fixing the ordering itself instead? I think that would address the starvation without adding a new knob. Something like the below:
.order_by(
nulls_first(cast("ColumnElement[Any]", BackfillDagRun.sort_ordinal), session=session),
coalesce(cls.last_scheduling_decision, cls.run_after),
cls.run_after,
)Fair aging: never-examined runs are ordered by when they became eligible (run_after), not pulled ahead of everything. A run examined long ago still outranks one examined a second ago.
|
@Nataneljpwd A few things need addressing before review — see our Pull Request quality criteria.
No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
When new dagruns are created in bulk (i.e with triggerDagRunOperator), the scheduler might struggle with the amount created, and cause other dagruns to starve.
This is due to the sort order in get_running_dagruns_to_examine which selects (with a nulls first) by last scheduling decision, which means that if a lot of new dagruns are created, the scheduler will examine them first, and in situations where the dags have a lot of tasks (hundreds to tens of thousands) it can cause the scheduler to stall, as it has to both examine a lot of dagruns, and create new tasks for those dagruns.
When we have tried to tune the max_dagruns_per_loop_to_schedule we either got starvation of other dagruns OR the scheduler being reset due to not returning a heartbeat for a long time and failing the readiness probe.
To fix this, a new configuration is added, max_new_dagruns_per_loop_to_schedule which can help when a lot of new dagruns are created in large batches at the same time, and allow the scheduler to both look at existing dagruns (not starving them and causing them to timeout with no running / scheduled tasks) and create and manage the new dagruns.
Was generative AI tooling used to co-author this PR?
Important
🛠️ Maintainer triage note for @Nataneljpwd · by
@potiuk· 2026-06-17 14:51 UTCHelpful heads-up from the maintainers — please address before this PR can be reviewed:
Low dep tests:core / All-core:LowestDeps:14:3.10:Core...Serialization,MySQL tests: core / DB-core:MySQL:8.0:3.10:Core...Serialization,Postgres tests: core / DB-core:Postgres:14:3.10:Core...Serialization,Sqlite tests: core / DB-core:Sqlite:3.10:Core...Serialization. Reproduce and fix locally, then push.The ball is in your court — you've been assigned to this PR. Fix the above, then mark it Ready for review.
Automated triage — may be imperfect; a maintainer takes the next look.