Add a standard toggle for resumability to ResumableJobMixin#68623
Add a standard toggle for resumability to ResumableJobMixin#68623amoghrajesh wants to merge 8 commits into
Conversation
|
Converting to draft as I am unsure if this is the best solution. |
ashb
left a comment
There was a problem hiding this comment.
It's probably worth mentioning :param durable: etc in the doc string of SparkSubmitOperator explicitly?
LGTM I think, but lets get @vikramkoka and @kaxil's view on this.
| * The Airflow worker must be able to reach the Kubernetes API server and have permission to | ||
| read and delete pods in the driver's namespace; otherwise pod tracking and cleanup will fail. | ||
| * Set ``reconnect_on_retry=True`` (the default) to enable crash recovery: the driver pod name is | ||
| * Set ``resume_on_retry=True`` (the default) to enable crash recovery: the driver pod name is |
There was a problem hiding this comment.
Needs updating to reflect new name
There was a problem hiding this comment.
The import is from airflow.sdk import ResumableJobMixin but the mixin itself lives in task-sdk/src/airflow/sdk/bases/resumablejobmixin.py -- is that import right (as in using the canonical location)?
There was a problem hiding this comment.
from airflow.sdk import ResumableJobMixin is the right one to use, its exported and documented
Yes it is, mentioned it in handling comments from ash |
(cherry picked from commit 546469d70ec3efc373bfa1d73c2f8d8d79b5cd03)
Was generative AI tooling used to co-author this PR?
Why
ResumableJobMixinhad no standard on/off switch for crash recovery. WhenSparkSubmitOperatorwasported to resumability, I added a
reconnect_on_retryparameter, but any future resumable operatorwould need to invent their own name — making the flag inconsistent across operators and impossible to
measure uniformly.
What changed
ResumableJobMixinnow ownsdurable: bool = Truein its own__init__, decorated with@BaseOperatorMeta._apply_defaults. Operators that inherit(ResumableJobMixin, BaseOperator)getthe flag for free — no redeclaration needed,
default_argsinjection and.partial()workautomatically.
False,execute_resumable()skips alltask_state_storeinteraction and runs a plainsubmit/poll/result cycle.
SparkSubmitOperator.reconnect_on_retryis renamed todurable. The per-mode if/else branching inexecute()is removed and all three tracking paths (standalone, K8s, YARN) now callexecute_resumable(context)directly.reconnect_on_retryis kept as a deprecated alias for backcompat with 6.1.0 — passing it raises anAirflowProviderDeprecationWarningand maps todurable.Impact on operators using resumability
No behavior change.
durabledefaults toTrue, so crash recovery is on by default exactly as before.How to opt out
Set
durable=Falseon the task:Or via
default_argsto disable for all tasks in a DAG:Implementation note
ResumableJobMixin.__init__is decorated directly with@BaseOperatorMeta._apply_defaults. Whensuper().__init__(**kwargs)reaches the mixin,apply_defaultsfires, seesdurablein thesignature, and injects it from
default_argsif not explicitly set. Operators just need(ResumableJobMixin, BaseOperator)ordering andsuper().__init__(**kwargs).{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.