Skip to content

fix(spark): ANSI-aware custom nullability for make_interval#22057

Open
EshwarCVS wants to merge 3 commits intoapache:mainfrom
EshwarCVS:main
Open

fix(spark): ANSI-aware custom nullability for make_interval#22057
EshwarCVS wants to merge 3 commits intoapache:mainfrom
EshwarCVS:main

Conversation

@EshwarCVS
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

Spark's make_interval nullability follows this rule (from Spark source):

nullable = if (failOnError) children.exists(_.nullable) else true
Where failOnError maps to DataFusion's enable_ansi_mode. The previous
default implementation always reports nullable = true, which is correct
for non-ANSI mode (overflow silently returns NULL) but too pessimistic for
ANSI mode (overflow throws, so output is only nullable when inputs are).

Previous attempts (#19248, #19606) tried to derive nullability from input
fields inside return_field_from_args, but ReturnFieldArgs carries no
ConfigOptions, so ANSI mode was inaccessible.

What changes are included in this PR?

Add ansi_mode: bool field to SparkMakeInterval, populated via a new
new_with_config(config) constructor.
Implement with_updated_config so the planner keeps the UDF in sync with
session config (same pattern as SparkCast).
Implement return_field_from_args with Spark's exact rule:
nullary call → nullable = false (always returns zero interval)
ANSI mode on → nullable = any input field is nullable
ANSI mode off (default) → nullable = true
In ANSI mode, make_interval_kernel now returns exec_err! on arithmetic
overflow instead of silently appending NULL.

Are these changes tested?

Yes. Six new unit tests cover all branches:

return_field_nullary_is_not_nullable — zero-arg call is never nullable
return_field_non_ansi_always_nullable — non-ANSI: always nullable even with non-null inputs
return_field_ansi_mode_not_nullable_when_inputs_not_null — ANSI + non-null inputs → not nullable
return_field_ansi_mode_nullable_when_any_input_nullable — ANSI + nullable input → nullable
ansi_mode_overflow_returns_error — ANSI mode overflow is an error, not NULL
non_ansi_overflow_returns_null — non-ANSI overflow is still NULL (regression guard)
All 14 tests (8 existing + 6 new) pass.

Are there any user-facing changes?

Non-ANSI mode (default): no behavior change.
ANSI mode: make_interval with non-nullable inputs now correctly
reports a non-nullable output field, and arithmetic overflow raises an
error instead of silently returning NULL.

claude and others added 2 commits May 7, 2026 04:49
Spark's make_interval nullability rule mirrors `failOnError`:
- nullary call → not nullable (always returns zero interval)
- ANSI mode on (failOnError=true) → nullable only when any input is nullable
- ANSI mode off (default) → always nullable (overflow silently returns NULL)

Previous PRs (apache#19248, apache#19606) could not implement this correctly because
ReturnFieldArgs carries no ConfigOptions. The fix follows the SparkCast
pattern: store `ansi_mode` as a struct field, populate it via
`with_updated_config`, and use `self.ansi_mode` in `return_field_from_args`.

In ANSI mode, arithmetic overflow in make_interval_kernel now returns an
error instead of silently producing NULL, matching Spark's behavior.

Closes apache#19155
fix(spark): implement ANSI-aware custom nullability for make_interval
@github-actions github-actions Bot added the spark label May 7, 2026
@kumarUjjawal
Copy link
Copy Markdown
Contributor

@EshwarCVS can you take a look at this #19155 (comment)

@github-actions github-actions Bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates catalog Related to the catalog crate functions Changes to functions implementation ffi Changes to the ffi crate labels May 7, 2026
…onfig access

Resolves the limitation described in apache#19155 where
`ReturnFieldArgs` (used by `return_field_from_args` at planning time)
had no access to `ConfigOptions`, forcing UDFs to store config-derived
state as struct fields via `with_updated_config`.

Changes:
- Add `config_options: &'a ConfigOptions` field to `ReturnFieldArgs`
- Physical planning path (`ScalarFunctionExpr::try_new`) passes the real
  session config options, giving UDFs correct config at physical plan time
- Logical planning path (`expr_schema.rs`) passes `ConfigOptions::default()`
  as before (UDFs that need config-correct logical plans still use
  `with_updated_config`)
- FFI conversion stores a default `ConfigOptions` in `ForeignReturnFieldArgsOwned`
  since config cannot cross the FFI boundary
- Update `SparkMakeInterval::return_field_from_args` to read ANSI mode
  directly from `args.config_options.execution.enable_ansi_mode` instead
  of `self.ansi_mode`, demonstrating the intended usage pattern
- Update all ~40 call sites with `config_options: &ConfigOptions::default()`
  (test sites) or the actual session config (production sites)

Closes apache#19155
@EshwarCVS
Copy link
Copy Markdown
Author

EshwarCVS commented May 7, 2026

@EshwarCVS can you take a look at this #19155 (comment)

updated
Lmk if it needs any further changes

@kumarUjjawal
Copy link
Copy Markdown
Contributor

I am not in favor of this PR. For the default/non-ANSI mode, the current behavior already matches Spark: make_interval is nullable and overflow returns NULL.

The ANSI-mode part needs a broader design discussion because it changes core UDF return-field API and still does not prove the full SQL/planning path works correctly.

Given that, I think we should close this PR for now instead of making this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spark make_interval need to have custom nullability

3 participants