Skip to content

[SPARK-56742][PYTHON][TESTS] Skip string-to-decimal failure assertion on pandas 3 in test_type_coercion_string_to_numeric#55698

Open
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:fix-arrow-legacy-type-coercion-test
Open

[SPARK-56742][PYTHON][TESTS] Skip string-to-decimal failure assertion on pandas 3 in test_type_coercion_string_to_numeric#55698
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:fix-arrow-legacy-type-coercion-test

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 6, 2026

What changes were proposed in this pull request?

Gate one assertRaises(PythonException) block in ArrowPythonUDFTestsMixin.test_type_coercion_string_to_numeric on LooseVersion(pd.__version__) < "3.0.0". Specifically, the string("1","2") -> decimal failure assertion is skipped on pandas 3+. The other failure assertions ("1.1" -> int, "1.1" -> decimal) and all success cases are unchanged.

Why are the changes needed?

ArrowPythonUDFLegacyTests.test_type_coercion_string_to_numeric is failing on the scheduled Build / Python-only (master, Python 3.12, Pandas 3) job, e.g. https://github.com/apache/spark/actions/runs/25402959034/job/74508177526.

Root cause: pandas 3's StringDtype implements __arrow_array__. In PandasToArrowConversion.convert (python/pyspark/sql/conversion.py), the path is

mask = None if hasattr(series.array, "__arrow_array__") else series.isnull()
...
pa.Array.from_pandas(series, mask=mask, type=arrow_type, safe=safecheck)

On pandas 2 the result series of strings has object dtype, no __arrow_array__, and from_pandas with type=decimal128(...) raises ArrowTypeError ("int or Decimal object expected, got str") which surfaces as PythonException. On pandas 3 the series has StringDtype, mask is None, and the __arrow_array__ protocol cleanly casts "1" to Decimal("1") — the conversion silently succeeds, so assertRaises(PythonException) fails.

The non-legacy ArrowPythonUDF path is unaffected because it converts a Python list directly via pa.array(list, type=...), where pyarrow's per-element type check still rejects str for Decimal.

Does this PR introduce any user-facing change?

No. Test-only.

How was this patch tested?

Verified locally in a Python 3.13 + pandas 3.0.2 + pyarrow 23.0.1 conda env. All three suites pass:

$ python/run-tests --testnames \
    "pyspark.sql.tests.arrow.test_arrow_python_udf ArrowPythonUDFLegacyTests.test_type_coercion_string_to_numeric, \
     pyspark.sql.tests.arrow.test_arrow_python_udf ArrowPythonUDFTests.test_type_coercion_string_to_numeric, \
     pyspark.sql.tests.arrow.test_arrow_python_udf ArrowPythonUDFNonLegacyTests.test_type_coercion_string_to_numeric"
...
Tests passed in 11 seconds

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

…as 3 in test_type_coercion_string_to_numeric

`ArrowPythonUDFLegacyTests.test_type_coercion_string_to_numeric` started failing on the
"Build / Python-only (master, Python 3.12, Pandas 3)" job. pandas 3's `StringDtype`
implements `__arrow_array__`, which lets pyarrow coerce integer-like strings (e.g. "1")
to decimal. On older pandas the same call raised `ArrowTypeError` from
`pa.Array.from_pandas` because the series was object dtype, so the existing
`assertRaises(PythonException)` for `string -> decimal` triggered. Gate that single
assertion on pandas < 3.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng changed the title [WIP][PYTHON][TESTS] Skip string-to-decimal failure assertion on pandas 3 in test_type_coercion_string_to_numeric [SPARK-56742][PYTHON][TESTS] Skip string-to-decimal failure assertion on pandas 3 in test_type_coercion_string_to_numeric May 6, 2026
@zhengruifeng zhengruifeng marked this pull request as ready for review May 6, 2026 10:16
@zhengruifeng zhengruifeng requested a review from HyukjinKwon May 6, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant