[PYTHON][TESTS] Skip string-to-decimal assertions in test_type_coercion_string_to_numeric on Pandas 3#55701
Draft
zhengruifeng wants to merge 1 commit intoapache:masterfrom
Conversation
849114a to
953b92c
Compare
953b92c to
356b888
Compare
…on_string_to_numeric on Pandas 3 ### What changes were proposed in this pull request? Skip the two `assertRaises(PythonException)` blocks for `string -> decimal` casts in `ArrowPythonUDFTestsMixin.test_type_coercion_string_to_numeric` when the active pandas defaults to a non-`object` (Arrow-backed) string dtype. Detect via `pd.Series(["x"]).dtype == object` so the assertions still run on Pandas 2 with default settings. ### Why are the changes needed? In Pandas 3, `pd.Series(['1', '2'])` is backed by `ArrowStringArrayNumpySemantics`. `pa.Array.from_pandas(series, type=pa.decimal128(...))` then silently casts the strings to decimal instead of raising `ArrowTypeError`. The legacy `SQL_ARROW_BATCHED_UDF` path goes through `PandasToArrowConversion.convert(...)` and depends on that exception to surface a `PythonException`, so the existing assertions stop holding under Pandas 3. The `string '1.1' -> int` assertion is unaffected because the cast fallback also fails. This was originally observed in [the Pandas-3 build for `master` at SHA ca4d88d](https://github.com/apache/spark/actions/runs/25402959034/job/74508177559) - `ArrowPythonUDFLegacyTests::test_type_coercion_string_to_numeric` failing with `AssertionError: PythonException not raised`. ### Does this PR introduce _any_ user-facing change? No. Test-only change. ### How was this patch tested? Tested locally in a Pandas 3 environment (`pandas==3.0.2`, `pyarrow==23.0.1`, Python 3.13.12, `future.infer_string=True` by default - matching the failing CI image): * `ArrowPythonUDFTests/LegacyTests/NonLegacyTests::test_type_coercion_string_to_numeric` - 3 passed. * `ArrowPythonUDFParityTests/ParityLegacyTests/ParityNonLegacyTests::test_type_coercion_string_to_numeric` (connect) - 3 passed. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-7)
356b888 to
6b93dea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Skip the two
assertRaises(PythonException)blocks forstring -> decimalcasts inArrowPythonUDFTestsMixin.test_type_coercion_string_to_numericwhen the active pandas defaults to a non-object(Arrow-backed) string dtype. Detect viapd.Series(["x"]).dtype == objectso the assertions still run on Pandas 2 with default settings.Why are the changes needed?
In Pandas 3,
pd.Series(['1', '2'])is backed byArrowStringArrayNumpySemantics.pa.Array.from_pandas(series, type=pa.decimal128(...))then silently casts the strings to decimal instead of raisingArrowTypeError. The legacySQL_ARROW_BATCHED_UDFpath goes throughPandasToArrowConversion.convert(...)and depends on that exception to surface aPythonException, so the existing assertions stop holding under Pandas 3. Thestring '1.1' -> intassertion is unaffected because the cast fallback also fails.This was originally observed in the Pandas-3 build for
masterat SHA ca4d88d —ArrowPythonUDFLegacyTests::test_type_coercion_string_to_numericfailing withAssertionError: PythonException not raised.Does this PR introduce any user-facing change?
No. Test-only change.
How was this patch tested?
Tested locally in a Pandas 3 environment (
pandas==3.0.2,pyarrow==23.0.1, Python 3.13.12,future.infer_string=Trueby default — matching the failing CI image):ArrowPythonUDFTests/LegacyTests/NonLegacyTests::test_type_coercion_string_to_numeric— 3 passed.ArrowPythonUDFParityTests/ParityLegacyTests/ParityNonLegacyTests::test_type_coercion_string_to_numeric(connect) — 3 passed.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)