fix(rest): skip Hadoop-only vended storage credentials during resolution by plusplusjiajia · Pull Request #3241 · apache/iceberg-python

plusplusjiajia · 2026-04-15T07:42:44Z

Rationale for this change

REST catalogs can return multiple StorageCredential entries per table to serve different client runtimes. A common pattern is one entry with Hadoop-style fs.* keys alongside a second entry with canonical s3.* / gs.* keys consumed by the cloud-native SDKs).
Java's FileIO implementations each filter vended credentials down to their own key namespace. S3FileIO.clientForStoragePath() only consumes entries with an s3-prefixed label (S3FileIO.java:413-414) and, when no URI prefix matches the storage path, falls back to the client keyed at the root "s3" prefix. pyiceberg has no HadoopFileIO, so Hadoop-style credential bundles have no consumer on the Python side; but _resolve_storage_credentials did a blind longest-prefix URI match across the full credential list, so when a Hadoop-style entry happened to be the longest URI-prefix match for a given location the Python FileIO ended up with fs.* keys it cannot use, and silently fell through to unauthenticated access.

github-actions · 2026-05-16T00:42:29Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

plusplusjiajia · 2026-05-17T06:40:59Z

Hi @kevinjqliu — would you have a chance to take a look at this when you get cycles? Small fix to _resolve_storage_credentials (the function from #3042).

rambleraptor · 2026-05-18T18:02:56Z

 _PLANNING_RESPONSE_ADAPTER = TypeAdapter(PlanningResponse)


+def _is_hadoop_only_config(config: Properties) -> bool:


Config is a dict[str, str] according to the OpenAPI doc

rambleraptor · 2026-05-18T18:11:11Z

+        # Java S3FileIO falls back to the "s3" ROOT_PREFIX credential; scope it to
+        # schemes pyarrow's S3FileSystem handles so non-S3 schemes (gs://, abfs://,
+        # etc.) don't get handed s3.* keys.
+        if best_match is None and location.startswith(("s3://", "s3a://", "s3n://", "oss://")):


Wouldn't s3:// get caught by line 477 already?

rambleraptor · 2026-05-18T18:11:43Z

+        # Java S3FileIO falls back to the "s3" ROOT_PREFIX credential; scope it to
+        # schemes pyarrow's S3FileSystem handles so non-S3 schemes (gs://, abfs://,
+        # etc.) don't get handed s3.* keys.
+        if best_match is None and location.startswith(("s3://", "s3a://", "s3n://", "oss://")):


I understand that we want s3 prefixed credentials to get mapped to s3a + s3n. What's oss here?

github-actions Bot added the stale label May 16, 2026

plusplusjiajia force-pushed the fix/skip-hadoop-only-vended-credentials branch from d68ade8 to a1315d1 Compare May 17, 2026 01:05

fix(rest): skip Hadoop-only vended storage credentials during resolution

4a4f18c

plusplusjiajia force-pushed the fix/skip-hadoop-only-vended-credentials branch from a1315d1 to 4a4f18c Compare May 17, 2026 06:38

github-actions Bot removed the stale label May 18, 2026

rambleraptor reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rest): skip Hadoop-only vended storage credentials during resolution#3241

fix(rest): skip Hadoop-only vended storage credentials during resolution#3241
plusplusjiajia wants to merge 1 commit into
apache:mainfrom
plusplusjiajia:fix/skip-hadoop-only-vended-credentials

plusplusjiajia commented Apr 15, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

plusplusjiajia commented May 17, 2026

Uh oh!

rambleraptor May 18, 2026

Uh oh!

rambleraptor May 18, 2026

Uh oh!

rambleraptor May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_PLANNING_RESPONSE_ADAPTER = TypeAdapter(PlanningResponse)


		def _is_hadoop_only_config(config: Properties) -> bool:

Conversation

plusplusjiajia commented Apr 15, 2026

Rationale for this change

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

plusplusjiajia commented May 17, 2026

Uh oh!

rambleraptor May 18, 2026

Choose a reason for hiding this comment

Uh oh!

rambleraptor May 18, 2026

Choose a reason for hiding this comment

Uh oh!

rambleraptor May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants