Skip to content

fix(rest): skip Hadoop-only vended storage credentials during resolution#3241

Open
plusplusjiajia wants to merge 1 commit into
apache:mainfrom
plusplusjiajia:fix/skip-hadoop-only-vended-credentials
Open

fix(rest): skip Hadoop-only vended storage credentials during resolution#3241
plusplusjiajia wants to merge 1 commit into
apache:mainfrom
plusplusjiajia:fix/skip-hadoop-only-vended-credentials

Conversation

@plusplusjiajia
Copy link
Copy Markdown
Member

Rationale for this change

REST catalogs can return multiple StorageCredential entries per table to serve different client runtimes. A common pattern is one entry with Hadoop-style fs.* keys alongside a second entry with canonical s3.* / gs.* keys consumed by the cloud-native SDKs).
Java's FileIO implementations each filter vended credentials down to their own key namespace. S3FileIO.clientForStoragePath() only consumes entries with an s3-prefixed label (S3FileIO.java:413-414) and, when no URI prefix matches the storage path, falls back to the client keyed at the root "s3" prefix. pyiceberg has no HadoopFileIO, so Hadoop-style credential bundles have no consumer on the Python side; but _resolve_storage_credentials did a blind longest-prefix URI match across the full credential list, so when a Hadoop-style entry happened to be the longest URI-prefix match for a given location the Python FileIO ended up with fs.* keys it cannot use, and silently fell through to unauthenticated access.

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label May 16, 2026
@plusplusjiajia plusplusjiajia force-pushed the fix/skip-hadoop-only-vended-credentials branch from d68ade8 to a1315d1 Compare May 17, 2026 01:05
@plusplusjiajia plusplusjiajia force-pushed the fix/skip-hadoop-only-vended-credentials branch from a1315d1 to 4a4f18c Compare May 17, 2026 06:38
@plusplusjiajia
Copy link
Copy Markdown
Member Author

Hi @kevinjqliu — would you have a chance to take a look at this when you get cycles? Small fix to _resolve_storage_credentials (the function from #3042).

@github-actions github-actions Bot removed the stale label May 18, 2026
_PLANNING_RESPONSE_ADAPTER = TypeAdapter(PlanningResponse)


def _is_hadoop_only_config(config: Properties) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is a dict[str, str] according to the OpenAPI doc

# Java S3FileIO falls back to the "s3" ROOT_PREFIX credential; scope it to
# schemes pyarrow's S3FileSystem handles so non-S3 schemes (gs://, abfs://,
# etc.) don't get handed s3.* keys.
if best_match is None and location.startswith(("s3://", "s3a://", "s3n://", "oss://")):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't s3:// get caught by line 477 already?

# Java S3FileIO falls back to the "s3" ROOT_PREFIX credential; scope it to
# schemes pyarrow's S3FileSystem handles so non-S3 schemes (gs://, abfs://,
# etc.) don't get handed s3.* keys.
if best_match is None and location.startswith(("s3://", "s3a://", "s3n://", "oss://")):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we want s3 prefixed credentials to get mapped to s3a + s3n. What's oss here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants