Skip to content

Raise errors on view-table collisions#3380

Closed
smaheshwar-pltr wants to merge 2 commits into
apache:mainfrom
smaheshwar-pltr:sm/view-check-create-rename-register
Closed

Raise errors on view-table collisions#3380
smaheshwar-pltr wants to merge 2 commits into
apache:mainfrom
smaheshwar-pltr:sm/view-check-create-rename-register

Conversation

@smaheshwar-pltr
Copy link
Copy Markdown
Contributor

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

- Add `_raise_if_view_exists` helper that maps `view_exists(id) == True` at
  the target identifier to `TableAlreadyExistsError`. Catalogs without view
  support (most non-REST impls) raise `NotImplementedError` from
  `view_exists`; treat that as "no view at this identifier".
- Wire the check into `create_table`, `register_table`, and `rename_table`
  on SqlCatalog, HiveCatalog, GlueCatalog, DynamoDbCatalog, and
  BigQueryMetastoreCatalog (skipping methods that already raise
  `NotImplementedError`). Also call it from
  `MetastoreCatalog.create_table_transaction`.
- For `rename_table`, the check is on the destination identifier.
- RestCatalog intentionally unchanged: the server's 409 response is the
  authority and an extra client-side HEAD per call would be wasted.
- Re-uses `TableAlreadyExistsError` so existing `create_table_if_not_exists`
  callers see the same exception type they already catch.
- Tests: unit tests for the helper (returns True / False / raises
  `NotImplementedError`) in `tests/catalog/test_base.py`, plus
  parametrized behavior tests over InMemoryCatalog + SqlCatalog in
  `tests/catalog/test_catalog_behaviors.py` that monkeypatch `view_exists`
  to assert each call site invokes the helper.
- Drop parametrization over (memory, sql, sql_without_rowcount): all three
  share the same SqlCatalog code path, and the helper itself is covered by
  unit tests in `tests/catalog/test_base.py`. Use a single InMemoryCatalog
  via a new `single_catalog` fixture.
- Replace `lambda` patches with a recording fake so the tests assert *which*
  identifier was checked — matters for `rename_table`, where the
  destination (not the source) must be passed.
- Add post-condition assertions: after the raise, the target table must
  not exist (proves the check ran before any commit-side effect), and for
  rename, the source must still be present at its original identifier.
@smaheshwar-pltr smaheshwar-pltr changed the title Sm/view check create rename register Raise errors on view-table collisions May 18, 2026
view_collision = catalog.view_exists(identifier)
except NotImplementedError:
view_collision = False
if view_collision:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper mirrors the pattern from Java HiveCatalog and Java REST's RESTSessionCatalog.replaceTransaction. Re-uses TableAlreadyExistsError so existing create_table_if_not_exists callers keep working. Catalogs without view support raise NotImplementedError from view_exists — treat that as 'no view'.

sort_order: SortOrder = UNSORTED_SORT_ORDER,
properties: Properties = EMPTY_DICT,
) -> CreateTableTransaction:
_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliberate divergence from Java: Java HiveCatalog.ViewAwareTableBuilder overrides create() with the view check but does not override createTransaction(), leaving the eager-create path stricter than the txn path. Checking here fixes that asymmetry — a create_table_transaction that would collide with a view fails fast, before any metadata files are written.

Comment thread pyiceberg/catalog/sql.py
table_name = Catalog.table_name_from(identifier)
if not self.namespace_exists(namespace_identifier):
raise NoSuchNamespaceError(f"Namespace does not exist: {namespace_identifier}")
_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirrors Java HiveCatalog.ViewAwareTableBuilder.create(). Java JdbcCatalog only checks views on replaceTransaction, not on create — runtime no-op here since SqlCatalog.view_exists raises NotImplementedError, but the helper keeps the call site uniform across catalogs.

Comment thread pyiceberg/catalog/sql.py
table_name = Catalog.table_name_from(identifier)
if not self.namespace_exists(namespace):
raise NoSuchNamespaceError(f"Namespace does not exist: {namespace}")
_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirrors Java HiveCatalog.registerTable. Java JdbcCatalog inherits registerTable from BaseMetastoreCatalog without a view check — runtime no-op here for the same reason as create_table.

Comment thread pyiceberg/catalog/sql.py
to_table_name = Catalog.table_name_from(to_identifier)
if not self.namespace_exists(to_namespace):
raise NoSuchNamespaceError(f"Namespace does not exist: {to_namespace}")
_raise_if_view_exists(self, to_identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirrors Java HiveCatalog.renameTableOrView. Java JdbcCatalog.renameTable does the check too but gates it on schemaVersion == V1. PyIceberg's SqlCatalog has no equivalent V1/V2 split — view_exists raises NotImplementedError, so this is a no-op at runtime.

Comment thread pyiceberg/catalog/hive.py
ValueError: If the identifier is invalid.
"""
properties = {**DEFAULT_PROPERTIES, **properties}
_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1:1 with Java HiveCatalog.ViewAwareTableBuilder.create(). Hive is the only PyIceberg catalog that actually implements view_exists, so this is the call site where the check fires for real.

Comment thread pyiceberg/catalog/hive.py
if overwrite:
raise NotImplementedError("`overwrite` isn't supported")

_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1:1 with Java HiveCatalog.registerTable.

Comment thread pyiceberg/catalog/hive.py

if self.table_exists(to_identifier):
raise TableAlreadyExistsError(f"Table already exists: {to_table_name}")
_raise_if_view_exists(self, to_identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1:1 with Java HiveCatalog.renameTableOrView. The check is on the destination, matching Java.

Comment thread pyiceberg/catalog/glue.py

"""
database_name, table_name = self.identifier_to_database_and_table(identifier)
_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java GlueCatalog has no view support and no check here; closest reference is HiveCatalog.ViewAwareTableBuilder.create(). Runtime no-op until Glue gains view support — view_exists raises NotImplementedError.

Comment thread pyiceberg/catalog/glue.py
if overwrite:
raise NotImplementedError("`overwrite` isn't supported")

_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same shape as the other Glue call sites — no Java analog (Glue Java has no view support), Hive reference HiveCatalog.registerTable. Runtime no-op.

Comment thread pyiceberg/catalog/glue.py
"""
from_database_name, from_table_name = self.identifier_to_database_and_table(from_identifier, NoSuchTableError)
to_database_name, to_table_name = self.identifier_to_database_and_table(to_identifier)
_raise_if_view_exists(self, to_identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Java Glue analog. Hive reference: HiveCatalog.renameTableOrView. Runtime no-op.

)

database_name, table_name = self.identifier_to_database_and_table(identifier)
_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java DynamoDbCatalog has no view support. Closest reference is HiveCatalog.ViewAwareTableBuilder.create(). Runtime no-op.

"""
from_database_name, from_table_name = self.identifier_to_database_and_table(from_identifier, NoSuchTableError)
to_database_name, to_table_name = self.identifier_to_database_and_table(to_identifier)
_raise_if_view_exists(self, to_identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Java DynamoDb analog. Hive reference: HiveCatalog.renameTableOrView. Runtime no-op.

schema: Schema = self._convert_schema_if_needed(schema) # type: ignore

dataset_name, table_name = self.identifier_to_database_and_table(identifier)
_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No iceberg-java BigQuery analog. Hive reference for the semantics: HiveCatalog.ViewAwareTableBuilder.create(). Runtime no-op until BQ gains view support.

if overwrite:
raise NotImplementedError("`overwrite` isn't supported")

_raise_if_view_exists(self, identifier)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No iceberg-java BigQuery analog. Hive reference: HiveCatalog.registerTable. Runtime no-op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant