Skip to content

fix: tables schema command replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library#229

Open
pkontek wants to merge 17 commits into
microsoft:mainfrom
SoletPL:tables_schema_fix
Open

fix: tables schema command replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library#229
pkontek wants to merge 17 commits into
microsoft:mainfrom
SoletPL:tables_schema_fix

Conversation

@pkontek
Copy link
Copy Markdown

@pkontek pkontek commented Apr 30, 2026

📥 Pull Request

✨ Description of new changes

Summary: The fab tables schema command failed with [InvalidDeltaTable] Failed to extract the table schema for Delta tables where pre-checkpoint JSON commit log files had been cleaned up. The previous implementation manually scanned _delta_log/*.json files via the OneLake API looking for a metaData entry — an approach that breaks once Delta Lake compacts logs into .checkpoint.parquet files and removes the preceding JSON entries.

Context: Delta Lake creates a checkpoint every 10 transactions by default and retains log files according to the configured retention policy. Once older JSON commit files are removed, the metaData entry (which carries the schema) is no longer available in the remaining JSON logs, causing the command to fail even for healthy, accessible tables.

Dependencies: Adds deltalake>=0.18.0 as a new dependency. The deltalake library correctly resolves the table schema from both JSON logs and Parquet checkpoints using the standard Delta protocol, making the implementation robust and protocol-compliant.

Changes:

  • Replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library
  • Add deltalake>=0.18.0 to project dependencies in pyproject.toml
  • Simplify fab_tables_schema.py by removing ~50 lines of fragile log-parsing logic
  • Correct typo in error constant for invalid Delta table

Copilot AI review requested due to automatic review settings April 30, 2026 13:17
@pkontek pkontek requested a review from a team as a code owner April 30, 2026 13:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes fab tables schema failing on Delta tables whose pre-checkpoint JSON logs were vacuumed, by switching schema extraction to the Delta protocol via the deltalake library.

Changes:

  • Replaced manual _delta_log/*.json scanning with deltalake.DeltaTable(...).schema() for checkpoint-aware schema extraction.
  • Added deltalake>=0.18.0 to project dependencies.
  • Renamed typo’d error constant ERROR_INVALID_DETLA_TABLE to ERROR_INVALID_DELTA_TABLE and wired it into the tables schema command.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/fabric_cli/core/fab_constant.py Fixes typo in error constant name; minor formatting cleanup.
src/fabric_cli/commands/tables/fab_tables_schema.py Refactors schema retrieval to use deltalake over ABFSS + token, removing fragile log parsing.
pyproject.toml Adds deltalake dependency required for robust schema extraction.
.changes/unreleased/fixed-20260430-130558.yaml Adds release note entry for the schema extraction fix.

Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py
Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings April 30, 2026 13:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@pkontek pkontek changed the title Tables schema fix fix: tables schema command replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library Apr 30, 2026
@pkontek pkontek requested a review from Copilot May 4, 2026 19:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py
Copilot AI review requested due to automatic review settings May 7, 2026 08:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Copilot AI review requested due to automatic review settings May 7, 2026 08:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread pyproject.toml
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py
Copilot AI review requested due to automatic review settings May 7, 2026 09:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

@shirasassoon shirasassoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkontek thanks for submitting this PR to address a legitimate bug in Fabric CLI! I have started reviewing the PR and added some comments. Please take a look at your earliest convenience.

Comment thread tests/test_commands/test_tables_schema.py
Comment thread tests/test_commands/test_tables_schema.py Outdated
Comment thread tests/test_commands/test_tables_schema.py Outdated
Comment thread tests/test_commands/test_tables_schema.py
Copilot AI review requested due to automatic review settings May 15, 2026 21:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment thread .vscode/settings.json Outdated
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Copilot AI review requested due to automatic review settings May 15, 2026 22:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 15, 2026 22:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@pkontek
Copy link
Copy Markdown
Author

pkontek commented Jun 2, 2026

@shirasassoon is there anything else I can do to close this pull request?

Copy link
Copy Markdown
Collaborator

@ayeshurun ayeshurun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this, @pkontek 🙏 Appreciate the thorough test scaffolding and the changie entry.

Before we can land it, could you take a look at CI? The "Test on Python 3.12" job is red and the other Python versions got cancelled by fail-fast.

return schema_fields
except (DeltaError, json.JSONDecodeError, ValueError) as exc:
raise FabricCLIError(
f"Failed to extract the table schema. Please ensure the path points to a valid Delta table: {exc}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe exception strings can include the full abfss://… URI (containing workspace & item GUIDs), internal Rust diagnostics, and storage-backend details. That violates the supportability/security guidance in commands.instructions.md ("Never log tokens, cookies, correlation IDs, or PII").

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation went through fab_api_onelake (centralized retries, timeouts, telemetry, request-id surfacing, and respect for FAB_API_ENDPOINT_ONELAKE testing hooks via fab_api_client.py). The new code constructs an abfss:// URI and hands the bearer token directly to delta-rs's Rust object_store.

That conflicts with commands.instructions.md - "All API calls must be invoked via centralized request utilities", "Use the common API client and common retry/backoff policy - don't hand-roll session logic".

Concretely we lose:

  • Bounded --timeout semantics
  • Exponential-backoff retries on transient 5xx/429
  • Request-ID propagation surfaced via OnelakeAPIError
  • Correlated debug logging

Please wrap the DeltaTable(...) construction in a thin helper under fabric_cli/client/ (e.g., fab_delta_client.py) that owns storage_options assembly, and error handling for exceptions → FabricCLIError mapping, and provides a single seam for future retry/timeout/telemetry.

Comment on lines +29 to +32
if args.schema:
local_path = f"Tables/{args.schema}/{args.table_name}"
else:
local_path = f"Tables/{args.table_name}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass the resolved context.local_path from fab_tables.py::schema_command (either directly or via an args.table_local_path populated in add_table_props_to_args). Drop the manual f"Tables/..." reconstruction.

Comment on lines +29 to +32
if args.schema:
local_path = f"Tables/{args.schema}/{args.table_name}"
else:
local_path = f"Tables/{args.table_name}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code defensively called utils.remove_dot_suffix(local_path) (default suffix .Shortcut). add_table_props_to_args sets args.table_name = table_path[-1] without stripping .Shortcut. If a user runs fab table schema /ws.Workspace/lh.Lakehouse/Tables/my_table.Shortcut, the new URI becomes …/Tables/my_table.Shortcut, which won't exist in storage.

PLease add a regression test for the .Shortcut path and either (a) call remove_dot_suffix(args.table_name) defensively here, or (b) confirm in add_table_props_to_args that the suffix is normalized upstream.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command_support.yaml lists table schema as supported for lakehouse, warehouse, kql_database, mirrored_database, semantic_model, sql_database. The new hardcoded Tables/… URI assumes the lakehouse/warehouse OneLake layout. Several of those item types do not expose Delta tables under Tables/ the same way (notably semantic_model).

Please prove (via test) that the new URI works for all listed item types, or narrow add a validation and fail in code for the non supported item in deltatable flow.

commit_data = json.loads(obj)
if "metaData" in commit_data:
metadata = commit_data["metaData"]
schema = metadata["schemaString"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code returned json.loads(metaData.schemaString)["fields"], the Delta protocol's field schema. The new code returns json.loads(DeltaTable.schema().to_json())["fields"], delta-rs's serialization of the Arrow schema. For complex types (nested struct, decimal, timestamp, map), the JSON shapes can differ ("type": "long" vs "type": "integer", nested vs string representation, etc.).

Could you please add a unit test with at least one nested/decimal/timestamp field and assert the exact output structure to lock the contract? Otherwise this is a silent breaking change for users who pipe --output_format json into scripts.

raise FabricCLIError(
"Failed to extract the table schema. Please ensure the path points to a valid Delta table",
fab_constant.ERROR_INVALID_DETLA_TABLE,
"Failed to obtain access token.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Failed to obtain access token." already exists as AuthErrors.access_token_failed(). Per commands.instructions.md, reuse messages from src/fabric_cli/errors/.py:

from fabric_cli.errors import ErrorMessages
raise FabricCLIError(
ErrorMessages.Auth.access_token_failed(),
fab_constant.ERROR_AUTHENTICATION_FAILED,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests don't actually prove the fix.

Every test mocks DeltaTable, so none of them exercises the compacted-log scenario the PR is fixing. The tests prove the dispatch/mapping logic, not the bug fix.

Please add at least one test that uses a real local Delta table fixture (delta-rs can write a tiny one in a tmp dir) with a checkpoint and no pre-checkpoint JSON logs, that's the actual regression test for #228.

from tests.test_commands.commands_parser import CLIExecutor


class TestTablesSchemaUnit:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test.instructions.md classifies pure unit tests (no network, no VCR) under tests/test_core/** or tests/test_utils/**. THis class TestTablesSchemaUnit is a pure unit test class - please move it to tests/test_core/ (or splitting the file) and leaving only TestTablesSchemaIntegration under tests/test_commands/.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because DeltaTable and FabAuth are both mocked in TestTablesSchemaIntegration, the 258-line test_table_schema_success.yaml cassette only exercises workspace+lakehouse CRUD that already has coverage elsewhere. Consider dropping the cassette and converting the integration test into a pure dispatch test (parser → command function call) without VCR — much less maintenance burden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] tables schema command fails to extract schema from Delta tables with checkpoint files

4 participants