Skip to content

[BUG] Lineage Databricks is not performed for external tables using path-based queries. #27561

@matheu-spereira

Description

@matheu-spereira

Affected module
Ingestion Framework

Describe the bug
External tables in Databricks are not generating data lineage correctly in OpenMetadata.

According to the Databricks documentation, when external tables are referenced using their cloud storage path (e.g., delta.\s3://...), the lineage is recorded using the source_path and target_path fields instead of source_table_full_name or target_table_full_name.

However, OpenMetadata does not seem to properly interpret or map this lineage information, resulting in missing lineage for external tables.
Documentation Datrabricks: https://docs.databricks.com/aws/en/admin/system-tables/lineage

To Reproduce
1 - Create or use an external table in Databricks backed by cloud storage (e.g., ADLS).
2 - Create an external table pointing to a table in external storage. Example: (CREATE TABLE bronze_ns.deltalake_ns.managed_table_ns AS SELECT * FROM delta.abfss://raw@[storage].dfs.core.windows.net/external_table)
3 - Verify lineage data in Databricks:
4 - Run OpenMetadata ingestion for Databricks lineage.
5 - Check the lineage graph in OpenMetadata.

Screenshots or steps to reproduce
Databricks:
Image

Image OpenMetadata: Image

Expected behavior

  • Correctly associate these paths with the corresponding external table entities.
  • Display full lineage (upstream/downstream) for external tables in the UI.

Version:

  • OS: Ubuntu
  • Python version: 3.10.19
  • OpenMetadata version: 1.12.5
  • OpenMetadata Ingestion package version: 1.12.5

Additional context
When referencing an external table using its cloud storage path, the lineage is recorded using the path instead of the table name. Therefore, the lineage is not generated in OpenMetadata.
Required permissions for lineage extraction were granted:

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions