|
1 | | -A **lakehouse** presents as a database and is built on top of a data lake using Delta format tables. Lakehouses combine the SQL-based analytical capabilities of a relational data warehouse and the flexibility and scalability of a data lake. Lakehouses store all data formats and can be used with various analytics tools and programming languages. As cloud-based solutions, lakehouses can scale automatically and provide high availability and disaster recovery. |
| 1 | +Traditional analytics architectures often force you to choose between two approaches. Data lakes offer flexibility and scalability but lack the structure and performance needed for business analytics. Data warehouses provide strong analytical capabilities but struggle with diverse data formats and can be costly to scale. **Lakehouses** bridge this gap by bringing database-like capabilities directly to your data lake, eliminating the need to maintain separate systems for different workloads. |
2 | 2 |
|
3 | 3 |  |
4 | 4 |
|
5 | | -Some benefits of a lakehouse include: |
| 5 | +## Understand lakehouse design |
6 | 6 |
|
7 | | -- Lakehouses use Spark and SQL engines to process large-scale data and support machine learning or predictive modeling analytics. |
8 | | -- Lakehouse data is organized in a *schema-on-read format*, which means you define the schema as needed rather than having a predefined schema. |
9 | | -- Lakehouses support ACID (Atomicity, Consistency, Isolation, Durability) transactions through Delta Lake formatted tables for data consistency and integrity. |
10 | | -- Lakehouses are a single location for data engineers, data scientists, and data analysts to access and use data. |
| 7 | +A lakehouse organizes data into two main areas: **Tables** and **Files**. Understanding this separation helps you design effective data workflows. |
11 | 8 |
|
12 | | -A lakehouse is a great option if you want a scalable analytics solution that maintains data consistency. It's important to evaluate your specific requirements to determine which solution is the best fit. |
| 9 | +**Tables folder**: This folder contains Delta Lake tables that provide structured, queryable data. Tables in this folder: |
13 | 10 |
|
14 | | -## Load data into a lakehouse |
| 11 | +- Support SQL queries through the SQL analytics endpoint |
| 12 | +- Enforce schemas and support ACID transactions |
| 13 | +- Can be accessed in Power BI for reporting |
| 14 | +- Benefit from automatic optimization and maintenance |
15 | 15 |
|
16 | | -Fabric lakehouses are a central element for your analytics solution. You can follow the ETL (Extract, Transform, Load) process to ingest and transform data before loading to the lakehouse. |
| 16 | +**Files folder**: This folder stores raw or semi-structured data files in their native format. Files in this folder: |
17 | 17 |
|
18 | | -You can ingest data in many common formats from various sources, including local files, databases, or APIs. You can also create Fabric **shortcuts** to data in external sources, such as Azure Data Lake Store Gen2 or OneLake. Use the Lakehouse explorer to browse files, folders, shortcuts, and tables and view their contents within the Fabric platform. |
| 18 | +- Support any file format (CSV, JSON, Parquet, images, documents) |
| 19 | +- Provide flexibility for data exploration and processing |
| 20 | +- Can be staged before transformation into tables |
| 21 | +- Don't enforce schema or support direct SQL queries |
19 | 22 |
|
20 | | -Ingested data can be transformed and then loaded using either Apache Spark with notebooks or Dataflows Gen2. Use Data Factory pipelines to orchestrate your different ETL activities and land the prepared data into your lakehouse. |
| 23 | +This separation lets you maintain both raw data (for compliance or reprocessing) and structured tables (for analytics) within the same lakehouse. You can process files using Spark notebooks or Dataflows Gen2, then load the results into tables for querying and reporting. |
21 | 24 |
|
22 | | -> [!NOTE] |
23 | | -> Dataflows Gen2 are based on Power Query - a familiar tool to data analysts using Excel or Power BI that provides visual representation of transformations as an alternative to traditional programming. |
| 25 | +## Understand Delta Lake tables |
| 26 | + |
| 27 | +At the heart of a lakehouse are **Delta Lake tables**. Delta Lake is an open-source storage layer that brings reliability to data lakes. When you create a table in a lakehouse, the data is stored in Delta format in the underlying OneLake storage. |
| 28 | + |
| 29 | +Delta Lake tables provide several key advantages: |
| 30 | + |
| 31 | +- **ACID transactions**: Delta Lake ensures data consistency even when multiple users read and write data simultaneously. |
| 32 | +- **Schema enforcement**: Delta Lake validates that the data you write matches the table schema, preventing corrupt data. |
| 33 | +- **Time travel**: Delta Lake maintains a transaction log that lets you query previous versions of your data or roll back changes. |
| 34 | +- **Efficient updates and deletes**: Unlike traditional data lake files, Delta tables support efficient update and delete operations. |
| 35 | + |
| 36 | +Each Delta table consists of Parquet data files plus a transaction log that tracks all changes. This architecture enables both batch and streaming workloads to work reliably with the same data. |
24 | 37 |
|
25 | | -You can use your lakehouse for many reasons, including: |
| 38 | +## Manage lakehouse access |
26 | 39 |
|
27 | | -- Analyze using SQL. |
28 | | -- Train machine learning models. |
29 | | -- Perform analytics on real-time data. |
30 | | -- Develop reports in Power BI. |
| 40 | +When you centralize data in your lakehouse, protecting that data becomes critical. Fabric provides layered access controls to secure lakehouse data at multiple levels. |
31 | 41 |
|
32 | | -## Secure a lakehouse |
| 42 | +Use **workspace roles** for collaborators who need access to all items in the workspace. Use **item-level sharing** to grant read-only access for specific needs, such as analytics or Power BI report development. |
33 | 43 |
|
34 | | -Lakehouse access is managed either through the workspace or item-level sharing. Workspaces roles should be used for collaborators because these roles grant access to all items within the workspace. Item-level sharing is best used for granting access for read-only needs, such as analytics or Power BI report development. |
| 44 | +For granular control, the SQL analytics endpoint supports **row-level** and **column-level security**, so you can restrict what specific users see when they query through SQL. If you organize tables into schemas, you can also apply **schema-level permissions** to control access by business domain. |
35 | 45 |
|
36 | | -Fabric lakehouses also support data governance features including sensitivity labels, and can be extended by using Microsoft Purview with your Fabric tenant. |
| 46 | +Fabric lakehouses also support data governance features, including sensitivity labels, and can be extended by using Microsoft Purview with your Fabric tenant. |
37 | 47 |
|
38 | 48 | > [!NOTE] |
39 | 49 | > For more information, see the [Security in Microsoft Fabric](/fabric/security/security-overview) documentation. |
| 50 | +
|
| 51 | +## Build a foundation for intelligent analytics |
| 52 | + |
| 53 | +The data you structure in a lakehouse doesn't just serve traditional reports and dashboards. Well-organized lakehouse data becomes the foundation that intelligent experiences across Microsoft Fabric depend on. |
| 54 | + |
| 55 | +When you create tables with clear schemas, consistent naming conventions, and descriptive column names, you make that data accessible to both human analysts and AI-powered tools. Fabric IQ data agents can query your lakehouse tables through the SQL analytics endpoint, translating natural language questions into SQL queries that return accurate answers. The quality of those answers depends directly on how well you structure and document your data. |
| 56 | + |
| 57 | +Copilot capabilities in Fabric also benefit from well-structured lakehouse data. Copilot in Power BI can generate reports and answer business questions when it can reason over clearly defined tables and relationships. The same lakehouse data can feed semantic models that support natural language exploration across Microsoft 365 experiences. |
| 58 | + |
| 59 | +This means the investment you make in organizing, naming, and structuring lakehouse data pays dividends beyond your immediate analytics needs. Good data engineering practices in the lakehouse create a reusable foundation for intelligent experiences across the platform. |
0 commit comments