Skip to content

Commit 3a30961

Browse files
authored
Update 4-solution-azure-data-brick.md
1 parent ac3bed0 commit 3a30961

1 file changed

Lines changed: 30 additions & 14 deletions

File tree

learn-pr/wwl-azure/design-data-integration/includes/4-solution-azure-data-brick.md

Lines changed: 30 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,18 @@
66

77
Azure Databricks is entirely based on Apache Spark, and it's a great tool for users who are already familiar with the open-source cluster-computing framework. Databricks is designed specifically for big data processing. Data scientists can take advantage of the built-in core API for core languages like SQL, Java, Python, R, and Scala.
88

9-
Azure Databricks has a Control plane and a Data plane:
9+
Azure Databricks has a Control plane and a Compute plane:
1010

11-
- **Control Plane**: Hosts Databricks jobs, notebooks with query results, and the cluster manager. The Control plane also has the web application, hive metastore, and security access control lists (ACLs), and user sessions. Microsoft manages these components in collaboration with Azure Databricks.
12-
- **Data Plane**: Contains all the Azure Databricks runtime clusters that are hosted within the workspace. All data processing and storage exists within the client subscription. No data processing ever takes place within the Microsoft/Databricks-managed subscription.
11+
- **Control Plane**: Hosts Databricks jobs, notebooks with query results, and the cluster manager. The Control plane also has the web application, security access control lists (ACLs), and user sessions. Microsoft manages these components in collaboration with Azure Databricks.
12+
13+
- **Compute Plane**: Contains all the Azure Databricks runtime clusters that are hosted within the workspace. All data processing and storage exists within the client subscription.
1314

1415
Azure Databricks offers three environments for developing data intensive applications.
1516

1617
- **Databricks SQL**: Azure Databricks SQL provides an easy-to-use platform for analysts who want to run SQL queries on their data lake. You can create multiple visualization types to explore query results from different perspectives, and build and share dashboards.
17-
- **Databricks Data Science & Engineering**: Azure Databricks Data Science & Engineering is an interactive *workspace* that enables collaboration between data engineers, data scientists, and machine learning engineers. For a big data pipeline, the data (raw or structured) is ingested into Azure through Azure Data Factory in batches, or streamed near real-time by using Apache Kafka, Azure Event Hubs, or Azure IoT Hub. The data lands in a data lake for long term persisted storage within Azure Blob Storage or Azure Data Lake Storage. As part of your analytics workflow, use Azure Databricks to read data from multiple data sources and turn it into breakthrough insights by using Spark.
18+
19+
- **Databricks Data Science & Engineering**: Azure Databricks Data Science & Engineering lets data teams work together in an interactive workspace. Data is brought into Azure through batch or real-time tools like Azure Data Factory, Kafka, Event Hubs, or IoT Hub. Data is stored in Azure Blob Storage or Data Lake Storage. Databricks reads data from these sources and uses Spark to generate insights.
20+
1821
- **Databricks Machine Learning**: Azure Databricks Machine Learning is an integrated end-to-end machine learning environment. It incorporates managed services for experiment tracking, model training, feature development and management, and feature and model serving.
1922

2023
#### Business scenario
@@ -23,21 +26,34 @@ Let's analyze a scenario for Tailwind Traders in the heavy machinery manufacturi
2326

2427
Let's review why Azure Databricks can be the right choice to meet these requirements.
2528

26-
- Azure Databricks provides an integrated Analytics *workspace* based on Apache Spark that allows collaboration between different users.
27-
- By using Spark components like Spark SQL and Dataframes, Azure Databricks can handle structured data. It integrates with real-time data ingestion tools like Kafka and Flume for processing streaming data.
28-
- Secure data integration capabilities built on top of Spark enable you to unify your data without centralization. Data scientists can visualize data in a few steps, and use familiar tools like Matplotlib, ggplot, or d3.
29-
- The Azure Databricks runtime abstracts out the infrastructure complexity and the need for specialized expertise to set up and configure your data infrastructure. Users can use existing languages skills for Python, Scala, and R, and explore the data.
30-
- Azure Databricks integrates deeply with Azure databases and stores like Azure Synapse Analytics, Azure Cosmos DB, Azure Data Lake Storage, and Azure Blob Storage. It supports diverse data store platforms, which satisfies the Tailwind Traders big data storage needs.
31-
- Integration with Power BI allows for quick and meaningful insights, which is a requirement for Tailwind Traders.
32-
- Azure Databricks SQL isn't the right choice because it can't handle unstructured data.
33-
- Azure Databricks Machine Learning is also not the right environment choice because machine learning isn't a requirement in this scenario.
29+
- Azure Databricks is an analytics workspace built on Apache Spark.
30+
31+
- Supports collaboration and handles both structured and streaming data.
32+
33+
- Integrates with real-time tools like Kafka and Flume.
34+
35+
- Lets users work with Python, Scala, or R.
36+
37+
- Connects to Azure databases and storage solutions, meeting big data needs.
38+
39+
- Works with Power BI for fast insights.
40+
41+
- Databricks SQL and Machine Learning aren't suitable here, as unstructured data and machine learning aren't required.
42+
3443

3544
### Things to consider when using Azure Databricks
3645

3746
You can use Azure Databricks as a solution for multiple scenarios. Consider how the service can benefit your data integration solution for Tailwind Traders.
3847

3948
- **Consider data science preparation of data**. Create, clone, and edit clusters of complex, unstructured data. Turn the data clusters into specific jobs. Deliver the results to data scientists and data analysts for review.
49+
4050
- **Consider insights in the data**. Implement Azure Databricks to build recommendation engines, churn analysis, and intrusion detection.
51+
4152
- **Consider productivity across data and analytics teams**. Create a collaborative environment and shared workspaces for data engineers, analysts, and scientists. Teams can work together across the data science lifecycle with shared workspaces, which helps to save valuable time and resources.
42-
- **Consider big data workloads**. Exercise Azure Data Lake and the engine to get the best performance and reliability for your big data workloads. Create no-fuss multi-step data pipelines.
43-
- **Consider machine learning programs**. Take advantage of the integrated end-to-end machine learning environment. It incorporates managed services for experiment tracking, model training, feature development and management, and feature and model serving.
53+
54+
- **Consider big data workloads**. Use Azure Data Lake and the engine to get the best performance and reliability for your big data workloads. Create no-fuss multi-step data pipelines.
55+
56+
- **Consider machine learning programs**. Take advantage of the integrated end-to-end machine learning environment. It incorporates managed services for experiment tracking, model training, feature development and management, and feature and model serving.
57+
58+
> [!Tip]
59+
> Learn more with self-paced training, [Explore Azure Databricks](/training/wwl-databricks/explore-azure-databricks/).

0 commit comments

Comments
 (0)