You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-azure/design-data-integration/includes/4-solution-azure-data-brick.md
+30-14Lines changed: 30 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,15 +6,18 @@
6
6
7
7
Azure Databricks is entirely based on Apache Spark, and it's a great tool for users who are already familiar with the open-source cluster-computing framework. Databricks is designed specifically for big data processing. Data scientists can take advantage of the built-in core API for core languages like SQL, Java, Python, R, and Scala.
8
8
9
-
Azure Databricks has a Control plane and a Data plane:
9
+
Azure Databricks has a Control plane and a Compute plane:
10
10
11
-
-**Control Plane**: Hosts Databricks jobs, notebooks with query results, and the cluster manager. The Control plane also has the web application, hive metastore, and security access control lists (ACLs), and user sessions. Microsoft manages these components in collaboration with Azure Databricks.
12
-
-**Data Plane**: Contains all the Azure Databricks runtime clusters that are hosted within the workspace. All data processing and storage exists within the client subscription. No data processing ever takes place within the Microsoft/Databricks-managed subscription.
11
+
-**Control Plane**: Hosts Databricks jobs, notebooks with query results, and the cluster manager. The Control plane also has the web application, security access control lists (ACLs), and user sessions. Microsoft manages these components in collaboration with Azure Databricks.
12
+
13
+
-**Compute Plane**: Contains all the Azure Databricks runtime clusters that are hosted within the workspace. All data processing and storage exists within the client subscription.
13
14
14
15
Azure Databricks offers three environments for developing data intensive applications.
15
16
16
17
-**Databricks SQL**: Azure Databricks SQL provides an easy-to-use platform for analysts who want to run SQL queries on their data lake. You can create multiple visualization types to explore query results from different perspectives, and build and share dashboards.
17
-
-**Databricks Data Science & Engineering**: Azure Databricks Data Science & Engineering is an interactive *workspace* that enables collaboration between data engineers, data scientists, and machine learning engineers. For a big data pipeline, the data (raw or structured) is ingested into Azure through Azure Data Factory in batches, or streamed near real-time by using Apache Kafka, Azure Event Hubs, or Azure IoT Hub. The data lands in a data lake for long term persisted storage within Azure Blob Storage or Azure Data Lake Storage. As part of your analytics workflow, use Azure Databricks to read data from multiple data sources and turn it into breakthrough insights by using Spark.
18
+
19
+
-**Databricks Data Science & Engineering**: Azure Databricks Data Science & Engineering lets data teams work together in an interactive workspace. Data is brought into Azure through batch or real-time tools like Azure Data Factory, Kafka, Event Hubs, or IoT Hub. Data is stored in Azure Blob Storage or Data Lake Storage. Databricks reads data from these sources and uses Spark to generate insights.
20
+
18
21
-**Databricks Machine Learning**: Azure Databricks Machine Learning is an integrated end-to-end machine learning environment. It incorporates managed services for experiment tracking, model training, feature development and management, and feature and model serving.
19
22
20
23
#### Business scenario
@@ -23,21 +26,34 @@ Let's analyze a scenario for Tailwind Traders in the heavy machinery manufacturi
23
26
24
27
Let's review why Azure Databricks can be the right choice to meet these requirements.
25
28
26
-
- Azure Databricks provides an integrated Analytics *workspace* based on Apache Spark that allows collaboration between different users.
27
-
- By using Spark components like Spark SQL and Dataframes, Azure Databricks can handle structured data. It integrates with real-time data ingestion tools like Kafka and Flume for processing streaming data.
28
-
- Secure data integration capabilities built on top of Spark enable you to unify your data without centralization. Data scientists can visualize data in a few steps, and use familiar tools like Matplotlib, ggplot, or d3.
29
-
- The Azure Databricks runtime abstracts out the infrastructure complexity and the need for specialized expertise to set up and configure your data infrastructure. Users can use existing languages skills for Python, Scala, and R, and explore the data.
30
-
- Azure Databricks integrates deeply with Azure databases and stores like Azure Synapse Analytics, Azure Cosmos DB, Azure Data Lake Storage, and Azure Blob Storage. It supports diverse data store platforms, which satisfies the Tailwind Traders big data storage needs.
31
-
- Integration with Power BI allows for quick and meaningful insights, which is a requirement for Tailwind Traders.
32
-
- Azure Databricks SQL isn't the right choice because it can't handle unstructured data.
33
-
- Azure Databricks Machine Learning is also not the right environment choice because machine learning isn't a requirement in this scenario.
29
+
- Azure Databricks is an analytics workspace built on Apache Spark.
30
+
31
+
- Supports collaboration and handles both structured and streaming data.
32
+
33
+
- Integrates with real-time tools like Kafka and Flume.
34
+
35
+
- Lets users work with Python, Scala, or R.
36
+
37
+
- Connects to Azure databases and storage solutions, meeting big data needs.
38
+
39
+
- Works with Power BI for fast insights.
40
+
41
+
- Databricks SQL and Machine Learning aren't suitable here, as unstructured data and machine learning aren't required.
42
+
34
43
35
44
### Things to consider when using Azure Databricks
36
45
37
46
You can use Azure Databricks as a solution for multiple scenarios. Consider how the service can benefit your data integration solution for Tailwind Traders.
38
47
39
48
-**Consider data science preparation of data**. Create, clone, and edit clusters of complex, unstructured data. Turn the data clusters into specific jobs. Deliver the results to data scientists and data analysts for review.
49
+
40
50
-**Consider insights in the data**. Implement Azure Databricks to build recommendation engines, churn analysis, and intrusion detection.
51
+
41
52
-**Consider productivity across data and analytics teams**. Create a collaborative environment and shared workspaces for data engineers, analysts, and scientists. Teams can work together across the data science lifecycle with shared workspaces, which helps to save valuable time and resources.
42
-
-**Consider big data workloads**. Exercise Azure Data Lake and the engine to get the best performance and reliability for your big data workloads. Create no-fuss multi-step data pipelines.
43
-
-**Consider machine learning programs**. Take advantage of the integrated end-to-end machine learning environment. It incorporates managed services for experiment tracking, model training, feature development and management, and feature and model serving.
53
+
54
+
-**Consider big data workloads**. Use Azure Data Lake and the engine to get the best performance and reliability for your big data workloads. Create no-fuss multi-step data pipelines.
55
+
56
+
-**Consider machine learning programs**. Take advantage of the integrated end-to-end machine learning environment. It incorporates managed services for experiment tracking, model training, feature development and management, and feature and model serving.
57
+
58
+
> [!Tip]
59
+
> Learn more with self-paced training, [Explore Azure Databricks](/training/wwl-databricks/explore-azure-databricks/).
0 commit comments