You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Capture Event Hubs data to ADLS in parquet format
2
+
title: Capture Event Hubs Data to Parquet in Azure Data Lake Storage Gen2
3
3
description: Shows you how to use the Stream Analytics no code editor to create a job that captures Event Hubs data in to Azure Data Lake Storage Gen2 in the parquet format.
4
4
author: xujxu
5
5
ms.author: xujiang1
6
+
ms.reviewer: spelluru
6
7
ms.service: azure-stream-analytics
7
8
ms.topic: tutorial
8
-
ms.date: 12/17/2024
9
+
ms.date: 03/25/2026
9
10
ms.custom: sfi-image-nochange
10
11
---
11
12
12
-
# Tutorial: Capture Event Hubs data in parquet format and analyze with Azure Synapse Analytics
13
-
This tutorial shows you how to use the Stream Analytics no code editor to create a job that captures Event Hubs data in to Azure Data Lake Storage Gen2 in the parquet format.
13
+
# Tutorial: Capture Event Hubs data in Parquet format and analyze with Azure Synapse Analytics
14
+
Azure Event Hubs generates large volumes of streaming data that you often need to store for analysis. This tutorial shows you how to capture that data in Parquet format - a columnar storage format optimized for analytics workloads - by using Azure Stream Analytics without writing any code.
15
+
16
+
Use the Stream Analytics no code editor to build a job that streams data from Event Hubs directly to Azure Data Lake Storage Gen2. Then, query the captured Parquet files by using Azure Synapse Analytics with both Spark and serverless SQL.
14
17
15
18
In this tutorial, you learn how to:
16
19
17
20
> [!div class="checklist"]
18
21
> * Deploy an event generator that sends sample events to an event hub
19
-
> * Create a Stream Analytics job using the no code editor
22
+
> * Create a Stream Analytics job by using the no code editor
20
23
> * Review input data and schema
21
-
> * Configure Azure Data Lake Storage Gen2 to which event hub data will be captured
24
+
> * Configure Azure Data Lake Storage Gen2 to which event hub data is captured
22
25
> * Run the Stream Analytics job
23
-
> * Use Azure Synapse Analytics to query the parquet files
26
+
> * Use Azure Synapse Analytics to query the Parquet files
24
27
25
28
## Prerequisites
26
29
27
-
Before you start, make sure you've completed the following steps:
30
+
Before you start, make sure you complete the following steps:
28
31
29
32
* If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/pricing/purchase-options/azure-account?cid=msft_learn).
30
-
*[Deploy the TollApp event generator app to Azure](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-stream-analytics%2Fmaster%2FSamples%2FTollApp%2FVSProjects%2FTollAppDeployment%2Fazuredeploy.json). Set the 'interval' parameter to 1, and use a new resource group for this step.
33
+
*[Deploy the TollApp event generator app to Azure](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-stream-analytics%2Fmaster%2FSamples%2FTollApp%2FVSProjects%2FTollAppDeployment%2Fazuredeploy.json). Set the `interval` parameter to 1, and use a new resource group for this step.
31
34
* Create an [Azure Synapse Analytics workspace](../synapse-analytics/get-started-create-workspace.md) with a Data Lake Storage Gen2 account.
32
35
33
36
## Use no code editor to create a Stream Analytics job
34
-
1. Locate the Resource Group in which the TollApp event generator was deployed.
35
-
2. Select the Azure Event Hubs **namespace**. You might want to open it in a separate tab or a window.
36
-
1. On the **Event Hubs namespace** page, select **Event Hubs** under **Entities** on the left menu.
37
+
1. Locate the resource group where you deployed the TollApp event generator.
38
+
1. Select the Azure Event Hubs **namespace**. You might want to open it in a separate tab or a window.
39
+
40
+
:::image type="content" source="./media/stream-analytics-no-code/resource-group.png" alt-text="Screenshot showing the selection of the Event Hubs namespace in the resource group." lightbox="./media/stream-analytics-no-code/resource-group.png":::
41
+
1. On the **Event Hubs namespace** page, select **Event Hubs** under **Entities** in the left menu.
37
42
1. Select `entrystream` instance.
38
43
39
44
:::image type="content" source="./media/stream-analytics-no-code/select-event-hub.png" alt-text="Screenshot showing the selection of the event hub." lightbox="./media/stream-analytics-no-code/select-event-hub.png":::
40
-
3. On the **Event Hubs instance** page, select **Process data** in the **Features** section on the left menu.
45
+
1. On the **Event Hubs instance** page, select **Process data** in the **Features** section of the left menu.
41
46
1. Select **Start** on the **Capture data to ADLS Gen2 in Parquet format** tile.
42
47
43
48
:::image type="content" source="./media/stream-analytics-no-code/parquet-capture-start.png" alt-text="Screenshot showing the selection of the **Capture data to ADLS Gen2 in Parquet format** tile." lightbox="./media/stream-analytics-no-code/parquet-capture-start.png":::
@@ -46,46 +51,50 @@ Before you start, make sure you've completed the following steps:
46
51
:::image type="content" source="./media/stream-analytics-no-code/new-stream-analytics-job.png" alt-text="Screenshot of the New Stream Analytics job page." lightbox="./media/stream-analytics-no-code/new-stream-analytics-job.png":::
47
52
1. On the **event hub** configuration page, follow these steps:
48
53
1. For **Consumer group**, select **Use existing**.
49
-
1. Confirm that `$Default` consumer group is selected.
54
+
1. Confirm that the `$Default` consumer group is selected.
50
55
1. Confirm that **Serialization** is set to JSON.
51
-
1. Confirm that **Authentication method** is set to **Connection String**.
56
+
1. Confirm that **Authentication method** is set to **Connection String**. To keep the tutorial simple, you use the connection string authentication. In production scenarios, we recommend using [Azure Managed Identity](stream-analytics-user-assigned-managed-identity-overview.md) for better security and easier management. For more information, see [Use managed identities to access Event Hubs from an Azure Stream Analytics job](event-hubs-managed-identity.md).
52
57
1. Confirm that **Event hub shared access key name** is set to **RootManageSharedAccessKey**.
53
58
1. Select **Connect** at the bottom of the window.
54
59
55
60
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/event-hub-configuration.png" alt-text="Screenshot of the configuration page for your event hub." lightbox="./media/event-hubs-parquet-capture-tutorial/event-hub-configuration.png":::
56
-
1. Within few seconds, you'll see sample input data and the schema. You can choose to drop fields, rename fields, or change data type.
61
+
1. Within a few seconds, you see sample input data and the schema. You can choose to drop fields, rename fields, or change data type.
57
62
58
63
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-preview.png" alt-text="Screenshot showing the fields and preview of data." lightbox="./media/event-hubs-parquet-capture-tutorial/data-preview.png":::
59
64
1. Select the **Azure Data Lake Storage Gen2** tile on your canvas and configure it by specifying
60
-
* Subscription where your Azure Data Lake Gen2 account is located in
61
-
* Storage account name, which should be the same ADLS Gen2 account used with your Azure Synapse Analytics workspace done in the Prerequisites section.
62
-
* Container inside which the Parquet files will be created.
65
+
* Subscription where your Azure Data Lake Gen2 account is located
66
+
* Storage account name, which should be the same Azure Data Lake Storage Gen2 account used with your Azure Synapse Analytics workspace done in the Prerequisites section.
67
+
* Container where the Parquet files are created.
63
68
* For **Delta table path**, specify a name for the table.
64
-
* Date and time pattern as the default *yyyy-mm-dd* and *HH*.
69
+
* Date and time pattern as the default `yyyy-MM-dd` and `HH`.
65
70
* Select **Connect**
66
71
67
72
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png" alt-text="Screenshot showing the configuration settings for the Data Lake Storage." lightbox="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png":::
68
-
1. Select **Save** in the top ribbon to save your job, and then select **Start** to run your job. Once the job is started, select X in the right corner to close the **Stream Analytics job** page.
73
+
1. Select **Save** in the top ribbon to save your job, and then select **Start** to run your job. Once the job starts, select X in the right corner to close the **Stream Analytics job** page.
1. You'll then see a list of all Stream Analytics jobs created using the no code editor. And within two minutes, your job will go to a **Running** state. Select the **Refresh** button on the page to see the status changing from Created -> Starting -> Running.
76
+
1. Yousee a list of all Stream Analytics jobs created using the no code editor. Within two minutes, your job goes to a **Running** state. Select the **Refresh** button on the page to see the status change from Created -> Starting -> Running.
72
77
73
78
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/job-list.png" alt-text="Screenshot showing the list of Stream Analytics jobs." lightbox="./media/event-hubs-parquet-capture-tutorial/job-list.png":::
74
79
75
80
## View output in your Azure Data Lake Storage Gen 2 account
76
-
1. Locate the Azure Data Lake Storage Gen2 account you had used in the previous step.
77
-
2. Select the container you had used in the previous step. You'll see parquet files created in the folder you specified earlier.
81
+
1. Locate the Azure Data Lake Storage Gen2 account you used in the previous step.
82
+
1. Select **Containers** under the **Data storage** section in the left menu.
83
+
84
+
:::image type="content" source="./media/stream-analytics-no-code/select-container.png" alt-text="Screenshot showing the selection of the container in Azure Data Lake Storage Gen 2." lightbox="./media/stream-analytics-no-code/select-container.png":::
85
+
1. Select the container you used in the previous step. You see parquet files created in the folder you specified earlier.
78
86
79
87
:::image type="content" source="./media/stream-analytics-no-code/capture-parquet-files.png" alt-text="Screenshot showing the captured parquet files in Azure Data Lake Storage Gen 2." lightbox="./media/stream-analytics-no-code/capture-parquet-files.png":::
80
88
81
89
## Query captured data in Parquet format with Azure Synapse Analytics
82
90
### Query using Azure Synapse Spark
83
91
1. Locate your Azure Synapse Analytics workspace and open Synapse Studio.
84
-
2.[Create a serverless Apache Spark pool](../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool) in your workspace if one doesn't already exist.
85
-
3. In the Synapse Studio, go to the **Develop** hub and create a new **Notebook**.
92
+
1.[Create a serverless Apache Spark pool](../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool) in your workspace if one doesn't already exist.
93
+
1. Select **Open Synapse Studio** tile in the **Getting started** section to launch the Synapse Studio in a new tab or window.
94
+
1. In the Synapse Studio, go to the **Develop** hub and create a new **Notebook**.
86
95
87
96
:::image type="content" source="./media/stream-analytics-no-code/synapse-studio-develop-notebook.png" alt-text="Screenshot showing the Synapse Studio." :::
88
-
1. Create a new code cell and paste the following code in that cell. Replace *container* and *adlsname* with the name of the container and ADLS Gen2 account used in the previous step.
97
+
1. Create a new code cell and paste the following code in that cell. Replace *container* and *adlsname* with the name of the container and Azure Data Lake Storage Gen2 account used in the previous step.
@@ -94,15 +103,15 @@ Before you start, make sure you've completed the following steps:
94
103
df.printSchema()
95
104
```
96
105
1. For **Attach to** on the toolbar, select your Spark pool from the dropdown list.
97
-
1. Select **Run All** to see the results
106
+
1. Select **Run All** to see the results.
98
107
99
108
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png" alt-text="Screenshot of spark run results in Azure Synapse Analytics." lightbox="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png":::
100
109
101
110
### Query using Azure Synapse Serverless SQL
102
111
1. In the **Develop** hub, create a new **SQL script**.
103
112
104
113
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/develop-sql-script.png" alt-text="Screenshot showing the Develop page with new SQL script menu selected.":::
105
-
1. Paste the following script and**Run** it using the **Built-in** serverless SQL endpoint. Replace *container*and*adlsname*with the name of the container andADLS Gen2 account used in the previous step.
114
+
1. Paste the following script and**Run** it using the **Built-in** serverless SQL endpoint. Replace *container*and*adlsname*with the name of the container andAzure Data Lake Storage Gen2 account used in the previous step.
106
115
```SQL
107
116
SELECT
108
117
TOP100*
@@ -116,12 +125,12 @@ Before you start, make sure you've completed the following steps:
116
125
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/sql-results.png" alt-text="Screenshot of SQL script results in Azure Synapse Analytics." lightbox="./media/event-hubs-parquet-capture-tutorial/sql-results.png":::
117
126
118
127
## Clean up resources
119
-
1. Locate your Event Hubs instance and see the list of Stream Analytics jobs under **Process Data** section. Stop any jobs that are running.
120
-
2. Go to the resource group you used while deploying the TollApp event generator.
121
-
3. Select **Delete resource group**. Type the name of the resource group to confirm deletion.
128
+
1. Locate your Event Hubs instance and see the list of Stream Analytics jobs under the **Process Data** section. Stop any jobs that are running.
129
+
1. Go to the resource group you used while deploying the TollApp event generator.
130
+
1. Select **Delete resource group**. To confirm deletion, typethe name of the resource group.
122
131
123
132
## Next steps
124
-
In this tutorial, you learned how to create a Stream Analytics job using the no code editor to capture Event Hubs data streams in Parquet format. You then used Azure Synapse Analytics to query the parquet files using both Synapse Spark and Synapse SQL.
133
+
In this tutorial, you learned how to create a Stream Analytics job by using the no code editor to capture Event Hubs data streams in Parquet format. You then used Azure Synapse Analytics to query the Parquet files by using both Synapse Spark and Synapse SQL.
125
134
126
135
> [!div class="nextstepaction"]
127
136
> [No code stream processing with Azure Stream Analytics](https://aka.ms/asanocodeux)
0 commit comments