Skip to content

Commit 27d5ee9

Browse files
committed
Update Azure Stream Analytics module content for clarity and accuracy
1 parent 9a6f09e commit 27d5ee9

9 files changed

Lines changed: 104 additions & 87 deletions

File tree

learn-pr/wwl-data-ai/ingest-streaming-data-use-azure-stream-analytics-synapse/7-knowledge-check.yml

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,25 +12,36 @@ metadata:
1212
durationInMinutes: 3
1313
quiz:
1414
questions:
15-
- content: "Which type of output should you use to ingest the results of an Azure Stream Analytics job into a dedicated SQL pool table in Azure Synapse Analytics?"
15+
- content: "Which Azure Stream Analytics window type groups events based on periods of inactivity between consecutive events?"
1616
choices:
17-
- content: "Azure Synapse Analytics"
18-
isCorrect: true
19-
explanation: "Correct. An Azure Synapse Analytics output writes data to a table in an Azure Synapse Analytics dedicated SQL pool."
20-
- content: "Blob storage/ADLS Gen2"
17+
- content: "Tumbling"
2118
isCorrect: false
22-
explanation: "Incorrect. A Blob storage/ADLS Gen2 output does not write data to a relational table in a dedicated SQL pool."
23-
- content: "Azure Event Hubs"
19+
explanation: "Incorrect. A tumbling window groups events into fixed-size, nonoverlapping intervals regardless of gaps between events."
20+
- content: "Session"
21+
isCorrect: true
22+
explanation: "Correct. A session window groups events that arrive within a configurable timeout of each other, creating variable-length windows bounded by inactivity gaps."
23+
- content: "Snapshot"
2424
isCorrect: false
25-
explanation: "Incorrect. An Azure Event Hubs output does not write data to a relational table in a dedicated SQL pool."
26-
- content: "Which type of output should be used to ingest the results of an Azure Stream Analytics job into files in a data lake for analysis in Azure Synapse Analytics?"
25+
explanation: "Incorrect. A snapshot window groups events that share the same timestamp using System.Timestamp()."
26+
- content: "You need to continuously write processed stream events to files in a data lake for later batch analytics. Which output type should you configure?"
2727
choices:
28-
- content: "Azure Synapse Analytics"
28+
- content: "Azure SQL Database"
2929
isCorrect: false
30-
explanation: "Incorrect. An Azure Synapse Analytics output does not write data to files in a data lake."
30+
explanation: "Incorrect. An Azure SQL Database output writes to a relational table, not to files in a data lake."
3131
- content: "Blob storage/ADLS Gen2"
3232
isCorrect: true
33-
explanation: "Correct. A Blob storage/ADLS Gen2 output writes data to files in a data lake."
33+
explanation: "Correct. A Blob storage/ADLS Gen2 output writes data to files in Azure Data Lake Storage Gen2, which is suitable for batch analytics workloads."
34+
- content: "Power BI"
35+
isCorrect: false
36+
explanation: "Incorrect. A Power BI output writes to a streaming dataset for near real-time visualization, not to file-based storage."
37+
- content: "You want to forward enriched events from a Stream Analytics job to a downstream application via a message hub. Which output type should you use?"
38+
choices:
3439
- content: "Azure Event Hubs"
40+
isCorrect: true
41+
explanation: "Correct. An Azure Event Hubs output forwards events to an event hub, enabling downstream consumers such as other jobs, functions, or applications to receive the enriched stream."
42+
- content: "Azure SQL Database"
43+
isCorrect: false
44+
explanation: "Incorrect. An Azure SQL Database output writes to a relational table, not to a message hub."
45+
- content: "Blob storage/ADLS Gen2"
3546
isCorrect: false
36-
explanation: "Incorrect. An Azure Event Hubs output does not write data to files in a data lake."
47+
explanation: "Incorrect. A Blob storage/ADLS Gen2 output writes to files in a data lake, not to a message hub for real-time event forwarding."
Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11

2-
Suppose a retail company captures real-time sales transaction data from an e-commerce website, and wants to analyze this data along with more static data related to products, customers, and employees. A common way to approach this problem is to ingest the stream of real-time data into a data lake or data warehouse, where it can be queried together with data that is loaded using batch processing techniques.
2+
Suppose a manufacturing company captures real-time telemetry data from factory floor sensors, and wants to monitor equipment performance, detect anomalies, and archive event data for long-term analysis. A common approach is to use a stream processing engine to continuously filter and aggregate the flow of sensor events, and route the results to one or more destinations—such as a data lake for storage, a relational database for operational reporting, or a message hub for downstream alerting systems.
33

4-
Microsoft Azure Synapse Analytics provides a comprehensive enterprise data analytics platform, into which real-time data captured in Azure Event Hubs or Azure IoT Hub, and processed by Azure Stream Analytics can be loaded.
4+
Azure Stream Analytics is a fully managed, cloud-based stream processing service that enables you to build real-time analytics pipelines. It connects to streaming data sources such as Azure Event Hubs, Azure IoT Hub, and Azure Data Lake Storage, processes data using a SQL-based query language, and writes results to a wide range of output destinations.
55

6-
![A diagram of a data stream in Azure Event Hubs being queried by Azure Stream Analytics and loaded into Azure Synapse Analytics.](../media/stream-ingestion.png)
6+
![A diagram of a data stream in Azure Event Hubs being queried by Azure Stream Analytics and loaded into multiple output destinations.](../media/stream-ingestion.png)
77

8-
A typical pattern for real-time data ingestion in Azure consists of the following sequence of service integrations:
8+
A typical pattern for real-time data processing in Azure consists of the following sequence:
99

1010
1. A real-time source of data is captured in an event ingestor, such as Azure Event Hubs or Azure IoT Hub.
11-
2. The captured data is perpetually filtered and aggregated by an Azure Stream Analytics query.
12-
3. The results of the query are loaded into a data lake or data warehouse in Azure Synapse Analytics for subsequent analysis.
11+
2. The captured data is perpetually filtered, aggregated, or enriched by an Azure Stream Analytics query.
12+
3. The results of the query are written to one or more output destinations—such as a data lake, a relational database, another event hub, or a real-time dashboard.
1313

14-
In this module, you'll explore multiple ways in which you can use Azure Stream Analytics to ingest real-time data into Azure Synapse Analytics.
14+
In this module, you'll learn how to configure Azure Stream Analytics jobs to process streaming data and route the results to a variety of output destinations.
Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
11

2-
Azure Synapse Analytics provides multiple ways to analyze large volumes of data. Two of the most common approaches to large-scale data analytics are:
2+
Azure Stream Analytics can route the results of stream processing to multiple types of output destinations, depending on whether you need to store, analyze, forward, or visualize the data.
33

4-
- **Data warehouses** - relational databases, optimized for distributed storage and query processing. Data is stored in tables and queried using SQL.
5-
- **Data lakes** - distributed file storage in which data is stored as files that can be processed and queried using multiple runtimes, including Apache Spark and SQL.
4+
## Data lake storage
65

7-
## Data warehouses in Azure Synapse Analytics
6+
A common use case is to write stream processing results to a data lake hosted in Azure Data Lake Storage Gen2. Data stored in a data lake can later be processed and queried using batch analytics tools such as Apache Spark or serverless SQL engines. This approach is well suited to scenarios where you want to retain raw or lightly processed event data for historical analysis, compliance, or machine learning workloads.
87

9-
Azure Synapse Analytics provides dedicated SQL pools that you can use to implement enterprise-scale relational data warehouses. Dedicated SQL pools are based on a *massively parallel processing* (MPP) instance of the Microsoft SQL Server relational database engine in which data is stored and queried in tables.
8+
![A diagram of a stream of data being ingested into an Azure Storage data lake.](../media/data-lake.png)
109

11-
To ingest real-time data into a relational data warehouse, your Azure Stream Analytics query must write its results to an output that references the table into which you want to load the data.
10+
## Relational database storage
1211

13-
![A diagram of a stream of data being ingested into a dedicated SQL pool in Azure Synapse Analytics.](../media/data-warehouse.png)
12+
When streaming results need to be available to applications or reporting tools that rely on relational data, you can write the output of a Stream Analytics job to a table in Azure SQL Database or Azure Synapse Analytics dedicated SQL pool. This approach enables dashboards and reports to query the most recently ingested data using standard SQL.
1413

15-
## Data lakes in Azure Synapse Analytics
14+
![A diagram of a stream of data being ingested into a relational database.](../media/data-warehouse.png)
1615

17-
An Azure Synapse Analytics workspace typically includes at least one storage service that is used as a data lake. Most commonly, the data lake is hosted in an Azure Storage account using a container configured to support Azure Data Lake Storage Gen2. Files in the data lake are organized hierarchically in directories (folders), and can be stored in multiple file formats, including delimited text (such as comma-separated values, or CSV), Parquet, and JSON.
16+
## Real-time dashboards
1817

19-
When ingesting real-time data into a data lake, your Azure Stream Analytics query must write its results to an output that references the location in the Azure Data Lake Gen2 storage container where you want to save the data files. Data analysts, engineers, and scientists can then process and query the files in the data lake by running code in an Apache Spark pool, or by running SQL queries using a serverless SQL pool.
18+
For scenarios that require live visualization of streaming metrics—such as monitoring sensor readings or tracking website activity in real time—Azure Stream Analytics can write output directly to a Power BI streaming dataset. Power BI then renders the data in near real time without requiring a scheduled data refresh.
2019

21-
![A diagram of a stream of data being ingested into an Azure Storage data lake and queried in Azure Synapse Analytics.](../media/data-lake.png)
20+
## Event forwarding
21+
22+
Azure Stream Analytics can also write filtered or enriched events to another Azure Event Hubs instance. This pattern is used to build multi-stage streaming pipelines, where one Stream Analytics job performs initial filtering or enrichment and forwards the results to a downstream consumer such as another job, an Azure Function, or a custom application.

learn-pr/wwl-data-ai/ingest-streaming-data-use-azure-stream-analytics-synapse/includes/3-configure-inputs-outputs.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
All Azure Stream Analytics jobs include at least one input and output. In most cases, inputs reference sources of streaming data (though you can also define inputs for static reference data to augment the streamed event data). Outputs determine where the results of the stream processing query will be sent. In the case of data ingestion into Azure Synapse Analytics, the output usually references an Azure Data Lake Storage Gen2 container or a table in a dedicated SQL pool database.
33

44
> [!NOTE]
5-
> Azure Stream Analytics offers two authoring experiences: the traditional SQL query editor covered in this module, and a no-code drag-and-drop editor. The no-code editor lets you build complete jobsincluding inputs, transformations, and Synapse outputsvisually without writing SQL. You can access it from the **Overview** page of a Stream Analytics job in the Azure portal, or from Azure Event Hubs via **Process Data**. For more information, see [No-code stream processing in Azure Stream Analytics](/azure/stream-analytics/no-code-stream-processing).
5+
> Azure Stream Analytics offers two authoring experiences: the traditional SQL query editor covered in this module, and a no-code drag-and-drop editor. The no-code editor lets you build complete jobsincluding inputs, transformations, and Synapse outputsvisually without writing SQL. You can access it from the **Overview** page of a Stream Analytics job in the Azure portal, or from Azure Event Hubs via **Process Data**. For more information, see [No-code stream processing in Azure Stream Analytics](/azure/stream-analytics/no-code-stream-processing).
66
77
## Streaming data inputs
88

@@ -18,20 +18,32 @@ Depending on the specific input type, the data for each streamed event includes
1818
> [!NOTE]
1919
> For more information about streaming inputs, see [Stream data as input into Stream Analytics](/azure/stream-analytics/stream-analytics-define-inputs?azure-portal=true) in the Azure Stream Analytics documentation.
2020
21-
## Azure Synapse Analytics outputs
21+
## Azure SQL Database outputs
2222

23-
If you need to load the results of your stream processing into a table in a dedicated SQL pool, use an **Azure Synapse Analytics** output. The output configuration includes the identity of the dedicated SQL pool in an Azure Synapse Analytics workspace, details of how the Azure Stream Analytics job should establish an authenticated connection to it, and the existing table into which the data should be loaded.
23+
If you need to load the results of your stream processing into a relational table, use an **Azure SQL Database** output. The output configuration specifies the server name, database name, and the existing table into which data should be written. The table must already exist and its schema must exactly match the fields and their types produced by your query.
2424

25-
The recommended authentication method is **managed identity**, which eliminates password management overhead and avoids the 90-day token expiration that affects user-based authentication methods. Using managed identity also enables fully automated Stream Analytics deployments without embedded credentials. Alternatively, you can use SQL Server authentication with a username and password. When using an Azure Synapse Analytics output, your Azure Stream Analytics job configuration must include an Azure Storage account in which authentication metadata for the job is stored securely.
25+
The recommended authentication method is **managed identity**, which eliminates password management overhead and avoids the 90-day token expiration that affects user-based authentication methods. Using managed identity also enables fully automated Stream Analytics deployments without embedded credentials. Alternatively, you can use SQL Server authentication with a username and password.
2626

2727
> [!NOTE]
28-
> For more information about using an Azure Synapse Analytics output, see [Azure Synapse Analytics output from Azure Stream Analytics](/azure/stream-analytics/azure-synapse-analytics-output?azure-portal=true) in the Azure Stream Analytics documentation.
28+
> For more information about using an Azure SQL Database output, see [Azure SQL Database output from Azure Stream Analytics](/azure/stream-analytics/sql-database-output?azure-portal=true) in the Azure Stream Analytics documentation.
2929
3030
## Azure Data Lake Storage Gen2 outputs
3131

32-
If you need to write the results of stream processing to an Azure Data Lake Storage Gen2 container that hosts a data lake in an Azure Synapse Analytics workspace, use a **Blob storage/ADLS Gen2** output. The output configuration includes details of the storage account in which the container is defined, authentication settings to connect to it, and details of the files to be created. You can specify the file format, including CSV, JSON, Parquet, and Delta formats. You can also specify custom patterns to define the folder hierarchy in which the files are saved - for example using a pattern such as *YYYY/MM/DD* to generate a folder hierarchy based on the current year, month, and day.
32+
If you need to write the results of stream processing to files in a data lake, use a **Blob storage/ADLS Gen2** output. The output configuration includes details of the storage account in which the container is defined, authentication settings to connect to it, and details of the files to be created. You can specify the file format, including CSV, JSON, Parquet, and Delta formats. You can also specify custom patterns to define the folder hierarchy in which the files are saved - for example using a pattern such as *YYYY/MM/DD* to generate a folder hierarchy based on the current year, month, and day.
3333

3434
You can specify minimum and maximum row counts for each batch, which determines the number of output files generated (each batch creates a new file). You can also configure the *write mode* to control when the data is written for a time window - appending each row as it arrives or writing all rows once (which ensures "exactly once" delivery).
3535

3636
> [!NOTE]
3737
> For more information about using a Blob storage/ADLS Gen2 output, see [Blob storage and Azure Data Lake Gen2 output from Azure Stream Analytics](/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output?azure-portal=true) in the Azure Stream Analytics documentation.
38+
39+
## Additional output types
40+
41+
Azure Stream Analytics supports a wide range of output destinations beyond data lakes and relational databases:
42+
43+
- **Azure Event Hubs** — forward filtered or enriched events to another event hub for downstream consumers or multi-stage pipelines.
44+
- **Power BI** — write aggregated streaming metrics directly to a Power BI streaming dataset for near real-time visualization without a scheduled refresh.
45+
- **Azure Cosmos DB** — write results to a globally distributed NoSQL database.
46+
- **Azure Functions** — trigger serverless functions in response to stream events.
47+
48+
> [!NOTE]
49+
> For the full list of supported output types, see [Understand outputs from Azure Stream Analytics](/azure/stream-analytics/stream-analytics-define-outputs?azure-portal=true) in the Azure Stream Analytics documentation.

learn-pr/wwl-data-ai/ingest-streaming-data-use-azure-stream-analytics-synapse/includes/4-define-query.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,21 @@ After defining the input(s) and output(s) for your Azure Stream Analytics job, y
33

44
## Selecting input fields
55

6-
The simplest approach to ingesting streaming data into Azure Synapse Analytics is to capture the required field values for every event using a **SELECT...INTO** query, as shown here:
6+
The simplest approach to capturing event data from an input stream is to select the required field values for every event using a **SELECT...INTO** query, as shown here:
77

88
```sql
99
SELECT
1010
EventEnqueuedUtcTime AS ReadingTime,
1111
SensorID,
1212
ReadingValue
1313
INTO
14-
[synapse-output]
14+
[output]
1515
FROM
1616
[streaming-input] TIMESTAMP BY EventEnqueuedUtcTime
1717
```
1818

1919
> [!TIP]
20-
> When using an **Azure Synapse Analytics** output to write the results to a table in a dedicated SQL pool, the schema of the results produced by the query must match the table into which the data is to be loaded. You can use **AS** clauses to rename fields, and cast them to alternative (compatible) data types as necessary.
20+
> When using an **Azure SQL Database** output to write the results to a relational table, the schema of the results produced by the query must match the table into which the data is to be loaded. You can use **AS** clauses to rename fields, and cast them to alternative (compatible) data types as necessary.
2121
2222
## Filtering event data
2323

@@ -29,7 +29,7 @@ SELECT
2929
SensorID,
3030
ReadingValue
3131
INTO
32-
[synapse-output]
32+
[output]
3333
FROM
3434
[streaming-input] TIMESTAMP BY EventEnqueuedUtcTime
3535
WHERE ReadingValue < 0
@@ -57,7 +57,7 @@ SELECT
5757
SensorID,
5858
MAX(ReadingValue) AS MaxReading
5959
INTO
60-
[synapse-output]
60+
[output]
6161
FROM
6262
[streaming-input] TIMESTAMP BY EventEnqueuedUtcTime
6363
GROUP BY SensorID, TumblingWindow(second, 60)

0 commit comments

Comments
 (0)