Skip to content

Commit 4bfff9b

Browse files
authored
Merge pull request #313749 from jlian/fix/source-doc-overhaul
Data flow docs: restructure source/destination/expressions docs for clarity and consolidation
2 parents 7aca269 + 130f840 commit 4bfff9b

6 files changed

Lines changed: 303 additions & 826 deletions

File tree

articles/iot-operations/connect-to-cloud/concept-dataflow-graphs-expressions.md

Lines changed: 217 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,13 @@ ms.author: sethm
66
ms.service: azure-iot-operations
77
ms.subservice: azure-data-flows
88
ms.topic: reference
9-
ms.date: 03/19/2026
9+
ms.date: 03/26/2026
1010
ai-usage: ai-assisted
1111

1212
---
1313

1414
# Expressions reference for data flows
1515

16-
[!INCLUDE [kubernetes-management-preview-note](../includes/kubernetes-management-preview-note.md)]
17-
1816
This reference applies to both [data flows](overview-dataflow.md) and [data flow graphs](concept-dataflow-graphs.md). Both use the same expression language for map, filter, and enrichment transforms. Data flow graphs also support branch and window (accumulate) transforms, which are noted where applicable.
1917

2018
## Positional variables
@@ -170,11 +168,19 @@ Use `()` (the empty value) in comparisons to detect missing fields.
170168

171169
Read from and write to message metadata by using the `$metadata.` prefix in the `inputs` or `output` fields of a rule. Metadata references go in the field path, not in the expression itself.
172170

171+
### Metadata properties
172+
173+
* **Topic**: Works for both MQTT and Kafka. It contains the string where the message was published. Example: `$metadata.topic`.
174+
* **User property**: In MQTT, this refers to the free-form key/value pairs an MQTT message can carry. For example, if the MQTT message was published with a user property with key "priority" and value "high", then the `$metadata.user_property.priority` reference holds the value "high". User property keys can be arbitrary strings and may require escaping: `$metadata.user_property."weird key"` uses the key "weird key" (with a space).
175+
* **System property**: This term is used for every property that is not a user property. Currently, only a single system property is supported: `$metadata.system_property.content_type`, which reads the content type property of the MQTT message (if set).
176+
* **Header**: This is the Kafka equivalent of the MQTT user property. Kafka can use any binary value for a key, but data flows support only UTF-8 string keys. Example: `$metadata.header.priority`. This functionality is similar to user properties.
177+
173178
| Field | Description |
174179
|-------|-------------|
175180
| `$metadata.topic` | The MQTT topic of the message |
176181
| `$metadata.user_property.<key>` | A user property on the message, identified by key |
177182
| `$metadata.system_property.content_type` | The content type system property |
183+
| `$metadata.header.<key>` | A Kafka header value, identified by key |
178184

179185
### Read from metadata
180186

@@ -187,12 +193,42 @@ To reference the source topic and a user property in an expression, list them as
187193

188194
Expression: `$1 + "/" + $2`
189195

196+
In the following example, the MQTT `topic` property is mapped to the `origin_topic` field in the output:
197+
198+
| Input | Output |
199+
|-------|--------|
200+
| `$metadata.topic` | `origin_topic` |
201+
202+
If the user property `priority` is present in the MQTT message, the following example demonstrates how to map it to an output field:
203+
204+
| Input | Output |
205+
|-------|--------|
206+
| `$metadata.user_property.priority` | `priority` |
207+
190208
### Write to metadata
191209

192210
To set a user property on the output message, use `$metadata.user_property.<key>` as the output field.
193211

194212
Setting a metadata field to an empty value (`()`) removes it. For user properties, duplicate keys are allowed.
195213

214+
You can also map metadata properties to an output header or user property. In the following example, the MQTT `topic` is mapped to the `origin_topic` field in the output's user property:
215+
216+
| Input | Output |
217+
|-------|--------|
218+
| `$metadata.topic` | `$metadata.user_property.origin_topic` |
219+
220+
If the incoming payload contains a `priority` field, the following example demonstrates how to map it to an MQTT user property:
221+
222+
| Input | Output |
223+
|-------|--------|
224+
| `priority` | `$metadata.user_property.priority` |
225+
226+
The same example for Kafka:
227+
228+
| Input | Output |
229+
|-------|--------|
230+
| `priority` | `$metadata.header.priority` |
231+
196232
Metadata fields are supported in map, filter, and branch rules. They aren't available in window (accumulate) rules.
197233

198234
## Last known value
@@ -212,7 +248,10 @@ Last known value is supported in map, filter, and branch rules. It isn't availab
212248

213249
## Default values
214250

215-
Use the `?? <default>` suffix on an input to provide a fallback value when the field is missing and no last known value is available. Supported default types: integer, float, boolean, string, and null.
251+
Use the `?? <default>` suffix on an input to provide a fallback value when the field is missing. Supported default types: integer, float, boolean, string, and null.
252+
253+
> [!NOTE]
254+
> The `?? <default>` syntax is available in data flow graphs only. It isn't supported in data flow `builtInTransformation` inputs.
216255
217256
| Input | Fallback |
218257
|-------|----------|
@@ -223,11 +262,12 @@ Use the `?? <default>` suffix on an input to provide a fallback value when the f
223262

224263
### Combine last known value and default
225264

226-
You can combine `? $last` and `?? <default>`. The runtime checks the current message first, then the last known value, then the default.
265+
You can combine `? $last` and `?? <default>`. The runtime checks the current message first, then the last known value, then the default. If you use `?? <default>` without `? $last`, the runtime checks the current message and then the default directly.
227266

228267
| Input | Evaluation order |
229268
|-------|-----------------|
230-
| `temperature ? $last ?? 0` | Current value, then last known, then 0 |
269+
| `temperature ?? 0` | Current value, then default (0) |
270+
| `temperature ? $last ?? 0` | Current value, then last known, then default (0) |
231271

232272
Default values are supported in map, filter, and branch rules. They aren't available in window (accumulate) rules.
233273

@@ -256,9 +296,179 @@ JSON objects and arrays are preserved as-is when fields are copied without an ex
256296
| `$metadata` access | Yes | Yes | Yes | No |
257297
| `$context` enrichment | Yes | Yes | Yes | No |
258298
| `? $last` | Yes | Yes | Yes | No |
259-
| `?? <default>` | Yes | Yes | Yes | No |
299+
| `?? <default>` ¹ | Yes | Yes | Yes | No |
300+
| `str::regex_matches` / `str::regex_replace` ¹ | Yes | Yes | Yes | No |
260301
| Wildcards | Yes | No | No | No |
261302

303+
¹ Available in data flow graphs only. Not supported in data flow `builtInTransformation` inputs.
304+
305+
## Dot notation and escaping
306+
307+
Dot notation is widely used to reference nested fields. A standard dot-notation path looks like `Person.Address.Street.Number`.
308+
309+
In a data flow, a path described by dot notation might include strings and some special characters without needing escaping, such as `Person.Date of Birth`.
310+
311+
In other cases, escaping is necessary, for example: `nsu=http://opcfoundation.org/UA/Plc/Applications;s=RandomSignedInt32`. This path, among other special characters, contains dots within the field name. Without escaping, the field name would serve as a separator in the dot notation itself.
312+
313+
While a data flow parses a path, it treats only two characters as special:
314+
315+
* Dots (`.`) act as field separators.
316+
* Single quotation marks, when placed at the beginning or the end of a segment, start an escaped section where dots aren't treated as field separators.
317+
318+
Any other characters are treated as part of the field name. This flexibility is useful in formats like JSON, where field names can be arbitrary strings.
319+
320+
The path definition must also adhere to the rules of the configuration format. When a character with special meaning is included in the path, proper quoting is required. For example, field names that start with a colon (like `:Person:.:name:`) or that begin with a number followed by text (like `100 celsius.hot`) need quoting in the configuration to be interpreted correctly as strings.
321+
322+
### Escaping
323+
324+
The primary function of escaping in a dot-notated path is to accommodate the use of dots that are part of field names rather than separators. For example, the path `Payload."Tag.10".Value` consists of three segments: `Payload`, `Tag.10`, and `Value`. The double quotation marks around `Tag.10` prevent the dot from acting as a separator.
325+
326+
### Escaping rules in dot notation
327+
328+
* **Escape each segment separately:** If multiple segments contain dots, those segments must be enclosed in double quotation marks. Other segments can also be quoted, but it doesn't affect the path interpretation. For example: `Payload."Tag.10".Measurements."Vibration.$12".Value`
329+
330+
331+
* **Proper use of double quotation marks:** Double quotation marks must open and close an escaped segment. Any quotation marks in the middle of the segment are considered part of the field name. For example, the path `Payload.He said: "Hello", and waved` defines two fields: `Payload` and `He said: "Hello", and waved`. When a dot appears under these circumstances, it continues to serve as a separator. For example, the path `Payload.He said: "No. It is done"` is split into the segments `Payload`, `He said: "No`, and `It is done"` (starting with a space).
332+
333+
### Segmentation algorithm
334+
335+
* If the first character of a segment is a quotation mark, the parser searches for the next quotation mark. The string enclosed between these quotation marks is considered a single segment.
336+
* If the segment doesn't start with a quotation mark, the parser identifies segments by searching for the next dot or the end of the path.
337+
338+
## Wildcards
339+
340+
Use a wildcard (`*`) in input and output paths to match multiple fields at once. This is useful when the output closely resembles the input, or when you need to apply the same transformation across many fields without listing each one.
341+
342+
### Copy all fields
343+
344+
To pass every field through unchanged:
345+
346+
| Input | Output |
347+
|-------|--------|
348+
| `*` | `*` |
349+
350+
The `*` matches each field path in the input and places it at the same path in the output. The portion of the path that `*` matches is called the **captured segment**. In the output, the captured segment replaces the `*`.
351+
352+
### Flatten nested fields
353+
354+
To move fields out of a nested object to the root level, put the prefix in the input and `*` in the output:
355+
356+
| Input | Output |
357+
|-------|--------|
358+
| `Sensors.*` | `*` |
359+
| `Metadata.*` | `*` |
360+
361+
Given this input:
362+
363+
```json
364+
{
365+
"Sensors": { "Temperature": 72.5, "Pressure": 14.7 },
366+
"Metadata": { "LineId": "Line-3", "Shift": "A" }
367+
}
368+
```
369+
370+
The output flattens both objects:
371+
372+
```json
373+
{
374+
"Temperature": 72.5,
375+
"Pressure": 14.7,
376+
"LineId": "Line-3",
377+
"Shift": "A"
378+
}
379+
```
380+
381+
### Restructure fields
382+
383+
To move fields under a new parent, put `*` in the input and add a prefix in the output:
384+
385+
| Input | Output |
386+
|-------|--------|
387+
| `*` | `Telemetry.*` |
388+
389+
This wraps all top-level fields inside a `Telemetry` object.
390+
391+
### Wildcard placement rules
392+
393+
- Only **one** `*` is allowed per input or output path.
394+
- The `*` must match a **complete segment** (not a partial segment like `Sensor*`).
395+
- The `*` can appear at the beginning (`*.Value`), middle (`Sensors.*.Reading`), or end (`Sensors.*`) of a path.
396+
397+
### Multi-input wildcards
398+
399+
When a rule has multiple inputs with wildcards, the `*` must capture the **same segment** across all inputs. The runtime resolves the `*` from the first input, then looks for matching paths in the other inputs.
400+
401+
For example, to average the max and min readings for each sensor:
402+
403+
| Input | Output | Expression |
404+
|-------|--------|------------|
405+
| `*.Max` ($1)<br>`*.Min` ($2) | `Averaged.*` | `($1 + $2) / 2` |
406+
407+
Given this input:
408+
409+
```json
410+
{
411+
"Temperature": { "Max": 85.3, "Min": 62.1 },
412+
"Pressure": { "Max": 15.2, "Min": 14.1 }
413+
}
414+
```
415+
416+
The `*` captures `Temperature` first, so the rule looks for both `Temperature.Max` and `Temperature.Min`. Then it captures `Pressure` and looks for `Pressure.Max` and `Pressure.Min`. The output is:
417+
418+
```json
419+
{
420+
"Averaged": { "Temperature": 73.7, "Pressure": 14.65 }
421+
}
422+
```
423+
424+
If any input can't resolve for a captured segment (for example, `*.Mid.Avg` when the field is nested differently), that segment is skipped. Make sure the paths in all inputs reflect the actual structure of the data.
425+
426+
### Override a wildcard for specific fields
427+
428+
You can combine a wildcard rule with specific rules. Specific rules take precedence when they have a **lower coverage** (fewer segments matched by `*`). This is called **specialization**.
429+
430+
| Input | Output | Expression |
431+
|-------|--------|------------|
432+
| `*.Max` ($1)<br>`*.Min` ($2) | `Averaged.*` | `($1 + $2) / 2` |
433+
| `Pressure.Max` ($1)<br>`Pressure.Min` ($2) | `Averaged.PressureAdj` | `($1 + $2 + 1.0) / 2` |
434+
435+
The first rule applies to all fields. The second rule overrides it for `Pressure` only, because `Pressure.Max` is more specific than `*.Max` (coverage 0 vs. coverage 1).
436+
437+
To exclude a field entirely, use an empty output:
438+
439+
| Input | Output |
440+
|-------|--------|
441+
| `Pressure.Max`, `Pressure.Min` | *(empty)* |
442+
443+
An empty output drops the field from the result. This overrides any wildcard rule that would otherwise include it.
444+
445+
### Multiple rules on the same inputs
446+
447+
If two rules have the same or higher coverage, both apply. This lets you compute multiple derived values from the same inputs:
448+
449+
| Input | Output | Expression |
450+
|-------|--------|------------|
451+
| `*.Max` ($1)<br>`*.Min` ($2) | `Stats.*.Avg` | `($1 + $2) / 2` |
452+
| `*.Max` ($1)<br>`*.Min` ($2) | `Stats.*.Range` | `$1 - $2` |
453+
454+
Both rules execute for each captured segment, producing two output fields per sensor.
455+
456+
### Wildcards in contextualization datasets
457+
458+
You can use wildcards with `$context` references to copy all fields from a dataset:
459+
460+
| Input | Output |
461+
|-------|--------|
462+
| `$context(assetMeta).*` | `Asset.*` |
463+
464+
This copies every field from the `assetMeta` dataset into the `Asset` section of the output.
465+
466+
## Contextualization datasets
467+
468+
Contextualization datasets let mappings integrate extra data from external databases. Use the `$context(datasetName)` prefix to reference fields from a dataset. For example, `$context(position).BaseSalary` reads the `BaseSalary` field from a dataset named `position`.
469+
470+
For details on configuring contextualization datasets, see [Enrich data by using data flows](concept-dataflow-enrich.md) and [Enrich with external data in data flow graphs](howto-dataflow-graphs-enrich.md).
471+
262472
## Related content
263473

264474
- [Map data by using data flows](concept-dataflow-mapping.md)

0 commit comments

Comments
 (0)