| title | Surrogate key transformation in mapping data flow |
|---|---|
| titleSuffix | Azure Data Factory & Azure Synapse |
| description | Learn how to use the mapping data flow Surrogate Key Transformation to generate sequential key values in Azure Data Factory and Synapse Analytics. |
| author | kromerm |
| ms.author | makromer |
| ms.reviewer | daperlov |
| ms.subservice | data-flows |
| ms.topic | concept-article |
| ms.custom | synapse |
| ms.date | 05/15/2024 |
[!INCLUDEappliesto-adf-asa-md]
[!INCLUDEdata-flow-preamble]
Use the surrogate key transformation to add an incrementing key value to each row of data. This is useful when designing dimension tables in a star schema analytical data model. In a star schema, each member in your dimension tables requires a unique key that is a non-business key.
:::image type="content" source="media/data-flow/surrogate.png" alt-text="Surrogate Key Transform":::
Key column: The name of the generated surrogate key column.
Start value: The lowest key value that will be generated.
To start your sequence from a value that exists in a source, we recommend to use a cache sink to save that value and use a derived column transformation to add the two values together. Use a cached lookup to get the output and append it to the generated key. For more information, learn about cache sinks and cached lookups.
:::image type="content" source="media/data-flow/cached-lookup-example.png" alt-text="Surrogate Key lookup":::
To seed the key value with the previous max, there are two techniques that you can use based on where your source data is.
Use a SQL query option to select MAX() from your source. For example, Select MAX(<surrogateKeyName>) as maxval from <sourceTable>.
:::image type="content" source="media/data-flow/surrogate-key-max-database.png" alt-text="Surrogate Key Query":::
If your previous max value is in a file, use the max() function in the aggregate transformation to get the previous max value:
:::image type="content" source="media/data-flow/surrogate-key-max-file.png" alt-text="Surrogate Key File":::
In both cases, you will need to write to a cache sink and lookup the value.
<incomingStream>
keyGenerate(
output(<surrogateColumnName> as long),
startAt: <number>L
) ~> <surrogateKeyTransformationName>
:::image type="content" source="media/data-flow/surrogate.png" alt-text="Surrogate Key Transform":::
The data flow script for the above surrogate key configuration is in the code snippet below.
AggregateDayStats
keyGenerate(
output(key as long),
startAt: 1L
) ~> SurrogateKey1
These examples use the Join and Derived Column transformations.