| title | Data persistence and serialization in Durable Functions - Azure |
|---|---|
| description | Learn how the Durable Functions extension for Azure Functions persists data |
| author | cgillum |
| ms.topic | concept-article |
| ms.service | azure-functions |
| ms.date | 07/18/2022 |
| ms.author | azfuncdf |
| ms.devlang | csharp |
| ms.custom | devx-track-dotnet |
The Durable Functions runtime automatically persists function parameters, return values, and other state to the task hub in order to provide reliable execution. However, the amount and frequency of data persisted to durable storage can impact application performance and storage transaction costs. Depending on the type of data your application stores, data retention and privacy policies may also need to be considered.
Task hubs store the current state of instances, and any pending messages:
- Instance states store the current status and history of an instance. For orchestration instances, this state includes the runtime state, the orchestration history, inputs, outputs, and custom status. For entity instances, it includes the entity state.
- Messages store function inputs or outputs, event payloads, and metadata that is used for internal purposes, like routing and end-to-end correlation.
Messages are deleted after being processed, but instance states persist unless they're explicitly deleted by the application or an operator. In particular, an orchestration history remains in storage even after the orchestration completes.
For an example of how states and messages represent the progress of an orchestration, see the task hub execution example.
Where and how states and messages are represented in storage depends on the storage provider. Durable Functions' default provider is Azure Storage, which persists data to queues, tables, and blobs in an Azure Storage account that you specify.
The following list shows the different types of data that will be serialized and persisted when using features of Durable Functions:
- All inputs and outputs of orchestrator, activity, and entity functions, including any IDs and unhandled exceptions
- Orchestrator, activity, and entity function names
- External event names and payloads
- Custom orchestration status payloads
- Orchestration termination messages
- Durable timer payloads
- Durable HTTP request and response URLs, headers, and payloads
- Entity call and signal payloads
- Entity state payloads
You can run into memory issues if you provide large inputs and outputs to and from Durable Functions APIs. Inputs and outputs are serialized into the orchestration history, which means that large payloads can, over time, greatly contribute to unbounded history growth. This growth risks causing memory exceptions during replay.
To mitigate the impact of large inputs and outputs, you can:
- Delegate work to sub-orchestrators to load balance the history memory burden across multiple orchestrators, keeping the memory footprint of individual histories small.
- Store large data in external storage (such as Azure Blob Storage) and pass lightweight identifiers that allow you to retrieve that data inside activity functions when needed.
If you use Durable Task Scheduler, you can also use large payload support to offload larger payloads to Azure Blob Storage.
Tip
The best practice for dealing with large data is to keep it in external storage and materialize that data only inside activities, when needed.
Inputs and outputs (including exceptions) to and from Durable Functions APIs are durably persisted in your storage provider of choice. If those inputs, outputs, or exceptions contain sensitive data (such as secrets, connection strings, or personally identifiable information), anyone with read access to your storage provider's resources could obtain them.
To safely handle sensitive data, fetch that data within activity functions from either Azure Key Vault or environment variables, and never communicate that data directly to or from orchestrators or entities. This approach helps prevent sensitive data from leaking into your storage resources.
Tip
This guidance also applies to the CallHttp orchestrator API, which persists its request and response payloads in storage. If your target HTTP endpoints require authentication, implement the HTTP call inside an activity, or use the built-in managed identity support offered by CallHttp, which doesn't persist credentials to storage.
Note
Avoid logging data containing secrets as anyone with read access to your logs (for example in Application Insights) could obtain those secrets.
When using the Azure Storage provider, all data is automatically encrypted at rest. However, anyone with access to the storage account can read the data in its unencrypted form. If you need stronger protection for sensitive data, consider first encrypting the data using your own encryption keys so that the data is persisted in its pre-encrypted form.
Alternatively, .NET users have the option of implementing custom serialization providers that provide automatic encryption. An example of custom serialization with encryption can be found in this GitHub sample.
Note
If you decide to implement application-level encryption, be aware that orchestrations and entities can exist for indefinite amounts of time. This matters when it comes time to rotate your encryption keys because an orchestration or entities may run longer than your key rotation policy. If a key rotation happens, the key used to encrypt your data may no longer be available to decrypt it the next time your orchestration or entity executes. Customer encryption is therefore recommended only when orchestrations and entities are expected to run for relatively short periods of time.
Durable Functions for .NET in-process internally uses Json.NET to serialize orchestration and entity data to JSON. The default Json.NET settings used are:
Inputs, Outputs, and State:
JsonSerializerSettings
{
TypeNameHandling = TypeNameHandling.None,
DateParseHandling = DateParseHandling.None,
}Exceptions:
JsonSerializerSettings
{
ContractResolver = new ExceptionResolver(),
TypeNameHandling = TypeNameHandling.Objects,
ReferenceLoopHandling = ReferenceLoopHandling.Ignore,
}Read more detailed documentation about JsonSerializerSettings here.
During serialization, Json.NET looks for various attributes on classes and properties that control how the data is serialized and deserialized from JSON. If you own the source code for data type passed to Durable Functions APIs, consider adding these attributes to the type to customize serialization and deserialization.
Function apps that target .NET and run on the Functions V3 runtime can use Dependency Injection (DI) to customize how data and exceptions are serialized. The following sample code demonstrates how to use DI to override the default Json.NET serialization settings using custom implementations of the IMessageSerializerSettingsFactory and IErrorSerializerSettingsFactory service interfaces.
using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Azure.WebJobs.Extensions.DurableTask;
using Microsoft.Extensions.DependencyInjection;
using Newtonsoft.Json;
using System.Collections.Generic;
[assembly: FunctionsStartup(typeof(MyApplication.Startup))]
namespace MyApplication
{
public class Startup : FunctionsStartup
{
public override void Configure(IFunctionsHostBuilder builder)
{
builder.Services.AddSingleton<IMessageSerializerSettingsFactory, CustomMessageSerializerSettingsFactory>();
builder.Services.AddSingleton<IErrorSerializerSettingsFactory, CustomErrorSerializerSettingsFactory>();
}
/// <summary>
/// A factory that provides the serialization for all inputs and outputs for activities and
/// orchestrations, as well as entity state.
/// </summary>
internal class CustomMessageSerializerSettingsFactory : IMessageSerializerSettingsFactory
{
public JsonSerializerSettings CreateJsonSerializerSettings()
{
// Return your custom JsonSerializerSettings here
}
}
/// <summary>
/// A factory that provides the serialization for all exceptions thrown by activities
/// and orchestrations
/// </summary>
internal class CustomErrorSerializerSettingsFactory : IErrorSerializerSettingsFactory
{
public JsonSerializerSettings CreateJsonSerializerSettings()
{
// Return your custom JsonSerializerSettings here
}
}
}
}Durable Functions running in the .NET Isolated worker process uses the same object-serializer configured globally for your Azure Functions app (see WorkerOptions). This serializer happens to be System.Text.Json by default rather than Newtonsoft.Json. Any changes to WorkerOptions.Serializer will transitively apply to Durable Functions.
For more information on the built-in support for JSON serialization in .NET, see the JSON serialization and deserialization in .NET overview documentation.
Azure Functions Node applications use JSON.stringify() for serialization and JSON.Parse() for deserialization. Most types should serialize and deserialize seamlessly. In cases where the default logic is insufficient, defining a toJSON() method on the object will hijack the serialization logic. However, no analog exists for object deserialization.
For full customization of the serialization/deserialization pipeline, consider handling the serialization and deserialization with your own code and passing around data as strings.
It's recommended to use type annotations to ensure Durable Functions serializes and deserializes your data correctly. While many built-in types are handled automatically, some built-in data types require type annotations to preserve the type during deserialization.
For custom data types, you make JSON serialization and deserialization possible by defining class methods to_json and from_json on your data type class. Note that these methods are not called on the return value from the orchestrator function, meaning the return value has to be natively JSON-serializable. For more information, see Bindings.
Java uses the Jackson v2.x libraries for serialization and deserialization of data payloads. You can use Jackson annotations on your POJO types to customize the serialization behavior.