| title | Production deployment guidelines |
|---|---|
| description | Learn about the recommendations and guidelines for preparing Azure IoT Operations for a production deployment. |
| author | dominicbetts |
| ms.author | dobett |
| ms.topic | concept-article |
| ms.date | 03/13/2025 |
| ms.service | azure-iot-operations |
Security and scalability are a priority for deploying Azure IoT Operations. This article outlines guidelines that you should take into consideration when setting up Azure IoT Operations for production.
Decide whether you're deploying Azure IoT Operations to a single-node or multi-node cluster before considering the appropriate configuration. Many of the guidelines in this article apply regardless of the cluster type, but when there's a difference it's called out specifically.
Use a supported environment for deploying Azure IoT Operations in production.
Ensure that your hardware setup is sufficient for your scenario and that you begin with a secure environment.
Create an Arc-enabled cluster that meets the system requirements.
- Use a supported environment for Azure IoT Operations.
- Configure the cluster according to documentation.
- If you expect intermittent connectivity for your cluster, ensure that you allocate enough disk space to the cluster cache data and messages while the cluster is offline. Azure IoT Operations can operate offline for a maximum of 72 hours.
- If possible, have a second cluster as a staging area for testing new changes before deploying to the primary production cluster.
- Turn off autoupgrade for Azure Arc to have complete control over when new updates are applied to your cluster. Instead, manually upgrade agents as needed.
- For multi-node clusters: Configure clusters with Edge Volumes to prepare for enabling fault tolerance during deployment.
Consider the following measures to ensure your cluster setup is secure before deployment.
- Validate images to ensure they're signed by Microsoft.
- When doing TLS encryption, bring your own issuer and integrate with an enterprise PKI.
- Use secrets for on-premises authentication.
- Use user-assigned managed identities for cloud connections.
- Keep your cluster and Azure IoT Operations deployment up to date with the latest patches and minor releases to get all available security and bug fixes.
[!INCLUDE aks-imds-restriction]
If you use enterprise firewalls or proxies, add the Azure IoT Operations endpoints to your allow list.
For production deployments, deploy observability resources on your cluster before deploying Azure IoT Operations. We also recommend setting up Prometheus alerts in Azure Monitor.
For a production-ready deployment, include the following configurations during the Azure IoT Operations deployment.
In the Azure portal deployment wizard, the broker resource is set up in the Configuration tab.
-
Configure cardinality settings based on memory profile and needs for handling connections and messages. For example, the following settings could support a single-node or multi-node cluster:
Setting Single node Multi node frontendReplicas 1 5 frontendWorkers 4 8 backendRedundancyFactor 2 2 backendWorkers 1 4 backendPartitions 1 5 Memory profile Low High [!NOTE] The backend redundancy factor must be 2 or greater. The broker requires at least two backend replicas per partition for high availability and rolling upgrade support.
-
Set disk-backed message buffer with a max size that prevents RAM overflow.
In the Azure portal deployment wizard, the schema registry and its required storage account are set up in the Dependency management tab.
- The storage account must have hierarchical namespace enabled.
- The schema registry's managed identity must have contributor permissions for the storage account.
- For production deployments, scope the storage account's public network access to allow traffic only from trusted Azure services. For example:
- In the Azure portal, navigate to the storage account that your schema registry uses.
- Select Security + networking > Networking from the navigation menu.
- For the public network access setting, select Enabled from selected virtual networks and IP addresses.
- In the Exceptions section of the networking page, ensure that the Allow trusted Microsoft services to access this resource option is selected.
- Select Save to apply the changes.
For more information, see Configure Azure Storage firewalls and virtual networks > Grant access to trusted Azure services.
Multi-node clusters: Fault tolerance can be enabled in the Dependency management tab of the Azure portal deployment wizard. It's only supported on multi-node clusters, and is recommended for production deployment.
During deployment, you have the option to use test settings or secure settings. For production deployments, choose secure settings. If you're upgrading an existing test settings deployment for production, follow the steps in Enable secure settings.
After deploying Azure IoT Operations, have the following configurations in place for a production scenario.
After deployment, you can edit BrokerListener resources:
- Configure TLS with automatic certificate management for listeners.
You can also edit BrokerAuthentication resources.
- Use X.509 certificates or Kubernetes service account tokens for authentication.
- Don't use no-auth.
When you create a new resource, manage its authorization:
- Create a BrokerAuthorization resource and provide the least privilege needed for the topic asset.
For connecting to assets at production, configure OPC UA authentication:
- Don't use no-auth. Connectivity to OPC UA servers isn't supported without authentication.
- Set up a secure connection to OPC UA server. Use a production PKI and configure application certificates and trust list.
When using data flows in production:
- Use service account token (SAT) authentication with the MQTT broker (default).
- Always used managed identity authentication. When possible, use user-assigned managed identity in data flow endpoints for flexibility and auditability.
- Scale data flow profiles to improve throughput and have high availability.
- Group multiple data flows into data flow profiles and customize scaling for each profile accordingly.