Skip to content

Commit ec60ae0

Browse files
authored
Merge pull request #2607 from msmbaldwin/hsm-psu-issues
Add PSU redundancy guidance for Dedicated HSM and Payment HSM
2 parents 04937e3 + e803e20 commit ec60ae0

3 files changed

Lines changed: 64 additions & 24 deletions

File tree

articles/dedicated-hsm/monitoring.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,24 @@ The monitor function itself is set up to poll the device every 10 minutes to get
2525

2626
Depending on the nature of the issue, the appropriate course of action would be taken to reduce impact and ensure low risk remediation. For example, a power supply failure is a hot-swap procedure with no resultant tamper event so can be performed with low impact and minimal risk to operation. Other procedures may require a device to be zeroized and deprovisioned to minimize any security risk to the customer. In this situation a customer would provision an alternate device, rejoin a high availability pairing thus triggering device synchronization. Normal operation would resume in minimal time, with minimal disruption and lowest security risk.
2727

28+
### Power supply redundancy
29+
30+
The Thales Luna 7 HSM device uses a dual power supply unit (PSU) design for redundancy. Each PSU connects to an independent power feed, allowing the device to operate normally if one PSU experiences a brief outage.
31+
32+
During scheduled datacenter power maintenance, power feeds are serviced one at a time while the other feed remains active, ensuring continuous operation through redundant power. You may see transient single-PSU messages in your HSM logs such as:
33+
34+
```text
35+
Power supply 1 AC outage
36+
Power supply 1 AC restored
37+
```
38+
39+
These messages are expected behavior and don't indicate a hardware fault—the device continues operating normally on the redundant PSU.
40+
41+
> [!IMPORTANT]
42+
> Don't open support tickets or request physical hardware investigation based on single-PSU log messages. Microsoft monitors PSU health and proactively addresses any actual hardware failures. Unnecessary physical intervention can introduce risk to your device's operation.
43+
44+
If our monitoring detects a genuine PSU or fan issue, Microsoft replaces the component without requiring customer action or notification.
45+
2846
## Customer monitoring
2947

3048
A value proposition of the Dedicated HSM service is the control the customer gets of the device, especially considering it is a cloud delivered device. A consequence of this control is the responsibility to monitor and manage the health of the device.

articles/dedicated-hsm/supportability.md

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -9,79 +9,84 @@ ms.date: 02/20/2024
99
ms.author: mbaldwin
1010
---
1111

12-
# Azure Dedicated HSM Supportability
12+
# Azure Dedicated HSM supportability
1313

14-
The Azure Dedicated HSM Service provides a physical device for sole customer use with complete administrative control and management responsibility. The device made available is a [Thales Luna 7 HSM model A790](https://cpl.thalesgroup.com/encryption/hardware-security-modules/network-hsms). Microsoft will have no administrative access once provisioned by a customer, beyond physical serial port attachment as a monitoring role. Without access, Microsoft can have no ongoing software level maintenance or system administration responsibilities. As a result, customers are responsible for typical operational activities.
15-
Customers are fully responsible for applications that use the HSMs and should work with Thales for support or consulting-based assistance. Due to the extent of customer ownership of operational hygiene, it is not possible for Microsoft to offer any kind of high availability guarantee for this service. It is the customer’s responsibility to ensure their applications are correctly configured to achieve high-availability. Microsoft will monitor and maintain device health and network connectivity.
14+
The Azure Dedicated HSM service provides a physical device for sole customer use with complete administrative control and management responsibility. The device is a [Thales Luna 7 HSM model A790](https://cpl.thalesgroup.com/encryption/hardware-security-modules/network-hsms). Microsoft has no administrative access once you provision the device, beyond physical serial port attachment as a monitoring role. Without access, Microsoft has no ongoing software-level maintenance or system administration responsibilities. As a result, customers are responsible for typical operational activities.
15+
Customers are fully responsible for applications that use the HSMs and should work with Thales for support or consulting-based assistance. Due to the extent of customer ownership of operational hygiene, Microsoft can't offer any kind of high availability guarantee for this service. It's the customer’s responsibility to ensure their applications are correctly configured to achieve high availability. Microsoft monitors and maintains device health and network connectivity.
1616

1717
## Getting support
1818

19-
Customer support for Dedicated HSM is a joint effort between Microsoft and Thales. Any hardware issues or network path issues will be addressed by Microsoft, and anything to do with the actual HSM, such as configuration, software, firmware and application development, will be addressed by Thales. This support model ensures the quickest route to the most effective support. If in doubt with a particular issue, raise a support request with Microsoft and we will ensure you are directed appropriately. Microsoft will stay engaged in all support scenarios and strive for the best support experience for our customers.
19+
Customer support for Dedicated HSM is a joint effort between Microsoft and Thales. Microsoft addresses any hardware issues or network path issues. Thales addresses anything to do with the actual HSM, such as configuration, software, firmware, and application development. This support model ensures the quickest route to the most effective support. If you're in doubt about a particular issue, raise a support request with Microsoft and we direct you appropriately. Microsoft stays engaged in all support scenarios and strives for the best support experience for you.
2020

2121
## Thales support
2222

23-
Customers using the Dedicated HSM service qualify for support from Thales as per their Plus Support Plan. This just requires a registration process using the Thales support portal. A Customer ID and instructions will be provided for this as part of the initial engagement with Microsoft to gain access to the Dedicated HSM service. The mechanism to get support from Thales is via their [customer support portal](https://supportportal.thalesgroup.com/csm).
24-
A key point of note is that Thales will provide all software and documentation required to use the HSM (for example, client access software and SDKs) via download on the customer support portal.
23+
Customers using the Dedicated HSM service qualify for support from Thales as per their Plus Support Plan. This support plan just requires a registration process by using the Thales support portal. A customer ID and instructions are provided for this as part of the initial engagement with Microsoft to gain access to the Dedicated HSM service. You get support from Thales through their [customer support portal](https://supportportal.thalesgroup.com/csm).
24+
A key point of note is that Thales provides all software and documentation required to use the HSM (for example, client access software and SDKs) via download on the customer support portal.
2525

2626
### Software components
2727

28-
Various software components are used in the configuration of HSM devices:
28+
You use various software components in the configuration of HSM devices:
2929

3030
* Client software
3131
* SDK
3232
* Tools
3333

3434
### Guidance
3535

36-
Thales makes available administration and configuration guidance via the [Thales customer support portal](https://supportportal.thalesgroup.com/csm). Once signed in using a valid customer ID, these documents are available for download. Thales also provides a series of integration guides to help customers with different scenarios and software integrations. For more information, see the [Thales partner site for Microsoft](https://cpl.thalesgroup.com/partners/overview).
36+
Thales makes administration and configuration guidance available via the [Thales customer support portal](https://supportportal.thalesgroup.com/csm). Once signed in by using a valid customer ID, you can download these documents. Thales also provides a series of integration guides to help customers with different scenarios and software integrations. For more information, see the [Thales partner site for Microsoft](https://cpl.thalesgroup.com/partners/overview).
3737

3838
### Support
3939

40-
Any software level issue or question in relation to using the HSMs as part of the Dedicated HSM service, should be addressed to Thales support directly. All software components listed above, and any custom HSM configuration that is post-provisioning, will be addressed by Thales. For more information, see the [Thales customer support portal](https://supportportal.thalesgroup.com/csm).
40+
Address any software-level issue or question in relation to using the HSMs as part of the Dedicated HSM service to Thales support directly. Thales addresses all software components listed earlier, and any custom HSM configuration that is post-provisioning. For more information, see the [Thales customer support portal](https://supportportal.thalesgroup.com/csm).
4141

4242
### Consulting services
4343

44-
For any assistance in the design, development and deployment of custom applications that use the HSM, contact your Thales account representative.
44+
For assistance with the design, development, and deployment of custom applications that use the HSM, contact your Thales account representative.
4545

4646
## Microsoft support
4747

48-
Microsoft will ensure physical HSM devices are network accessible and in an operational state for the exclusive use of a single customer. Customers are responsible for configuration, administration, and management of the device.
48+
Microsoft ensures physical HSM devices are network accessible and in an operational state for the exclusive use of a single customer. Customers are responsible for the configuration, administration, and management of the device.
4949
Microsoft responsibilities include:
5050

5151
* Making sure that the device has power and cooling
5252
* Maintaining an operational state of the HSM (for example, break/fix scenarios)
53-
* The device is accessible over the network.
53+
* Ensuring the device is accessible over the network.
5454

55-
Issues such as the following should be reported to Microsoft:
55+
Report the following issues to Microsoft:
5656

5757
* Component failures
5858
* Full device failure
59-
* Network access issues
59+
* Network access problems
6060
* Problems provisioning and deprovisioning.
6161

62-
Microsoft has physical serial port access to the device via a monitoring role (that is a non-administrative role) that enables basic health telemetry. This will allow Microsoft to provide proactive notification of issues to the customer unless the customer chooses to restrict this permission.
62+
Microsoft has physical serial port access to the device through a monitoring role (that is a non-administrative role) that enables basic health telemetry. This access allows Microsoft to provide proactive notification of issues to the customer unless the customer chooses to restrict this permission.
6363

6464
### Provisioning and decommissioning
6565

66-
After a customer has an approved registration for the Dedicated HSM service, they will be able to create HSM resources (currently via PowerShell or command-line interface and not the Azure portal). The resource goes through an allocation process that maps a physical device in a specified region, to a customer’s pre-defined virtual network (VNet). Once visible on a VNet, the customer can access the device and configure it further as per requirements. Customers access their dedicated HSMs using Thales client software and tools. The resource creation process is supported by Microsoft. Custom configuration process and beyond are supported by Thales. (see Thales support above). When a customer has finished using an HSM, it must be reset (or zeroized) to ensure no persistence of data. The process of resetting the device removes all custom configuration and data. Microsoft deallocates the device and returns it to the pool in a pristine state. This means that when the device is returned to the pool there is no evidence of previous customer activity.
66+
After you register and are approved for the Dedicated HSM service, you can create HSM resources (currently via PowerShell or command-line interface and not the Azure portal). The resource goes through an allocation process that maps a physical device in a specified region to a customer’s pre-defined virtual network (VNet). Once visible on a VNet, you can access the device and configure it further as per requirements. You access your dedicated HSMs by using Thales client software and tools. Microsoft supports the resource creation process. Thales supports the custom configuration process and beyond (see Thales support above). When you finish using an HSM, you must reset (or zeroize) the HSM to ensure no persistence of data. The process of resetting the device removes all custom configuration and data. Microsoft deallocates the device and returns it to the pool in a pristine state. This process ensures that when the device is returned to the pool, there's no evidence of previous customer activity.
6767

68-
### Hardware issues
68+
### Hardware problems
6969

70-
The HSM device has redundant and replaceable power supplies and fan units. However, fan unit removal will still cause a tamper event. When a component failure occurs, Microsoft will use the most appropriate process to address the component level issue in a way that causes minimal interruption and lowest risk to our customers service availability.
71-
Any more serious failure of the device will result in that device being replaced by a new device from the free pool. The customer simply includes the new device in the existing HA pair for it to synchronize and return to full operational state. The failed device will have its data bearing devices removed and shredded on site at the data center.
70+
The HSM device has redundant and replaceable power supplies and fan units. However, fan unit removal still causes a tamper event. When a component failure occurs, Microsoft uses the most appropriate process to address the component-level problem in a way that causes minimal interruption and lowest risk to your service availability.
7271

73-
### Networking issues
72+
#### Power supply events
7473

75-
If customers experience networking access problems to the HSM device, they should contact Microsoft support. A simple test for networking access is to use SSH to connect to the HSM device. If this fails, contact Microsoft support.
74+
The HSM uses dual-PSU redundancy. Transient single-PSU log messages during datacenter maintenance are expected and don't require action or support tickets. For details, see [Power supply redundancy](monitoring.md#power-supply-redundancy).
75+
76+
Any more serious failure of the device results in replacing that device with a new device from the free pool. To synchronize and return to full operational state, include the new device in the existing HA pair. The failed device has its data-bearing devices removed and shredded on site at the datacenter.
77+
78+
### Networking problems
79+
80+
If you experience networking access problems to the HSM device, contact Microsoft support. A simple test for networking access is to use SSH to connect to the HSM device. If this test fails, contact Microsoft support.
7681

7782
## Service level expectations for support
7883

79-
For Microsoft support service levels, refer to the [Azure support plan](https://azure.microsoft.com/support/plans/).
80-
For Thales support service levels, refer to the [Thales Support Essentials](https://azure.microsoft.com/support/plans/).
84+
For Microsoft support service levels, see the [Azure support plan](https://azure.microsoft.com/support/plans/).
85+
For Thales support service levels, see the [Thales Support Essentials](https://azure.microsoft.com/support/plans/).
8186

8287
## Next steps
8388

84-
It is recommended that key concepts such as high availability and security are well understood before device provisioning and application design or deployment.
89+
Before device provisioning and application design or deployment, make sure you understand key concepts such as high availability and security.
8590

8691
* [Deployment Architecture](deployment-architecture.md)
8792
* [High Availability](high-availability.md)

articles/payment-hsm/lifecycle-management.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,23 @@ Microsoft allocates Payment HSMs with a base image by default that includes appr
4242
Microsoft monitors HSM physical health and network connectivity, which includes individual HSM’s power, temperature/Fan, OOB Connectivity, tamper, HOST1/HOST2/MGMT link status, upstream networking, and equipment.
4343

4444
Customers are responsible for monitoring their allocated HSM’s operational health, which includes HSM error logs and audit logs. Customers can utilize all payShield monitoring solutions.
45+
### Power supply redundancy
46+
47+
The payShield 10K uses a dual power supply unit (PSU) design for redundancy. Each PSU connects to an independent power feed, allowing the device to operate normally if one PSU experiences a brief outage.
48+
49+
During scheduled datacenter power maintenance, power feeds are serviced one at a time while the other feed remains active, ensuring continuous operation through redundant power. You may see transient single-PSU messages in your HSM logs such as:
50+
51+
```text
52+
Power supply 1 AC outage
53+
Power supply 1 AC restored
54+
```
55+
56+
These messages are expected behavior and don't indicate a hardware fault—the device continues operating normally on the redundant PSU.
57+
58+
> [!IMPORTANT]
59+
> Don't open support tickets or request physical hardware investigation based on single-PSU log messages. Microsoft monitors PSU health and proactively addresses any actual hardware failures. Unnecessary physical intervention can introduce risk to your device's operation.
60+
61+
If our monitoring detects a genuine PSU or fan issue, Microsoft replaces the component without requiring customer action or notification.
4562

4663
## Managing unresponsive HSM devices
4764

0 commit comments

Comments
 (0)