Emergency maintenance iSCSI storage gateway servers

Emergency maintenance iSCSI storage gateway servers

23-04-2021 12:30:00 - 23-04-2021 18:00:00

Urgency: emergency
Affected services:
- iSCSI LUNs
- MS SQL instances
- MS SQL databases
- Websites on the loadbalanced shared Windows hosting platform
Expected Impact:
- iSCSI LUNs might not respond for a short moment
- MS SQL instances might not respond for a short moment
- MS SQL databases might not respond for a short moment
- Websites on the loadbalanced shared Windows hosting platform might not respond for a short moment
Customer intervention required: no
Reference number: 173274


Summary:

The disruption on the storage gateway server on april 22nd is caused by a kernel bug. During this maintenance we will restart the storage gateway servers with the previous kernel version where we did not trigger this bug. We are doing this in a emergency maintenance because we can hit the bug at any time. Before a server is restarted, switch-overs will take place of iSCSI targets between the gateways servers. During a switch-over, the iSCSI targets can be unavailable for several seconds, causing a short stall for iSCSI LUNs. MS SQL instances, MS SQL databases and websites on the shared Windows hosting platform are using iSCSI backend storage. These services could experience a short disruption due to these stalls.

Details:

The disruption on the storage gateway servers is caused by a bug in the kernel that we hit on of the storage gateway servers. As a result a scenario unfolded where all gateway servers were unreachable for a short period of time. The kernel where we hit this bug was installed during the maintenance on the April 8th. During this emergency maintenance we will downgrade to the kernel that was installed before that maintenance.

Before we restart a storage gateway, the iSCSI targets on that server will be moved to a different server. During these switch-overs, the iSCSI target can be shortly unavailable for an iSCSI initiator. This could lead to a short IO stall on iSCSI LUNs until the target becomes available again. Customers with an iSCSI LUN can experience this as short stalls. Because we are using iSCSI storage as backend storage on our MS SQL clusters, customers with an MS SQL instance or an MS SQL database can experience a short stall to these services. The load balanced shared Windows hosting platform also uses iSCSI storage in its cluster. Websites on this platform can experience a short stall too.