Emergency maintenance shared filesystem (cephfs)

Emergency maintenance shared filesystem (cephfs)

24-11-2019 15:32:09 - 24-11-2019 23:59:59

Urgency: Emergency maintenance
Affected services:
- Shared Linux Hosting
- No VNC (console) access of virtual machines in BIT portal
- Customers with shared filesystem on cephfs
Expected impact:
- Websites on the Shared Linux Hosting platform might be unreachable for a short period of time.
- The upload server of the Shared Linux hosting platform will be unreachable for a short period of time.
- In the BIT portal it will not be possible to obtain (VNC) console access to virtual machines
- Customers that make use of shared filesystem cephfs will experience hindrance
Customer intervention required: No
Reference number: 165777
Contact: +31 318 648 688, support@bit.nl

Summary:
The standby MDS server (part of the cephfs cluster) will be re-installed and provisioned with a modified version of the cephfs (MDS) software in order to obtain debug information in case the MDS server might crash. During the maintenance on the standby MDS server the cephfs cluster is not redundant.

Details:
This emergency maintenance is not being carried out during a normal maintenance window (00:00 - 07:00 hrs), because this overlaps with the period the MDS server(s) are used the most intense (during backup period). We have therefore decided to execute this maintenance during low traffic hours. During the maintenance the shared filesystem cephfs will failover at least once, but possibly multiple times, during which the filesystem will be (shortly) unavailable. The standby server will be re-installed and provisioned with a modified version of the cephfs (MDS) software in order to obtain debug information in case the MDS server might crash. This to obtain debug information in case a crash like some time ago (https://www.bit.nl/news/2509/91/Outage-storage-systems) might happen again. As soon as the server is fully re-installed and provisioned the amount of cache in use on the current active MDS server will be strongly lowered. This will happen in small steps to minimize the impact and this should also result in a fast(er) fail-over to the standby MDS server. During the maintenance on the standby MDS server the cephfs cluster is temporary not redundant.