Maintenance Ceph cluster

Maintenance Ceph cluster

19-12-2019 00:00:00 - 19-12-2019 07:00:00

Urgency: Planned

Affected services:
- Shared Linux Hosting;
- No VNC (console) access to virtual machines in BIT portal;
- Customers with shared filesystem on CephFS.

Expected impact:
- Websites on the Shared Linux Hosting platform might be unreachable for a short period of time;
- The upload server of the Shared Linux hosting platform will be unreachable for a short period of time;
- In the BIT portal it will not be possible to obtain (VNC) console access to virtual machines for a short period of time;
- Customers who use shared filesystem CephFS will experience hindrance.

Customer intervention required: No

Reference number: 166004

Summary:
The Ceph cluster will be updated. We will make "CRUSH" rule changes which will move all CephFS metadata to NVMe storage to make it as fast as possible. Three storage nodes will have their management network patched to a different network device to make the CephFS as fast as possible.

Details:
The cluster will be upgraded to Ceph version 13.2.8. The latest OS updates will also be installed during this maintenance. To make the CephFS as fast as possible, a CRUSH rule change will be applied to move the metadata "pool" of the CephFS to NVMe storage. This will happen online and should not have any impact. Three storage nodes will have their management network interface patched on an onboard NIC instead of add-in card. Therefore, these servers will be rebooted one by one . During the maintenance, the shared filesystem, CephFS, will be unavailable for a couple of moments. There have been changes backported that should improve the stability of the MDS (and of CephFS) during incidents. The impact of having a large cache should be heavily reduced. Future maintenance on CephFS (MDS) should be less impacting.