- 03-08-21Wegwerkzaamheden BIT-2 van 16 aug tot en met 25 sep
- 23-07-21DDoS aanvallen naar BIT infra
- 25-06-21Mond- en neusmasker niet meer verplicht bij BIT
- 14-04-21Untangle haalt de complexiteit uit netwerkbeveiliging
- 03-03-21Vacature Netwerk Engineer
- 02-03-21Update RFO netwerk incident 17 februari 2021
- 18-02-21RFO netwerk incident
- 17-02-21Netwerk incident - www.bit.nl niet bereikbaar
- 22-01-21Avondklok: Datacenters 24/7 bereikbaar en remote hands en brains mogelijk
- 30-11-20Mond- en neusmasker vanaf 1 december bij BIT verplicht
Ceph’s Software Designed Storage for all possible storage flavours
It might have passed you by, but BIT has traded in all its storage appliances for something really hip. That hip thing is called Ceph and that is now the basis for all flavours of storage that we serve. We have closed the era of proprietary storage earlier this year and have entered the age of Software Defined Storage (SDS).
With the rise of virtual machines we wanted to build a base for stable storage. We have chosen for a NetApp metro cluster. Despite this proven technical implementation, we have had to make a lot of investments to be able to support this solution. We have encountered numerous bugs and have had our hands full at keeping the whole thing stable. One of the insurmountable problems, however, was the performance. Despite adding disc capacity (more spindles), the performance did not increase. The filer could not handle the increasing IO demands and this lead to longer queues for IO operations, better known as latency. This is a known problem with storage systems that can only scale in a vertical manner; the controller becomes a bottleneck. The ‘solution’ to this problem is replacing the controller for a more powerful type, with correspondingly increasing licence and support costs. This is not scaling, neither economically or performance-wise.
Future of storage
We think the future of storage is in distributed software storage solutions based on standard hardware. This would lead to an increase in capacity and performance with every server. This is ‘horizontal’ scaling, which means that solutions like this can both increase and decrease based on the demand, even if a decrease hardly ever occurs.
One of the software solutions is Ceph, which is also open source software. In contrast with many other solutions, Ceph has no “Single Point Of Failure” (SPOF). That is, of course, very desirable for systems that have high availability and reliability as their goal. It uses ingenious, mathematical algorithms to read and write data in the cluster. Also, you will not need an omniscient controller that can become a bottleneck. The data is saved in objects. This does not mean that Ceph cannot fail, it is still software and can still have bugs. And those bugs can lead to certain objects becoming unavailable in any data center (failure domain), which has regrettable happened to us before.
Another advantage of Ceph is that you know where your data is. So on which disk, server, rack and data center an object is located. By means of placement rules, you can instruct Ceph to make multiple copies of an object and where to place them. BIT uses this to make at least two replicas of each object and placing them in different data centers. So the data is saved in three different locations.
Correlated failures in a data center (think about cooling or power outages) will not lead to downtime, because the cluster with the other two data centers (majority) will still function. It has happened before. During our half-yearly black building test, the power for ⅓ of the Ceph cluster failed. Still, the cluster kept functioning with the majority of the nodes and it automatically recovered after the power was turned back on without the help of engineers.
Ceph can be compared to a Swiss army knife when it comes to the possibilities it provides. Block, file and object based interfaces are available, so you can use it as storage for your virtual machines, as a shared file system and as an S3 alternative. If you want to keep full control, you can program against the API with librados (multiple options for bindings). However, we do not use this last option ourselves because it can lead to risks in multi-tenant systems.
In future blog posts we will dive deeper into how the cluster works and what the possibilities are.
By: Stefan Kooman