- 03-08-21Wegwerkzaamheden BIT-2 van 16 aug tot en met 25 sep
- 23-07-21DDoS aanvallen naar BIT infra
- 25-06-21Mond- en neusmasker niet meer verplicht bij BIT
- 14-04-21Untangle haalt de complexiteit uit netwerkbeveiliging
- 03-03-21Vacature Netwerk Engineer
- 02-03-21Update RFO netwerk incident 17 februari 2021
- 18-02-21RFO netwerk incident
- 17-02-21Netwerk incident - www.bit.nl niet bereikbaar
- 22-01-21Avondklok: Datacenters 24/7 bereikbaar en remote hands en brains mogelijk
- 30-11-20Mond- en neusmasker vanaf 1 december bij BIT verplicht
BIT's new access network
BIT’s network has grown rapidly over the last few years. Just as the number of clients and the bandwidth usage. High time, we say, to implement a number of major changes in the network to accommodate further growth. By applying new techniques, the network will be more robust and easier for our clients (multiple uplinks).
We can distinguish between two parts of the network: the core network, which is being used for the transport between the different data centers connecting BIT to the internet, and the access network that connects BIT to its clients and on which BIT provides various services.
The core network must be fast and reliable, with little complexity in the configuration. The access network, apart from being fast and reliable, needs to be able to meet the demands and wishes of a wide range of clients. It needs to be able to support a wide variety of configurations, protocols and brands of hardware.
Goals for a new access network
We have started a project in 2016 to initiate the renewal of the aces network. The goals we defined for this are as follows:
- The capacity of the network needs to be increased in order to continue to meet the increase in the need for bandwidth;
- The availability of the network needs to be increased, also during DDoS attacks;
- We continue to offer support for all possible ways in which clients use our network now;
- We want to be able to apply new technologies to provide future services;
- To have the possibility to use automation more often to manage the configuration of network equipment.
Extensive research was performed in 2016 and 2017 in order to find out about the possibilities that various producers of switches could provide in order to reach this goal and five brands have ben tested. Finally we opted for a solution based on Arista switches. Their data center switches offer a solution that ensures us that we can reach the abovementioned goals. Obviously these switches have more ports with a higher capacity than the old switches.
Design of a new network
With the arrival of the new switches, we had to make a new network design that applied a number of new technologies like VSLAN, L3VPN and streaming telemetry. You can find more information about this in a blog that will be uploaded later. The design of the new network will result in a number of big changes. Instead of using single big switches that house blades that can be connected to several hundred client ports, we will be using a large number of smaller switches. This has a few advantages: we can create a redundant connection for clients more easily without needing connections between different data centers and the impact of a failure in one of the access switches is a lot smaller. It also makes it easier to distribute switches across the server room, which simplifies the wiring.
But of course a large number of switches also comes with some challenges: more switches need to be managed, which – if done manually – would take more time and is more prone to mistakes. That is why we have invested a lot of time in the further automation of the switch configuration. This involves intensive use of Ansible, a tool that enables the automation of systems. The big advantage of Ansible is that is not only available for Ansible switches, but can also be applied on Linux and Windows servers and network equipment of several other brands. This simplifies integration into other systems, since this is already being used for other systems and equipment.
After an extensive period of development and testing, the new access network was put into use in January 2018, using spine-leaf topology. A spine switch has been placed in BIT-1 and BIT-2A and a number of leaf switches in every data center. The leaf switches are being used to connect clients and BIT servers and to connect to BIT’s core routers. The spine switches are being used to connect all leaf switches without each leaf switch having to have one or more connections to every other leaf switch.
Applied topology always works with pairs of leaf switches. The diagram depicts this topology, leaving out a large amount of the leaf switches (including the leaf switches in BIT-2B and BIT-2C) for simplicity reasons.
Each pair is connected to each other with two connections and each switch in the pair has a connection to both spine switches. This way, each leaf switch has four connections to the rest of BIT’s network. Each of these connections has a capacity of 40Gb/sec and all four connections can be loaded simultaneously. This increases the capacity of the access network greatly. Failure of a connection is detected very quickly, which allows a connection to shut down without a noticeable interruption for clients.
To demonstrate this, we have made this video:
Moving to the new network
After deploying the new network, all BIT’s servers and services have initially been moved there. After that we started the relocation of the large number of client connections. Every move requires proper preparation and needed to be executed in the nightly maintenance window. Due to all the checks prior to and following the physical move of the connections of our clients, the number of clients that can be moved each night was limited, but we have been able to move a large number of the connections by now. It will still take some time to phase out the entire old network.
We will be discussing the technical details of the new setup and the possibilities it offers our clients in a later blogpost.
By: Teun Vink