We've all been there, it's supposed to be a relatively simple change and then BOOM! Spanning tree topology change blows up the network :( There is a movement in the data centre space to push the layer 2 boundary down into the host to avoid the bandwidth waste of spanning tree link blocking and the nightmare failure scenarios of technologies to "prevent" spanning tree issues like MLAG and VSS.
Current data centre design utilizing MLAG.
Pete Lumbis of Cumulus Networks has written an excellent blog post describing the evolution of data centre design from legacy models to the routed host model. It's a quick read that covers the topic very well.
This post will cover enabling routing on the host by installing FRR on an Ubuntu 1604 host and configuring BGP peering with Cumulus Linux switches.
Free Range Routing (FRR) is an open source IP routing suite for linux. FRR is a fork of the Quagga project with over 130 contributors and support from vendors such as Big Switch and Cumulus Networks. FRR aims to implement seemless integration with the native linux network stacks and currently supports the BGP, OSPF, ISIS, RIP routing protocols with support for EIGRP on the way.
For reference the following software will be used in this post.
First up I will configure the leaf switches, for this task I will use the network command line utiliy (NCLU).
# Configure interfaces sudo net add interface swp1 ipv6 nd ra-interval 5 sudo net del interface swp1 ipv6 nd suppress-ra sudo net add loopback lo ip address 10.2.0.1/32 # Configure BGP sudo net add bgp autonomous-system 65201 sudo net add bgp router-id 10.2.0.1 sudo net add bgp bestpath as-path multipath-relax sudo net add bgp bestpath compare-routerid sudo net add bgp neighbor fabric peer-group sudo net add bgp neighbor fabric remote-as external sudo net add bgp neighbor fabric description Internal Fabric Network sudo net add bgp neighbor fabric capability extended-nexthop sudo net add bgp neighbor swp1 interface peer-group fabric sudo net add bgp ipv4 unicast network 10.2.0.1/32 sudo net add bgp ipv6 unicast neighbor fabric activate # Save configuration sudo net commit
# Configure interfaces sudo net add interface swp1 ipv6 nd ra-interval 5 sudo net del interface swp1 ipv6 nd suppress-ra sudo net add loopback lo ip address 10.2.0.2/32 # Configure BGP sudo net add bgp autonomous-system 65202 sudo net add bgp router-id 10.2.0.2 sudo net add bgp bestpath as-path multipath-relax sudo net add bgp bestpath compare-routerid sudo net add bgp neighbor fabric peer-group sudo net add bgp neighbor fabric remote-as external sudo net add bgp neighbor fabric description Internal Fabric Network sudo net add bgp neighbor fabric capability extended-nexthop sudo net add bgp neighbor swp1 interface peer-group fabric sudo net add bgp ipv4 unicast network 10.2.0.2/32 sudo net add bgp ipv6 unicast neighbor fabric activate # Save configuration sudo net commit
Alright, lets move onto the host machine. I had to do an
apt update otherwise I
would get an error when installing the required packages.
sudo apt update -y
Install the dependencies.
sudo apt install -y iproute libc-ares2
Download and install the FRR package.
# download wget https://github.com/FRRouting/frr/releases/download/frr-3.0.2/frr_3.0.2-1-ubuntu16.04.1_amd64.deb # install sudo dpkg -i frr_3.0.2-1-ubuntu16.04.1_amd64.deb
Add the uplink interfaces to the /etc/network/interfaces config file.
auto eth1 iface eth1 inet manual auto eth2 iface eth2 inet manual
Enable the routing daemons in the /etc/frr/daemons config file.
# Change no to yes zebra=yes bgpd=yes
Create a file called /etc/frr/frr.conf for the BGP configurations with the the following contents.
! service integrated-vtysh-config ! int lo ip address 10.3.0.1/32 ip address 188.8.131.52/32 ! interface eth1 ipv6 nd ra-interval 5 no ipv6 nd suppress-ra ! interface eth2 ipv6 nd ra-interval 5 no ipv6 nd suppress-ra ! router bgp 65301 bgp router-id 10.3.0.1 bgp bestpath as-path multipath-relax bgp bestpath compare-routerid neighbor fabric peer-group neighbor fabric remote-as external neighbor fabric description Internal Fabric Network neighbor fabric capability extended-nexthop neighbor eth1 interface peer-group fabric neighbor eth2 interface peer-group fabric ! address-family ipv4 unicast network 10.3.0.1/32 network 184.108.40.206/32 neighbor fabric prefix-list host-routes-out out exit-address-family ! address-family ipv6 unicast neighbor fabric activate exit-address-family ! ip prefix-list host-routes-out seq 100 permit 10.3.0.1/32 ip prefix-list host-routes-out seq 200 permit 220.127.116.11/32 ip prefix-list host-routes-out seq 300 deny 0.0.0.0/0 le 32 ! line vty ! end
Restart the networking and frr services.
sudo systemctl restart networking.service sudo systemctl restart frr.service
Did you notice that no IP addresses where configured for the BGP peering ? This is possible with the use of BGP unnumbered. BGP unnumbered; specified in rfc5549 allows for the advertising of an IPv4 route with an IPv6 next-hop. The above configuration uses IPv6 dynamic link local neighbor discovery for the BGP peering address. More info can be found here and here.
Alright, that's all the config out of the way lets move onto verifing the operation.
Verify BGP peering is up and prefixes are learned
host01# show bgp ipv4 unicast summary # output BGP router identifier 10.3.0.1, local AS number 65301 vrf-id 0 BGP table version 4 RIB entries 7, using 952 bytes of memory Peers 2, using 42 KiB of memory Peer groups 1, using 72 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd eth1 4 65201 62 62 0 0 0 00:02:51 1 eth2 4 65202 61 61 0 0 0 00:02:50 1 Total number of neighbors 2
Confirm the prefixes being learned via BGP.
host01# show ip route # output Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, N - NHRP, T - Table, v - VNC, V - VNC-Direct, > - selected route, * - FIB route K>* 0.0.0.0/0 via 192.168.121.1, eth0 B>* 10.2.0.1/32 [20/0] via fe80::2ab7:adff:fe23:355e, eth1, 00:06:24 B>* 10.2.0.2/32 [20/0] via fe80::2ab7:adff:fe72:e8e1, eth2, 00:06:23 C>* 10.3.0.1/32 is directly connected, lo C>* 18.104.22.168/32 is directly connected, lo C>* 192.168.121.0/24 is directly connected, eth0
Good, we are learning the loopback IP addresses of leaf01/2 which is what we expect.
As a final test, lets check that we are learning the loopback addresses from host01 on leaf01.
vagrant@leaf01:~$ sudo net show route show ip route ============= Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, > - selected route, * - FIB route K>* 0.0.0.0/0 via 192.168.121.1, eth0 C>* 10.2.0.1/32 is directly connected, lo B>* 10.3.0.1/32 [20/0] via fe80::2ab7:adff:fe30:c4, swp1, 03:13:24 B>* 22.214.171.124/32 [20/0] via fe80::2ab7:adff:fe30:c4, swp1, 03:13:24 C>* 192.168.121.0/24 is directly connected, eth0
leaf01 is learning the loopback addresses advertised by host01. That looks like success to me !
Routing on the host is a nice solution to those pesky layer 2 problems we get in the networking world and with the rise of micro-service architectures it makes alot of sense as a data centre design choice.