published: 5th of January 2018
We've all been there, it's supposed to be a relatively simple change and then BOOM! Spanning tree topology change blows up the network :( There is a movement in the data centre space to push the layer 2 boundary down into the host to avoid the bandwidth waste of spanning tree link blocking and the nightmare failure scenarios of technologies to "prevent" spanning tree issues like MLAG and VSS.
Current data centre design utilizing MLAG.
Pete Lumbis of Cumulus Networks has written an excellent blog post describing the evolution of data centre design from legacy models to the routed host model. It's a quick read that covers the topic very well.
This post will cover enabling routing on the host by installing FRR on an Ubuntu 1604 host and configuring BGP peering with Cumulus Linux switches.
Free Range Routing (FRR) is an open source IP routing suite for linux. FRR is a fork of the Quagga project with over 130 contributors and support from vendors such as Big Switch and Cumulus Networks. FRR aims to implement seemless integration with the native linux network stacks and currently supports the BGP, OSPF, ISIS, RIP routing protocols with support for EIGRP on the way.
For reference the following software will be used in this post.
First up I will configure the leaf switches, for this task I will use the network command line utiliy (NCLU).
# Configure interfaces
sudo net add interface swp1 ipv6 nd ra-interval 5
sudo net del interface swp1 ipv6 nd suppress-ra
sudo net add loopback lo ip address 10.2.0.1/32
# Configure BGP
sudo net add bgp autonomous-system 65201
sudo net add bgp router-id 10.2.0.1
sudo net add bgp bestpath as-path multipath-relax
sudo net add bgp bestpath compare-routerid
sudo net add bgp neighbor fabric peer-group
sudo net add bgp neighbor fabric remote-as external
sudo net add bgp neighbor fabric description Internal Fabric Network
sudo net add bgp neighbor fabric capability extended-nexthop
sudo net add bgp neighbor swp1 interface peer-group fabric
sudo net add bgp ipv4 unicast network 10.2.0.1/32
sudo net add bgp ipv6 unicast neighbor fabric activate
# Save configuration
sudo net commit
# Configure interfaces
sudo net add interface swp1 ipv6 nd ra-interval 5
sudo net del interface swp1 ipv6 nd suppress-ra
sudo net add loopback lo ip address 10.2.0.2/32
# Configure BGP
sudo net add bgp autonomous-system 65202
sudo net add bgp router-id 10.2.0.2
sudo net add bgp bestpath as-path multipath-relax
sudo net add bgp bestpath compare-routerid
sudo net add bgp neighbor fabric peer-group
sudo net add bgp neighbor fabric remote-as external
sudo net add bgp neighbor fabric description Internal Fabric Network
sudo net add bgp neighbor fabric capability extended-nexthop
sudo net add bgp neighbor swp1 interface peer-group fabric
sudo net add bgp ipv4 unicast network 10.2.0.2/32
sudo net add bgp ipv6 unicast neighbor fabric activate
# Save configuration
sudo net commit
Alright, lets move onto the host machine. I had to do an apt update otherwise I would get an error when installing the required packages.
sudo apt update -y
Install the dependencies.
sudo apt install -y iproute libc-ares2
Download and install the FRR package.
# download
wget https://github.com/FRRouting/frr/releases/download/frr-3.0.2/frr_3.0.2-1-ubuntu16.04.1_amd64.deb
# install
sudo dpkg -i frr_3.0.2-1-ubuntu16.04.1_amd64.deb
Add the uplink interfaces to the /etc/network/interfaces config file.
auto eth1
iface eth1 inet manual
auto eth2
iface eth2 inet manual
Enable the routing daemons in the /etc/frr/daemons config file.
# Change no to yes
zebra=yes
bgpd=yes
Create a file called /etc/frr/frr.conf for the BGP configurations with the the following contents.
!
service integrated-vtysh-config
!
int lo
ip address 10.3.0.1/32
ip address 30.30.30.30/32
!
interface eth1
ipv6 nd ra-interval 5
no ipv6 nd suppress-ra
!
interface eth2
ipv6 nd ra-interval 5
no ipv6 nd suppress-ra
!
router bgp 65301
bgp router-id 10.3.0.1
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor fabric peer-group
neighbor fabric remote-as external
neighbor fabric description Internal Fabric Network
neighbor fabric capability extended-nexthop
neighbor eth1 interface peer-group fabric
neighbor eth2 interface peer-group fabric
!
address-family ipv4 unicast
network 10.3.0.1/32
network 30.30.30.30/32
neighbor fabric prefix-list host-routes-out out
exit-address-family
!
address-family ipv6 unicast
neighbor fabric activate
exit-address-family
!
ip prefix-list host-routes-out seq 100 permit 10.3.0.1/32
ip prefix-list host-routes-out seq 200 permit 30.30.30.30/32
ip prefix-list host-routes-out seq 300 deny 0.0.0.0/0 le 32
!
line vty
!
end
Restart the networking and frr services.
sudo systemctl restart networking.service
sudo systemctl restart frr.service
Did you notice that no IP addresses where configured for the BGP peering ? This is possible with the use of BGP unnumbered. BGP unnumbered; specified in rfc5549 allows for the advertising of an IPv4 route with an IPv6 next-hop. The above configuration uses IPv6 dynamic link local neighbor discovery for the BGP peering address. More info can be found here and here.
Alright, that's all the config out of the way lets move onto verifing the operation.
Verify BGP peering is up and prefixes are learned
host01# show bgp ipv4 unicast summary
# output
BGP router identifier 10.3.0.1, local AS number 65301 vrf-id 0
BGP table version 4
RIB entries 7, using 952 bytes of memory
Peers 2, using 42 KiB of memory
Peer groups 1, using 72 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
eth1 4 65201 62 62 0 0 0 00:02:51
1
eth2 4 65202 61 61 0 0 0 00:02:50
1
Total number of neighbors 2
Confirm the prefixes being learned via BGP.
host01# show ip route
# output
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, P - PIM, N - NHRP, T - Table,
v - VNC, V - VNC-Direct,
> - selected route, * - FIB route
K>* 0.0.0.0/0 via 192.168.121.1, eth0
B>* 10.2.0.1/32 [20/0] via fe80::2ab7:adff:fe23:355e, eth1, 00:06:24
B>* 10.2.0.2/32 [20/0] via fe80::2ab7:adff:fe72:e8e1, eth2, 00:06:23
C>* 10.3.0.1/32 is directly connected, lo
C>* 30.30.30.30/32 is directly connected, lo
C>* 192.168.121.0/24 is directly connected, eth0
Good, we are learning the loopback IP addresses of leaf01/2 which is what we expect.
As a final test, lets check that we are learning the loopback addresses from host01 on leaf01.
vagrant@leaf01:~$ sudo net show route
show ip route
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel,
> - selected route, * - FIB route
K>* 0.0.0.0/0 via 192.168.121.1, eth0
C>* 10.2.0.1/32 is directly connected, lo
B>* 10.3.0.1/32 [20/0] via fe80::2ab7:adff:fe30:c4, swp1, 03:13:24
B>* 30.30.30.30/32 [20/0] via fe80::2ab7:adff:fe30:c4, swp1, 03:13:24
C>* 192.168.121.0/24 is directly connected, eth0
leaf01 is learning the loopback addresses advertised by host01. That looks like success to me !
Routing on the host is a nice solution to those pesky layer 2 problems we get in the networking world and with the rise of micro-service architectures it makes alot of sense as a data centre design choice.
https://docs.cumulusnetworks.com/display/ROH/Routing+on+the+Host