Intro

We've all been there, it's supposed to be a relatively simple change and then BOOM! Spanning tree topology change blows up the network :( There is a movement in the data centre space to push the layer 2 boundary down into the host to avoid the bandwidth waste of spanning tree link blocking and the nightmare failure scenarios of technologies to "prevent" spanning tree issues like MLAG and VSS.

Current data centre design utilizing MLAG.

blog/linux-routing-on-the-host-with-frr/linux-mlag-topology.svg
Future data centre design utilizing routing on the host.
blog/linux-routing-on-the-host-with-frr/linux-routed-host-topology.svg

Pete Lumbis of Cumulus Networks has written an excellent blog post describing the evolution of data centre design from legacy models to the routed host model. It's a quick read that covers the topic very well.

This post will cover enabling routing on the host by installing FRR on an Ubuntu 1604 host and configuring BGP peering with Cumulus Linux switches.

Free Range Routing (FRR) is an open source IP routing suite for linux. FRR is a fork of the Quagga project with over 130 contributors and support from vendors such as Big Switch and Cumulus Networks. FRR aims to implement seemless integration with the native linux network stacks and currently supports the BGP, OSPF, ISIS, RIP routing protocols with support for EIGRP on the way.

Topology

blog/linux-routing-on-the-host-with-frr/linux-roh-lab-topology.svg

For reference the following software will be used in this post.

  • Ubuntu Minimal - 16.04
  • Free Range Routing - 3.0.2
  • Cumulus Linux - 3.4.3

Switch Configuration

First up I will configure the leaf switches, for this task I will use the network command line utiliy (NCLU).

leaf01

cmd
# Configure interfaces

sudo net add interface swp1 ipv6 nd ra-interval 5
sudo net del interface swp1 ipv6 nd suppress-ra
sudo net add loopback lo ip address 10.2.0.1/32

# Configure BGP

sudo net add bgp autonomous-system 65201
sudo net add bgp router-id 10.2.0.1
sudo net add bgp bestpath as-path multipath-relax
sudo net add bgp bestpath compare-routerid
sudo net add bgp neighbor fabric peer-group
sudo net add bgp neighbor fabric remote-as external
sudo net add bgp neighbor fabric description Internal Fabric Network
sudo net add bgp neighbor fabric capability extended-nexthop
sudo net add bgp neighbor swp1 interface peer-group fabric
sudo net add bgp ipv4 unicast network 10.2.0.1/32
sudo net add bgp ipv6 unicast neighbor fabric activate

# Save configuration

sudo net commit

leaf02

cmd
# Configure interfaces

sudo net add interface swp1 ipv6 nd ra-interval 5
sudo net del interface swp1 ipv6 nd suppress-ra
sudo net add loopback lo ip address 10.2.0.2/32

# Configure BGP

sudo net add bgp autonomous-system 65202
sudo net add bgp router-id 10.2.0.2
sudo net add bgp bestpath as-path multipath-relax
sudo net add bgp bestpath compare-routerid
sudo net add bgp neighbor fabric peer-group
sudo net add bgp neighbor fabric remote-as external
sudo net add bgp neighbor fabric description Internal Fabric Network
sudo net add bgp neighbor fabric capability extended-nexthop
sudo net add bgp neighbor swp1 interface peer-group fabric
sudo net add bgp ipv4 unicast network 10.2.0.2/32
sudo net add bgp ipv6 unicast neighbor fabric activate

# Save configuration

sudo net commit

FRR Installation

Alright, lets move onto the host machine. I had to do an apt update otherwise I would get an error when installing the required packages.

cmd
sudo apt update -y

Install the dependencies.

cmd
sudo apt install -y iproute libc-ares2

Download and install the FRR package.

cmd
# download

wget https://github.com/FRRouting/frr/releases/download/frr-3.0.2/frr_3.0.2-1-ubuntu16.04.1_amd64.deb

# install

sudo dpkg -i frr_3.0.2-1-ubuntu16.04.1_amd64.deb

FRR Configuration

Add the uplink interfaces to the /etc/network/interfaces config file.

file
auto eth1
iface eth1 inet manual

auto eth2
iface eth2 inet manual

Enable the routing daemons in the /etc/frr/daemons config file.

file
# Change no to yes

zebra=yes
bgpd=yes

Create a file called /etc/frr/frr.conf for the BGP configurations with the the following contents.

file
!
service integrated-vtysh-config
!
int lo
 ip address 10.3.0.1/32
 ip address 30.30.30.30/32
!
interface eth1
 ipv6 nd ra-interval 5
 no ipv6 nd suppress-ra
!
interface eth2
 ipv6 nd ra-interval 5
 no ipv6 nd suppress-ra
!
router bgp 65301
 bgp router-id 10.3.0.1
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor fabric peer-group
 neighbor fabric remote-as external
 neighbor fabric description Internal Fabric Network
 neighbor fabric capability extended-nexthop
 neighbor eth1 interface peer-group fabric
 neighbor eth2 interface peer-group fabric
 !
 address-family ipv4 unicast
  network 10.3.0.1/32
  network 30.30.30.30/32
  neighbor fabric prefix-list host-routes-out out
 exit-address-family
 !
 address-family ipv6 unicast
  neighbor fabric activate
 exit-address-family
!
ip prefix-list host-routes-out seq 100 permit 10.3.0.1/32
ip prefix-list host-routes-out seq 200 permit 30.30.30.30/32
ip prefix-list host-routes-out seq 300 deny 0.0.0.0/0 le 32
!
line vty
!
end

Restart the networking and frr services.

cmd
sudo systemctl restart networking.service
sudo systemctl restart frr.service

Did you notice that no IP addresses where configured for the BGP peering ? This is possible with the use of BGP unnumbered. BGP unnumbered; specified in rfc5549 allows for the advertising of an IPv4 route with an IPv6 next-hop. The above configuration uses IPv6 dynamic link local neighbor discovery for the BGP peering address. More info can be found here and here.

Alright, that's all the config out of the way lets move onto verifing the operation.

Verification

Verify BGP peering is up and prefixes are learned

cmd
host01# show bgp ipv4 unicast summary

# output

BGP router identifier 10.3.0.1, local AS number 65301 vrf-id 0
BGP table version 4
RIB entries 7, using 952 bytes of memory
Peers 2, using 42 KiB of memory
Peer groups 1, using 72 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
eth1            4      65201      62      62        0    0    0 00:02:51
            1

eth2            4      65202      61      61        0    0    0 00:02:50
            1


Total number of neighbors 2

Confirm the prefixes being learned via BGP.

cmd
host01# show ip route

# output

Codes: K - kernel route, C - connected, S - static, R - RIP,
        O - OSPF, I - IS-IS, B - BGP, P - PIM, N - NHRP, T - Table,
        v - VNC, V - VNC-Direct,
        > - selected route, * - FIB route

K>* 0.0.0.0/0 via 192.168.121.1, eth0
B>* 10.2.0.1/32 [20/0] via fe80::2ab7:adff:fe23:355e, eth1, 00:06:24
B>* 10.2.0.2/32 [20/0] via fe80::2ab7:adff:fe72:e8e1, eth2, 00:06:23

C>* 10.3.0.1/32 is directly connected, lo
C>* 30.30.30.30/32 is directly connected, lo
C>* 192.168.121.0/24 is directly connected, eth0

Good, we are learning the loopback IP addresses of leaf01/2 which is what we expect.

As a final test, lets check that we are learning the loopback addresses from host01 on leaf01.

cmd
vagrant@leaf01:~$ sudo net show route
show ip route
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel,
       > - selected route, * - FIB route

K>* 0.0.0.0/0 via 192.168.121.1, eth0
C>* 10.2.0.1/32 is directly connected, lo
B>* 10.3.0.1/32 [20/0] via fe80::2ab7:adff:fe30:c4, swp1, 03:13:24
B>* 30.30.30.30/32 [20/0] via fe80::2ab7:adff:fe30:c4, swp1, 03:13:24

C>* 192.168.121.0/24 is directly connected, eth0

leaf01 is learning the loopback addresses advertised by host01. That looks like success to me !

Outro

Routing on the host is a nice solution to those pesky layer 2 problems we get in the networking world and with the rise of micro-service architectures it makes alot of sense as a data centre design choice.