EVPN behind the curtains



Is EVPN magic? Well, like Arthur C Clarke said, any considerable leap in technology is indistinguishable from magic. On that premise, moving from a traditional layer 2 environment to VXLAN driven by EVPN has much of that same hocus pocus feeling. To help demystify the sorcery, this blog aims to help users new to EVPN create some step-by-step understanding of how EVPN works and how the control plane converges. In this blog post, we’ll focus on basic layer 2 (L2) building blocks then work our way up to layer 3 (L3) connectivity and the control plane.

We’ll be using the “reference topology” as our cable plan and foundation to build our understanding of the traffic flow. Our infrastructure will try to demystify a symmetric mode EVPN environment using distributed gateways. All the configurations are defined in this github repo.

If you’d like to follow along as we go, feel free to launch your own CITC blank slate and deploy the above playbook:

EVPN message types

Like any good protocol, EVPN has a robust process for exchanging information with its peers. In EVPN this process uses message types. If you already know OSPF and the LSA messages you can think of EVPN message types very similarly. Each EVPN message type can carry a different kind of information about the EVPN traffic flow.

In total there are about 5 different message types, but we’re going to focus on the two most popular types for now. In this blog post we’ll cover Type 2, Mac and Mac/IP information, and in a later post, I will discuss Type 5, VNI Route information.

Digging into EVPN message types: Type 2

The easiest EVPN messages to understand are type 2. As mentioned before, type 2 routes contain MAC and MAC/IP mappings. To Startoff, let’s inspect a type 2 entry at work. To do that, we can verify basic connectivity from leaf01 to the server01.

First we look at the bridge table to make sure the MAC address of the switch has the correct mapping to the correct port for the server.

Lets get Server01’s MAC address:

[email protected]:~$ ip address show
…
7: uplink: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
 link/ether 44:38:39:00:08:01 brd ff:ff:ff:ff:ff:ff
 inet 10.1.1.101/24 brd 10.1.1.255 scope global uplink
   valid_lft forever preferred_lft forever
 inet6 fe80::4638:39ff:fe00:801/64 scope link
    valid_lft forever preferred_lft forever

Look at Leaf01’s bridge table to make sure the MAC address is mapped to the port we expect. We can cross reference it with LLDP:

[email protected]:~$ net show bridge mac

VLAN   Master  Interface  MAC                TunnelDest  State      Flags          LastSeen
--------  ------  ----------  -----------------  ------------  ---------  -------------  --------
10          bridge  SERVER01    44:38:39:00:08:01                                00:01:15
...
[email protected]:~$ net show lldp

LocalPort   Speed  Mode               RemoteHost  RemotePort
---------  -----  -------------  ----------  ----------
swp1        1G      BondMember     server01    eth1
swp49      1G     BondMember       leaf02      swp49
swp50      1G     BondMember       leaf02      swp50
swp51      1G     NotConfigured    spine01     swp1
swp52       1G     NotConfigured   spine02     swp1

Checking the ARP table we can validate the MAC and IP addresses are mapped correctly.

[email protected]:~$ ip neighbor show
10.1.1.101 dev vlan10 lladdr 44:38:39:00:08:01 REACHABLE
...

Now that we’ve checked the basics, let’s start looking at how this gets pulled into EVPN. To being, we validate the local VNIs that are configured:

[email protected]:~$ net show evpn vni
VNI       Type VxLAN IF                  # MACs   # ARPs   # Remote VTEPs  Tenant VRF
10010       L2   VXLAN10        4        8        1               RED
10020       L2   VXLAN20        2        6        1               BLUE
104001       L3   L3VNI_RED       1        1        n/a            RED
104002       L3   L3VNI_BLUE      0        0        n/a            BLUE

Since we validated that server01 is mapped to vlan10 as per the bridge mac table, we’ll check if the ip neighbor entries are being pulled into the EVPN cache. This cache describes the information that is being exchanged with the other EVPN speakers in the environment.

[email protected]:~$ net show evpn arp-cache vni 10010
Number of ARPs (local and remote) known for this VNI: 8
IP                        Type   MAC               Remote VTEP
fe80::4638:39ff:fe00:205 local   44:38:39:00:02:05
fe80::4638:39ff:fe00:801 local   44:38:39:00:08:01
10.1.1.2                    local  44:38:39:00:02:05
10.1.1.103                  remote 44:38:39:00:0a:01 192.168.1.34
10.1.1.1                    local  00:00:00:00:00:1a
fe80::200:ff:fe00:1a         local  00:00:00:00:00:1a
10.1.1.101                  local  44:38:39:00:08:01
fe80::4638:39ff:fe00:a01 remote 44:38:39:00:0a:01 192.168.1.34

Let’s review what we know so far. The L2 connectivity works correctly as the L2 bridge table and L3 neighbor table are populated locally on leaf01. Next we verified that the mac and ip information is being properly pulled into EVPN via the EVPN arp cache.

Using this information, we check to make sure that the RD and RT mapping so we can learn more about the full VNI advertisement.

An RD is a route distinguisher and is used to disambiguate EVPN routes in different VNIs (as they may have the same MAC and/or IP address).

The RTs are route targets. They are used to describe the VPN membership for the route, specifically which VRFs are exporting and importing the different routes in the infrastructure.

[email protected]:~$ net show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise All VNI flag: Enabled
Number of L2 VNIs: 2
Number of L3 VNIs: 2
Flags: * - Kernel
  VNI       Type RD                 Import RT                 Export RT              Tenant VRF
* 10010        L2   10.255.255.11:2      65101:10010               65101:10010           RED
* 10020         L2   10.255.255.11:3     65101:10020               65101:10020           BLUE
* 104001       L3   10.1.1.2:4               65101:104001              65101:104001        RED
* 104002  L3         10.2.2.2:5              65101:104002              65101:104002         BLUE

Since the local L2 VNI has RD 10.255.255.11:2, the RD is essentially an identifier for all routes that are exchanged by this node. When looking elsewhere in the fabric, we use that information to see all the routes advertised by leaf01.

[email protected]:~$ net show bgp l2vpn evpn route rd 10.255.255.11:2
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
BGP routing table entry for 10.255.255.11:2:[2]:[0]:[0]:[48]:[44:38:39:00:08:01]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  spine01(swp51) spine02(swp52)
  Route [2]:[0]:[0]:[48]:[44:38:39:00:08:01] VNI 10010/104001
  Local
     192.168.1.12 from 0.0.0.0 (10.255.255.11)
     Origin IGP, localpref 100, weight 32768, valid, sourced, local, bestpath-from-AS Local, best
     Extended Community: ET:8 RT:65101:10010 RT:65101:104001                                    Rmac:44:38:39:00:02:05
     AddPath ID: RX 0, TX 51
     Last update: Thu Sep  6 18:20:00 2018

BGP routing table entry for 
10.255.255.11:2:[2]:[0]:[0]:[48]:[44:38:39:00:08:01]:[32]:[10.1.1.101]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  spine01(swp51) spine02(swp52)
  Route [2]:[0]:[0]:[48]:[44:38:39:00:08:01]:[32]:[10.1.1.101] VNI 10010/104001
  Local
    192.168.1.12 from 0.0.0.0 (10.255.255.11)
    Origin IGP, localpref 100, weight 32768, valid, sourced, local, bestpath-from-AS Local,    best
    Extended Community: ET:8 RT:65101:10010 RT:65101:104001                                   Rmac:44:38:39:00:02:05
    AddPath ID: RX 0, TX 83
    Last update: Thu Sep  6 18:20:06 2018

....

Displayed 6 prefixes (6 paths) with this RD

Here’s an important piece of information and lets spend some time dissecting the EVPN type 2 route. There are actually two different forms that a type 2 route can take, in this case we’re sending each of the two types.

  • Type 2 MAC Route
    • The first one is an EVPN type 2 MAC route. It only includes a 48 byte MAC entry. This entry is pulled in directly from from the bridge table, and hence only has L2 information in it. Any time a MAC address is learned in the bridge table, that MAC address is pulled into EVPN as a type 2 MAC route.
  • Type 2 MAC/IP Route
    • The second EVPN type 2 entry is a MAC/IP route. These entries are pulled into EVPN from the ARP table. Reading this entry, the first section includes MAC address and the second one is a mapping for the IP address and mask. Notice how the mask for the IP address is a /32, since this is pulled from the ARP table all EVPN routes are pulled in as host routes.
BGP routing table entry for 10.255.255.11:2:[2]:[0]:[0]:[48]:[44:38:39:00:08:01]
...
  Route [2]:[0]:[0]:[48]:[44:38:39:00:08:01] VNI 10010/104001
...

BGP routing table entry for 10.255.255.11:2:[2]:[0]:[0]:[48]:[44:38:39:00:08:01]:[32]:[10.1.1.101]
...
  Route [2]:[0]:[0]:[48]:[44:38:39:00:08:01]:[32]:[10.1.1.101] VNI 10010/104001

....

Using this information, we should be able to validate that this /32 host route for server01 is in the routing table of leaf03 as a pure L3 route, pointing out to the L3VNI.

[email protected]:~$ net show route vrf RED

show ip route vrf RED
======================
Codes: K - kernel route, C - connected, S - static, R - RIP,

   O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
   T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
   F - PBR,
   > - selected route, * - FIB route

VRF RED:

K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 00:34:43
C * 10.1.1.0/24 is directly connected, vlan10-v0, 00:33:28
C>* 10.1.1.0/24 is directly connected, vlan10, 00:33:29
B>* 10.1.1.101/32 [20/0] via 192.168.1.12, vlan4001 onlink, 00:31:18

Let’s spend some time dissecting this output. The neighbor entry in Leaf01 for Server01 has made it all the way to Leaf03 as a /32 host route where the next hop is leaf01 but via the L3VNI.

In order to validate that the connection between the L2 VNI and the L3 VNI are accomplished successfully, we can examine the L3 VNI:

[email protected]:~$ net show evpn vni 104001
VNI: 104001
  Type: L3
  Tenant VRF: RED
  Local Vtep Ip: 192.168.1.12
  Vxlan-Intf: L3VNI_RED
  SVI-If: vlan4001
  State: Up
  VNI Filter: none
  Router MAC: 44:38:39:00:02:05
  L2 VNIs: 10010

Notice in this output that the L3 VNI of 104001 is mapped to VRF RED, which we validated in the output of net show evpn vni 10010. Using this, we also can see that VNI 10010 is mapped to VRF 104001 via vlan 4001. All the outputs we’re seeing are lining up to indicate that we have a full working EVPN Type 2 VXLAN infrastructure.

There you have it. From start to finish, we looked at how EVPN works for Type 2 based routes. Specifically we focused at the different EVPN message types and how control planes converge in an L2 extension environment. It’s not witchcraft — just good technology. Tune in for our next post where we extend the EVPN control plane demystification and tackle the traffic flows around Type 5 messages and VXLAN routing. If you haven’t already, I highly recommend trying this out for yourself with Cumulus in the Cloud. And if you’d like to take a deeper dive, we’ve put together a hub of EVPN content — from whitepapers to videos — so you can expand your expertise (or skills in the black arts).

The post EVPN behind the curtains appeared first on Cumulus Networks engineering blog.

Source:: Cumulus Networks