Check out my first novel, midnight's simulacra!
VXLAN: Difference between revisions
No edit summary |
|||
Line 33: | Line 33: | ||
* Delete an entry: <tt>bridge fdb delete L2ADDR dev VXLANDEV</tt> | * Delete an entry: <tt>bridge fdb delete L2ADDR dev VXLANDEV</tt> | ||
* Dump the forwarding table: <tt>bridge fdb show dev VXLANDEV</tt> | * Dump the forwarding table: <tt>bridge fdb show dev VXLANDEV</tt> | ||
===Offloading=== | |||
Several NICs provide hardware offloading functionality, typically configured with <tt>[[ethtool]]</tt>. | |||
==External links== | ==External links== | ||
* [[Linux]] kernel [https://www.kernel.org/doc/Documentation/networking/vxlan.txt vxlan] documentation | * [[Linux]] kernel [https://www.kernel.org/doc/Documentation/networking/vxlan.txt vxlan] documentation | ||
* Vincent Bernat's "[https://vincent.bernat.ch/en/blog/2017-vxlan-linux VXLAN & Linux]", 2017-05-03 | * Vincent Bernat's "[https://vincent.bernat.ch/en/blog/2017-vxlan-linux VXLAN & Linux]", 2017-05-03 |
Revision as of 09:04, 1 November 2020
The Virtual eXtensible Local Area Network protocol is used to encapsulate virtual Layer 2 networks over Layer 3+4, designed for use among multitenant hypervisors in (potentially multi-DC) cloud networks. It was formalized in 2014's RFC 7348. It avoids use of 802.1D's Spanning Tree Protocol while facilitating a full broadcast domain, superseding 802.1Q VLANs and their 12-bit VLAN IDs (VXLAN uses a 24-bit ID, the VXLAN Network Identifier aka VNI). The virtual layer 2 network thus created is known as a "VXLAN segment" or "VXLAN overlay network". The agents adding or removing VXLAN encapsulation (commonly hypervisors or switches) are referred to as "VXLAN Tunnel Endpoints" or VTEPs, and play roles similar to bridges, learning MACs and selectively forwarding frames.
The clients within a VXLAN segment needn't know that VXLAN is in use, and use standard unicast/broadcast traffic to talk to other hosts within the segment. Upon receipt of a frame, the VTEP looks up the VTEP with which this destination MAC is associated. If the client does not know the destination's L2 address, ARP is performed via normal broadcast. Broadcasts within a VXLAN are carried over a multicast address (multicast also uses this same tree).
VTEPs are not allowed to fragment packets. It is thus important to choose a network MTU which allows the VXLAN header to be inserted without exceeding physical MTUs.
VXLAN runs over udp/4789 by default. VXLANs can be stacked.
VXLAN encapsulation
The outermost header is a standard Ethernet header. The source hardware address is initially set to the originating VTEP's MAC address. The destination hardware address is the hardware address of the destination VTEP, or the router by which said VTEP is reached. 802.1Q tags can be used here as they would in any other case. In the case where routing hops exist between the two VTEPs, the source and destination addresses will change with each hop.
The next header is an IPv4 or IPv6 header with the VTEPs' L3 addresses used as source and dest. These addresses persist across hops. Within is the UDP header and its VXLAN payload. This datagram's destination port is the VTEP's VXLAN port, by default 4789. The source port is arbitrary, though RFC 7348 recommends that it be constructed using a hash over encapsulated headers, over the domain 49152--65535.
The VXLAN payload consists of the VXLAN header, plus the original frame minus its Ethernet FCS. The VXLAN header is 8 bytes:
- 8 bits of flags, RRRRIRRR. All R bits must be 0. The I bit must be 1 for a valid VNI.
- 24 reserved bits. All must be 0.
- 24-bit VNI.
- 8 reserved bits. All must be 0.
The VTEP removes the original FCS, and adds its own.
Use on Linux
Note that Openvswitch has its own, distinct VXLAN implementation. This describes the Linux kernel tunneling device.
- Create the interface using ip: ip l a VXLANDEV type vxlan id VNI dstport DSTPORT
- Add group MCASTIP dev MCASTDEV for a multicast-based VXLAN (see below)
- Destroy the interface with ip l d VXLANDEV
- Get information with ip -d l s VXLANDEV
Forwarding tables
- On Linux, the forwarding tables can be built up two ways:
- Automatically using multicast: supply the multicast IP as the group argument when creating the VXLAN interface
- Manually using bridge: add the entries using bridge fdb add to L2ADDR dst L3ADDR dev VXLANDEV
- Delete an entry: bridge fdb delete L2ADDR dev VXLANDEV
- Dump the forwarding table: bridge fdb show dev VXLANDEV
Offloading
Several NICs provide hardware offloading functionality, typically configured with ethtool.
External links
- Linux kernel vxlan documentation
- Vincent Bernat's "VXLAN & Linux", 2017-05-03