Blog post

Of Overlays and Underlays

By Andrew Lerner | February 18, 2014 | 1 Comment


We used to just call the network a “network”… or if we were being fancy, we’d call it the “network infrastructure”.  Ahhh, simpler times…  Now with the on-set of SDN overlays, the en vogue terminology is now “underlay”.  I actually like the term, but find it funny that we’ve “belittled” the network to the point where it is “just” an underlay.   That said, we are starting to get questions from Gartner clients about overlay/underlay integration.  We are considering writing some research on it (please leave comments if you think it would be valuable as a Gartner client), but in the interim here’s my rough shot at classifying “overlay/underlay” integration.  Look forward to your thoughts/feedback on this…


 Level 1 – No awareness

This is when you’re network infrastructure (underlay) supports IP but has no awareness of the overlay.  So there is no awareness of VMs, and none of the networking devices in the physical switching underlay support the overlay’s tunneling protocol.  The “usual suspect” tunneling protocols include VXLAN and NVGRE, although there are others in the works including NVO3 and as of Friday, Geneve.  At this basic level, the overlay and the underlay are essentially “ships in the night“.    If you have non-virtual devices in your network (i.e., bare-metal servers or network appliances), they need a gateway to terminate the tunnel.  This gateway is really just a piece of software and can come in the form of a VM, physical appliance or as software baked into the bare-metal devices (i.e., VTEP).


 Level 1.5 – VM awareness

This is similar to level 1 but with some additional capability in that the underlay is “VM-aware“.  What this means is that when a VM move/add/change occurs, the network is automatically changed (i.e., VLAN, QOS, ACL, etc.).  The reason I don’t make this its own distinct level is because this capability has been around for several years, prior to the emergence of overlay solutions (on a side note, my colleague Simon Richard {@SimonSDN} would argue that this should be level 0.5 for that reason).  So it isn’t really an overlay/underlay thing, but a server virtualization thing.  However, it provides more functionality so I consider it a higher level and feel it is worth pointing out.  Some of the key benefits here are that you don’t have to a) make manual network changes as VMs are created and moved around the network and b) you don’t have to trunk all VLs to all physical hosts.   An example of this capability is VM-tracer from Arista or VM-FEX/VN-link from Cisco, and most of the other big network vendors (Dell, Brocade, Juniper, etc.) have this capability.


 Level 2 – Basic Traffic Integration

In this level, the physical devices in the underlay can directly terminate the tunnels used to create the overlay.  The switches in the underlay have software and hardware support for VXLAN (or NVGRE) tunnel termination (referred to as a tunnel endpoint;  VTEP for VXLAN).  This is important for bare-metal devices attached to devices in the underlay because you no longer need a separate/dedicated gateway for them (you use the gateway or VTEP that is embedded in the network device instead).  This is important, and simplifies the environment.  This is why you see VXLAN capability offered on so many new switches today (aided by native support in Broadcom’s Trident II chip).  An example of a switch that fits into this category is Cisco’s Nexus 3100.


Level 3 – Extended Traffic Integration

In this level you get the above benefits in underlay switches (VXLAN termination) plus additional control plane functionality.  For example, When VMware announced NSX, there were several NSX switching “integration” partners including Arista, Brocade, Dell, and HP.  That means that these devices support a) tunnel termination (VXLAN) in hardware/software and additional control plane capability such as a distributed MAC database and the OVSDB protocol.  The additional control plane capability does things like reduce the requirement for like MAC flooding and improves efficiency of ARP learning across the fabric.  These benefits increase with scale and size of the network. There are other additional benefits at this level, including further integration such as cross-controller federation in mixed-vendor SDN environments (which is what HP and VMware have done).

Level 4 – Deep Integration

So in terms of commercial product availability this level doesn’t exist in the real-world yet.  It is mostly a theoretical level of integration at this point – but I have faith that it will emerge. The key differentiator here is enhanced visibility and unified management  between overlay and underlay. For example, if there is an issue in the underlay, a management system pinpoints it and signals the overlay and things shift dynamically (and vice versa).  You could argue that Cisco has this level of integration in their Data Center ACI solution by delivering the overlay/underlay coupled together.  However, by coupling them together, they deliver the solution as more of a programmable fabric versus a true separated underlay/overlay.  To date, I haven’t yet seen a truly integrated overlay/underlay solution  that provides this capability yet, which creates opportunity in the market both for networking, virtualization, and and network management vendors…


Curious for your thoughts and opinions…



Leave a Comment

1 Comment

  • Vaibhav Katkade says:

    True that deep-integration will be the way forward for controllers most controllers, leveraging the strength of their core infrastructure product offering. However, the battle will be won not by the vendor which can best integrate “overlay” with the “underlay”, but the vendor which can do this best across network, security, application, collab, etc.