Routing VMotion Traffic in vSphere 6

One of the new features that is officially supported and exposed in the UI for vSphere 6 is routed VMotion traffic.  This blog post will identify what the use cases are, why there was difficulty leading up to vSphere 6, and how vSphere 6 overcomes it.

Use Cases

So why would you want to route Vmotion traffic, anyway?  Truthfully, in the overwhelming majority of cases, you wouldn’t and shouldn’t route Vmotion traffic for various reasons.

Why?  Remember a few facts about Vmotion traffic:

Additional latency, delays, and reduced throughput exponentially reduces vSphere performance.  When a Vmotion operation gets underway, the running contents of memory for a VM is copied from one host to another and changes in the VM’s working set of memory are monitored for changes.  Interative copies continue to copy changes until there is a small enough delta difference that those small differences can be copied very quickly.  Therefore, the longer a Vmotion takes, the more changes in the working set accumulate, and the more changes that accumulate, the longer the operation will take, which invites even more changes to occur during the operation.  Adding an unnecessary hop in the network can only reduce Vmotion performance.  Therefore, if you are within the same datacenter, it is almost certainly the case that routing Vmotion traffic is ill advised at best.  About the only situation I could possibly think this might be a good idea is if you have a very large datacenter with hundreds of hosts, which causes performance deteriortation because of too many broadcasts within a single LAN segment, but you infrequently need to Vmotion VMs between hosts within different clusters.  But you would need A LOT of ESXi hosts that may need to Vmotion between each other before that would make sense.

So when would routed Vmotion traffic make sense?  Vmotioning VMs between datacenters!  Sure, you could stretch the VMotion Layer 2 network between the datacenters with OTV instead, but at that point, you are choosing the lesser of two evils – Vmotioning with a router between hosts in different datacenters, or the inherent perils of stretching an L2 network across sites.  The WAN link will take the bigger toll over an extra hop in the network by far, so there’s no question here the better choice would be to route the Vmotion traffic instead of stretching the Vmotion network between sites.

This is important because cross vCenter VMotioning is now possible, too, and Vmware has enabled additional network portability via other technologies such as NSX, so the need to do this is far greater than in the past, when the only scenario routing Vmotion traffic would make sense is in stretched storage metro clusters and the like.

Why was this a problem in the past?

If you’ve never done stretched metro storage clusters, this may never have occurred to you because there was pretty much never a need to route any kernel port group traffic other than host management traffic.  The fundamental problem was ESXi had a single TCP/IP stack, with one default gateway.  If you followed best practices, you would make multiple kernel port groups to segregate iSCSI, NFS, Vmotion, Fault Tolerance, and Host Management traffic, each in their own segregated VLANs.  You would configure the host’s default gateway as an IP in the host’s management traffic subnet, because you probably shouldn’t route any of that other traffic.  Well, now we need to.  Your only option to do this would be to create static route statements via command line to make this happen on every single host.  As workload mobility increases with vSphere 6 cross-vCenter Vmotion capabilities, NSX, and vCloud Air, this just isn’t a very practical solution.

How does VMware accomplish this in vSphere 6?

Very simple, at least conceptually anyway.  ESXi 6 has the capability of having multiple independent TCP/IP stacks.  By default, there already exists separate TCP/IP stacks for Vmotion and other traffic.  Each can be assigned separate default gateways.

Simple to manage, and configure!  Just configure the stacks appropriately, and ensure your kernel port groups are configured to use the appropriate stack.  Vmotion port groups should use the Vmotion stack, while pretty much everything else should use the default stack.

How cool is that?