Tag Archives: hyperconverged

Nutanix administration do’s and don’ts

As a virtualization consultant, I know there’s a wide variety of technologies at every level – hypervisor, storage, networking, and even server hardware is getting to some degree more complex in terms of what you need to know to manage it effectively.  Everyone can’t be an expert in every single storage technology as an example, and with more and more options that are radically different in their architecture, right now I wanted to make my own little contribution to the world for consultants and admins alike on basic things you should and shouldn’t do with one storage solution – Nutanix.  For us consultants, we often find ourselves within environments with something we’re not totally familiar with, so some helpful concise guidance can go a long way.  Admins, too, may have depended upon a consultant or previous colleagues that no longer work there for implementation and support, but now it’s on them, so I thought this would be helpful.

There are quite a few things everyone should know if they ever are working on a environment with Nutanix that aren’t necessarily obvious.  I can see it being pretty darn easy to blow up a Nutanix environment if you’re not aware of some of these things.

Common stuff

  • Contact Nutanix Support before downgrading licensing or destroying cluster to reclaim licenses (unnecessary if you’re using Starter licensing though). This was repeated many times, so I’m guessing if this isn’t done, you’ll be hating life getting licensing straight.
  • Do NOT delete the Nutanix Controller VM on any Nutanix host (CVM names look like: NTNX-<blockid>-<position>-CVM)
  • Do NOT modify any settings of a Controller VM, all the way down to even the name of the VM.
  • Shutdown/Startup gotchas:
    • It’s probably best to never shutdown/reboot/etc. more than one Nutanix node in a cluster at a time. If you do more, you may cause all hosts in the Nutanix cluster to lose storage connectivity.
    • When shutting down a single host or < the redundancy factor (Nutanix number of hosts it is configured to tolerate failure in a Nutanix cluster), migrate/shutdown all VMs on host EXCEPT the controller VM, THEN shutdown the controller VM.
    • If you are shutting down a number of hosts that exceeds the redundancy factor, you need to shutdown the Nutanix cluster. There’s also a specialized procedure to start up the Nutanix cluster in this situation.  That’s beyond the scope of this email.
    • When booting up a host, do the following:
      • start the Controller VM first that resides on it, and verify it’s services are working by SSH to it using:
        • Ncli cluster status | grep –A 15 <controllerVmIP>
      • Then have it rescan its datastores.
      • Then verify the Nutanix Cluster state using the following to ensure cluster services are all up via same SSH session:
        • cluster status
  • Hypervisor Patching
    • Make sure to patch one hypervisor node and ensure Controller VM comes back up with services are good before proceeding to the next one. Also do one at a time in a Nutanix cluster (see above).
    • Follow shutdown host procedure above.

vSphere

  • NEVER use “Reset System Configuration” command in Nutanix.
  • If resource pools are created, Controller VM (CVM) must have the highest share.
  • Do NOT modify NFS settings.
  • VM swapfile location should be the same folder as the VM. Do NOT place it on a dedicated datastore.
  • Do NOT modify the Controller VM startup/shutdown order.
  • Do NOT modify iSCSI software adapter settings.
  • Do NOT modify vSwitchNutanix standard vSwitch.
  • Do NOT modify Vmk0 interface in port group “Management Network”.
  • Do NOT disable ESXi host SSH.
  • HA configuration recommended settings:
    • Enable admission control and use percentage based policy with value based on number of nodes in cluster
    • Set VM Restart Priority for CVMs to Disabled.
    • Set Host Isolation Response of cluster to Power Off
    • Set Host Isolation Response of CVMs to Leave Powered ON.
    • Disable VM Monitoring for all CVMs
    • Enable Datastore Heartbeating by clicking Select only from my preferred datastores and choosing Nutanix datastores. If cluster has only one datastore (which would be common potentially in Nutanix deployments), add advanced option das.ignoreInsufficientHbDatastore=true to avoid warnings about not having at least two heartbeat datastores.
  • DRS stuff:
    • Disable automation of all CVMs
    • Leave power management disabled (DPM)
  • Enable EVC for lowest processor class in cluster.

Hyper-V

  • Do NOT use Validate Cluster within Failover Clustering nor SCVMM, as it is not supported. Not sure what would happen if you did, but I’m guessing it would be pretty awesome, and you probably should make sure you got popcorn ready if you’re gonna do that.
  • Do NOT modify the Nutanix or Hyper-V cluster name
  • Do NOT modify the external network adapter name
  • Do NOT modify the Nutanix specific virtual switch settings

KVM (the Hypervisor… also assuming this means if you’re using Acropolis Hypervisor from Nutanix since it’s KVM based…)

  • Do NOT modify the Hypervisor configuration, including installed packages
  • Do NOT modify iSCSI settings
  • Do NOT modify the Open vSwitch settings

I hope this proves helpful to people who unexpectedly find themselves working on Nutanix and need a quick primer to ensure they don’t break something!

Evolution of storage – traditional to hyperconverged

These days, there’s been an explosion in diversity of storage options, which often bleed into compute and/or networking when it comes to virtualized architecture.  It used to be that storage was storage, networking was networking, and compute was compute.  And when it came to storage, while architectures differed, what you stored your virtual machines did storage and storage only.  EMC ClARiiON/VNX, NetApp Filers, iSCSI targets like LeftHand, Compellent, EqualLogic, etc.  These were all storage and storage only.

Some of these added SSD as permanent storage type disks and/or as an expanded caching tier.  We also saw the emergence of all flash storage arrays that attempted to make the most of SSD using technologies like compression and deduplication to overcome the inherent weakness of SSD of high cost per unit of storage.  These arrays often are architectured from the ground up to work best with SSD, taking into account garbage collection needed to reuse space in SSD.

But these are also all storage only type devices.

Over time, that’s changed.  We now have converged infrastructure, such as VCE and Flexpod, but those typically still use devices dedicated for storage.  VCE VBlock and VxRack use EMC arrays.  FlexPod uses NetApp filers.  These are prepackaged validated designs built in factory, but still use traditional type storage arrays.

Keep in mind I don’t think there’s inherently anything wrong with this or any of these architectures.  I’m just laying the framework down to describe the different options available.

Now, we do have options that truly move away from the concept of buying a dedicated storage array, called Hyperconverged.  They’re still shared storage in the sense that your VMs can be automatically restarted on a different host should the host they are running goes down.  There’s still (when architected and configured properly) no single point of failure.  But this category doesn’t use a dedicated storage device.  Instead, it utilizes effectively local storage/DAS connected to multiple compute units pooled together with special sauce to turn this storage into highly available, scalable storage, usually for use with virtualization.  In fact, many only work with virtualization.  These tend to use commodity type hardware in terms of x86 processors, RAM, and disk types, although many companies sell their own hardware with these components in them, and/or work with server hardware partners to build their hardware for them.

The common thread between them though is you’re not buying a storage array.  You’re buying compute + storage + special sauce software when you comprise the total solution.

These options are for example Nutanix, VMware VSAN (or EVO:RAIL that utilizes it), Simplivity, ScaleIO, and you will see more emerging, and plenty I didn’t mention just because I’m not intending that to be a definitive list.

While there are good solutions in each of these types of storage arrays, none of the types are perfect.  None of these types work best for everyone, despite what any technical marketing will try to tell you.

So while there are more good choices to choose from than there ever has been in storage, it’s also harder to choose a storage product than it ever has been.  My goal in these posts are to lay a foundation to help understand these different options, which might help people sort through them better.