Tag Archives: Hyper-V

Nutanix administration do’s and don’ts

As a virtualization consultant, I know there’s a wide variety of technologies at every level – hypervisor, storage, networking, and even server hardware is getting to some degree more complex in terms of what you need to know to manage it effectively.  Everyone can’t be an expert in every single storage technology as an example, and with more and more options that are radically different in their architecture, right now I wanted to make my own little contribution to the world for consultants and admins alike on basic things you should and shouldn’t do with one storage solution – Nutanix.  For us consultants, we often find ourselves within environments with something we’re not totally familiar with, so some helpful concise guidance can go a long way.  Admins, too, may have depended upon a consultant or previous colleagues that no longer work there for implementation and support, but now it’s on them, so I thought this would be helpful.

There are quite a few things everyone should know if they ever are working on a environment with Nutanix that aren’t necessarily obvious.  I can see it being pretty darn easy to blow up a Nutanix environment if you’re not aware of some of these things.

Common stuff

  • Contact Nutanix Support before downgrading licensing or destroying cluster to reclaim licenses (unnecessary if you’re using Starter licensing though). This was repeated many times, so I’m guessing if this isn’t done, you’ll be hating life getting licensing straight.
  • Do NOT delete the Nutanix Controller VM on any Nutanix host (CVM names look like: NTNX-<blockid>-<position>-CVM)
  • Do NOT modify any settings of a Controller VM, all the way down to even the name of the VM.
  • Shutdown/Startup gotchas:
    • It’s probably best to never shutdown/reboot/etc. more than one Nutanix node in a cluster at a time. If you do more, you may cause all hosts in the Nutanix cluster to lose storage connectivity.
    • When shutting down a single host or < the redundancy factor (Nutanix number of hosts it is configured to tolerate failure in a Nutanix cluster), migrate/shutdown all VMs on host EXCEPT the controller VM, THEN shutdown the controller VM.
    • If you are shutting down a number of hosts that exceeds the redundancy factor, you need to shutdown the Nutanix cluster. There’s also a specialized procedure to start up the Nutanix cluster in this situation.  That’s beyond the scope of this email.
    • When booting up a host, do the following:
      • start the Controller VM first that resides on it, and verify it’s services are working by SSH to it using:
        • Ncli cluster status | grep –A 15 <controllerVmIP>
      • Then have it rescan its datastores.
      • Then verify the Nutanix Cluster state using the following to ensure cluster services are all up via same SSH session:
        • cluster status
  • Hypervisor Patching
    • Make sure to patch one hypervisor node and ensure Controller VM comes back up with services are good before proceeding to the next one. Also do one at a time in a Nutanix cluster (see above).
    • Follow shutdown host procedure above.

vSphere

  • NEVER use “Reset System Configuration” command in Nutanix.
  • If resource pools are created, Controller VM (CVM) must have the highest share.
  • Do NOT modify NFS settings.
  • VM swapfile location should be the same folder as the VM. Do NOT place it on a dedicated datastore.
  • Do NOT modify the Controller VM startup/shutdown order.
  • Do NOT modify iSCSI software adapter settings.
  • Do NOT modify vSwitchNutanix standard vSwitch.
  • Do NOT modify Vmk0 interface in port group “Management Network”.
  • Do NOT disable ESXi host SSH.
  • HA configuration recommended settings:
    • Enable admission control and use percentage based policy with value based on number of nodes in cluster
    • Set VM Restart Priority for CVMs to Disabled.
    • Set Host Isolation Response of cluster to Power Off
    • Set Host Isolation Response of CVMs to Leave Powered ON.
    • Disable VM Monitoring for all CVMs
    • Enable Datastore Heartbeating by clicking Select only from my preferred datastores and choosing Nutanix datastores. If cluster has only one datastore (which would be common potentially in Nutanix deployments), add advanced option das.ignoreInsufficientHbDatastore=true to avoid warnings about not having at least two heartbeat datastores.
  • DRS stuff:
    • Disable automation of all CVMs
    • Leave power management disabled (DPM)
  • Enable EVC for lowest processor class in cluster.

Hyper-V

  • Do NOT use Validate Cluster within Failover Clustering nor SCVMM, as it is not supported. Not sure what would happen if you did, but I’m guessing it would be pretty awesome, and you probably should make sure you got popcorn ready if you’re gonna do that.
  • Do NOT modify the Nutanix or Hyper-V cluster name
  • Do NOT modify the external network adapter name
  • Do NOT modify the Nutanix specific virtual switch settings

KVM (the Hypervisor… also assuming this means if you’re using Acropolis Hypervisor from Nutanix since it’s KVM based…)

  • Do NOT modify the Hypervisor configuration, including installed packages
  • Do NOT modify iSCSI settings
  • Do NOT modify the Open vSwitch settings

I hope this proves helpful to people who unexpectedly find themselves working on Nutanix and need a quick primer to ensure they don’t break something!

Hyper-V 2012 R2 not able to form cluster

Ran into an interesting problem with a colleague.  He was trying to form a basic Hyper-V cluster on Windows Server 2012 R2, but kept getting the following error:

Event ID: 1570
Source: Microsoft-Windows-FailoverClustering
Event Details:
Node 'Host1' failed to establish a communication session while joining the cluster.  This is due to an authentication failure.  Please verify that the nodes are running compatible versions of the cluster service software.

We verified DNS settings, disjoined and rejoined Active Directory, verified the host’s computer account was valid, time sync with the domain was good, rights of his account to form the cluster were sufficient, validated the nodes for clustering, and more.

At that point, we began looking at GPO policy settings like “Access this computer from the network”, and noticed that Authenticated Users was not in there.  Simply adding Authenticated Users and refreshing the GPO on the cluster nodes resolved the issue.

Be careful making changes to these types of settings.  While Authenticated Users may seem like a group you would want to remove from a policy like that, it’ll often cause problems down the road.

Getting Cisco UCS drivers right with Windows

I’ve already mentioned one pain point with Cisco UCS – drivers – in my previous post concerning vSphere, but the same thing applies to other environments, including Windows servers. You better have the EXACT version Cisco wants for your specific environment. But how do you know which drivers to get, how do you get them, how do you know when you need to upgrade them, and how do you know what drivers you have installed? This post applies to Windows Server, which by extension, includes Hyper-V.

Why is getting the drivers so important?

I want to emphasize that getting the exact right version of Cisco UCS drivers is a big deal! I’ve personally now seen two vSphere environments that had issues because the drivers were not exactly correct. The best part is the issues never turned up during a testing of the environment. Just weird intermittent issues like bad performance, or VMs needing consolidation after backups, or a VM hangs out of nowhere a week or two down the road. Make sure you get the drivers exactly right!  I don’t work with Windows Servers on bare metal nearly as much as VMware, but I’m sure getting the drivers right is equally, if not more, crucial.

How do I install Windows Server 2012 on Cisco UCS?

You have two choices.  One is create a Windows Server installation ISO with the drivers slipstreamed into it, or you can insert the driver image during the routine to install the proper storage driver to see your storage to which you’ll install Windows, which is available from Cisco for download. Also, at least currently, remember that Cisco UCS does not support Windows in a boot from SAN configuration using FCoE.

Remember however you’re still not done.  You’ll need to still update the network card drivers.

How do I know which drivers should be installed?

This is relatively simple. First, collect some info about your Cisco UCS environment. You need to know these (don’t worry, if you don’t know what info you need, Cisco’s Interoperability page will walk you through it):
1. Standalone C-Series not managed by UCSM or UCSM managed B and/or C-Series? For those of you who don’t know, if you got blades, you got UCSM.
2. If UCSM is present, which version is it running? Ex. 2.2(3c)
3. Which server model(s) are present? Ex. B200-M3. Also note the processor type (ex. Xeon E5-2600-v2). They can get picky about that.
4. What OS and major version? Note there is a difference between support for Server 2012 and 2012 R2.  Cisco at least at the time of this blog post does not change support depending upon installed Service Packs.
5. What type and model of I/O cards do you have in your servers? Example – CNA, model VIC-1240

Then head on over to the Interoperability Matrix site.  Fill in your info, and you get a clear version of the driver and firmware.

ucswindriverlookup

It’s very straightforward to know which drivers are needed from that.

How do I figure out which drivers are installed?

 

You can do this one of two ways.  You can manually check them via Device Manager, or you can use PowerShell.  I’m assuming everyone knows how to check these with Device Manager.  To use PowerShell, use the following:

Get-WmiObject Win32_PnPSignedDriver | select-object devicename,driverversion

 

Note that you can use the -ComputerName parameter to check a remote system, or even a PowerShell array of remote systems for their drivers easily.

How do I update Cisco UCS drivers?

You need to go to Cisco’s support page for UCS downloads, and download the driver ISO that has your driver, which is typically just the driver ISO with the same version as UCS Manager.  For example, if you’re running 2.2.5(a), you need the 2.2.5 driver ISO.

Next, use the virtual media option within the Cisco KVM, mount the ISO, so it’s ready to go.  You could also extract the contents somewhere on the server, it doesn’t matter.

Next, login to the server, pull up Device Manager, and locate the adapter instances you need to update.  If you’re using a VIC, which is pretty much everyone, you need to find the relevant storage and network adapters.  This example, the customer wasn’t using the FC functionality, so we’re just doing the ENIC devices.

updatingwinucsdrivers

When the dialogue comes up, browse to select the zip file that was *contained* in the original zip file. If you select just the zip file you downloaded itself, it will fail. Repeat for the fnic and enic drivers.

Double click one of the VIC instances, click the Driver tab, note the currently installed driver version, and then Update Driver…

Next, click Browse my computer for driver software.

Next, click “Let me pick from a list of device drivers on my computer”.  Don’t even bother trying any other options, it will continue to want to install the old driver because… REASONS!!!

Next, click Have Disk… and browse on the CD to the right folder for the hardware and OS you’re running.  This is a Server 2012 non-R2 server, so as an example, it’s under Windows\Network\Cisco\VIC\W2K12\x64.

After the driver installs, verify the new driver version is now showing in Device Manager.  Unfortunately, you’re not done yet.  You gotta update the drivers on every other instance, but it’s a little easier for the rest.

For the other instances, double click each one, click Update Drivers…, “Browse my computer for driver software”, “Let me pick from a list of drive drivers on my computer”, and you will see both the old and new driver versions.  Pick the new one, and click Next.

updatingwinucsdrivers2

Rinse and repeat for all instances, and ensure the driver tab reflects the proper new version.  Remember that FNIC HBA drivers will need to be updated on those instances separately under Storage Controllers.

I took a quick look to see if Microsoft made some device driver PowerShell cmdlets, but unfortunately I don’t see any at this time.

When should I check my drivers?

You should do this during any of the following:

• During initial deployments
• When UCS Manager or a standalone C-Series BIOS is updated
• If you update to a new major OS, although I’d check when planning to install Service Packs to Windows as well.

Also, remember, newer drivers aren’t the right drivers necessarily. Check the matrix for what the customer is or will be running to see which drivers should go along with it!