Tag Archives: troubleshooting

VMware network test commands

I recently ran into an issue with vSphere Replication that involved network connectivity (probably a future post), and I quickly realized that VMware network test commands are not consistent across all their products, so this could be confusing for many people.  I’ll update this post later as I get the commands for other products, but this may help someone looking for how to do VMware network testing and troubleshooting.

ESXi

ESXi has two helpful commands.  For basic connectivity tests, vmkping is awesome because it’s simple to use and to specify which kernel port group you want to test.  Sure, you could use ping, but you can’t specify which vmk interface with it.

To ping 192.168.1.1 with your Management Port group, assuming it’s default, so it’s using vmk0, it’s simply:

vmkping 192.168.1.1 -I vmk0

Another good use is validating jumbo frames, as you can specify the packet size as well and disable packet fragmentation.  To conduct the same test with a packet size of 9000 and ensure the packet doesn’t get fragmented:

vmkping 192.168.1.1 -I vmk0 -s 9000 -d

For testing specific port connectivity, ESXi does support the netcat, aka nc command.  To test port 80 on destination 192.168.1.1:

nc -z 192.168.1.1 80

You can specify UDP mode using -u as well.  Note that at least in my experience -s <source IP> does NOT work, so I don’t believe it’s possible to specifically direct netcat through a specific vmkernel port.  When I tried it for example forcing it through an IP that shouldn’t work, connectivity was still made when it shouldn’t have.

Any VMware Product Running on Windows 2012 or Higher (vCenter, SRM)

Everybody knows ping.  I’m not gonna go over that.  But did you know that PowerShell has a ping cmdlet?  This is useful for documentation of results, using export-csv, and scripting lots of ping tests.

To ping 192.168.1.1:

test-connection 192.168.1.1

Another handy trick is you can remotely have multiple Windows machines ping the same computer and/or specify multiple targets.  For example, if I want server1, server2, to ping 192.168.1.1 and 192.168.1.2:

test-connection -Source Server1,Server2 -ComputerName server3,server4

PowerShell also has cmdlets to test network port connectivity as well.  To test if the local machine can connect to 192.168.1.1 on TCP port 80:

test-netconnection -computername 192.168.1.1 -InformationLevel detailed -port 80

Unfortunately, there isn’t a handy -source parameter, but you could use PowerShell remoting to run this command on multiple remote computers, too.

VMware vCenter Server Appliance

For pinging, there’s the ping command.  That’s easy enough.

If you try to use netcat for port testing, it isn’t there by default.  You have to run the following to temporarily install it on version 6:

/etc/vmware/gss-support/install.sh

Rebooting the VCSA removes it.

You can also use curl if that’s something you’d rather not do:

curl -v telnet://192.168.1.1:80

vSphere Replication Appliance

For pinging, there’s the ping command.  No surprises.

For network port testing, again, netcat isn’t installed, nor is there a supported way to install it to my knowledge.  Instead, use the curl command:

curl -v telnet://192.168.1.1:80

Keep checking back, as I add more.

HP NC375T NICs are drunk, should go home

I ran into one of the most bizarre issues I’ve ever encountered in my decade of experience with VMware this past week.

I was conducting a health check of a customer’s vSphere 5.5 environment, and found that the servers were deployed with 8 NICs, but only 4 were wired up.  While the customer was running FC for storage, 4 NICs isn’t enough redundantly segregate VMotion, VM, and Management traffic, and the customer was complaining about VM performance issues when VMotioning VMs around.  The plan was to wire up the extra add-on NIC ports, and take a port each from the quadport onboard NIC and the add-on HP NC375T.

So first, I looked to see if I had the right driver and firmware installed for this NIC according to VMware’s compatibility list guide.  The driver was good, but commands to determine the firmware wouldn’t provide any info.  Also curious was the fact that this NIC was showing up as additional ports for the onboard Broadcom NIC.  FYI, this server is an HP DL380 Gen7, a bit older but still supported server for VMware vSphere 5.5.

At this point, I wanted to see if the onboard NIC would function, so I went to add the NICs into a new vSwitch.  Interestingly enough, the NICs did not show up as available NICs to add.  However, if I plugged the NICs in and just looked at the Network Adapters info, the NICs showed up there and even reported their connection state accurately.  I tried rebooting the server, same result.  One other server was identical, so I tried the same on that one, same exact behavior – they reported as ports that were part of the onboard NIC, commands to list the firmware version did not work, you could not add them into any vSwitch, but the connection status info reported accurately under the Network Adapters section of the vSphere console.

At this point, I was partly intrigued and enraged, because accomplishing this network reconfiguration shouldn’t be a big deal.  I put the original host I was working on in maintenance mode, evacuated all the VMs, and powered it off.  I reseated the card, powered it back on, and I got the same exact results.  I powered it off, removed the add-on NIC, and powered it back on, expecting to see the NIC ports gone, and they were, along with the first two onboard NIC ports!

This was, and still is, utterly baffling to me.  I did some more research, thinking this HP NC375T must be a Broadcom NIC since it’s messing with the onboard Broadcom adapter in mysterious ways, but nope!  It’s a rebadged Qlogic!  I reboot it, same result.  Cold boot it, same result.  I put the NIC back in, and the add-on NIC ports AND the two onboard NICs come back, all listed as part of the onboard Broadcom NIC!

I researched the NC375T for probably over an hour at this point, finding people having other weird problems, some of them fixed by firmware upgrades.  It took 45 minutes to actually find a spot on HP’s site to download drivers and firmware, but the firmware VMware and everyone else who had issues with this card swore you better be running to have any prayer of stability was not available.  I tried their FTP site, I tried Qlogic’s site, no dice.  I recommended to the customer that we should probably replace these cards since they’re poorly supported, and people were having so many problems, AND we were seeing the most absolutely bizarre behavior I’ve ever seen with a NIC.  The customer agreed, but we needed to get this host back to working again with the four NICs until we could get the replacement NIC cards.

At this point, I had a purely instinctual voice out of nowhere come in to my head and say, “You should pull the NIC out and reset the BIOS to defaults.”  To which, I replied, “Thanks weird oddly technically knowledgeable voice.”

And sure enough, it worked.  All onboard NIC ports were visible again.  Weird!  Just for fun, I stuck the NC375T back in.  What do you know, it was now listed as it’s own separate NIC, not a part of the onboard Broadcom adapter, AND I could add it to a vSwitch if I wanted, AND I could run commands to get the firmware version, which confirmed it was nowhere near the supported version for vSphere 5.5.

In the end, the customer still wanted these NICs replaced, which I was totally onboard with at this point, too, for many obvious reasons.

So, in conclusion, HP NC375T adapters are drunk, and should go home!