Tag Archives: vsphere6

Resolving VM MAC Conflict alarm with Veeam Replicas

It’s been awhile since I’ve deployed Veeam using replication with vSphere 6.0.  I recently implemented it for a customer who was replicating VMs to a secondary storage appliance in addition to backing the VMs up to a Data Domain.  Upon running the initial replication for the VM, a “VM MAC Conflict” alarm triggered on the replica VM.

vm mac conflict alarm triggered

Here’s a description of what’s going on and how to prevent the VM MAC Conflict alarm from triggering.

VM MAC Conflict Alarm

The VM MAC Conflict alarm is new to vCenter 6.0 Update 1a.  The intent of the alarm is to warn you if two vNICs on VMs within a vCenter instance have the same MAC address.  This can happen for a variety of reasons:

  1. vCenter malfunctioned and dynamically provided the same MAC address to two or more vNICs.
  2. Either intentionally or mistakenly, an admin or a third party product might have statically assigned a MAC address already in use within the environment.  In this case, Veeam created a copy of the VNX file with identical MAC addresses for the source and replica VM’s vNICs.

It’s a good alarm to have to notify you just in case.  But how do you keep this alarm while stopping it from triggering on replica VMs?

Stopping VM MAC Conflict Alarms from triggering for Veeam Replicas

The solution for preserving the VM MAC Conflict alarm while stopping it from triggering on Veeam replicas is quite simple.  You can modify the alarm itself by setting an exception to exclude VMs.  In the case of Veeam replicas, they have a “_replica” suffix within the VM name by default.  If you changed that suffix in the replica job, just adjust accordingly.

Go to the VM MAC Conflict alarm definition.  It’s in the vCenter inventory object under Manage > Alarm Definitions.  Click the alarm and on the right, click Edit.

Under the bottom box that reads, “The following conditions must be satisfied for the trigger to fire”, add a condition that says the VM name does not end with “_replica”.  Once applied, the alarm disappears for your replica VMs.

vm mac conflict alarm modified

That’s it!

Compromised vSphere 6.0 Certificates – Part 3

This is the final part of my series on how to deal with compromised vSphere 6.0 certificates.  If you are coming here first, I highly recommend reading:

Compromised vSphere 6.0 Certificates – Part 1

Compromised vSphere 6.0 Certificates – Part 2

We pick up where we left off in Part 2.  The scenario is there are suspected compromised vSphere 6.0 certificates in your environment that were provided to vCenter issued either via a root certificate within the Platform Services Controller (PSC), or the PSC generated certificates with an installed intermediate certificate from an external Public Key Infrastructure (PKI).  Again, VMware does not support certificate revocation when its PSC automatically generated the cert using either its own root or via the external PKI’s issued intermediate.  You must then regenerate all certs.

At this point, I assume certificates were regenerated for the PSC’s root or the intermediate certificate, along with all vCenter server certificates.  I outlined that process in Part 2.  Now, the question is what to do about everything else when dealing with compromised vSPhere 6.0 certificates?  What about ESXi servers?  What about external products that plug into vCenter like VMware Update Manager?  NSX Manager?  vRealize Operations Manager?

Let’s get to it!

Compromised vSphere 6.0 Certificates – ESXi servers

After resetting all the certificates within the PSC and vCenter, I have good news when it comes to your ESXi servers.  They won’t have any problems.  Resetting VCSA certificates has nothing to do with ESXi servers because they do not obtain certificates from VCSA, nor do ESXi servers have any trusts of the certificates that were reset.

Yay!

Compromised vSphere 6.0 Certificates – Most external vCenter dependent solutions from VMware

In most cases, with it comes to external vCenter products that establish relationships with vCenter, these products do often establish a trust of one of the certificates that were reset.  However, they do not obtain certificates from a PSC themselves.  Therefore, you need to fix these products by simply establishing trust with the new certificate that is now installed with vCenter.   And in most cases, this is as easy as it was when you registered it with vCenter in the first place.

I’m not going to show step by step of every product, as I don’t have the time.  I will however come back and update this post if/when I need to do this with various products.  I am going to use vRealize Operations Manager as an example of a typical product that is fixed the same basic way.

vRealize Operations Manager

Here is what vCOPS looks like following the PSC/vCenter certificate resets:

compromised vsphere 6.0 certificates vcops statusNote we are checking the same place you go to register the product with vCenter in the first place (Appliance portal > Solutions > VMware vSphere).  If we were talking about NSX, you would login to the NSX Manager’s direct portal and navigate to Manage Appliance Settings > NSX Management Service > Configure.  This is the same basic concept, even though where you navigate might be different on the product.

Go into settings and re-establish the connection.  In vCOPS, that means clicking the gear on that page, and on both the vCenter Adapter and vCenter Python Actions Adapter, click “Test Connection”.  Low and behold, a pop up comes up to ask if you wish to trust a new certificate it doesn’t yet trust.

compromised vSphere 6.0 certificates trust new certs

If you click OK to trust, vCOPS adds the new certificate to its trusted store.  However, you get an error that you effectively can’t trust two certificates for the same object.

compromised vSphere 6.0 certificates vCOPS trust new cert error

I show this just in case other products share similar behavior.  Delete the old trusted certificate from the appliance.

In this case, navigate to Certificates, and delete the trusted certificate.

compromised vSphere 6.0 certificates vCOPS delete old untrusted cert

Hovering over a column gives more specific info for that cert, which can help identify which certificate to delete.

Then, go back and issue the Test Connection command.

Stop and start the collections on the Solutions page as needed.

compromised vSphere 6.0 certificates vCOPS data receiving

Click refresh and wait to ensure  “Data Receiving” is shown for the collection status.  Otherwise, vCOPS is not functioning.

Other products will have their idiosyncrisies, but they have the same basic concept of establishing trust for the new vCenter certificate.  You perform this process pretty much where you registered the product with vCenter in the first place.

Compromised vSphere 6.0 Certificates – Abnormal external vCenter dependent solutions from VMware

Some products need specialized procedures to trust the new certificates installed in your vCenter/PSC servers.  Here are all of the ones so far I’ve run into, and how to fix those:

vCenter Update Manager (VUM)

I’m hardly shocked VUM would need a specialized procedure.  VUM runs on a Windows OS only.  It remains 32-bit, as opposed to almost every other VMware product.  Plus, it has had esoteric procedures when it came to certificates for a long time.

Navigating to an impacted VUM server within the vSphere Client nets you a pretty immediate error that clearly shows a problem with the SSL certificate.

compromised vSphere 6.0 certificates VUM SSL error

“sysimage.fault.SSLCertificateError”

Time to fix the trust of the new certificate!

First, remote into the VUM server.

Next, run the VMware vSphere Update Manager Utility under the installation directory for VUM (X:\Program Files (x86)\VMware\Infrastructure\Update Manager\VMwareUpdateManagerUtility.exe, where X is the drive in which you installed the VUM binaries).  Login, and select to re-register with vCenter.

compromised vSphere 6.0 certificates reregister VUM

Of course, restart vSphere Update Manager service to complete the process.

The error will go away, and VUM will function again.

Summary

Hopefully, this gives everyone enough info to complete the process or point them in the right direction.  If you have any insights to other products I didn’t cover, please post in the comments!  As I try more products, I will also update this article.

Thanks for reading!

Compromised vSphere 6.0 Certificates – Part 2

In this second blog article, I discuss what to do with compromised vSphere 6.0 Certificates issues by a PSC to vSphere components.  As mentioned in the previous blog article, you cannot revoke certificates issued by a PSC either using an installed intermediate certificate from an external CA or using its own root.  You must regenerate all certificates instead.

FYI, this post assumes you’re using the VCSA.  Windows installable vCenter is nearly identical, aside from the path to Certificate Manager.

Compromised vSphere 6.0 Certificates – Embedded PSC With Own Root Certificate

If you have compromised vSphere 6.0 certificates automatically generated from an embedded PSC, you must regenerate all certificates.  Yes, you must regenerate even certificates you don’t suspect, too.

To do this:

  1. Login as root into your embedded vCenter server via console, SSH, etc.
  2. Enter into shell.  If you didn’t enable shell via the console, you can run “shell.set –enable True” and then run “shell”.
  3. Run the certificate manager utility.  For the VCSA, you simply run /usr/lib/vmware-vmca/bin/certificate-manager
  4. Select option 4 – Regenerate a new VMCA Root Certificate and replace all certificates.
  5. Certificate Manager asks for various pieces of information for each certificate regeneration such as the country, organization, OrgUnit, State, Locality, email, etc. These are cosmetic values mostly, and are only visible if someone really examines the certificate.  Functionally, they make no difference.  However, I wanted to call your attention to a couple of things that are very important. It is VERY CRITICAL you do the following for each certificate, or else the process will fail!
    1. There is a bug in the certificate automation tool, where if you answer identical values for all questions asked, the same certificate will be generated for that cert.  You’ll notice there are multiple certs that end up being regenerated.  You can tell which one is being regenerated with the following line: “Please configure root.cfg with proper values before proceeding to next step.”  That means the root certificate is being regenerated.  You’ll see various certs as well like “machine”, “machine-ssl”, “vpxd.cfg”, etc.   Each one of these certs must actually be unique.  Ensure that you give at least some different value for one of the questions asked for every cert regenerated for a server.  By far, the easiest way to do this is to answer the following question uniquely for every cert: “Enter proper value for ‘Name’ [Default value : CA]”  Simply name it an abbreviated name of the server and the certificate name.  In this case, you could call it “VC-ROOTCFG”.  Answering every other question identically won’t hurt.
    2. One question that is more than cosmetic that you must answer correctly is: “Enter proper value for ‘Hostname’ [Enter valid Fully Qualified Domain Name(FQDN), For Example : example.domain.com]”.  Make sure this is the actual DNS name for the vCenter server.
  6. When prompted afterregenerating all certificates, stop and start all services using:
    1. service-control –stop –all
    2. service-control –start –all
  7. I would recommend rebooting your vCenter server now.
  8. Download your root certificate again and reimport into GPO or however you established trust on the clients for the root originally.
  9. Fix all trust issues with external products.  (See part 3 of this series!)

This is probably the one time you might actually want an embedded PSC for vCenter.  This is far simpler than if you have an external PSC.  (I still recommend external PSC’s in all cases for the record!!!)

Compromised vSphere 6.0 Certificates – External PSC(s) With Own Root Certificate

This is somewhat similar.  However, keep in mind each PSC is a CA.  Therefore, you probably should do this on every PSC that’s a part of the same environment if you suspect certificate(s) have been compromised.

To do this:

  1. Login as root into your external PSC server via console or SSH.
  2. Enter into shell.  If you didn’t enable shell via the console, you can run “shell.set –enable True” and then run “shell”.
  3. Run the certificate manager utility.  For the VCSA, you simply run /usr/lib/vmware-vmca/bin/certificate-manager
  4. Select option 4 – Regenerate a new VMCA Root Certificate and replace all certificates.
  5. Certificate Manager asks for various pieces of information for each certificate regeneration such as the country, organization, OrgUnit, State, Locality, email, etc. These are cosmetic values mostly, and are only visible if someone really examines the certificate.  Functionally, they make no difference.  However, I wanted to call your attention to a couple of things that are very important. It is VERY CRITICAL you do the following for each certificate, or else the process will fail!
    1. There is a bug in the certificate automation tool, where if you answer identical values for all questions asked, the same certificate will be generated for that cert.  You’ll notice there are multiple certs that end up being regenerated.  You can tell which one is being regenerated with the following line: “Please configure root.cfg with proper values before proceeding to next step.”  That means the root certificate is being regenerated.  You’ll see various certs as well like “machine”, “machine-ssl”, “vpxd.cfg”, etc.   Each one of these certs must actually be unique.  Ensure that you give at least some different value for one of the questions asked for every cert regenerated for a server.  By far, the easiest way to do this is to answer the following question uniquely for every cert: “Enter proper value for ‘Name’ [Default value : CA]”  Simply name it an abbreviated name of the server and the certificate name.  In this case, you could call it “PSC1-ROOTCFG”.  Answering every other question identically won’t hurt.
    2. One question that is more than cosmetic that you must answer correctly is: “Enter proper value for ‘Hostname’ [Enter valid Fully Qualified Domain Name(FQDN), For Example : example.domain.com]”.  Make sure this is the actual DNS name for this server.  Even if it asks for a cert for the web client, do NOT put in the name of the vCenter server.  It will also ask for an optional IP address.  Obviously, if you input one, make sure it’s the correct one.
  6. When prompted after regenerating all certificates, stop and start all services using:
    1. service-control –stop –all
    2. service-control –start –all
  7. I recommend rebooting the machine when you’ve completed this.
  8. To verify the PSC cert reset worked, attempt to go to https://FQDNofPSC.domain.com/psc to ensure you get a login prompt.  If you don’t, the certificate reset failed.  Stop and redo this portion again.  You likely didn’t provide some kind of different answer to one of the questions for each certificate to make them unique.
  9. Run Certificate Manager on your vCenter server(s).  Here’s where it gets weird.  VMware says you should run Option 3 – Replace Machine SSL certificater with VMCA Certificate and answer the questions.  Next, run Option 6 – Replace Solution user certificates with VMCA certificates.  That didn’t work for me.  The only way I could get it to work is run Option 8 – Reset all certificates.  That’s the only way I could get it to work.  I found another oddity.  During this process, you are asked: “Performing operation on distributed setup, Please provide valid Infrastructure Server IP.”  If I entered an IP address, and did the rest correctly (remember to answer the questions but provide a different value for name for each certificate!), the process would kick off, get stuck at a long time here and eventually fail:Status : 85% Completed [starting services…]
    Error while starting services, please see log for more details
    Status : 0% Completed [Operation failed, performing automatic rollback]

    Error while replacing Machine SSL Cert, please see /var/log/vmware/vmcad/certificate-manager.log for more information.

    Then the certificates would roll back.  Enter the FQDN of one of your PSC servers instead!  That allows it to continue.

  10. Download your root certificate again and re-import into GPO or however you established trust on the clients for the root originally.
  11. Fix all trust issues with external products.  (See part 3 of this series!)

This is far more complicated than the first one, but it’s probably the one you’re more likely to need to do.

Compromised vSphere 6.0 Certificates – Intermediate CA

If you installed a now compromised intermediate CA certificate, revoke the intermediate certificate within the external PKI.  You should then request and install a new intermediate certificate within the PSC.  Then proceed with regenerating certificates for all other components. (See above…)

And that’s how you deal with compromised vSphere 6.0 Certificates.  In part 3, I’ll delve into how to fix trust issues with various products that might arise from regenerating these certificates.

Compromised vSphere 6.0 Certificates – Part 1

As I alluded to in a previous post, I’ve been needing to do some more in depth testing in relation to vSphere 6.0, which I run in VMware Workstation.  Now the cat is out of the bag!  I’m running through scenarios about what with compromised vSphere 6.0 certificates.

After scouring the internet, there are plenty of blog articles about how to go with various certificate management models.  There’s not a lot of information what to do if suspect a compromised vSphere 6.0 certificate.

I wanted to cover the basics here discuss more specifics in later articles.  I’m going to start with a short introduction to vSphere 6.0 Certificate Management, and then the implications of each when it comes to a compromised certificate.

vSphere 6.0 Certificate Management Basics

I’ve posted about this in the past.  Here’s a VERY quick recap.

  1. You can have the Platform Services Controller (PSC) act as a root Certificate Authority (CA), and hand out certificates automatically to other vSphere components automatically, which is the easiest to implement and manage.
  2. You can have the PSC act as an intermediate CA, and issue certificates using the intermediate certificate you install automatically.  This is arguably the second easiest to implement and manage.
  3. You can generate Certificate Service Requests to an external CA manually for various vSphere components, and install those certificates manually.  This is arguably the hardest to implement and manage.

One other note here is you can mix and match these options.  This is usually implemented by having non-client internal vSphere component certificates be issued by the PSC, and client facing certs such as the cert for the vSphere Web Client be issued by an external CA.

Again, the above information is not intended to be a primer for certificate management in vSphere 6.0.  It’s only to facilitate discussion about what to do if a certificate has been compromised.

Dealing with Compromised vSphere 6.0 Certificates Issued by an External CA

One advantage of using an external certificate authority to issue the certificates for vSphere is the support for certificate revocation.  If any certificate is compromised that was issued by an external CA, you can simply within that PKI environment revoke the certificate.  Replacing compromised vSphere 6.0 certificates is done the same way the certs was acquired in the first place.

The basic steps would be:

  1. Revoke the suspected compromised certificate within the PKI.
  2. Go through the process of obtaining a new certificate, and install it.
  3. Fix any trust issues that may occur with the new certificate.  For example, you must  manually fix VUM when you change a vCenter certificate.

It’s not so straightforward if the PSC generated the certificate you suspect is compromised.

Dealing with Compromised vSphere 6.0 Certificates Issued by a PSC

For all intents and purposes, it doesn’t really matter if compromised vSphere 6.0 certificates were issued by a PSC using its own root certificate or using an installed intermediate certificate obtained from an external CA.  If the PSC in the end generated the certificate used by a vSphere component, any vSphere component, certificate revocation is not supported.

If you suspect a certificate has been compromised, you have no choice but to regenerate all certificates, even certificates that you don’t expect to be compromised.  This should certainly be considered prior to deciding upon which model to use for vSphere 6.0 Certificate Management.

The basic steps would be:

  1. Run the Certificate Management Utility on the PSC in question to regenerate all certificates.  If any doubts, do this on all PSCs.
  2. Run the Certificate Management Utility on any vCenter server that obtained its certificates from that PSC.  If you’re running embedded PSC with vCenter, you already did this in step one.
  3. Fix any trust issues that may occur with the new certificate.  For example, you must  manually fix VUM when you change a vCenter certificate.

If the above seems like it’s just as easy, it isn’t.  For one, documentation on how to do this with external PSC’s is vague and confusing from VMware.  Secondly, it gets more complicated the more PSC and vCenter nodes you have.

I’ll go in more depth on how address compromised vSphere 6.0 Certificates issued by a PSC in Part 2.  In Part 3, I’ll address how to fix trust issues with certificates in various products.

vCenter 6 – Windows vs Linux Appliance?

One of the first questions for a vSphere 6 design is which version should be used – the linux based VCSA appliance, or the traditional Windows installable version?

The debate on whether to go with the Linux based vSphere Appliance vs. the Windows installable version began when the first version of the appliance was introduced.  Through vSphere 5.5, I generally recommended the Windows version for numerous reasons:

  • It’s more mature
  • VCSA didn’t support linked mode
  • It uses a SQL database
  • You need Windows anyway for VMware Update Manager
  • It scaled better with backend databases that were more common (MS SQL)

Many blog articles have compared the two, and I don’t want to rehash a lot of that information here.  A case could be made for either.

vSphere 6 has been out for awhile, and I’ve deployed it for numerous customers, both the appliance and Windows versions.  I feel like now I can make offer something more on the debate, based on practical experience.

Which works best generally speaking?

I’m going to be honest, I’m coming at this from a Windows centered background.  I’ve worked with linux a bit, don’t get me wrong.  But at the end of the day, I am far more comfortable with Windows.  So, I’ve generally been partial to Windows based vCenter servers for my customers partly because I can support Windows based OS’s easier than linux based, and most of the customers I’ve dealt with are also more familiar with Windows.

With all that said, after deploying vSphere 6 for awhile now, it’s time for the vCenter Appliance.  I preface this with that doesn’t mean for everyone.  It does mean I start with the assumption of the appliance first.  If the customer has reasons why a Windows version makes more sense for them, I’ll recommend the Windows version.  But my de facto recommendation otherwise is go with the VCSA.  This is the first time in my entire workings with vSphere I’ve recommended the VCSA over the Windows version generally speaking.

Why the VCSA is better?

Some reasons for the VCSA have been consistent since its introduction:

  • No need for licensing of Windows or SQL
  • It’s more secure (honestly this is debatable)
  • If you’re a linux shop, you don’t need to introduce Windows for vCenter
  • It’s faster to deploy

But this version is different for numerous reasons.

VCSA is faster

After you work on both, you start noticing the VCSA is noticeably faster.  As a consultant, I jump around between environments that have different hardware that varies drastically.  I started to wonder if I truly remembered correctly which environments were faster.  Maybe the faster environments had faster storage arrays or servers?

One customer I did work for, they had hardware issues that caused me to rebuild their vCenter environment.  We elected the second time to deploy it as a VCSA instead since it had to be rebuilt from scratch to save time.  This provided a rare opportunity to compare them on the same hardware.  I don’t have numbers or benchmarks to provide.  I can only say that the customer commented it was noticeably faster within the Web Client.  I noticed it as well.

It’s faster to deploy

I know, I said this version was different for numerous reasons.  Why bring that up again?  Because vCenter 6 works best by deploying the Platform Services Controller into a separate OS from the vCenter server.  That’s two VMs to build.  It’s far faster to deploy two VCSA’s than two Windows servers.  There’s no contest there.

It scales better without a licensed database

You can scale vCenter to its highest limits of VMs and hosts with the included database within the VCSA.  If you deploy the Windows version and use the vPostgress database, it only scales to 10 hosts and 200 VMs.

Who doesn’t have full VM backup capabilities now?

Back when the VCSA first came out, I dealt with numerous customers who used traditional in guest agent backup products such as Backup Exec without the ability to do whole VM based backups.  To backup the vCenter database, they needed to use a SQL database, which locked them into using the Windows vCenter version.  Now, most environments have whole VM based backup products, whether it be Veeam, add-ons for backup products they’ve been using for a long time such as Backup Exec, or using the generally included VMware Data Protection in most licensed versions of vSphere.  How to backup the VCSA just isn’t a challenge to overcome anymore.

No feature limitations

There used to be functional limits with the VCSA compared to the Windows version of VCSA.  Almost always, this centered around Linked Mode.  Why deploy any version of vCenter that might stop you from using included features, even if you aren’t using those today?  VMware since rewrote Linked Mode, and it works with both versions.  There isn’t any other native VMware feature you can’t use in conjunction with the VCSA.

It’s the future of vCenter

This is speculation on my part, but I think it’s clear VMware wants vCenter to become the VCSA.  Best to go that direction now than later.

Why might the Windows version still be better for some environments?

There can still be some compelling reasons to go with the Windows version.

Windows in place upgrades and SQL based databases

vCenter 6 is supported on Windows 2008 R2 and above.  If your old vCenter runs on a supported Windows OS instance, this potentially allows you to do in place upgrades.  The same can be said for the SQL backend database.  With that said, if VCSA becomes the only version down the road, it might be better to bite the bullet now instead of later to migrate to it.

No whole VM backup products

If a customer doesn’t have any whole VM backup products and doesn’t wish to deploy anything, including VDP, then they may need a SQL backend database that can be backed up with their backup product.

Better operational skills with Windows

Sometimes environments have IT personnel better skilled with Windows than linux.  With that said, linux based appliances, whether it be vCenter or for some other service or application, are far more common than in the past.  It’s increasingly likely the customer has or will have one or more in their environment, whether it be a Cisco wifi controller, or perhaps a security appliance.  Perhaps the VCSA is a good starting point before learning some linux is suddenly forced on the staff unexpectedly.

vCenter High Availability

With vCenter Heartbeat gone, Microsoft clustering is the only way to provide application layer high availability for vCenter.  That can only come through the Windows version.  However, for most customers, they often elect for HA as sufficient protection of vCenter.  Windows Failover Clustering often caused more service loss than it avoids, especially if there is insufficient knowledge and/or experience, on managing it within environments.

VUM still needs Windows

Unfortunately, VUM must be installed within a Windows OS.  You can use VUM in conjunction with the VCSA.  It must run in its own Windows OS.  I don’t think that’s a good justification to not go with VCSA.  I prefer to run VUM within its own OS instance anyway.  However, some customers would rather not mix them, or prefer to deploy VUM in the same VM as vCenter.

vCenter 6 – The appliance rocks!

I highly encourage using the appliance in most cases.  The one piece of advice I can give is don’t dismiss the appliance because you’ve never used it.  Also, unfamiliarity with linux may not be a good reason either.  That one is tricky.  On the one hand, you don’t want to introduce risk due to the lack of linux skills.  On the other hand, you’ll rarely need to be in the linux parts of the appliance anyway.

Either way, the VCSA for vCenter 6 is a solid option, and should be heavily considered.

VMware ESXi 6.0 Express Patch 6 causing CBT issues

The always useful Veeam support digest is reporting that at the very least Veeam is seeing issues with Change Block Tracking (CBT) caused by vSphere 6.0 Express Patch 6.  This build was released on May 12th of this year.  It is the current build according to VMware’s build number KB article.

Veeam is reporting they’re seeing the issue if you’re not using application aware processing and using VMware Tools quiescence on your Veeam jobs.

Other blog articles are mentioning other backup products also impacted, including VMware Data Protection and IBM TSM.  It’s safe to assume this will broadly impact all VMware centered backup products.

If you’re using Express Patch 6, you currently have two options:

  1. Roll back to ESXi 6.0 Update 2.
  2. Don’t use quiesced snapshots.

Heads up!

EMC VSI RecoverPoint/SRM Integration

I’ve recently set a customer up with new VNX storage arrays, RecoverPoint , and it’s all to be integrated with VMware Site Recovery Manager.  Previously, the customer used SRM in conjunction with MirrorView/A.  Why RecoverPoint?

The really cool thing about RecoverPoint is you can easily rollback to specific points in time, as they like to call it DVR functionality for disaster recovery.  MirrorView/A only allows you to rollback to a specific snapshots at specific points in time.

EMC also provides their VSI for VMware environments.  This integrates with many of their storage products, including VNX, RecoverPoint, and it provides the DVR selection ability within SRM if you integrate it as well!

Setup is pretty straight forward:

  1. Deploy the OVA for the VSI in each site.
  2. Login to the VSI’s web portal by hitting https://<ip>:8443/vsi_vum with user name admin and password ChangeMe.  Change the password as prompted.
  3. Install the VSI’s plugin with vCenter by going to VSI Setup and provide the required info.  If you don’t get “The Operation is successful.”, do it again unless you’re provided an error to troubleshoot.  For me, that happened on one of the two vCenter servers I was deploying this on.  Also, be patient, as this can take quite sometime.  For me, the plugin took about 10-15 minutes to complete the installation.
  4. Login to the vCenter Web Client, and go to vCenter Inventory Lists. At the end, you should see an EMC VSI section. emcvsisection
  5. Click on Storage Integration Service.  Under Actions, click Register Solutions Integration Service, and enter the VSI’s info for that vCenter.  Click Test to ensure there’s connectivity to the VSI, and click OK.
  6. Under Storage Systems, add the storage array for that site.  Again, click Test to ensure there’s connectivity to the storage array, and click OK.  VSI supports VMAX, VNX, VNXe, ViPR, and XtremIO, so this isn’t just limited to the VNX on this project.
  7. Under Data Protection Systems, add the RecoverPoint cluster info for that site using the RPA cluster IP address, and be sure to select RecoverPoint as the Protection System Type.  rpprotectionsystemtypeClick Test to ensure communication will work.  If successful, OK will no longer be grayed out.  Click OK.
  8. Repeat step 7, but select SRM this time for the Data Proection System type.  Here’s where I ran into a gotcha.  The FQDN/IP address and port fields were grayed out.  I went ahead and clicked to Test, and got an error: “Could not communicate with the data protection system SRM at <IP of vCenter server>. Details: Cannot reach the target SRM server at <IP of vCenter server>:1” vsisrmregerrorGoogle didn’t yield any results for a solution, so I began troubleshooting.  Thankfully, I knew my ports, and decided to click the check box for the FQDN or IP/Port line, and entered in the FQDN of the SRM server and the port.  Be aware that SRM 6.X uses 9086.  I provided that, clicked Test, got my green “OK to go” text, and clicked OK.

Note that this needs to be done for each vCenter/RPA cluster/storage array/SRM server in the environment.  Note also only one VSI instance can be registered per vCenter server, so you’ll need to deploy one VSI per vCenter.

After setting up each site, go to a VM, click it, go to Manage, view the snapshots for its Consistency Group, click the one you want and apply, and launch your Failover or Test action from SRM.

vsiselectsnapshot

And there you have it!

Configure Dump Collector with PowerCLI in vSphere 6

I had a script to configure Dump Collector settings that I used in previous versions of vSphere.  If you look around the web, you’ll find similar PowerCLI snippets to configure Dump Collector.

If you use that snippet in vSphere 6, it doesn’t work.  You’ll get the following error:

Message: Cannot set 2 server ip parameters.;
InnerText: Cannot set 2 server ip parameters.EsxCLI.CLIFault.summary
At line:4 char:1

This is because ESXCLI now has a parameter for whether to use IPv6, so when using get-esxcli, invoking the method to set requires an additional value.  Remember, esxcli is not intuitive in that “enabled” properties are either true or null, so don’t use $false.

The revised code should now be:

$vcenterip = '192.168.1.10'
foreach($vmhost in Get-VMHost){
	$esxcli = Get-EsxCli -VMHost $vmhost.Name
	$esxcli.system.coredump.network.set($null,"vmk0",$null,$vcenterip,6500)
	$esxcli.system.coredump.network.set($true)
	$esxcli.system.coredump.network.get()
}

Also not something commonly found on the internet – can you test the ESXi netdump configuration?  Yep!

foreach($vmhost in Get-VMHost){
$esxcli = Get-EsxCli -VMHost $vmhost.Name
Write-Host "Checking dump collector on host $vmhost.name"
$esxcli.system.coredump.network.check()
}

And there you have it!

In-place upgrading Windows OS on vCenter 6?

I recently had a customer with two vCenter VMs running on Windows 2008 R2.  They were vCenter servers upgraded from 5.1 to vCenter 6.0 about six months ago.  They’re both using embedded PSCs, and have vSphere Replication and SRM plugged into them.  To simplify administration, they have embarked on a project to get all servers running Windows Server 2012 R2.

After researching, there really isn’t a great, documented way to transplant a vSphere 6 server from one OS instance to another.  Normally, I’m not a big fan of in place upgrading server operating systems, but this was a special case to meet the customer’s objective, and redeploying two vCenters and then likely having to redeploy/reconfigure SRM wasn’t something I’d want to do, plus any pitfalls with vSphere Replication. But the question is – will vCenter 6, especially with an embedded Platform Services Controller and lots of things plugged into it, work after an in place OS upgrade?

I definitely had my doubts.  The answer though in my lab is surprisingly yes!  I tried it both with an embedded PSC, and then tried it again with a once embedded PSC reconfigured to use an external PSC.  I didn’t encounter any problems whatsoever, although I should point out this was a lab environment with a clean fresh setup prior to the OS in place upgrades.

So I went ahead and did it for the customer’s environment (they aren’t  big enough to have a lab environment), and it worked like a champ as well!

Here are some things I would make sure of before proceeding:

  • You may want to backup the vCenter database.  Warning: the vPostgress Windows backup script said it ran successful for me but generated an empty 0KB backup file.  (This was one of the reasons I didn’t attempt a transplant of vCenter to a new OS instance!)  Check to ensure this database file is valid before counting it as a backup to fallback to if there’s a problem.  This may be a future blog article once I get some answers for why this happened.
  • Verify what version of Windows is running, and ensure you have the required media and license keys.  In particular, if vCenter is running Windows Server 2008 R2 Datacenter, you can’t upgrade to Windows Server 2012 R2 Standard.
  • Verify what database vCenter and VUM are using if on the same box.  vPostgress is fine.  But of it’s Microsoft SQL running on the vCenter server itself, make sure SQL is running something that is supported on Windows Server 2012 R2.  Of specific note on some of these older vCenter VMs, SQL 2008 R2 needs to be SP2 or later.
  • I would recommend stopping all vCenter related services, VUM related services (if on the same OS instance), the database service (if it’s on the same OS instance),  and AV active protection prior to OS upgrade.
  • Make sure the C drive has at least about 15GBs of free space.
  • Reboot the OS prior to starting the upgrade to clear out any cobwebs.
  • Take a snapshot and/or backup vCenter before proceeding.  (Kinda duh…)  What isn’t so duh is before you take the last snapshot, launch the upgrade prior to doing this and verify you don’t need to do anything prior to installing the upgrade.  This is usually stuff like it may require you to reboot the OS prior to the upgrade  If all you see is the warning to check to ensure your applications are compatible, cancel the upgrade, take your snapshot, and start the other precautionary steps below. If there are other things it asks you to do, do those first, THEN snapshot your VM.
  • Don’t forget to kill your snapshot once everything is done, and you’ve confirmed everything is working.

It worked flawlessly using these precautions for both production vCenter servers!

vSphere Replication 6 – Stopping replication impacts

I unfortunately didn’t get a chance to post Thursday, as I came down with a bit of a stomach bug, but I’m back at it!

I found this little interesting tidbit during preparation for VCAP6-DCV Deployment…

Did you know in 6.X that stopping replication on a VM in vSphere Replication 6.X has different behavior depending upon if you used a replication seed?

Just to make sure we’re all clear, a replication seed in vSphere Replication speak is if you copy down a VMDK from the source side, upload it to the target site, and then configure replication for the VM and select the datastore/folder for the VMDK.  When vSphere Replication sees the matching VMDK, it uses the data there and replicates only the changes since the download.

In vSphere Replication 6.X, if this was done, and you stop replication for a VM, the target VMDKs are left in place.  If this wasn’t done, and just let vSphere Replication replicate the initial copy of the VMDK, if you stop replication, the VMDK is DELETED at the target site!  If that’s a large data set, that could be a lot of data that has to be replicated again, and more than likely over a WAN link!

This in particular impacts a somewhat common task when it comes to growing a VMDK for a VM being replicated by vSphere Replication.  To do this, replication must be disabled at some point.

If you used a replication seed, it’s actually easier.  You simply stop replication, grow the VMDK on both sides, and reconfigure replication.  Pretty easy.  The target VMDK would obviously not have been deleted, making this possible.

If the VMDK wasn’t seeded, you need to do a planned failover, stop replication, resize the VMDK on both sides, and reconfigure replication.  This also obviously requires downtime.

I’m still investigating to see if there’s a way to determine if the VMDK was seeded or not, so you would know which way to go.  If you’re unsure though, use the non-seeded method as a precaution unless it’s okay to have to re-replicate the VMDK/whole VM.