vCenter is down - impact on VMware infrastructure.

By | December 20, 2015

Over the past few months I have seen posts or been asked what happens when vCenter is down. Everybody probably knows that vCenter is a heart of VMware infrastructure installed as Appliance (VCSA) or Windows machine. The main vCenter features are as follows:

  • Central management of VMware environments - allow you to manage VM's across multiple VMware hosts at once. E.g. you can create a custom roles and permissions.
  • Depends on vSphere license, allows you to use DRS/DPM, Storage DRS, vMotion, Fault Tolerance (FT), High Availability (HA) or vSphere Distributed Switch (vDS)

There are also other functions such as easier patching of your VMware infrastructure by integration with the VMware Update Manager component, doing backup of VMs.

Impact on VMware environment when vCenter is down

Ok, so what happens with above features when vCenter is unavailable? The most important thing is: all workloads (VMs) should be running fine. To illustrate this, the below figure presents features not working (red) or limited working (blue) during the vCenter down.

vCenter-is-down-impact-on-VMware-infrastructure-features

Central management

Of course, when your vCenter is unavailable, you do not have the central management of your VMware environment. However, still all your VMs are running and if you need to administer them you have to log in directly to ESXi hosts (separately). So if you have 100 hosts and you need to do sth with 10 VMs, you have to log in at least once or max 100 times in the worst case...because maybe you would have to find those VMs ๐Ÿ˜‰

Patching environment

If your vCenter is down, you can't use VMware Update Manager to patch your ESXi hosts. In spite of possibility to patch ESXi hosts using esxcli command - it may have impact as you can not do vMotion to migrate VMs between hosts (VMs shut down required).

vMotion and Storage vMotion

vMotion allows moving a virtual machine from one physical server (ESXi host) to another with no downtime. Storage vMotion is a feature that allows the live migration of a running virtual machine's VMDKs from one storage system to another, with no downtime for the VM. These features are available only when vCenter is up (available).

Distributed Resource Scheduler (DRS) and Storage DRS

VMware DRS dynamically balances computing capacity across ESXi hosts (DRS) or between datastores (Storage DRS). As DRS relies on vMotion/Storage vMotion and vCenter calculates a correct balance -ย  DRS does not work when vCenter is down.

Fault Tolerance (FT)

All VMs configured (protected) by FT before vCenter outgate are safe. FT failovers in case of VM fail, however does not select a new secondary.

High Availability (HA)

VMware HA will restart VMs (powered on before vCenter unavailability) in case of ESXi host failure. As vCenter is responsible for the protection or unprotection of virtual machines - "new powered on VMs" will not be protected by VMware HA until vCenter is back and online. There are not possible to do any changes in HA configuration (e.g. restart priority) as well. Admission Control does not work too.

vSphere Distributed Switch (vDS)

All VMs connected to the vDS still should have network access even vCenter failure. However, if you need to change network settings for VM, you can change only to network in port group located in vSS. No possible to make any changes in vDS.

Other products

vCenter can be crucial for 3rd applications, other VMware components (e.g. vROps, vRA) or just backup software.

vCenter downtime avoidance

We should limit or totally avoid vCenter downtime (RTO). Fortunately, we can use some availability options:

For more information about vCenter High Availability Options please follow a post here.

Conclusion

vCenter is a crucial component of VMware infrastructure that we should protect. Unfortunately, currently there are not native vCenter features to protect itself so we have to use mentioned availability options. I hope that VMware adds some additional possibilities in the near future.

Update 09.2016: During VMworld 2016 US, VMware announced some new features in vSphere 6.5. There will be some native replication and monitoring for VCSA as follows:

  1. Active/Passive HA functionality for vCenter (VCSA only).
  2. Builtin monitoring web interface for the VCSA.
  3. Builtin backup/restore of all VCSA configuration.
  4. VMware Update Manager integrated in VCSA (no Windows required).

9 thoughts on “vCenter is down - impact on VMware infrastructure.

  1. Magnus

    As far as i know vSphere HA will not protect VMs that was powered on when vCenter Server is down but all VMs that were powered on before vCenter Server failed are protected as you said.

    Reply
    1. Mariusz Post author

      Hi Magnus,

      yes, that's true. At least it worked as you mentioned in vSphere 5.x. I haven't tested it in 6.0. I've updated the post. Thanks.

      Reply
  2. Larry

    Thank you for this write-up.

    While availability have never really been an issue for us, since vcsa runs in vMSC. However, we've had a lot of problems with things happens in the vcsa itself. Bugs, disks fills up (5.5), services not starting etc. That kind of stuff is much harder to mitigate. VM image backups not running is the biggest issue for us in those cases.

    I hope VMware makes VCSA more stable, more resilient, better support, more/better KB's.

    Reply
  3. Kalpesh Patel

    This is an excellent article on what functionality may become unavailable in case vCenter is not available.

    VMWare now days in good deal recommends deploying VCSA now that 6.0 version or better of it is on pair with the Windows based brother when it comes to core capabilities. Has there been any discussion as to how to future proof VCSA against corrupt OS and or database? I like to see more discussion of it. This I believe historically has been the least spoken of but that also dictate the strength of protection of the infrastructure.

    Reply
    1. Mariusz Post author

      Hi,

      I haven't seen such discussion but certainly, there are already taken into considerations such features by VMware. As you said VMware recommends "appliances direction" and native load balancing/replication methods (e.g. we can observe it in vROps, PSC) so VCSA probably will be the next.

      Reply
  4. Jason

    We run Juniper's firefly/vGW product which is now thankfully discontinued. However a fun feature we found is that if the SD (security designer) is unable to contact the vcenter server it'll take ALL the servers out of their policy groups, and move them into the default group.

    Now we setup our default group as a deny any/any, since we didn't want any servers to magically work, but rather their naming conventions define their roles to fit into the network, and what they can access (it's so nice to move away from the old days of separate networks ala VLANs) but boy were we surprised when we first had a vcenter outage, followed by a full on site outage.

    It's really bothersome, but now our default group is to permit any/any inbound and outbound as Juniper has offered no solution for this issue. And worse, they never do reply as to why this is even a feature.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *