High Availability for virtual machines

A high availability virtual machine is automatically restarted if it crashes or if its host becomes non-responsive. When these events occur, RHVM automatically restarts the high availability virtual machine, either on its original host or another host in the cluster.

Red Hat Virtualization Manager constantly monitors hosts and storage to detect hardware failures. With high availability, interruption to service is kept to a minimum because RHVM restarts virtual machines configured to be highly available within seconds with no user intervention required.

Configuring high availability is a recommended practice for virtual machines running critical workloads.

Note: A highly available virtual machine is automatically restarted, either on its original host or another host in the cluster.

Virtual machines may be configured to automatically restart if the host becomes non-responsive or the virtual machine unexpectedly crashes. To use this feature, all hosts in the cluster must support power management; that is, they must have an out-of-band power management system like iLO, DRAC, RSA, or a network-attached remote power switch configured to act as a fencing device.

RHVM can also automatically restart high-priority virtual machines first. Multiple levels of priority give the highest restart priority to the most important virtual machines.

Fencing hosts for VM integrity

A virtual machine must never be running on two hosts at the same time, or its disk image is likely to become corrupt and other problems will occur. To ensure that this does not happen, Red Hat Virtualization uses an out-of-band management agent to fence a non-responsive host; that is, to kill its power and ensure that it and its virtual machines are down. Only then will it reboot the virtual machine on a new host.

A host is non-responsive when RHVM cannot communicate with it. RHVM uses fencing to ensure that highly available virtual machines running on a non-responsive host are really stopped and then restarts them on a different host in the cluster.

Red Hat Virtualization 4 and later also support the usage of a special storage volume as a lease to control whether virtual machines boot on another host when the original host goes down unexpectedly. This feature also avoids having two instances of the same virtual machine running on different hosts at the same time.

NOTE: There is an important distinction between a Non-Operational host and a NonResponsive host. A non-operational host has a problem but RHVM can still communicate with it.RHVM works with the host to migrate any virtual machines running on that host to operational hosts in the cluster. Likewise, a host that is moved to Maintenance mode automatically migrates all its virtual machines to other operational hosts in the cluster.

A non-responsive host is one that is not communicating with RHVM. After about 30 seconds, RHVM fences that host and restarts any highly available virtual machines on operational hosts in the cluster.

Configuring a fence agent in a host

RHVM uses a fence agent to fence non-responsive hosts. It does not do this directly but uses VDSM to send power management requests to a fencing proxy, one of the other hosts in the same cluster or data center as the non-responsive host. That host communicates with the fence agent to execute the power management request.

The Power Management tab in the Edit Host and New Host windows includes the power management configuration options for a host.

Configuration for host high availability

The configuration options included in the Power Management tab:

    • The Enable Power Management check box enables power management for the host.
  • The Kdump integration check box disables host fencing while a kernel crash dump happens.
  • The Disable policy control of power management check box disables the host’s cluster scheduling policy for the host.
  • The plus (+) button opens a new window, titled Edit fence agent, to configure a new fence agent for a host. This configuration includes parameters like the IP address of the Remote Access Card (RAC), and the username and password to log in to it.
  • The Advanced Parameters section specifies the search order for a proxy in the host’s cluster and data center.

Configuring a highly available virtual machine

Virtual machines are configured to be highly available on an individual basis. It can be configured when you create the virtual machine, or you can edit the VM to enable high availability.

The High Availability tab in the Edit Virtual Machine window includes the high availability configuration options for a virtual machine. To open the Edit Virtual Machine window, right-click on the virtual machine list item, and click Edit.

Configuration for virtual machine high availability

The configuration options included in the High Availability tab:

  • The Highly Available check box enables high availability for the virtual machine.
  • The Target Storage Domain for VM Lease drop-down menu specifies whether or not to use a storage lease to control whether a virtual machine boots on another host when the original host goes down unexpectedly. To use a storage lease, you must configure a storage domain for the lease.
  • The Priority drop-down menu sets the priority of the virtual machine in the migration queue.

Highly available virtual machines successfully restart when their host becomes non-responsive if the following conditions are met:

  1. Power management is available for the hosts running the highly available virtual machines.
  2. The host running the highly available virtual machine must be part of a cluster that has other available hosts.
  3. The destination host must be running.
  4. The source and destination host must have access to the data domain on which the virtual machine resides.
  5. The source and destination host must have access to the same virtual networks and VLANs.
  6. There must be enough CPUs on the destination host that are not in use to support the virtual machine’s requirements.
  7. There must be enough RAM on the destination host that is not in use to support the virtual machine’s requirements.