Default HA and DR scenario
--------------------------

|VOSS Automate| supports using off-the-shelf VMware tools.

High Availability is implemented using VMware HA clusters, with data
accessed via a central storage facility (SAN). VMware monitors the
primary server, and should it fail, another instance of the VM is
automatically started on a different hardware instance. Since data is
shared on the SAN, the new HA instance will have access to the full
dataset.

Disaster Recovery is implemented by streaming data updates to a separate
DR instance that remains powered on. If the primary server fails, the DR
instance can take over operation. The switch-over to DR instance is
scripted, but must be invoked manually.

During a HA failover, the HA instance assumes the primary IP address,
and no reconfiguration of other UC elements is required. However, in the
case of a DR failover, interaction with other UC elements should be
considered.

*  DNS can be used effectively to provide hostname abstraction of
   underlying IP addresses. In such a case, a DNS update will allow
   existing UC elements to seamlessly interact with the new DR instance.

*  If DNS is not available, and the UC elements cannot be configured
   with the IP address of the DR instance, it is necessary for the DR
   instance to assume the primary IP address. In such a case, the DR and
   the primary IP addresses can be swapped using the CLI interface.
   Standard networking practices should be employed to ensure that the
   IP address is correctly routed, e.g. Stretched layer-2 vLAN, and
   ensuring that the Primary and DR instances are not operated with the
   same IP address.

The following failure points should be considered:

*  Since the HA instance is started automatically if the primary
   instance fails, a slight interruption in service is expected,
   including VMware polling latency in determining that the primary
   server has failed, and the startup delay of the HA instance. This
   delay is around 3 minutes

*  If data is corrupted on the SAN, the HA instance will start with the
   same corrupt code and data instances

*  Since VMware is checking only for VM liveness, it is not able to
   check that the primary instance is functionally active.

*  Data updates are transported to the DR instance. If data updates
   cannot be shipped by the primary instance, SNMP traps are generated
   informing administration of the problem. However, if this is not
   fixed timeously, it is possible for the DR instance to become out of
   sync. These delays could result in data loss between the primary and
   DR instances. Database updates are scheduled every 3 minutes and/or
   16MB.

*  There are certain manual steps that are required to bring the DR
   instance online. These steps are documented in the |Platform Guide|.


.. |VOSS Automate| replace:: VOSS Automate
.. |Installation Guide| replace:: Installation Guide
.. |Platform Guide| replace:: Platform Guide
