DR Failover and Recovery Scenarios¶

A number of failover scenarios and recovery steps are shown. In each case, a node topology is assumed: 6 or 8 node clusters in 2 sites - primary and Disaster Recovery (DR). A node failure scenario is indicated and a set of recovery steps are provided.

The following scenarios that are covered:

Power off of a node
Loss of a non-primary node in the Primary site
Loss of a non-primary server in the DR site
Loss of the Primary Database Server
Loss of a Primary Site
Loss of a DR Site

For the scenarios below, the following procedures and definitions apply:

In the event of a network failure or a temporary network outage affecting a single a node, the node will be inaccessible and the cluster will respond in the same way as if the node had failed. If network connectivity is then restored, no action is required, because the node will again start communicating with the other nodes in the cluster, provided no changes were made to that node during the outage window.
In a clustered deployment, the datacentre would typically be two different datacentres, for example “Virginia” and “Seattle”. These can be thought of as a primary site and a DR (Disaster Recovery) site in case of a failure in the primary site. These two datacentres can exist on the same physical hardware, so the separation of the cluster is into two sets of three nodes.

When datacentres are defined during installation, the nodes of a cluster may or may not be in the same physical location. The cluster is designed to communicate across all nodes, regardless of their physical location.
During recovery, the command cluster provision must be run every time a node is deleted from or added to a cluster, even if it is a replacement node. It is recommended that this step is run in a terminal opened with the screen command. See: Using the screen command.
During recovery and installation, the command cluster prepnode must be run on every node.
During recovery of 8 node clusters, database weights should be deleted and added again.

VOSS Automate 21.3

DR Failover and Recovery Scenarios¶