DR Failover and Recovery Scenarios in a Modular Cluster#

A number of failover scenarios and recovery steps are shown. In each case, a node topology is assumed: 8 node clusters in 2 sites - primary and Disaster Recovery (DR).

The example is a typical cluster deployment: 8 nodes, where 3 nodes are database servers, 3 nodes are application nodes and 2 nodes are proxy servers.

A node failure scenario is indicated and a set of recovery steps are provided.

The following scenarios that are covered:

Power Off and On of a Node in a Modular Cluster
Loss of an app node: Modular Cluster
Loss of the Primary Database Server in a Modular Cluster
Loss of a non-primary database: Modular Cluster
Loss of a Primary Site in a Modular Cluster
Loss of a DR Site in a Modular Cluster
Loss of Full Cluster in a Modular Cluster

Background

For the scenarios below, the following procedures and definitions apply:

In the event of a network failure or a temporary network outage affecting a single a node, the node will be inaccessible and the cluster will respond in the same way as if the node had failed. If network connectivity is then restored, no action is required, because the node will again start communicating with the other nodes in the cluster, provided no changes were made to that node during the outage window.
In a clustered deployment, the data center would typically be two different data centers, for example “Virginia” and “Seattle”. These can be thought of as a primary site and a DR (Disaster Recovery) site in case of a failure in the primary site. These two data centers can exist on the same physical hardware, so the separation of the cluster is into two sets of three nodes.

When data centers are defined during installation, the nodes of a cluster may or may not be in the same physical location. The cluster is designed to communicate across all nodes, regardless of their physical location.
During recovery, the command cluster provision must be run every time a node is deleted from or added to a cluster, even if it is a replacement node. It is recommended that this step is run in a terminal opened with the screen command. See: Using the screen command.
During recovery and installation, the command cluster prepnode must be run on every node.
During recovery of 8 node clusters, database weights should be deleted and added again.