Modular Cluster: Election of a New Primary and Failover#

In the case where nodes fail, the system follows a failover procedure. For details on the failover and DR process, refer to the topics in the Platform Guide.

If the primary database is lost, the failover process involves the election of a new primary database by the remaining database nodes. Each node in a cluster is allocated a number of votes that are used in the failover election of a new primary database - the election of a running node with the highest database weight.

The database weights for a node can be seen as the priority value when running the database config command.

Note

The database weight of a node does not necessarily match its number of votes.

For voting on a modular system, arbiters are added to the database nodes (N1 and N2) on the primary site, but the secondary site database node (N3) arbiter will be unused.

members:
    192.168.100.4:27020:
        priority: 30.0
        stateStr: SECONDARY
        storageEngine: WiredTiger
    192.168.100.4:27030:
        priority: 30.0
        stateStr: ARBITER
        storageEngine: WiredTiger
    192.168.100.6:27020:
        priority: 40.0
        stateStr: PRIMARY
        storageEngine: WiredTiger
    192.168.100.6:27030:
        priority: 40.0
        stateStr: ARBITER
        storageEngine: WiredTiger
    192.168.100.8:27020:
        priority: 10.0
        stateStr: SECONDARY
        storageEngine: WiredTiger

The maximum number of votes in a cluster should not exceed 5 and arbiter votes are added to nodes to provide a total of 5 votes.

The tables below show the system status and failover for a selection of scenarios for a 6 node cluster. Also refer to the topics on the specific DR scenarios. The abbreviations used are as follows:

  • Pri : Primary site

  • DR : DR site

  • N : node. Primary node is N1, secondary node is N2.

  • w : database weight

  • v : vote

  • a : arbiter vote

  • For example, for a 6 node cluster with 3 database nodes and 2 sites, initial votes per node are as follows:

    Primary database nodes (N1, N2): 2 (each 1 + 1 arbiter) Secondary database node (N3): 1 (no arbiter vote)

Pri N1 w:40 v:1 a:1

Pri N2 w:30 v:1 a:1

DR N3 w:10 v:1

Votes

System Status under scenario

Up

Up

Up

5

System is functioning normally.

Up

Up

Down

4

Scenario: Loss of a Non-primary Server in the DR Site. System continues functioning normally.

Up

Up

Up

5

Scenario: Loss of an Application Server. System continues functioning normally. Some transactions may hang.

Up

Down

Up

3

Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.

Pri N1 w:40 v:1 a:1

Pri N2 w:30 v:1 a:1

DR N3 w:10 v:1

Votes

System Status under scenario

Down

Up

Up

3

Scenario: Loss of the Primary Database Server. Some downtime occurs. System automatically fails over to N2.

Down

Down

Up

1

Scenario: Loss of a Primary Site. Manual recovery required.

Up

Down

Down

2

Scenario: Loss of all secondary nodes. Manual recovery required.

Down

Up

Down

2

Scenario: Loss of all Primary and DR nodes. Manual recovery required.