Election of a New Primary and Failover

In the case where unified nodes fail, the system follows a failover procedure. For details on the failover and DR process, refer to the topics in the Platform Guide.

If the primary database is lost, the failover process involves the election of a new primary database by the remaining database nodes. Each node in a cluster is allocated a number of votes that are used in the failover election of a new primary database - the election of a running node with the highest database weight.

The database weights for a node can be seen as the priority value when running the database config command. Note that database weight of a node does not necessarily match its number of votes.

$ database config
    date: 2016-04-25T09:50:34Z
    members:
        172.29.21.101:27020:
            priority: 16
            stateStr: PRIMARY
        172.29.21.101:27030:
            stateStr: ARBITER
        172.29.21.102:27020:
            priority: 8
            stateStr: SECONDARY
        172.29.21.102:27030:
            stateStr: ARBITER
        172.29.21.103:27020:
            priority: 4
            stateStr: SECONDARY
        172.29.21.103:27030:
            stateStr: ARBITER
        172.29.21.104:27020:
            priority: 2
            stateStr: SECONDARY
    myState: 1
    ok: 1
    set: DEVICEAPI

The maximum number of votes in a cluster should not exceed 7 and arbiter votes are added to nodes to provide a total of 7 votes.

The tables below show the system status and failover for a selection of scenarios for 6 node and 8 node clusters. Also refer to the topics on the specific DR scenarios. The abbreviations used are as follows:

  • Pri : Primary site
  • DR : DR site
  • N : node. Primary node is N1, secondary node is N2.
  • w : database weight
  • v : vote
  • a : arbiter vote

Not all scenarions are listed for 8 node clusters and example weights have been allocated.

  • For a 6 node cluster with 4 database nodes and 2 sites, initial votes are as follows:

    Primary database node, nodes 2-3: 2 (1 + 1 arbiter) Secondary database nodes 4: 1 (no arbiter)

Pri N1 w:40 v:1 a:1 Pri N2 w:30 v:1 a:1 DR N3 w:20 v:1 a:1 DR N4 w:10 v:1 Votes System Status under scenario
Up Up Up Up 7 System is functioning normally.
Up Up Up Down 6 Scenario: Loss of a Non-primary Server in the DR Site. System continues functioning normally.
Up Up Down Up 6 Scenario: Loss of a Non-primary Server in the DR Site. System continues functioning normally.
Up Down Up Up 6 Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.
Down Up Up Up 5 Scenario: Loss of the Primary Database Server. Some downtime occurs. System automatically fails over to N2.
Down Down Up Up 3 Scenario: Loss of a Primary Site. Manual recovery required
Up Up Down Down 4 System continues functioning normally.
Up Down Down Up 3 Manual recovery required
Up Down Up Down 4 System continues functioning normally.
  • For an 8 node cluster with 6 database nodes and 2 sites, initial votes are as follows:

    Primary database node: 2 (1 + 1 arbiter voting member) Secondary database nodes total: 5 (no arbiter votes)

    The table here shows a representative selection of scenarios.

Pri N1 w:60 v:1 a:1 Pri N2 w:50 v:1 Pri N3 w:40 v:1 Pri N4 w:30 v:1 DR N5 w:20 v:1 DR N6 w:10 v:1 Votes System Status under scenario
Up Up Up Up Up Up 7 System is functioning normally.
Up Up Up Down Down Down 4 Scenarios: Loss of a Non-primary Node in the Primary and Secondary Site. System continues functioning normally.
Up Up Up Up Down Up 6 Scenario: Loss of a Non-primary Server in the DR Site. System continues functioning normally.
Up Down Up Up Up Up 6 Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.
Up Down Down Up Up Up 6 Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.
Down Up Up Up Up Up 6 Scenario: Loss of the Primary Database Server. Some downtime occurs. System automatically fails over to N2.
Down Down Up Up Up Up 4 Some downtime occurs. System automatically fails over to N3.
Down Down Down Up Up Up 3 Manual recovery required
Down Down Down Down Up Up 2 Scenario: Loss of a Primary Site. Manual recovery required
Up Up Down Up Up Up 6 Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.
Up Up Down Down Up Up 5 Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.
Up Up Down Down Down Up 4 Scenarios: Loss of a Non-primary Node in the Primary and Secondary Site. System continues functioning normally.
Up Up Down Down Down Down 3 Manual recovery required
Up Down Up Down Down Down 3 Manual recovery required

As the represenative table above shows, the 8 node status and scenarios are similar for a number of permutations of nodes. For example, the failure of a single node N2, N3 or N4 results in the same scenario:

  • Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.

The list below shows individual nodes (N1 to N6) and groups of nodes that will result in the same failover scenario.

Upon recovery, there is typically a delay of 10-20 minutes in the continuance of transaction processing.

  • N2, N3, N4
  • N5, N6
  • N2+N3, N2+N4, N3+N4
  • N1+N2+N3, N1+N2+N4, N1+N3+N4
  • N1+N5, N1+N6
  • N2+N5, N2+N6, N3+N5, N3+N6, N4+N5, N4+N6
  • N2+N3+N4
  • N2+N3+N5, N2+N3+N6, N2+N4+N5, N2+N4+N6, N3+N4+N5, N3+N4+N6
  • N5+N6

A failure in other groupings will require a manual recovery, for example, in such groups as:

  • N1+N2+N3, N1+N2+N4, N1+N2+N5, N1+N2+N6, N1+N3+N4, N1+N3+N5, N1+N3+N6, N1+N4+N5, N1+N4+N6, N1+N5+N6
  • N2+N3+N4+N5, N2+N3+N4+N6, N3+N4+N5+N6
  • N1+N2+N3+N4, N1+N2+N3+N5, N1+N2+N3+N6, N1+N3+N4+N5, N1+N3+N4+N6, N1+N4+N5+N6
  • N1+N2+N3+N4+N5, N1+N2+N3+N4+N6