.. _failover-elect-new-primary:

Election of a New Primary and Failover
--------------------------------------

.. index:: database;database config

In the case where unified nodes fail, the system follows a failover procedure.
For details on the failover and DR process, refer to the topics in the Platform Guide.

If the primary database is lost, the failover process involves the election
of a new primary database by the remaining database nodes. Each node in a
cluster is allocated a number of votes that are used in
the failover election of a new primary database - the election of a running
node with the highest database weight.

The database weights for a node can be seen as the ``priority`` value when
running the **database config** command. Note that database weight of a node
does not necessarily match its number of votes.

 ::

      $ database config
          date: 2016-04-25T09:50:34Z
          members:
              172.29.21.101:27020:
                  priority: 16
                  stateStr: PRIMARY
              172.29.21.101:27030:
                  stateStr: ARBITER
              172.29.21.102:27020:
                  priority: 8
                  stateStr: SECONDARY
              172.29.21.102:27030:
                  stateStr: ARBITER
              172.29.21.103:27020:
                  priority: 4
                  stateStr: SECONDARY
              172.29.21.103:27030:
                  stateStr: ARBITER
              172.29.21.104:27020:
                  priority: 2
                  stateStr: SECONDARY
          myState: 1
          ok: 1
          set: DEVICEAPI
      

The maximum number of votes in a cluster should not exceed 7 and arbiter
votes are added to nodes to provide a total of 7 votes.

The tables below show the system status and failover for a selection of scenarios
for 6 node and 8 node clusters. Also refer to the topics on the specific DR scenarios.
The abbreviations used are as follows:

* Pri : Primary site
* DR : DR site
* N : node. Primary node is N1, secondary node is N2.
* w : database weight
* v : vote
* a : arbiter vote

Not all scenarions are listed for 8 node clusters and
example weights have been allocated.

* For a 6 node cluster with 4 database nodes and 2 sites, initial votes are as follows:

  Primary database node, nodes 2-3: 2 (1 + 1 arbiter)
  Secondary database nodes 4: 1 (no arbiter) 



.. tabularcolumns:: |p{1cm}|p{1cm}|p{1cm}|p{1cm}|p{1cm}|p{10cm}|


+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Pri N1 w:40 v:1 a:1 | Pri N2 w:30 v:1 a:1 | DR N3 w:20 v:1 a:1 | DR N4 w:10 v:1 | Votes | System Status under scenario                                                                                |
+=====================+=====================+====================+================+=======+=============================================================================================================+
| Up                  | Up                  | Up                 | Up             | 7     | System is functioning normally.                                                                             |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Up                  | Up                  | Up                 | Down           | 6     | Scenario: Loss of a Non-primary Server in the DR Site. System continues functioning normally.               |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Up                  | Up                  | Down               | Up             | 6     | Scenario: Loss of a Non-primary Server in the DR Site. System continues functioning normally.               |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Up                  | Down                | Up                 | Up             | 6     | Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.            |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Down                | Up                  | Up                 | Up             | 5     | Scenario: Loss of the Primary Database Server. Some downtime occurs. System automatically fails over to N2. |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Down                | Down                | Up                 | Up             | 3     | Scenario: Loss of a Primary Site. Manual recovery required                                                  |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Up                  | Up                  | Down               | Down           | 4     | System continues functioning normally.                                                                      |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Up                  | Down                | Down               | Up             | 3     | Manual recovery required                                                                                    |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+
| Up                  | Down                | Up                 | Down           | 4     | System continues functioning normally.                                                                      |
+---------------------+---------------------+--------------------+----------------+-------+-------------------------------------------------------------------------------------------------------------+


* For an 8 node cluster with 6 database nodes and 2 sites, initial votes are as follows:

  Primary database node: 2 (1 + 1 arbiter voting member)
  Secondary database nodes total: 5 (no arbiter votes) 

  The table here shows a representative selection of scenarios.

.. tabularcolumns:: |p{1cm}|p{1cm}|p{1cm}|p{1cm}|p{1cm}|p{1cm}|p{1cm}|p{7cm}|


+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Pri N1 w:60 v:1 a:1 | Pri N2 w:50 v:1 | Pri N3 w:40 v:1 | Pri N4 w:30 v:1 | DR N5 w:20 v:1 | DR N6 w:10 v:1 | Votes | System Status under scenario                                                                                    |
+=====================+=================+=================+=================+================+================+=======+=================================================================================================================+
| Up                  | Up              | Up              | Up              | Up             | Up             | 7     | System is functioning normally.                                                                                 |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Up              | Up              | Down            | Down           | Down           | 4     | Scenarios: Loss of a Non-primary Node in the Primary and Secondary Site. System continues functioning normally. |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Up              | Up              | Up              | Down           | Up             | 6     | Scenario: Loss of a Non-primary Server in the DR Site. System continues functioning normally.                   |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Down            | Up              | Up              | Up             | Up             | 6     | Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.                |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Down            | Down            | Up              | Up             | Up             | 6     | Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.                |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Down                | Up              | Up              | Up              | Up             | Up             | 6     | Scenario: Loss of the Primary Database Server. Some downtime occurs. System automatically fails over to N2.     |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Down                | Down            | Up              | Up              | Up             | Up             | 4     | Some downtime occurs. System automatically fails over to N3.                                                    |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Down                | Down            | Down            | Up              | Up             | Up             | 3     | Manual recovery required                                                                                        |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Down                | Down            | Down            | Down            | Up             | Up             | 2     | Scenario: Loss of a Primary Site. Manual recovery required                                                      |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Up              | Down            | Up              | Up             | Up             | 6     | Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.                |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Up              | Down            | Down            | Up             | Up             | 5     | Scenario: Loss of a Non-primary Node in the Primary Site. System continues functioning normally.                |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Up              | Down            | Down            | Down           | Up             | 4     | Scenarios: Loss of a Non-primary Node in the Primary and Secondary Site. System continues functioning normally. |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Up              | Down            | Down            | Down           | Down           | 3     | Manual recovery required                                                                                        |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+
| Up                  | Down            | Up              | Down            | Down           | Down           | 3     | Manual recovery required                                                                                        |
+---------------------+-----------------+-----------------+-----------------+----------------+----------------+-------+-----------------------------------------------------------------------------------------------------------------+

As the represenative table above shows, the 8 node status and scenarios are
similar for a number of permutations of nodes. For example, the failure of a single 
node N2, N3 or N4 results in the same scenario: 

* Scenario: Loss of a Non-primary Node in the Primary Site.  
  System continues functioning normally.

The list below shows individual nodes (N1 to N6) and groups of nodes that will 
result in the same failover scenario. 

Upon recovery, there is typically a delay of 10-20 minutes in the continuance of transaction
processing.

* N2, N3, N4
* N5, N6
* N2+N3, N2+N4, N3+N4
* N1+N2+N3, N1+N2+N4, N1+N3+N4
* N1+N5, N1+N6
* N2+N5, N2+N6, N3+N5, N3+N6, N4+N5, N4+N6
* N2+N3+N4
* N2+N3+N5, N2+N3+N6, N2+N4+N5, N2+N4+N6, N3+N4+N5, N3+N4+N6 
* N5+N6

A failure in other groupings will require a manual recovery, for example,
in such groups as:

* N1+N2+N3, N1+N2+N4, N1+N2+N5, N1+N2+N6, N1+N3+N4, N1+N3+N5, N1+N3+N6, N1+N4+N5, N1+N4+N6, N1+N5+N6
* N2+N3+N4+N5, N2+N3+N4+N6, N3+N4+N5+N6
* N1+N2+N3+N4, N1+N2+N3+N5, N1+N2+N3+N6, N1+N3+N4+N5, N1+N3+N4+N6, N1+N4+N5+N6
* N1+N2+N3+N4+N5, N1+N2+N3+N4+N6



