DR Failover and Recovery in a 2 Node Cluster¶

Important

A 2 node cluster will not fail over automatically.

With only two Unified nodes, with or without Web proxies, there is no High Availability. The database on the primary node is read/write, while the database on the secondary is read only.

Only redundancy is available.

If the primary node fails, a manual delete of the primary node on the secondary and a cluster provision will be needed.
If the secondary node fails, it needs to be replaced.

Scenario: Loss of Primary Node¶

The administrator deployed the 2-node cluster.

$ cluster status

Data Centre: jhb
             application : AS01[172.29.42.100]
                           AS02[172.29.42.101]

             webproxy :    AS01[172.29.42.100]
                           AS02[172.29.42.101]

             database :    AS01[172.29.42.100]
                           AS02[172.29.42.101]

Example database weights:

$ database weight list
     172.29.42.100:
         weight: 20
     172.29.42.101:
         weight: 10

Node Failure: in the case where the primary node is lost on the Primary site:

$ cluster status

Data Centre: unknown
             application : unknown_172.29.248.100[172.29.248.100] (not responding)

             webproxy : unknown_172.29.248.100[172.29.248.100] (not responding)

             database : unknown_172.29.248.100[172.29.248.100] (not responding)

Data Centre: jhb
             application : AS02[172.29.248.101]

             webproxy : AS02[172.29.248.101]

             database : AS02[172.29.248.101]

Recovery Steps¶

The primary node server is lost.

It is decided to fail over to the secondary node:
1. On the secondary node, remove the lost server from the cluster:
  
  cluster del 172.29.248.100
2. On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command). See: Using the screen command.
  
  On the secondary node, check:
```
$ cluster status

Data Centre: jhb
             application : AS02[172.29.248.101]

             webproxy : AS02[172.29.248.101]

             database : AS02[172.29.248.101]
```

It is decided to recover the primary node:

On the secondary node, remove the lost server from the cluster:

cluster del 172.29.248.100

On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command).

On the secondary node, check:

$ cluster status

Data Centre: jhb
             application : AS02[172.29.248.101]

             webproxy : AS02[172.29.248.101]

             database : AS02[172.29.248.101]

Switch on the newly installed server.

On the secondary node, add the server. Run cluster add 172.29.42.100.

On either node, check:

$ cluster status

Data Centre: jhb
             application : AS01[172.29.42.100]
                           AS02[172.29.42.101]

             webproxy :    AS01[172.29.42.100]
                           AS02[172.29.42.101]

             database :    AS01[172.29.42.100]
                           AS02[172.29.42.101]

Configure the primary database. On the newly installed server, run cluster provision primary 172.29.42.100 (it is recommended that this step is run in a terminal opened with the screen command).

Check database configuration on both nodes, for example:

$ database config
    date:
        $date: 1549450382862
    heartbeatIntervalMillis: 2000
    members:
        172.29.42.100:27020:
            priority: 20.0
            stateStr: PRIMARY
            storageEngine: WiredTiger
        172.29.42.100:27030:
            priority: 1.0
            stateStr: ARBITER
            storageEngine: Unknown
        172.29.42.101:27020:
            priority: 10.0
            stateStr: SECONDARY
            storageEngine: WiredTiger
    myState: 1
    ok: 1.0
    set: DEVICEAPI
    term: 8

VOSS Automate 21.2

DR Failover and Recovery in a 2 Node Cluster¶

Scenario: Loss of Primary Node¶

Recovery Steps¶