DR Failover and Recovery in a 2 Node Cluster


A 2 node cluster will not fail over automatically.

With only two Unified nodes, with or without Web proxies, there is no High Availability. The database on the primary node is read/write, while the database on the secondary is read only.

Only redundancy is available.

  • If the primary node fails, a manual delete of the primary node on the secondary and a cluster provision will be needed.

  • If the secondary node fails, it needs to be replaced.

Scenario: Loss of Primary Node

  • The administrator deployed the 2-node cluster.

    $ cluster status
    Data Centre: jhb
                 application : AS01[]
                 webproxy :    AS01[]
                 database :    AS01[]

    Example database weights:

    $ database weight list
             weight: 20
             weight: 10
  • Node Failure: in the case where the primary node is lost on the Primary site:

    $ cluster status
    Data Centre: unknown
                 application : unknown_172.29.248.100[] (not responding)
                 webproxy : unknown_172.29.248.100[] (not responding)
                 database : unknown_172.29.248.100[] (not responding)
    Data Centre: jhb
                 application : AS02[]
                 webproxy : AS02[]
                 database : AS02[]

Recovery Steps

The primary node server is lost.

  1. It is decided to fail over to the secondary node:

    1. On the secondary node, remove the lost server from the cluster:

      cluster del

    2. On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command). See: Using the screen command.

      On the secondary node, check:

      $ cluster status
      Data Centre: jhb
                   application : AS02[]
                   webproxy : AS02[]
                   database : AS02[]
  2. It is decided to recover the primary node:

    1. On the secondary node, remove the lost server from the cluster:

      cluster del

    2. On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command).

      On the secondary node, check:

      $ cluster status
      Data Centre: jhb
                   application : AS02[]
                   webproxy : AS02[]
                   database : AS02[]
    3. Switch on the newly installed server.

      On the secondary node, add the server. Run cluster add

      On either node, check:

      $ cluster status
      Data Centre: jhb
                   application : AS01[]
                   webproxy :    AS01[]
                   database :    AS01[]
    4. Configure the primary database. On the newly installed server, run cluster provision primary (it is recommended that this step is run in a terminal opened with the screen command).

      Check database configuration on both nodes, for example:

      $ database config
              $date: 1549450382862
          heartbeatIntervalMillis: 2000
                  priority: 20.0
                  stateStr: PRIMARY
                  storageEngine: WiredTiger
                  priority: 1.0
                  stateStr: ARBITER
                  storageEngine: Unknown
                  priority: 10.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
          myState: 1
          ok: 1.0
          set: DEVICEAPI
          term: 8

Scenario: Loss of Secondary Node - Replace

  1. Remove the secondary node:

    cluster del <secondary node IP>
  2. Re-provision the cluster without the removed node:

    cluster provision
  3. Create a new secondary node: see Create a New VM Using the Platform-Install OVA

  4. On the newly added node, run:

    cluster prepnode
  5. From the primary unified node, run the command below - with the IP address of the new unified server to add it to the existing cluster.

    cluster add <secondary node IP>
  6. Re-provision the cluster:

    cluster provision primary <IP of current primary>