DR Failover and Recovery in a 2 Node Cluster

Important

A 2 node cluster will not fail over automatically.

With only two Unified nodes, with or without Web proxies, there is no High Availability. The database on the primary node is read/write, while the database on the secondary is read only.

Only redundancy is available.

  • If the primary node fails, a manual delete of the primary node on the secondary and a cluster provision will be needed.

  • If the secondary node fails, it needs to be replaced.

Scenario: Loss of Primary Node

  • The administrator deployed the 2-node cluster.

    $ cluster status
    
    Data Centre: jhb
                 application : AS01[172.29.42.100]
                               AS02[172.29.42.101]
    
                 webproxy :    AS01[172.29.42.100]
                               AS02[172.29.42.101]
    
                 database :    AS01[172.29.42.100]
                               AS02[172.29.42.101]
    

    Example database weights:

    $ database weight list
         172.29.42.100:
             weight: 20
         172.29.42.101:
             weight: 10
    
  • Node Failure: in the case where the primary node is lost on the Primary site:

    $ cluster status
    
    Data Centre: unknown
                 application : unknown_172.29.248.100[172.29.248.100] (not responding)
    
                 webproxy : unknown_172.29.248.100[172.29.248.100] (not responding)
    
                 database : unknown_172.29.248.100[172.29.248.100] (not responding)
    
    
    Data Centre: jhb
                 application : AS02[172.29.248.101]
    
                 webproxy : AS02[172.29.248.101]
    
                 database : AS02[172.29.248.101]
    

Recovery Steps

The primary node server is lost.

  1. It is decided to fail over to the secondary node:

    1. On the secondary node, remove the lost server from the cluster:

      cluster del 172.29.248.100

    2. On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command). See: Using the screen command.

      On the secondary node, check:

      $ cluster status
      
      Data Centre: jhb
                   application : AS02[172.29.248.101]
      
                   webproxy : AS02[172.29.248.101]
      
                   database : AS02[172.29.248.101]
      
  2. It is decided to recover the primary node:

    1. On the secondary node, remove the lost server from the cluster:

      cluster del 172.29.248.100

    2. On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command).

      On the secondary node, check:

      $ cluster status
      
      Data Centre: jhb
                   application : AS02[172.29.248.101]
      
                   webproxy : AS02[172.29.248.101]
      
                   database : AS02[172.29.248.101]
      
    3. Switch on the newly installed server.

      On the secondary node, add the server. Run cluster add 172.29.42.100.

      On either node, check:

      $ cluster status
      
      Data Centre: jhb
                   application : AS01[172.29.42.100]
                                 AS02[172.29.42.101]
      
                   webproxy :    AS01[172.29.42.100]
                                 AS02[172.29.42.101]
      
                   database :    AS01[172.29.42.100]
                                 AS02[172.29.42.101]
      
    4. Configure the primary database. On the newly installed server, run cluster provision primary 172.29.42.100 (it is recommended that this step is run in a terminal opened with the screen command).

      Check database configuration on both nodes, for example:

      $ database config
          date:
              $date: 1549450382862
          heartbeatIntervalMillis: 2000
          members:
              172.29.42.100:27020:
                  priority: 20.0
                  stateStr: PRIMARY
                  storageEngine: WiredTiger
              172.29.42.100:27030:
                  priority: 1.0
                  stateStr: ARBITER
                  storageEngine: Unknown
              172.29.42.101:27020:
                  priority: 10.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
          myState: 1
          ok: 1.0
          set: DEVICEAPI
          term: 8
      

Scenario: Loss of Secondary Node - Replace

  1. Remove the secondary node:

    cluster del <secondary node IP>
    
  2. Re-provision the cluster without the removed node:

    cluster provision
    
  3. Create a new secondary node: see Create a New VM Using the Platform-Install OVA

  4. On the newly added node, run:

    cluster prepnode
    
  5. From the primary unified node, run the command below - with the IP address of the new unified server to add it to the existing cluster.

    cluster add <secondary node IP>
    
  6. Re-provision the cluster:

    cluster provision primary <IP of current primary>