DR Failover and Recovery in a 2 Node Cluster

Important

A 2 node cluster will not fail over automatically.

With only two Unified nodes, with or without Web proxies, there is no High Availability. The database on the primary node is read/write, while the database on the secondary is read only.

Only redundancy is available.

  • If the primary node fails, a manual delete of the primary node on the secondary and a cluster provision will be needed.
  • If the secondary node fails, it needs to be replaced.

Scenario: Loss of Primary Node

  • The administrator deployed the 2-node cluster.

    $ cluster status
    
    Data Centre: jhb
                 application : AS01[172.29.42.100]
                               AS02[172.29.42.101]
    
                 webproxy :    AS01[172.29.42.100]
                               AS02[172.29.42.101]
    
                 database :    AS01[172.29.42.100]
                               AS02[172.29.42.101]
    

    Example database weights:

    $ database weight list
         172.29.42.100:
             weight: 20
         172.29.42.101:
             weight: 10
    
  • Node Failure: in the case where the primary node is lost on the Primary site:

    $ cluster status
    
    Data Centre: unknown
                 application : unknown_172.29.248.100[172.29.248.100] (not responding)
    
                 webproxy : unknown_172.29.248.100[172.29.248.100] (not responding)
    
                 database : unknown_172.29.248.100[172.29.248.100] (not responding)
    
    
    Data Centre: jhb
                 application : AS02[172.29.248.101]
    
                 webproxy : AS02[172.29.248.101]
    
                 database : AS02[172.29.248.101]
    

Recovery Steps

The primary node server is lost.

  1. It is decided to fail over to the secondary node:

    1. On the secondary node, remove the lost server from the cluster:

      cluster del 172.29.248.100

    2. On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command).

      On the secondary node, check:

      $ cluster status
      
      Data Centre: jhb
                   application : AS02[172.29.248.101]
      
                   webproxy : AS02[172.29.248.101]
      
                   database : AS02[172.29.248.101]
      
  2. It is decided to recover the primary node:

    1. On the secondary node, remove the lost server from the cluster:

      cluster del 172.29.248.100

    2. On the secondary node, run cluster provision (it is recommended that this step is run in a terminal opened with the screen command).

      On the secondary node, check:

      $ cluster status
      
      Data Centre: jhb
                   application : AS02[172.29.248.101]
      
                   webproxy : AS02[172.29.248.101]
      
                   database : AS02[172.29.248.101]
      
    3. Switch on the newly installed server.

      On the secondary node, add the server. Run cluster add 172.29.42.100.

      On either node, check:

      $ cluster status
      
      Data Centre: jhb
                   application : AS01[172.29.42.100]
                                 AS02[172.29.42.101]
      
                   webproxy :    AS01[172.29.42.100]
                                 AS02[172.29.42.101]
      
                   database :    AS01[172.29.42.100]
                                 AS02[172.29.42.101]
      
    4. Configure the primary database. On the newly installed server, run cluster provision primary 172.29.42.100 (it is recommended that this step is run in a terminal opened with the screen command).

      Check database configuration on both nodes, for example:

      $ database config
          date:
              $date: 1549450382862
          heartbeatIntervalMillis: 2000
          members:
              172.29.42.100:27020:
                  priority: 20.0
                  stateStr: PRIMARY
                  storageEngine: WiredTiger
              172.29.42.100:27030:
                  priority: 1.0
                  stateStr: ARBITER
                  storageEngine: Unknown
              172.29.42.101:27020:
                  priority: 10.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
          myState: 1
          ok: 1.0
          set: DEVICEAPI
          term: 8
      
    5. If an OVA file was not available for your current release and you used the most recent release OVA for which there is an upgrade path to your release to create the new unified node, re-apply the Delta Bundle upgrade to the cluster.

      Note that the new node version mismatch in the cluster can be ignored, since this upgrade step aligns the versions.

      See: Upgrade