.. _dr-scenario-loss-of-full-cluster-modular:

Scenario: Loss of Full Cluster in a Modular Cluster
------------------------------------------------------------

.. _21.1|VOSS-837:

.. index:: voss;voss finalize_transaction
.. index:: voss;voss queues
.. index:: database;database weight
.. index:: database;database config
.. index:: cluster;cluster run
.. index:: cluster;cluster provision

Background
...........

* The administrator deployed the cluster into a primary and DR site.
* The cluster is deployed following the Installation Guide.
* The example is a typical cluster deployment: 8 nodes,
  where 3 nodes are database servers, 3 nodes are application nodes
  and 2 nodes are proxy servers.

  The design is preferably split over 2 physical data centers.
* The cluster might also be in two geographically dispersed areas. The 
  cluster has to be installed in two different site names or data center names. 

Full cluster failure
....................

* In this scenario, *all* nodes failed while transactions were running.
* At this point, *all* transactions that were in flight are lost and 
  will not recover.
* The lost transactions have to be rerun. 
* The cluster will not be operational and manual intervention is needed to recover. 
* To recover the cluster, carry out the Recovery Steps.


Recovery Steps
..............


.. important::

   * Prerequisite: a system backup exported to a remote backup location. 
     The backup file on the remote location would typically have a format *<timestamp>.tar.gz*.
     This recovery procedure will *only* succeed if you have a valid recent backup to restore. 
   * For details, considerations and specific commands at each step below, 
     refer to the "Modular Cluster Multinode Installation" topic in the *Installation Guide*.


1. Ensure all traces of the previous nodes have been removed from the VMware environment.
#. Deploy fresh nodes as per the original topology. 

   * Check topologies and hardware requirements in the *Installation Guide*.

     .. raw:: html
   
        <a class="reference internal" href="../concepts-deployment-models-modular-recc-geo-redund.html"><span class="std std-ref">Multinode Modular Cluster with Application and Database Nodes</span></a>

     .. raw:: html
   
        <a class="reference internal" href="../install/reference-hardware-specs-modular-geo-redund.html"><span class="std std-ref">Multinode Modular Cluster Hardware Specification</span></a>
  
   * For new *node type* deployment at the *required data center*, 
     see: :ref:`create_a_new_VM_using_the_platform-install_OVA`.

   * For the steps below, follow the "Modular Cluster Installation" topics in the *Installation Guide*:

     .. raw:: html
    
        <a class="reference internal" href="../install/modular-multinode-installation.html"><span class="std std-ref">Modular Cluster Multinode Installation</span></a>


3. Add each node to the cluster by running **cluster prepnode**.
#. From the primary database node, add each node to the cluster using the **cluster add <IP address of node>** command.
#. On the primary database node, set the database weights for each database node using the
   **database weight add <IP address of node> <weight>** command.
#. Restore a backup made from the highest weighted secondary database node in the original cluster.
   
   Follow the Import steps here: :ref:`backup-import-to-new-environment`.
   
   Note: It is not necessary to run **cluster provision** again on the primary node. This 
   action is included in the backup restore process.
#. On the new app nodes, check the number of queues using **voss queues** and if the
   number is *less than 2*, set the queues to 2 with **voss queues 2**.

   .. note::
      Applications are reconfigured and the ``voss-queue`` process is restarted.

#. Ensure all services are up and running:
   
   Run **cluster run all app status** to check if all the services are up and
   running after the restore completes.


.. note::


   * Upon cluster provision failure at any of the proxy nodes during provisioning, the following steps illustrate
     the cluster provisioning:

     1. Run **database config** and check if nodes are either in STARTUP2 or SECONDARY or PRIMARY
        states with correct arbiter placement.
     2. Login to web proxy on both primary and secondary site and add a web weight using **web weight add <ip>:443 1**
        for all those nodes that you want to provide a web weight of 1 on the respective proxies.
     3. Run **cluster provision** to mitigate the failure.
     4. Run **cluster run all app status** to check if all the services are up and running after cluster provisioning
        completes.

   * If the existing nodes in the cluster do not see the new incoming cluster after **cluster add**,
     try the following steps:

     1. Run **cluster del <ip>** from the primary node, <ip> being the IP of the new incoming node.
     2. Delete all database weights. Run **database weight del <ip>** from the primary node, <ip> being the IP
        of the nodes, including the new incoming node.
     3. Log into any secondary node (non primary unified node) and run **cluster add <ip>** ,<ip> being the IP
        of the new incoming node.
     4. Re-add all database weights. Run **database weight add <ip> <weight>** from the same session, <ip> being
        the IP of the nodes, including the new incoming node.
     5. Use **cluster run database cluster list** to check if all nodes see the new incoming nodes inside the
        cluster.