.. _dr-power-off-on-node-modular:

Scenario: Power Off and On of a Node in a Modular Cluster
-----------------------------------------------------------------

.. _21.1|VOSS-837:

.. index:: voss;voss finalize_transaction
.. index:: database;database weight
.. index:: database;database config
.. index:: cluster;cluster run
.. index:: cluster;cluster del
.. index:: cluster;cluster provision
.. index:: web;web weight

The scenario and recovery steps apply to database, application and Proxy nodes.

Node powered off

* Secondary database node assumes primary
* There is no cluster downtime and normal operations continue where 
  the cluster is processing requests and transactions are committed 
  successfully up to the point where a node is powered off. 
* At this point, *all* transactions that are currently in flight at the node
  are lost and will not recover. The lost transactions have to be rerun.
* The lost transactions have to be replayed or rerun.

  Bulk load transactions cannot be replayed and have to be rerun.
  Before resubmitting a failed Bulk load job, carry out the following command
  on an application node CLI in order to manually clear each failure
  transaction that still has a Processing status *after a service restart*. 
  Use the command:

  **voss finalize_transaction <Trans ID>**

  The failed transaction status then changes from Processing to Fail.
  With the node still powered off, replaying the failed transactions is successful


Recovery steps if the node is powered off:


1. Power up the node. The node resyncs. 
   
   For a database node, run the **database config** command to
   verify the state of the database members. A typical output of the command would be:
   
   ::

      $ database config
          date: 2017-04-25T09:50:34Z
          heartbeatIntervalMillis: 2000
          members:
              172.29.21.41:27020:
                  priority: 60.0
                  stateStr: PRIMARY
                  storageEngine: WiredTiger
              172.29.21.41:27030:
                  priority: 1.0
                  stateStr: ARBITER
                  storageEngine: WiredTiger
              172.29.21.42:27020:
                  priority: 50.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
              172.29.21.43:27020:
                  priority: 40.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
              172.29.21.44:27020:
                  priority: 30.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
              172.29.21.45:27020:
                  priority: 20.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
              172.29.21.46:27020:
                  priority: 10.0
                  stateStr: SECONDARY
                  storageEngine: WiredTiger
          myState: 1
          ok: 1.0
          set: DEVICEAPI
          term: 38


   Note that ``storageEngine`` will show as ``WiredTiger`` after the database engine upgrade
   to Wired Tiger when upgrading to VOSS Automate 17.4. Otherwise, the value is ``MMAPv1``.
      
   In other words, the database should not for example be any of: ``STARTUP``, 
   ``STARTUP2`` or ``RECOVERING``. Note however that is is sometimes expected that
   nodes are recovering or in startup, but then should change to a normal state after
   a period of time (depending on how far out of sync those members are). 
 
   A file system check may take place.
#. If a replacement node is not on standby, rebuild steps such as 
   boot up, adding to cluster, setting database weight and reprovisioning 
   may take 200-300 minutes, depending on hardware specifications.

   It is recommended that standby nodes are available to be used for faster
   recovery.

.. note::

   Upon cluster provision failure at any of the proxy nodes during provisioning, the following steps illustrate
   the cluster provisioning:

   1. Run **database config** and check if nodes are either in STARTUP2 or SECONDARY or PRIMARY
      states with correct arbiter placement.
   2. Login to web proxy on both primary and secondary site and add a web weight using **web weight add <ip>:443 1**
      for all those nodes that you want to provide a web weight of 1 on the respective proxies.
   3. Run **cluster provision** to mitigate the failure (it is recommended that this step
      is run in a terminal opened with the **screen** command). See: :ref:`screen-command`.
   4. Run **cluster run all app status** to check if all the services are up and running after cluster provisioning
      completes.

.. note::

   If the existing nodes in the cluster do not see the new incoming cluster after **cluster add**,
   try the following steps:

   1. Run **cluster del <ip>** from the primary database node, <ip> being the IP of the new incoming node.
   2. For database nodes, run **database weight del <ip>** from the primary database node, <ip> being the IP of the new incoming node.
   3. Log into primary database node and run **cluster add <ip>** ,<ip> being the IP of the new incoming node.
   4. For database nodes, run **database weight add <ip> <weight>** from the same session, <ip> being the IP of the new incoming
      node.
   5. Use **cluster run database cluster list** to check if all nodes see the new incoming nodes inside the
      cluster.