.. _dr-power-off-on-node-modular: Scenario: Power Off and On of a Node in a Modular Cluster ----------------------------------------------------------------- .. _21.1|VOSS-837: .. index:: voss;voss finalize_transaction .. index:: database;database weight .. index:: database;database config .. index:: cluster;cluster run .. index:: cluster;cluster del .. index:: cluster;cluster provision .. index:: web;web weight The scenario and recovery steps apply to database, application and Proxy nodes. Node powered off * Secondary database node assumes primary * There is no cluster downtime and normal operations continue where the cluster is processing requests and transactions are committed successfully up to the point where a node is powered off. * At this point, *all* transactions that are currently in flight at the node are lost and will not recover. The lost transactions have to be rerun. * The lost transactions have to be replayed or rerun. Bulk load transactions cannot be replayed and have to be rerun. Before resubmitting a failed Bulk load job, carry out the following command on an application node CLI in order to manually clear each failure transaction that still has a Processing status *after a service restart*. Use the command: **voss finalize_transaction ** The failed transaction status then changes from Processing to Fail. With the node still powered off, replaying the failed transactions is successful Recovery steps if the node is powered off: 1. Power up the node. The node resyncs. For a database node, run the **database config** command to verify the state of the database members. A typical output of the command would be: :: $ database config date: 2017-04-25T09:50:34Z heartbeatIntervalMillis: 2000 members: 172.29.21.41:27020: priority: 60.0 stateStr: PRIMARY storageEngine: WiredTiger 172.29.21.41:27030: priority: 1.0 stateStr: ARBITER storageEngine: WiredTiger 172.29.21.42:27020: priority: 50.0 stateStr: SECONDARY storageEngine: WiredTiger 172.29.21.43:27020: priority: 40.0 stateStr: SECONDARY storageEngine: WiredTiger 172.29.21.44:27020: priority: 30.0 stateStr: SECONDARY storageEngine: WiredTiger 172.29.21.45:27020: priority: 20.0 stateStr: SECONDARY storageEngine: WiredTiger 172.29.21.46:27020: priority: 10.0 stateStr: SECONDARY storageEngine: WiredTiger myState: 1 ok: 1.0 set: DEVICEAPI term: 38 Note that ``storageEngine`` will show as ``WiredTiger`` after the database engine upgrade to Wired Tiger when upgrading to VOSS Automate 17.4. Otherwise, the value is ``MMAPv1``. In other words, the database should not for example be any of: ``STARTUP``, ``STARTUP2`` or ``RECOVERING``. Note however that is is sometimes expected that nodes are recovering or in startup, but then should change to a normal state after a period of time (depending on how far out of sync those members are). A file system check may take place. #. If a replacement node is not on standby, rebuild steps such as boot up, adding to cluster, setting database weight and reprovisioning may take 200-300 minutes, depending on hardware specifications. It is recommended that standby nodes are available to be used for faster recovery. .. note:: Upon cluster provision failure at any of the proxy nodes during provisioning, the following steps illustrate the cluster provisioning: 1. Run **database config** and check if nodes are either in STARTUP2 or SECONDARY or PRIMARY states with correct arbiter placement. 2. Login to web proxy on both primary and secondary site and add a web weight using **web weight add :443 1** for all those nodes that you want to provide a web weight of 1 on the respective proxies. 3. Run **cluster provision** to mitigate the failure (it is recommended that this step is run in a terminal opened with the **screen** command). See: :ref:`screen-command`. 4. Run **cluster run all app status** to check if all the services are up and running after cluster provisioning completes. .. note:: If the existing nodes in the cluster do not see the new incoming cluster after **cluster add**, try the following steps: 1. Run **cluster del ** from the primary database node, being the IP of the new incoming node. 2. For database nodes, run **database weight del ** from the primary database node, being the IP of the new incoming node. 3. Log into primary database node and run **cluster add ** , being the IP of the new incoming node. 4. For database nodes, run **database weight add ** from the same session, being the IP of the new incoming node. 5. Use **cluster run database cluster list** to check if all nodes see the new incoming nodes inside the cluster.