Scenario: Loss of Full Cluster in a Modular Cluster#

Background#

  • The administrator deployed the cluster into a primary and DR site.

  • The cluster is deployed following the Installation Guide.

  • The example is a typical cluster deployment: 8 nodes, where 3 nodes are database servers, 3 nodes are application nodes and 2 nodes are proxy servers.

    The design is preferably split over 2 physical data centers.

  • The cluster might also be in two geographically dispersed areas. The cluster has to be installed in two different site names or data center names.

Full cluster failure#

  • In this scenario, all nodes failed while transactions were running.

  • At this point, all transactions that were in flight are lost and will not recover.

  • The lost transactions have to be rerun.

  • The cluster will not be operational and manual intervention is needed to recover.

  • To recover the cluster, carry out the Recovery Steps.

Recovery Steps#

Important

  • Prerequisite: a system backup exported to a remote backup location. The backup file on the remote location would typically have a format <timestamp>.tar.gz. This recovery procedure will only succeed if you have a valid recent backup to restore.

  • For details, considerations and specific commands at each step below, refer to the “Modular Cluster Multinode Installation” topic in the Installation Guide.

  1. Ensure all traces of the previous nodes have been removed from the VMware environment.

  2. Deploy fresh nodes as per the original topology.

  1. Add each node to the cluster by running cluster prepnode.

  2. From the primary database node, add each node to the cluster using the cluster add <IP address of node> command.

  3. On the primary database node, set the database weights for each database node using the database weight add <IP address of node> <weight> command.

  4. Restore a backup made from the highest weighted secondary database node in the original cluster.

    Follow the Import steps here: Backup and Import to a New Environment.

    Note: It is not necessary to run cluster provision again on the primary node. This action is included in the backup restore process.

  5. On the new app nodes, check the number of queues using voss queues and if the number is less than 2, set the queues to 2 with voss queues 2.

    Note

    Applications are reconfigured and the voss-queue process is restarted.

  6. Ensure all services are up and running:

    Run cluster run all app status to check if all the services are up and running after the restore completes.

Note

  • Upon cluster provision failure at any of the proxy nodes during provisioning, the following steps illustrate the cluster provisioning:

    1. Run database config and check if nodes are either in STARTUP2 or SECONDARY or PRIMARY states with correct arbiter placement.

    2. Login to web proxy on both primary and secondary site and add a web weight using web weight add <ip>:443 1 for all those nodes that you want to provide a web weight of 1 on the respective proxies.

    3. Run cluster provision to mitigate the failure.

    4. Run cluster run all app status to check if all the services are up and running after cluster provisioning completes.

  • If the existing nodes in the cluster do not see the new incoming cluster after cluster add, try the following steps:

    1. Run cluster del <ip> from the primary node, <ip> being the IP of the new incoming node.

    2. Delete all database weights. Run database weight del <ip> from the primary node, <ip> being the IP of the nodes, including the new incoming node.

    3. Log into any secondary node (non primary unified node) and run cluster add <ip> ,<ip> being the IP of the new incoming node.

    4. Re-add all database weights. Run database weight add <ip> <weight> from the same session, <ip> being the IP of the nodes, including the new incoming node.

    5. Use cluster run database cluster list to check if all nodes see the new incoming nodes inside the cluster.