Check general cluster health#

Services#

If any services on the cluster are not running, it could indicate a problem in the system.

To check:

  1. Log in on any unified node (multinode unified topology) / application node (modular cluster topology).

  2. Run the following commands:

    cluster status

    and

    cluster run all app status

  3. Check for any anomalous output, for example, topped services or unknown nodes or mismatched service versions.

  4. Resolve issues:

    • Start stopped services.

    • Resolve issues on non-responsive nodes.

    • Escalate unresolvable issues to VOSS L2 helpdesk.

Nodes in cluster#

If all nodes in the cluster are not known to all other nodes, provisioning may fail.

  1. Log in on any unified node (multinode unified topology) / application node (modular cluster topology).

  2. Run the following command:

    cluster run database cluster list

  3. Ensure all nodes list the correct number of nodes.

  4. Resolve issues, if any:

    • If one or more nodes do not list all nodes, the nodes may need to be deleted and re-added, possibly from a different unified node. Add or delete nodes until all nodes show the same output of the cluster list command.

    • Escalate unresolvable issues to VOSS L2 helpdesk.

Node communication#

Ensure the nodes in the cluster can freely communicate.

  1. Log in on any unified node (multinode unified topology) / application node (modular cluster topology).

  2. Run a cluster command across all nodes, for example:

    cluster run all network list

  3. Verify that all nodes respond with the expected output.

  4. To resolve issues, check the general health of the cluster.

NTP connectivity#

Ensure NTP is accessible in order to prevent failures such as unexpected session timeout.

For each node:

  1. Log in as root.

  2. Run the following command:

    ntpq -p

  3. The output displays a result for the reach metric. A value of 377 indicates that there has been no packet loss, while a value less than 377 shows that there was some packet loss. A value of zero will need to be resolved.

  4. Resolve issues:

    • If the reach parameter returns with a value of zero (0), restart the time service using the following command:

      app start services:time --force

    • Repeat the procedure. If the problem persists, contact VOSS L2 Helpdesk.