Check General Cluster Health#
Services#
Purpose#
If any services on the cluster are not running, it could indicate a problem in the system.
Procedure#
Log in on any unified node (multinode unified topology) / application node (modular cluster topology).
Run the following commands:
cluster status
and
cluster run all app status
Check for any anomalous output, e.g. stopped services or unknown nodes or mismatched service versions.
Step to Resolve#
Start stopped services, resolve issues on non-responsive nodes. Escalate unresolvable issues to VOSS L2 helpdesk.
Nodes in Cluster#
Purpose#
If all nodes in the cluster are not known to all other nodes, provisioning may fail.
Procedure#
Log in on any unified node (multinode unified topology) / application node (modular cluster topology).
Run the following command:
cluster run database cluster list
Ensure all nodes list the correct number of nodes.
Step to resolve#
If one or more nodes do not list all nodes, the nodes may need to be deleted and re-added, possibly from a different unified node. Nodes can be added or deleted without any harm until all nodes show the same output of the cluster list command.
Escalate unresolvable issues to VOSS L2 helpdesk.
Node Communication#
Purpose#
Ensure the nodes in the cluster can freely communicate.
Procedure#
Log in on any unified node (multinode unified topology) / application node (modular cluster topology).
Run a cluster command across all nodes, for example:
cluster run all network list
Verify that all nodes respond with the expected output.
Step to resolve#
Go back to checking the general health of the cluster.
NTP Connectivity#
Purpose#
Ensure NTP is accessible in order to prevent failures such as unexpected session timeout.
Procedure#
For each node:
Log in as root.
Run the following command:
ntpq -p
The output will show a result for the reach metric. A value of 377 indicates that there has been no packet loss, while a value less than 377 shows that there was some packet loss. A value of zero will be a cause for concern.
Step to resolve#
In the event that the reach parameter returns with a value of 0, restart the time service by running the following command:
app start services:time --force
Repeat the procedure above. If the problem persists, contact VOSS L2 Helpdesk.