.. _cluster_check:

Cluster Check
-------------

.. index:: cluster;cluster check

.. _19.1.1|EKB-1249:
.. _19.2.1|EKB-2618: 
.. _20.1.1|VOSS-724|EKB-4942:
.. _20.1.1|VOSS-724|EKB-5579:
.. _20.1.1|VOSS-724|EKB-5603:
.. _20.1.1|VOSS-724|EKB-5298:
.. _25.3|EKB-25384:



On a cluster, the **cluster check [verbose]** command is available to check:

* ``network``: test and validate connectivity from each node to every other node, for each port required,
  as well as the time taken to connect to each node. 

  * Checks for access to port ``27020`` on database hosts is not required from web proxy nodes.
  * Checks for access to port ``443`` is only required from web proxy nodes to unified nodes.

* ``database``: carry out a check of database configuration
 
  * ``info``: displays database weights and whether the node state is primary, secondary or arbiter
  * ``error``: 
    
    * if there is no connection to the database IP on a port
    * if the current database weight does not match the configured weight
    * if a node is marked as an arbiter but is not in the list of arbiters
  * ``warn``: if the primary database node does not have the highest weight
* ``disk``: carry out a drive space percentage check
* ``ntp``: at NTP is functioning
* ``nrs``: a check if NRS (root shell using the nrs script), is running on a host,
   with error status:
 
     * ``info``: not running
     * ``error``: running, or an error checking the status 
* ``packages``: Check status of packages installed by the system package manager.
  If an error occurs for a package, a message next to the package name shows: ``package in an undesired state``.
* ``security``: Check for security updates. Error status:
 
  * ``info``: zero or one security update missed
  * ``error``: more than one security update missed
* ``cluster status``: also check the cluster status and
 
  * ``info``: show status as ``OK``
  * ``error``: display a message to run **cluster status** for details
  * ``warn``: It is advisable that these be resolved prior to upgrading where
    possible. Some warnings may be resolved by upgrading.

  .. note::

     If *only* node versions mismatch or some nodes are missing components,
     a ``warning`` status is displayed. This status will allow for an upgrade
     of a node during failover recovery.

     This caters for scenarios during repair/recovery of nodes.  
     The **cluster check**  will warn about version mismatches and
     not prevent upgrade commands.  The cluster check cannot distinguish
     between whether a recovery process is ongoing or a general
     fault exists. When no node recovery process is ongoing,
     then the warning should be treated as an error and resolved
     before upgrade commences.


This command should also be run *before* carrying out a system upgrade.


.. note::

   Without the ``verbose`` parameter, the **cluster check** command will
   *only show warnings and errors*. Otherwise it would only show the message
   ``No issues found with host checks``.
  
   Use the ``verbose`` parameter to see detailed output for supported commands.

Example output (abbreviated):

::

   $ cluster check
   warn
      192.168.322.3:
          drives
             /: 47 % utilised
      192.168.322.5:
          drives
             /: 47 % utilised
      192.168.322.6:
          drives
             /: 47 % utilised

   error       
      192.168.322.3
          network
             => 192.168.322.4:27020: Failed
      192.168.322.4: Failed to connect to host       
      192.168.322.5
          network
             => 192.168.322.4:27020: Failed
      192.168.322.6
             database
                 arbiter: not configured
                 weight: mismatched
      192.168.100.3
             nrs
                 running
     
   [...]
      cluster
          status
              Error, please run `cluster status` for more information
   
   



Using the ``verbose`` parameter to see detailed output 
Any warnings and errors are then shown at the end of the verbose output.

Abbreviated example, ``info`` only; no issues:

::

   $ cluster check verbose
   info
       192.168.100.3
           database
               arbiter: ok
               state: ok
               weight: ok
           disk
               /: 28%
               /opt/platform: 27%
               /opt/platform/apps/mongodb/dbroot: 1%
               /tmp: 1%
               /var/log: 3%
           network
               => 192.168.100.4:8443: 0.223ms
               => 192.168.100.4:27020: 0.205ms
               => 192.168.100.5:8443: 0.246ms
               => 192.168.100.5:27020: 0.405ms
               => 192.168.100.6:8443: 0.169ms
               => 192.168.100.6:27020: 0.218ms
               => 192.168.100.7:8443: 0.225ms
               => 192.168.100.8:8443: 0.208ms
           ntp
               172.29.88.56: 18.313ms
           packages
               package database: ok
           security
               updates: 0 missed
       192.168.100.4
           database
               arbiter: ok
               state: ok
               weight: ok
           disk
               /: 28%
               /opt/platform: 27%
               /opt/platform/apps/mongodb/dbroot: 1%
               /tmp: 1%
               /var/log: 2%
           avx    
             enabled
           network

       [...]
       cluster
           status
               OK
   
