Diagnostic Troubleshooting
--------------------------

.. index:: diag;diag health
.. index:: diag;diag disk
.. index:: diag;diag free
.. index:: diag;diag top
.. index:: app;app status
.. index:: log;log view

The health displayed on login will normally include sufficient information 
to determine that the system is either working, or experiencing a fault.  
More detailed health reports can be be displayed with **diag health**.

A rich set of SNMP and SMTP traps are described in the Notifications section 
which can be used to automate fault discovery.

Determine if all processes are running using **app status**.  If a process 
is not running, investigate its log file with:

**log view process/<application>.<process>**

For example, checking processes:

:: 
 
   platform@development:~$ app status
   development v0.8.0 (2013-08-12 12:41)
   voss-deviceapi v0.6.0 (2013-11-19 07:37)
      |-voss-celerycam             running
      |-voss-queue_high_priority   running

      ...
   core_services v0.8.0 (2013-08-27 10:46)
      |-wsgi                       running
      |-logsizemon                 running
      |-firewall                   running
      |-mountall                   running
      |-syslog                     running (completed)
      |-timesync                   stopped (failed with error 1)
   nginx v0.8.0 (2013-08-27 10:53)
      |-nginx                      running
   security v0.8.0 (2013-08-27 11:02)

Followed by a log investigation for a stopped process:

:: 
 
   platform@development:~$ log view process/core_services.timesync
   2013-08-15 10:55:20.234932 is stopping from basic_stop
   2013-08-15 10:55:20:    core_services:timesync killed 
     successfully
   2013-08-15 10:55:20: Apps.StatusGenerator core_services:timesync 
     returned 1 after 1 loops
   App core_services:timesync is not running with status stopped

   ...

   + /usr/sbin/ntpdate 172.29.1.15
   2014-02-04 09:27:31: Apps.StatusGenerator core_services:timesync 
     returned 0 after 1 loops
   2014-02-04 09:27:31: WaitRunning core_services:timesync is reporting 
     return code 0
   core_services:timesync:/opt/platform/apps/core_services/timesync 
     started
   4 Feb 09:27:38 ntpdate[2766]: no server suitable for 
     synchronization found
   + echo 'Failed to contact server: 172.29.1.15 - retrying'
   Failed to contact server: 172.29.1.15 - retrying
   + COUNTER=2
   + sleep 1
   + test 2 -lt 3
   + /usr/sbin/ntpdate 172.29.1.15
   4 Feb 09:27:48 ntpdate[3197]: no server suitable for 
     synchronization found
   + echo 'Failed to contact server: 172.29.1.15 - retrying'
   Failed to contact server: 172.29.1.15 - retrying
   + COUNTER=3
   + sleep 1
   + test 3 -lt 3
   + test 3 -eq 3
   + echo 'Timesync  - could not contact server 172.29.1.15 after 
       three tries. Giving up'
   Timesync  - could not contact server 172.29.1.15 after 
      three tries. Giving up
   + exit 1


The error message and return code being displayed in the browser is also 
invaluable in determining the cause of the problem.

The system resources can be inspected as follows:

* **diag disk** will display the disk status
* **diag free** and **diag mem** will display the memory status
* **diag top** will display the CPU status


.. |VOSS-4-UC| replace:: VOSS-4-UC
.. |Unified CM| replace:: Unified CM