.. _snmp_system_monitoring:

SNMP Traps: System Monitoring
-----------------------------

.. _19.1.1|VOSS-489:
.. _21.2|EKB-9363:



.. index:: voss;voss session-limits
.. index:: voss;voss throttle-rates

Administrators at ``sysadmin`` level 
can configure additional SNMP traps for alerts
from the **System Monitoring > Configuration** menu on the GUI 
(menu model: ``data/SystemMonitoringConfig``). Note: some traps
are not configurable.

Refer to the topic on System Monitoring Configuration
in the Advanced Configuration Guide.

The following alerts are configured:

===============================  =========  =============  ==============================
Notification                     Interval   Level          Configurable
===============================  =========  =============  ==============================
Txn Queue Size                   Hourly     warn           Yes
Failed Txn                       Immediate  warn           Yes
Stuck in Queued                  Hourly     error          Yes
Stuck in Processing              Hourly     error          Yes
data/Alert (CNF atm)             Immediate  Alert defined  No
Session Exceeded                 Immediate  warn           Yes (via platform CLI command)
API Request Throttled            Immediate  warn           Yes (via platform CLI command)
Total DB Index Size              Daily      warn           Yes
Total DB Size                    Daily      warn           Yes
Device Comms. Concurrency Limit  Immediate  warn           No
===============================  =========  =============  ==============================

For platform CLI commands for session limits and throttle rates, 
see: :ref:`voss-performance-commands`.

Transaction Queue Size
......................

In accordance with the configurable threshold (default 500)

Identifying strings and example context:

::

   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: Transaction Queue Size Exceeded Threshold
   DISMAN-EVENT-MIB::mteHotContextName.0 = STRING: Current Size: 520 Threshold: 500


Transactions: maximum time in Queued and Processing state
.........................................................

Alerts on transactions exceeding maximum configured queued and processing time.

Identifying strings:

::

   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING:

   Stuck in Queued: <n> transaction(s) 'Queued' for too long.
   Stuck in Processing: <n> transaction(s) 'Processing' for too long.

Example: Queued

::

   2021-11-19 15:31:38 <UNKNOWN> 
   [UDP: [192.168.100.3]:13177->[192.168.100.25]:162]:
   #012iso.3.6.1.2.1.1.3.0 = Timeticks: (17398363) 2 days, 0:19:43.63
   #011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
   #011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "Stuck in Queued"
   #011iso.3.6.1.2.1.88.2.1.3.0 = STRING: 
    "ID: Transactions, 
    Code: 72054, 
    Occurences: 44, 
    Latest Occurence: 2021-11-19T13:31:36.948Z"
   #011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
   #011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"

Example: Processing

::

    2021-11-19 18:31:40 <UNKNOWN> 
    [UDP: [192.168.100.3]:47295->[192.168.100.25]:162]:
    #012iso.3.6.1.2.1.1.3.0 = Timeticks: (18478492) 2 days, 3:19:44.92
    #011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
    #011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "Stuck in Processing"
    #011iso.3.6.1.2.1.88.2.1.3.0 = STRING: 
     "ID: Transactions, 
     Code: 72055, 
     Occurences: 39, 
     Latest Occurence: 2021-11-19T16:31:38.655Z"
    #011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
    #011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"


Transactions: Model Operations Alerts
.....................................

* Alerts on transactions failure

  * per model (wild cards allowed, default is ``data/*``)
  * model operations (default is **Import**)

Identifying string:

::

   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: Transaction Completed with Fail


Transaction trap context information (200 chars):

* ``ID``: transaction ID (same as on GUI - further transaction details available on GUI)
* ``Action``:  transaction message  (same as on GUI)
* ``Detail``: source of resource (source host for import)
* ``Hierarchy``: friendly path of the resource, else the execution hierarchy of transaction


Example: Import Fail

::

   2019-03-28 10:54:46 <UNKNOWN> 
   [UDP: [192.168.100.3]:31384->[192.168.100.25]:162]:
   DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (170158257) 19 days, 16:39:42.57
   SNMPv2-MIB::snmpTrapOID.0 = OID: DISMAN-EVENT-MIB::mteTriggerFired
   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING:
    Transaction Completed with Fail
   DISMAN-EVENT-MIB::mteHotContextName.0 = STRING:
    ID: 44967,
    Action: Import Call Manager,
    Detail: 192.168.100.15,
    Hierarchy: sys
   DISMAN-EVENT-MIB::mteHotValue.0 = INTEGER: 1
   SNMPv2-MIB::sysName.0 = STRING: VOSS


Change Notification Feature (CNF)
.................................

CNF traps are triggered when Change Notification Sync transactions
add or update instances
on the ``data/Alerts`` model. 

The identifying alert string is:

::

  DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: Device Change Notification



The ``data/Alerts`` attribute values of the model
are provided in the traps details: ::

  alert_severity
  alert_category
  alert_timestamp
  alert_count
  alert_id
  alert_message
  alert_code


For example , the trap Context information (200 chars) is:

* ``ID``: Device Host business key (``alert_id``)
* ``Code``:  CNF Alert code (``alert_code``)
* ``Occurrences``: number of occurrences
* ``Latest Occurrence:``: time stamp (``alert_timestamp``)

Warning and Error Alert Codes
...............................

The following table shows alert codes and details

.. tabularcolumns:: |p{1.5cm}|p{6.5cm}|p{7cm}|

+-------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Code  | Details                                                                                                                          | Resolution                                                                                                                                                                                 |
+=======+==================================================================================================================================+============================================================================================================================================================================================+
| 72051 | ERROR. Device connectivity failure                                                                                               | For connectivity checks, see *UC Apps Reachability* in the Advanced Configuration Guide and also the Platform Guide and Health Checks for Cluster Installations Guide.                     |
+-------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 72052 | WARNING. Slow device connection: the roundtrip time (RTT) is greater than 400ms.                                                 | For latency checks, see *UC Apps Reachability* in the Advanced Configuration Guide and the Health Checks for Cluster Installations Guide.                                                  |
+-------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 72053 | ERROR. Utilization approaching limit: if the maximum number of sessions during the interval exceeds 80% of the configured limit. | The threshold can be reached if many users are using the system at the same time, or by not logging out. For **Utilization %**, see *Login Sessions* in the  Advanced Configuration Guide. |
+-------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 72054 | ERROR. Stuck in Queued: <n> transaction(s) 'Queued' for too long.                                                                | The error is raised if transactions are in *queued* state longer than configured maximum time. See *System Monitoring Configuration* in the  Advanced Configuration Guide.                 |
+-------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 72055 | ERROR. Stuck in Processing: <n> transaction(s) 'Processing' for too long.                                                        | The error is raised if transactions are in *processing* state longer than configured maximum time. See *System Monitoring Configuration* in the  Advanced Configuration Guide.             |
+-------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Example: CNF alert

::

   2019-03-28 10:54:46 <UNKNOWN> 
   [UDP: [192.168.100.3]:31384->[192.168.100.25]:162]:
   DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (170158257) 19 days, 16:39:42.57
   SNMPv2-MIB::snmpTrapOID.0 = OID: DISMAN-EVENT-MIB::mteTriggerFired
   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: Device Change Notification
   DISMAN-EVENT-MIB::mteHotContextName.0 = STRING:
    ID: 44967,
    Code: 100034,
    Occurrences: 1,
    Latest Occurrence: 2019-03-28 10:54:44Z
   DISMAN-EVENT-MIB::mteHotValue.0 = INTEGER: 1
   SNMPv2-MIB::sysName.0 = STRING: VOSS


Session Limits
..............


SNMP traps are triggered when session limits are reached.


Example:

For example, the customer administrator session limit default is 10 and a trap is 
triggered if it is exceeded.
(The default can be configured with the **voss session-limits** command).

.. note::

   Global session limits do not show a ``Hierarchy`` value in the message string.


::

   2019-03-28 10:54:46 <UNKNOWN> 
   [UDP: [192.168.100.3]:31384->[192.168.100.25]:162]:
   DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (170158257) 19 days, 16:39:42.57
   SNMPv2-MIB::snmpTrapOID.0 = OID: DISMAN-EVENT-MIB::mteTriggerFired
   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: Customer Administration Session Limit Exceeded
   DISMAN-EVENT-MIB::mteHotContextName.0 = STRING:
    Limit: 10,
    Hierarchy: sys.hcs.Varidion.GSCorp
   DISMAN-EVENT-MIB::mteHotValue.0 = INTEGER: 1
   SNMPv2-MIB::sysName.0 = STRING: VOSS



API Request Throttle
....................


SNMP traps are triggered when throttle rates are reached.

Throttle rates are configured with:

  **voss throttle-rates type <administration|selfservice|user> requests <number of requests> unit <min|sec>**

In other words, the SNMP trap would be triggered for request limits for any of:

* Administration
* Self-service
* User-specific

Identifying strings and Self-service as example:

::

   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: Selfservice Api Request Limit Exceeded
   DISMAN-EVENT-MIB::mteHotContextName.0 = STRING: Rate 20/min


Total DB Index Size
...................


In accordance with the configurable threshold (default 50)

Identifying strings and example:

::

   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: DB Index Size Exceeded Threshold
   DISMAN-EVENT-MIB::mteHotContextName.0 = STRING: DB Index Size (60.00GB) exceeded threshold (50GB)



Total DB Size
.............


In accordance with the configurable threshold (default 200)

Identifying strings and example:

::

   DISMAN-EVENT-MIB::mteHotTrigger.0 = STRING: DB Size Exceeded Threshold
   DISMAN-EVENT-MIB::mteHotContextName.0 = STRING: DB Size (210.30GB) exceeded threshold (200GB)



Device Communications Concurrency Limit
.......................................

SNMP traps are sent if there is a timeout failure while
connecting to a device and waiting for the concurrency limit.

Current concurrency support: 

* 8 concurrent requests to Unified CM
* 8 concurrent requests to Unity Connection


