SNMP Trap: Excessive Load

Identification

  • The originating IP / hostname is used to identify the system generating the traps

  • The NMS is responsible for associating traps with each managed system, along with clearing of alarms and escalation to the relevant system operator

  • The trap OID is generic for various SNMP events monitored by the system

  • The SNMP system name is included as part of the variable binding to assist identification:

    .iso.org.dod.internet.mgmt.mib-2.system.sysName.0 = standalone
    
  • The following variable binding can be used to determine that the load average threshold has been exceeded.

    .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotTrigger.0 = ERROR: Excessive load.
    
  • The following variable binding can be used to further diagnose which time interval threshold has been exceeded

    .iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laNames.<LoadIdx> = <LoadError>
    .iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laErrMessage.<LoadIdx> = <LoadMessage>
    
Load average interval <LoadIdx> <LoadError> <LoadMessage>
1 minute 1 Load-1 1 min Load Average too high (= 2.52)
5 minute 2 Load-5 5 min Load Average too high (= 1.27)
15 minute 3 Load-15 15 min Load Average too high (= 1.27)

Trap OID

.iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotifications.mteTriggerFired

Variable Bindings

  • .iso.org.dod.internet.mgmt.mib-2.system.sysUpTime.0 = 2 minutes (12065)
  • snmpTrapOID = mteTriggerFired
  • .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotTrigger.0 = ERROR: Excessive load.
  • .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotTargetName.0 =
  • .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotContextName.0 =
  • .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotOID.0 = laErrorFlag.1
  • .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotValue.0 = 1
  • .iso.org.dod.internet.mgmt.mib-2.system.sysName.0 = standalone
  • .iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laNames.1 = Load-1
  • .iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laErrMessage.1 = 1 min Load Average too high (= 1.36)

Severity:

  • Critical:
    • ERROR: Excessive load
    • ERROR: Extremely high CPU usage
  • Urgent: WARNING: High CPU usage

Example: Critical

Mar 19 08:08:34 robot-sl snmptrapd[1234]:
2019-03-19 08:08:34 <UNKNOWN>
[UDP: [192.168.100.3]:20997->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6797884) 18:52:58.84
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Excessive load"
#011iso.3.6.1.2.1.88.2.1.2.0 = ""
#011iso.3.6.1.2.1.88.2.1.3.0 = ""
#011iso.3.6.1.2.1.88.2.1.4.0 = OID: iso.3.6.1.4.1.2021.10.1.100.1
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"
#011iso.3.6.1.4.1.2021.10.1.2.1 = STRING: "Load-1"
#011iso.3.6.1.4.1.2021.10.1.101.1 = STRING: "1 min Load Average too high (= 3.45)"

Mar 19 08:10:34 robot-sl snmptrapd[1234]:
2019-03-19 08:10:34 <UNKNOWN>
[UDP: [192.168.100.3]:49080->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6809885) 18:54:58.85
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Excessive load"
#011iso.3.6.1.2.1.88.2.1.2.0 = ""
#011iso.3.6.1.2.1.88.2.1.3.0 = ""
#011iso.3.6.1.2.1.88.2.1.4.0 = OID: iso.3.6.1.4.1.2021.10.1.100.2
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"
#011iso.3.6.1.4.1.2021.10.1.2.2 = STRING: "Load-5"
#011iso.3.6.1.4.1.2021.10.1.101.2 = STRING: "5 min Load Average too high (= 2.24)"

Mar 19 08:11:34 robot-sl snmptrapd[1234]:
2019-03-19 08:11:34 <UNKNOWN>
[UDP: [192.168.100.3]:47676->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6815886) 18:55:58.86
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Excessive load"
#011iso.3.6.1.2.1.88.2.1.2.0 = ""
#011iso.3.6.1.2.1.88.2.1.3.0 = ""
#011iso.3.6.1.2.1.88.2.1.4.0 = OID: iso.3.6.1.4.1.2021.10.1.100.3
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"
#011iso.3.6.1.4.1.2021.10.1.2.3 = STRING: "Load-15"
#011iso.3.6.1.4.1.2021.10.1.101.3 = STRING: "15 min Load Average too high (= 1.16)"

Example: Critical

Mar 19 08:12:14 robot-sl snmptrapd[1234]:
2019-03-19 08:12:14 <UNKNOWN>
[UDP: [192.168.100.3]:21137->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6819828) 18:56:38.28
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Extremely high CPU usage"
#011iso.3.6.1.2.1.88.2.1.3.0 = STRING: "CPU activity:  4.14, 2.78, 1.29"
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"

Example: Urgent

Mar 20 12:46:04 robot-sl snmptrapd[1214]:
2019-03-20 12:46:04 <UNKNOWN>
[UDP: [192.168.100.3]:48439->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (114032) 0:19:00.32
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "WARNING: High CPU usage"
#011iso.3.6.1.2.1.88.2.1.3.0 = STRING: "CPU activity:  3.41, 2.56, 1.28"
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"