SNMP Trap: Excessive Load¶
Identification
The originating IP / hostname is used to identify the system generating the traps
The NMS is responsible for associating traps with each managed system, along with clearing of alarms and escalation to the relevant system operator
The trap OID is generic for various SNMP events monitored by the system
The SNMP system name is included as part of the variable binding to assist identification:
.iso.org.dod.internet.mgmt.mib-2.system.sysName.0 = standalone
The following variable binding can be used to determine that the load average threshold has been exceeded.
.iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotTrigger.0 = ERROR: Excessive load.
The following variable binding can be used to further diagnose which time interval threshold has been exceeded
.iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laNames.<LoadIdx> = <LoadError> .iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laErrMessage.<LoadIdx> = <LoadMessage>
Load average interval | <LoadIdx> | <LoadError> | <LoadMessage> |
---|---|---|---|
1 minute | 1 | Load-1 | 1 min Load Average too high (= 2.52) |
5 minute | 2 | Load-5 | 5 min Load Average too high (= 1.27) |
15 minute | 3 | Load-15 | 15 min Load Average too high (= 1.27) |
Trap OID
.iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotifications.mteTriggerFired
Variable Bindings
- .iso.org.dod.internet.mgmt.mib-2.system.sysUpTime.0 = 2 minutes (12065)
- snmpTrapOID = mteTriggerFired
- .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotTrigger.0 = ERROR: Excessive load.
- .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotTargetName.0 =
- .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotContextName.0 =
- .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotOID.0 = laErrorFlag.1
- .iso.org.dod.internet.mgmt.mib-2.dismanEventMIB.dismanEventMIBNotificationPrefix. dismanEventMIBNotificationObjects.mteHotValue.0 = 1
- .iso.org.dod.internet.mgmt.mib-2.system.sysName.0 = standalone
- .iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laNames.1 = Load-1
- .iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laErrMessage.1 = 1 min Load Average too high (= 1.36)
Severity:
- Critical:
- ERROR: Excessive load
- ERROR: Extremely high CPU usage
- Urgent: WARNING: High CPU usage
Example: Critical¶
Mar 19 08:08:34 robot-sl snmptrapd[1234]:
2019-03-19 08:08:34 <UNKNOWN>
[UDP: [192.168.100.3]:20997->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6797884) 18:52:58.84
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Excessive load"
#011iso.3.6.1.2.1.88.2.1.2.0 = ""
#011iso.3.6.1.2.1.88.2.1.3.0 = ""
#011iso.3.6.1.2.1.88.2.1.4.0 = OID: iso.3.6.1.4.1.2021.10.1.100.1
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"
#011iso.3.6.1.4.1.2021.10.1.2.1 = STRING: "Load-1"
#011iso.3.6.1.4.1.2021.10.1.101.1 = STRING: "1 min Load Average too high (= 3.45)"
Mar 19 08:10:34 robot-sl snmptrapd[1234]:
2019-03-19 08:10:34 <UNKNOWN>
[UDP: [192.168.100.3]:49080->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6809885) 18:54:58.85
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Excessive load"
#011iso.3.6.1.2.1.88.2.1.2.0 = ""
#011iso.3.6.1.2.1.88.2.1.3.0 = ""
#011iso.3.6.1.2.1.88.2.1.4.0 = OID: iso.3.6.1.4.1.2021.10.1.100.2
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"
#011iso.3.6.1.4.1.2021.10.1.2.2 = STRING: "Load-5"
#011iso.3.6.1.4.1.2021.10.1.101.2 = STRING: "5 min Load Average too high (= 2.24)"
Mar 19 08:11:34 robot-sl snmptrapd[1234]:
2019-03-19 08:11:34 <UNKNOWN>
[UDP: [192.168.100.3]:47676->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6815886) 18:55:58.86
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Excessive load"
#011iso.3.6.1.2.1.88.2.1.2.0 = ""
#011iso.3.6.1.2.1.88.2.1.3.0 = ""
#011iso.3.6.1.2.1.88.2.1.4.0 = OID: iso.3.6.1.4.1.2021.10.1.100.3
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"
#011iso.3.6.1.4.1.2021.10.1.2.3 = STRING: "Load-15"
#011iso.3.6.1.4.1.2021.10.1.101.3 = STRING: "15 min Load Average too high (= 1.16)"
Example: Critical¶
Mar 19 08:12:14 robot-sl snmptrapd[1234]:
2019-03-19 08:12:14 <UNKNOWN>
[UDP: [192.168.100.3]:21137->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (6819828) 18:56:38.28
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "ERROR: Extremely high CPU usage"
#011iso.3.6.1.2.1.88.2.1.3.0 = STRING: "CPU activity: 4.14, 2.78, 1.29"
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"
Example: Urgent¶
Mar 20 12:46:04 robot-sl snmptrapd[1214]:
2019-03-20 12:46:04 <UNKNOWN>
[UDP: [192.168.100.3]:48439->[192.168.100.25]:162]:
#012iso.3.6.1.2.1.1.3.0 = Timeticks: (114032) 0:19:00.32
#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.2.1.88.2.0.1
#011iso.3.6.1.2.1.88.2.1.1.0 = STRING: "WARNING: High CPU usage"
#011iso.3.6.1.2.1.88.2.1.3.0 = STRING: "CPU activity: 3.41, 2.56, 1.28"
#011iso.3.6.1.2.1.88.2.1.5.0 = INTEGER: 1
#011iso.3.6.1.2.1.1.5.0 = STRING: "VOSS"