The initial step is monitoring device and link availability. Tools such as CiscoWorks and HP OpenView discover the network topology and report connectivity problems. SNMP polling of router interfaces and the Cisco Discovery Protocol (CDP) Management Information Base (MIB), plus syslog messages and SNMP notifications for linkUp/linkDown, are useful techniques for detecting connectivity issues. Because availability is part of fault management, it is excluded in this case.
The next question relates to the monitoring of device and link performance. The NMS can monitor one aspect of device performance by polling device CPU utilization from the CISCO-PROCESS-MIB. However, instead of regular SNMP polling to discover a potential CPU spike, the router can monitor the local CPU with RMON event and alarm mechanisms and send an SNMP notification in case of a threshold violation. A typical example is the cpmCPUTotal5sec.1 CPU MIB variable (for the first CPU instance on the router), monitored every 20 seconds, for a rising threshold of 90% and a falling threshold of 60%:
Router(config)#rmon alarm 1 220.127.116.11.18.104.22.168.22.214.171.124.1.3.1 20 absolute rising- threshold 90 1 falling-threshold 60 2 owner me Router(config)#rmon event 1 log Trap public description "cpu busy" owner sysadmin Router(config)#rmon event 2 log description "cpu not too busy" ower sysadmin
Different Cisco IOS trains and platforms interpret the MIB name to a different depth. For example, a 7200 router running 12.4(11)T would not understand cpmCPUTotal5sec.1 or cpmCPUTotalEntry.3.1. You would have to enter cpmCPUTotalTable.1.3.1. To make sure that the command always works, you should always use the full numeric OID.
The cpmCPUTotal5sec CPU MIB variable has been deprecated in favor of cpmCPUTotal5secRev, which adds the percentage as units. cpmCPUTotal5secRev, in turn, is deprecated in favor of the cpmCPUTotalMonIntervalValue, which is the overall CPU busy percentage in the last cpmCPUMonInterval period. Before using the right MIB variable, you should check the support of these MIB variables on the platforms of your choice. All those MIB variables are part of the CISCO-PROCESS-MIB.
In the case of a rising threshold, an event 1 SNMP notification and a syslog message are fired. In the case of a falling threshold, a syslog message event 2 SNMP notification is sent.
Next to the example of CPU utilization, good practice leads to monitoring other MIB variables, either by direct polling or using RMON event/alarm monitoring. For example, leverage the CISCO-MEMORY-POOL-MIB or the CISCO-ENHANCED-MEMPOOL-MIB for memory monitoring and the CISCO-ENVMON-MIB for environment monitoring.
With link performance, an NMS station can poll the interface statistics, such as ifInOctets and ifOutOctets, for relevant links in the network. These typically are the interfaces toward the Internet, between different locations, or toward the data center, as shown in Figure 13-1. If high-capacity counters such as ifHCInOctets and ifHCOutOctets are available, they should be used, because this avoids the issue of counter32 wrapping up too fast. Figure 13-2 shows the CiscoView application, which displays nice graphs from the interface statistics, such as utilization, traffic types, and errors.
Next is the question "When should a link be upgraded?" Because no MIB variable contains the link utilization ad hoc, calculation is required. In this case, the interface statistics should be related to the interface bandwidth to calculate the link utilization. The interface bandwidth is available from the ifSpeed or ifHighSpeed variables in the IF-MIB, which contain the bandwidth value for each interface.
For each interface on the router and switch, the bandwidth is set by default. However, the network administrator should ensure that the default value corresponds to the actual physical speed and configure the bandwidth if appropriate. A typical example is the serial interface bandwidth that is per default set to 1.544 Mbps, the value of a T1 circuit.
Although the interface statistics and link utilization graphs are useful, they suffer from one drawback: the NMS application needs to poll them regularly to extract information. From a graph, either a continuous trending or a threshold violation would lead to a link upgrade.
An alternative to constant polling of performance parameters is to use the EXPRESSION-MIB and EVENT-MIB or RMON-MIB. A new "link utilization" MIB variable is created with the EXPRESSION-MIB, and whenever this value breaches a threshold, the EVENT-MIB or RMON-MIB report an alarm. The full example is provided in Chapter 4, "SNMP and MIBs."
Relying only on SNMP notifications has the potential to miss important events. The UDP protocol is unreliable, and failed network elements might not be able to send a notification, especially during power outages. Although SNMP notifications sent as informs increase the reliability based on a handshake concept, they are transported over UDP. Fully reliable SNMP informs require infinite message queues at the network elements. Therefore, current best practice suggests combining polling of the router's reachability in conjunction with local device monitoring and sending SNMP notifications.
Another alternative to constant SNMP polling of performance parameters from the NMS application every couple of minutes is leveraging the DATA-COLLECTION-MIB mechanism. This MIB can store multiple MIB values locally, enabling the NMS server to retrieve longer intervals with a bulk upload. Chapter 4 explains the CISCO-DATA-COLLECTION-MIB in detail.
The following example collects the ifInOctets and ifOutOctets variables for the Serial0/0 and FastEthernet1/0 interfaces every minute. The results are stored at the router in a file that is transferred every 5 minutes to the ftpserver host via FTP. The username is ftpuser, with a password of ftppassword:
Router(config)# snmp mib bulkstat object-list ifmib Router(config-bulk-objects)# add ifInOctets Router(config-bulk-objects)# add ifOutOctets Router(config-bulk-objects)# exit Router(config)# snmp mib bulkstat schema interface-stats-fasteth0 Router(config-bulk-sc)# object-list ifmib Router(config-bulk-sc)# poll-interval 1 Router(config-bulk-sc)# instance exact interface FastEthernet1/0 Router(config-bulk-sc)# exit Router(config)# snmp mib bulkstat schema interface-stats-serial0 Router(config-bulk-sc)# object-list ifmib Router(config-bulk-sc)# poll-interval 1 Router(config-bulk-sc)# instance exact interface Serial0/0 Router(config-bulk-sc)# exit Router(config)# snmp mib bulkstat transfer interface-stats-transfer Router(config-bulk-tr)# schema interface-stats-fasteth0 Router(config-bulk-tr)# schema interface-stats-serial0 Router(config-bulk-tr)# url primary ftp://ftpuser:ftppassword@ftpserver/tmp Router(config-bulk-tr)# format schemaASCII Router(config-bulk-tr)# transfer-interval 5 Router(config-bulk-tr)# retain 30 Router(config-bulk-tr)# retry 5 Router(config-bulk-tr)# buffer-size 1024 Router(config-bulk-tr)# enable
The following output is the display from the exported file:
Schema-def interface-stats-fasteth0 "%u, %s ,%s ,%u, %u" epochtime ifDescr instanceOID ifInOctets ifOutOctets Schema-def interface-stats-serial0 "%u, %s ,%s ,%u, %u" epochtime ifDescr instanceOID ifInOctets ifOutOctets Schema-def GLOBAL "%s, %s, %u, %u, %u, %u, %u" hostname date timeofday sysuptime cpu5min cpu1min cpu5sec interface-stats-fasteth0: 980626825, FastEthernet1/0, .9, 167878375, 681136794 interface-stats-serial0: 980626825, Serial0/0, .1, 45228030, 93252495 interface-stats-fasteth0: 980626885, FastEthernet1/0, .9, 167923745, 681142145 interface-stats-serial0: 980626885, Serial0/0, .1, 45228500, 93253512 interface-stats-fasteth0: 980626945, FastEthernet1/0, .9, 167972581, 681143070 interface-stats-serial0: 980626945, Serial0/0, .1, 45228954, 93254513 interface-stats-fasteth0: 980627005, FastEthernet1/0, .9, 168024079, 681149575 interface-stats-serial0: 980627005, Serial0/0, .1, 45229424, 93255530 Global: <Router>, 20010127, 202425, 5866517, 1%, 0%, 0%
Finally, the following two CLI commands complete the solution by sending SNMP notifications when a transfer attempt occurs or when the data collection could not be carried out successfully (due to insufficient memory, for example):
Router(config)# snmp-server enable traps bulkstat collection Router(config)# snmp-server enable traps bulkstat transfer