A good illustration for security management is a picture puzzle. It consists of many different pieces, it fits together perfectly if combined correctly, and the overall result depends on every single piece. If one piece is missing, the picture looks incomplete. Security solutions are also a combination of multiple features, functions, and applications. This chapter identifies the accounting and performance management pieces of the security puzzle and puts the magnifying glass on them. Especially in large networks, the potential for attacks exists at multiple points. It is suggested that you develop a multifaceted defense approach. This led Cisco to develop a six-stage security operations model, as shown in Figure 16-2. This framework also provides the structure for this chapter. It positions accounting and performance management for both device instrumentation and management applications in relationship to security.
Chapter 1, "Understanding the Need for Accounting and Performance Management," briefly introduces the concept of baselining and security analysis. For consistency purposes, the same concept is applied in this chapter. On top of the basic concept, this chapter applies technologies and products to the six stages and therefore is an extension to Chapter 1.
The ability to identify an attack quickly is critical to minimizing the damage a virus or worm can ultimately cause. Aside from waiting for users to complain or the network management alarms to begin flashing in the NOC, how can you detect attacks?
Preparation for attacks includes a set of policies that define who is allowed to do what in the network. From an accounting and performance perspective, preparation means having the right device instrumentation and management applications in place. One technique to spot the possibility that an attack is occurring is to compare the current network performance figures with the expected ones. This requires a performance monitoring system, a performance trending application, and a metering process to identify deviation from normal. This relates to the concept of baselining, which was described in previous chapters (such as the "Baselining" section in Chapter 1). Taking the preparation one step further, the performance monitoring system needs sufficient device instrumentation functions at the network elements to meter the required performance details. SNMP MIBs and NetFlow are examples of metering functions at the network element. Note that baselining is a statistical indicator only. There might be legitimate reasons for traffic patterns to deviate.
In summary, the following steps are a good preparation for a network administrator to protect the network in advance:
Define a policy that describes which protocols, applications, transactions, and connections are permitted and denied in the network.
Activate accounting and performance management features at the network elements, such as the following:
- Full NetFlow should be enabled at the edge of the network. It can be combined with Sampled NetFlow at the core network.
- SNMP read access should be configured at all network elements. Trap destinations are required as well. From a security perspective, leveraging SNMP version 3 is the best approach. If that is not applicable, secure SNMP device access with access control lists (ACL) or a dedicated network for management purposes (data communications network [DCN]).
- Enable Syslog at the devices. Syslog message severity can be customized from critical to debugging, resulting in few or many messages. Critical devices should get a closer monitoring level (severity level "info" or "notice"), and less-critical devices can be configured with Syslog level "critical" or "alert."
Analyze flow records for abnormalities to identify reconnaissance, especially ping sweeps and port scans. Advanced attacks do not use sequential scanning; they take a randomized approach or a list of ports. By tracking flow records and correlating them, you can identify these scanning activities; however, this is not as simple as detecting a massive DoS attack. In this case, security management applications are required and must have flow-monitoring capabilities included.
NBAR is the tool of choice for network protocol discovery up to the application level. However, because of the CPU impact, it should be used with care and only if alternatives such as SNMP, NetFlow, and BGP PA do not provide sufficient details. Hardware-based NBAR implementations, such as the sup32-PISA in the Catalyst 6500, address these performance implications effectively.
IP SLA is the final function to verify network, server, and application response time. Using IP SLA to assess the network before, during, and after an attack is a great way to measure the impact of an attack and document how long the remedy took.
Deploying security management applications that collect, store, aggregate, correlate, and present potential threats is the final step during the Preparation stage.
Chapter 13, "Monitoring Scenarios," describes performance management, which is one of the building blocks of security management.
In a perfect world, preparation would result in blocking all attacks against the network. However, we live in a real world, so attacks will happen eventually! The next logical step is to detect and identify an attack. Some attacks are easy to detect, especially if their goal is to paralyze the users completely. Attacks that try to snoop for passwords or business knowledge are much harder to detect, because they do not leave massive footprints.
Therefore, a critical task is to identify a potential attack quickly. The detection or identification process is a walk between the two extremes of too many "false positives," which are false attack warnings, and "false negatives," which are missed alerts. If operators get too many false alerts, they will soon ignore the messages, whereas suppressed warnings result in a lack of the operator's trust in the security management application. The challenge is to find the right balance!
A successful approach for increasing the accuracy of alerts is to combine multiple sources of information, such as NetFlow and MIB variables. NetFlow at the routers and switches provide valuable information about the current traffic flows. Constantly analyzing the flows is the first step toward identifying an abnormal situation. For example, a sudden increase in the total number of flows, or a large number of flows with the same destination IP address and port number, combined with few or no payloads, can be an attack signal. MIB counters, such as interface traffic and CPU utilization, offer an at-a-glance view of the network and devices. An increase in the number of ACL violations is a useful hint to drill deeper.
Each of these symptoms alone can have a normal reason, such as burstiness of traffic. However, chances are low that multiple indicators of abnormal situations will occur at the same time without a good reason.
Various applications help an operator identify attacks. Examples are the Cisco Security Monitoring, Analysis, and Response System (CS-MARS) and Arbor's Peakflow SP anomaly-detection system. Both applications take the fully automated approach of combining multiple sources of information and displaying the result in a graphical user interface. An alternative approach is writing scripts. You can collect NetFlow records with an open-source application such as flow-tools (see http://www.splintered.net/sw/flow-tools/ for more details) and monitor the total number of flows, either continuously or every couple of minutes. As soon as you have an average value of flows per time interval, you can define a flow threshold value. If the number of current flows exceeds this value, the script should initiate a notification. This can trigger another script that checks device CPU utilization and interface traffic. Note that any correlation is possible only if continuous monitoring of all components is applied! By defining thresholds for these values, you have multiple indicators daisy-chained, allowing you to interact at each step. The collected NetFlow records are a good starting point for further analysis in the Identification stage, which leads to the classification of the attack.
An alternative approach to collecting accounting data for security purposes is to set up a "honeypot." This is a router or computer that is installed by the security operator merely for surveillance and security analysis, but to the outside world it appears to be part of the network. The idea is to provide a limited "open door" for hackers while carefully monitoring their activities. Note that honeypots can be a risk to the network if they are not properly shielded from the production network!
Due to the performance impact of collecting NetFlow records the question arises of whether NetFlow is applicable for the Identification part of the process. There are two valid answers to this question. First, NetFlow is a crucial building block for the classification step. Because you will need the NetFlow records later anyway, why not use NetFlow for identification as well? Second, the NetFlow MIB offers an at-a-glance view of NetFlow statistics, which can point out a potential attack. Because the NetFlow MIB is a relatively new feature, it has limited support from security manage-ment applications. It is nice to notice that it was built because of security requirements. Chapter 7, "NetFlow," covers the NetFlow MIB as well.
For intrusion detection, monitoring logging messages is a key to success. The following list of event logs can serve as a starting point:
Failed login attempts at the network element result in Syslog messages and SNMP traps.
High number of ACL violations.
High number of uRFP drops. Unicast Reverse Path Forwarding (uRPF) is a feature that drops packets with malformed or spoofed IP source addresses. The router examines all received packets to verify that the source address and interface appear in the routing table and match the interface on which the packet was received.
Related to traffic baselining, threat identification also includes monitoring traffic that deviates from normal. Fault management features such as RMON alarms and events or a combination of the Expression MIB and Event MIB can help identify unusual traffic patterns quickly. The challenge for a security architect is combining the number of events to provide information and root-cause analysis to an operator.
In summary, the following steps are suggested as preparation for detecting attacks:
Baseline the network and define thresholds for significant events. Examples could be NetFlow (number of flows per hour and day) and MIB counters (CPU utilization, minimum free memory, interface utilization). Thresholds can be monitored at the central NMS or locally at the network element by using the Expression MIB and Event MIB.
Enable logging at the network elements (SNMP traps, Syslog) and at the network servers (Syslog) and also define thresholds, such as for the total number of messages per hour and day, as well as filtering for severe log messages.
Deploy applications that monitor and analyze network activities and utilization. These tools calculate a utilization trend and deviation from normal, which can be quite handy to quickly identify abnormal situations.
After identifying an abnormal situation in the network, classifying the kind of attack comes next. It is tricky to block an attack without a clear understanding of the attack's characteristics. You need answers to questions such as these:
Is this an attack against the network, or the servers, or the services?
Is it a security threat or a denial of service, such as one caused by a worm?
Is the attack distributed, or is a single source causing it?
Intelligent device instrumentation features can decrease the time for classifying a threat's kind and scope.
NetFlow is also instrumental in the Classification stage. Two approaches apply when leveraging NetFlow for categorizing an attack: traffic baselining and in-depth flow inspection. Baselining for security is applied with two flavors. The first is for the identification step, where unusual traffic behavior is observed. The second flavor relates to classification, where the attack is classified.
Through NetFlow baselining, you can identify the Top-N traffic volume or number of sessions per host. This can help you identify hosts that suddenly generate much more traffic than they normally do. This can indicate that an attacker broke into that system and used it for a DoS attack or that a user initiated a lot of peer-to-peer traffic by sharing multimedia files. This can be detected as follows: First, you gather a list of known application servers in your network—e-mail servers, application servers, web servers, and so on. Then you retrieve the NetFlow Top-N hosts and compare the two lists. Under normal circumstances, the application servers should be the top talkers in the network. Unless major changes occur in the network, this list of top talkers should be relatively stable. If suddenly a client IP address generates similar volume or more volume than network servers, a flag should be raised.
A similar analysis can be applied to the number of active sessions per host. It is normal for to have a network server thousands of active sessions. But for a client PC, this would be an exceptional situation. A likely reason would be a network scan, a worm, or an attack.
NetFlow for security analysis can be used in two phases. The NetFlow collection gathers traffic statistics, and the NetFlow export sends the accounting records to a NetFlow collector. If only the summary information is required, the NetFlow MIB (CISCO-NETFLOW-MIB) is the right tool, because it displays traffic summary information, such as the top flows (cnfTopFlows) and packet statistics (cnfProtocolStatistics) such as packet size distribution and number of active or inactive flows.
In-depth flow inspection relates to, for example, TCP flags and TCP or UDP port numbers. Even though NetFlow records do not include payload data, TCP session characteristics gathered by NetFlow are valuable to identify attacks. A three-step handshake process establishes a TCP session. The initial client request contains the SYN (synchronize) TCP flag, to which the server replies with a SYN/ACK (acknowledge) packet. After the client acknowledges the server's acknowledge, the session is established. A TCP session is closed by sending a packet with a FIN (finished) flag set. NetFlow records contain the session's TCP flags, aggregated by a cumulative OR operation. Normal flows have a combination of flags, but a large number of SYN flags without an ACK can point out a DoS attack. Certain combinations of flags should never occur normally and are a clear sign of an anomaly. A security management application should cautiously monitor the NetFlow flag field.
For application server monitoring, it can be useful to analyze the relationship between servers and port numbers. If you know which UDP or TCP ports are used per application, you can identify if someone installed a new application that uses different port numbers. Port monitoring creates a lookup table for active UDP and TCP port numbers per server and compares traffic patterns against this table. If traffic is sent to or from ports that are not listed, generate a notification.
Packet size is another relevant NetFlow parameter. The average packet size on the Internet is greater than 300 bytes. If a large number of flows are significantly below or above that size, raise a flag. Table 16-1 illustrates a DoS attack in which flows from multiple source IP addresses target a single destination IP address with random port numbers and very small packets.
|Source IP Address||Destination IP Address||Input I/F||Output I/F||Source Port||Destination Port||Packets||Bytes||Port|
To display only certain entries from the potentially large NetFlow cache at the router, use the include or exclude function:
router# show ip cache flow | include 10.44.4.245
You can draw two conclusions from Table 16-1:
Every flow consists of only one packet, and they all have the same packet size (40 bytes). Because the length of the IP and TCP headers are 40 bytes in total, these packets have no payload!
The source and destination port numbers look like a random mix.
As you can see, none of these symptoms considered alone would be an issue. However, if they all come together, and this traffic pattern is different from normal, a notification should be generated. Again, this points to the need for baselining! In this specific case, a SYN flood attack might be taking place. You could confirm this by inspecting the TCP flags in the NetFlow record.
NetFlow records contain traffic and session characteristics, but no payload data. A complementary feature to NetFlow is the Cisco Network-Based Application Recognition (NBAR) utility. NBAR is a classification engine that applies stateful packet inspection to identify applications based on characteristics such as static and dynamic port numbers and payload details, such as application URLs. This enables NBAR to detect attacks, such as worms, from their payload signatures. As soon as specifics about new worms or attacks exist, NBAR can identify these packets and apply traffic policies, such as blocking only the specific packets. NBAR can be used inbound and outbound to mitigate the effects of attacks immediately. NBAR's advantage over ACLs is the flexibility of blocking only some applications, not all traffic.
During the initial phase of the classification, when the attack's fingerprints are not identified, NBAR can help keep network services operational by prioritizing business-critical applications. Because NBAR understands the details of e-mail, VoIP, SAP, and Citrix applications, these can be prioritized in the network, and all "unknown" traffic can be rate-limited. Even though rate-limiting traffic that is unknown to NBAR might also limit user traffic, it guarantees the availability of the most important services, which is a major achievement when a network is under attack.
Even though Cisco did not design the NAM primarily as a security management tool, it can provide useful at-a-glance views of network traffic and help classify traffic. Because the NAM supports RMON, ART, NetFlow, VoIP, and QoS monitoring, it is a valid source for traffic analysis and classification, and it can help an operator to identify abnormal traffic patterns.
Based on the analysis resulting from the identification and classification section, you can define rising and falling thresholds for the following conditions, where actions include logging and trap generation:
Network layer hosts
Network layer conversations
MAC layer hosts
MAC layer conversations
Server response time
Server-client response time
DiffServ traffic statistics
DiffServ host statistics
DiffServ application statistics
Monitoring these characteristics is related to baselining, so it is important to ensure that generated traps are also sent to the security management application.
NetFlow, NBAR, and the NAM offer significant value for traffic classification, but other tools need to be mentioned as well:
With a "sinkhole," a portion of the network is designed to accept and analyze attack traffic. It can be used to divert attack traffic, monitor for worm propagation, and monitor other potentially malicious traffic.
"Honeypots" emulate a server or service in the network and are designed to be probed, attacked, or compromised by hackers. This gives the security operator insight into the hacker's plans by observing and analyzing his or her activities. Typical intruder interactions with a honeypot are reconnaissance and intrusions.
Intrusion Detection Systems (IDS) can be implemented as host-based and network-based solutions. Both methods are based on the assumption that an intruder's activity is different from the characteristics of a regular user or traffic. For host-based IDS, a software agent runs on the server and analyzes log files, access to applications, configuration files, e-mail activities, and more. An example is the Cisco Security Agent, which can be installed on client and server PCs. Network-based IDSs are usually implemented on dedicated hardware boxes that monitor and analyze all layers of network communications. In addition, the operator can write rules to describe common attack techniques and apply traffic patterns that describe attack "signatures."
Packet sniffers are external devices for packet inspection.
After identifying and classifying the attack, you need to trace back the source(s) of the attack and isolate them. For this task, you need answers to questions such as these:
Did the attack come from inside or outside the network?
Did a special segment of the network initiate the attack, or did it come from all over the network?
Did servers or client PCs distribute, such as notebooks that were infected outside of the network?
Are the source IP addresses real or spoofed?
Depending on the answers, different actions need to be taken.
An IP spoofing attack occurs when an attacker outside of your network pretends to be a trusted computer. This can be done by using an IP address that is within the range of your network's IP addresses. Or someone could use an authorized external IP address you trust and to which you want to grant access to specified resources on your network. Detecting whether a specific IP address is spoofed is not an easy task.
Different mechanisms are available to locate the source IP address:
Search the routing table(s).
Check the Internet Routing Registry (IRR).
Use NetFlow to trace the packet flow through the network.
Identify the upstream ISP if the source is outside of your network. From there, the upstream ISP needs to continue tracing to block the source.
The first classification separates valid IP source addresses from spoofed source addresses. If you use a private IP address space (RFC 1918), the existence of a local IP address can be identified with network utilities such as ping, traceroute, and DNS lookup. Public addresses can be validated with tools such as WHOIS via the IRR. This information is distributed globally:
Latin America: whois.lacnic.net
U.S. and everywhere else: whois.arin.net
Unfortunately, the IRR does not work in case of spoofed IP addresses. By modifying the source IP address, an attacker can make an IP packet appear to come from a completely different source than it actually does. Because the spoofed packet does not return to the attacker's IP address, it is hard to trace it back.
Fortunately, NetFlow can also help in case of spoofed addresses. Because the source IP address field in the NetFlow record is not useful in this case, the Ifindex field becomes relevant, because it identifies the actual router interface where the flow comes from. If NetFlow is already enabled on all routers' ingress interfaces, you can identify the attacker's path immediately. Otherwise, you can enable NetFlow ad hoc and take the hop-by-hop approach, as explained next.
The following CLI output relates to the data from Table 16-1, where the victim's address is 10.44.4.245:
router1#sh ip cache flow | include 10.44.4.245 Se1 10.1.1.69 Et0 10.44.4.245 ...
Se1 identifies Serial Interface 1 as the source interface. The following CLI command displays the flow's upstream router:
router1#sh ip cef se1 Prefix Next Hop Interface 0.0.0.0/0 10.10.10.2 Serial1 10.10.10.0/30 attached Serial1
Therefore, the upstream router is 10.10.10.2, and the trace back process needs to be continued there. These steps are repeated up to the LAN segment or service provider interface where the traffic originated. A MAC-based ACL can be put in place at that interface to block all traffic from this source.
In case of spoofed IP addresses, the client needs to be identified by the MAC address in the LAN segment. NetFlow enhancements for Layer 2 can provide these details:
Layer 2 IP header fields.
Source MAC address field from frames that are received by the NetFlow router.
Destination MAC address field from frames that are transmitted by the NetFlow router. The MAC address is also needed when tracing back over Internet Exchange Points (IXP), because they connect multiple ISPs with Layer 2 switches.
Received VLAN ID field (802.1q and Cisco ISL).
Transmitted VLAN ID field (802.1q and Cisco ISL).
Cisco CS-MARS and Arbor Peakflow Traffic and Peakflow DoS anomaly-detection system leverage NetFlow for the detection of an attack's path through the network and offer a proposal for blocking it.
Because the NetFlow features for Layer 2 are not yet deployed consistently across all Cisco platforms, an alternative is the IP Accounting MAC Address feature. It is considered an interim solution because it collects only a subset of the NetFlow details.
As soon as the attack's characteristics and sources have been identified, a remedy can be initiated. Possible solutions range from configuring an ACL up to leveraging sophisticated security management applications.
After you identify the source of an attack, a simple way to block all traffic from the attack toward the victim is to define an ACL at a router, next to either the attacker or the victim. Deploying ACLs through a configuration management tool such as CiscoWorks can be a quick way to block an attack. For example, during the outbreak of the SQL Slammer virus, the Cisco internal IT department immediately blocked it within the Cisco intranet by configuring ACLs at all Internet access points worldwide.
If only a limited number of hosts are targets of the attack, an alternative is to propagate BGP drop information to all routers in the network. If you set the next hop of the target IP addresses to the Null interface, all packets destined for the victims are dropped into the so-called "black hole."
An alternative to dropping traffic from a source or toward a destination is to block only traffic from nonexistent sources. The uRPF feature in Cisco IOS discards IP packets of IP source addresses that cannot be verified, such as spoofed addresses. Note that uRPF creates additional overload on the router, which might be contradictory during an attack where the network is already under heavy load.
The "IP source guard" feature prevents IP spoofing by allowing only the IP addresses that are obtained through DHCP snooping on a particular port.
Instead of blocking or verifying all traffic, you can use QoS features such as rate-limiting packets. The committed access rate (CAR) feature can define a limit for certain traffic types. For instance, if a high rate of FTP traffic overutilizes the network capacity, a rate limit can be applied.
NBAR plays an important role in threat mitigation because it works in conjunction with QoS features to block or rate-limit undesirable traffic. When a match value unique to the attack is identified, deploying NBAR can be an effective, tactical first step to block malicious worms while preparing a strategy against the attack. For example, with the Code Red worm, NBAR matched "*.ida" URLs in the HTTP GET request. With Blaster, NBAR looked for SQL packets of a specific length. The following example shows how NBAR can be used with a QoS policy to mitigate the effects of the Code Red and Slammer worms at the network edge. To be complete, the example contains two parts: classification (Steps 1 and 2) and the reaction (Step 3):
Create a custom protocol definition to identify all SQL traffic:
Router(config)# ip nbar port-map custom-01 udp 1434
Create a QoS class map to identify SQL packets that are 404 bytes long and Code Red virus packets:
Router(config)# class-map match-all slammer-worm Router(config-cmap)# match protocol custom-01 Router(config-cmap)# match packet length min 404 max 404 Router(config-cmap)# exit Router(config)# class-map match-all code-red Router(config-cmap)# match protocol http url "*.ida" Router(config-cmap)# exit
Use the QoS drop action to discard the matching packets at the ingress interface:
Router(config)# policy-map mitigate-worms Router(config-pmap)# class slammer-worm Router(config-pmap-c)# drop Router(config-pmap-c)# exit Router(config-pmap)# class code-red Router(config-pmap-c)# drop Router(config-pmap-c)# exit
Because NBAR has an impact on the router's CPU performance, it is recommended that you use NBAR on the access links rather than on the core links. Alternatively deploy the sup32-PIS supervisory card in the catalyst 6500.
Assuming that the steps and actions described in the previous sections helped identify and block an attack against your network, the next step is crucial (but still is neglected by some operators). The postmortem examines the situation to identify why the network was vulnerable and what lessons can be learned from the situation. However, the postmortem sometimes requires a decision from upper management to assign time and people to analyze the past situation. Most people are good at firefighting, but only a few like digging into the past.
Postmortems are conducted internally first. If possible, they should include information from other enter-prises or service providers. Especially after a major attack against multiple corporations and organizations, teamwork between the affected groups can increase a postmortem's value and results.
Start the process by reconstructing the attack as precisely as possible to establish a "footprint" for the attack. The different steps are related to the different stages of the Cisco six-stage security operations model:
The first place to inspect is fault management reporting. If you configured SNMP traps and performance thresholds properly, several conclusions can be drawn from the reported notifications. The baselining information that was initiated during the Preparation stage is useful, as well as the data gathered during the Identification stage.
Check log files, such as Syslog messages from the network elements and application logs from the servers, and relate them to the fault and performance reports. They are gathered during the Identification stage.
Performance monitoring and accounting records should provide an idea about the attack's path through the network. If you use security management applications such as CS-MARS or Arbor Networks, these reports can be created easily. This is related to the Trace Back stage.
Analyze configuration changes at network elements before, during, and after the attack. Applications such as CiscoWorks Resource Manager Essentials (RME) or other configuration applications offer easy ways of comparing configuration files. Distinguish between configuration changes resulting from the Reaction stage and other configuration changes.
With the footprint in place, answer the following questions:
Did you monitor and collect the right level of details? This is linked to changes in the Preparation stage.
Were all network parameters, such as flow records, CPU, and interface load, easily accessible during the event? Which other parameters could have helped in the incident? With the answers in mind, modify the Identification stage.
Was the network security level as strong as possible?
Were the correct patches applied at both the network element and application server level?
Did you get enough details during the attack to classify the attack? If not, extend the Classification stage.
Was the trace back successful? Maybe additional methods are required in the Trace Back stage.
How long did it take to initiate a reaction?
Did you gather enough information to complete the Postmortem stage successfully?
Remember, as soon as the postmortem is complete, the Preparation stage for the next attack should start. The following questions can serve as a link between the analysis of the previous security threat and the Preparation stage for the future:
What worked well during and after the attack?
What needs to be improved? How can it be improved?
Who discovered the attack(s)—a user, the operator, or a management tool?
How long did it take to identify, classify, and trace back the attack?
How long did it take to block the attack?
Did the existing tools help support the operators?
Were the tools used at all? In which of the six stages of the process were tools used? What was their contribution to solving the issue?
How well did the different teams, such as network operators, data center administrators, and the security group, cooperate? How can the cooperation be improved?
What additional resources, processes, training, or tools are required?
Did you find a single point of failure in the network?
What is the most important lesson learned?
What will you do differently next time?
With answers to all these questions, you are ready for the Preparation stage, where you can prioritize action items and implement lessons learned to ensure future success.