Now that you understand collection concepts, this section covers data processing strategies and mediation device features. As the name indicates, the mediation device is located between two instances—either between the network element (meter) and the application server, or between the collection server(s) and the application server(s). Mediation functionality does not implicitly require a separate physical entity; it can be a (software) subfunction of the collection or application server. Alternatively, multiple collection servers can be consolidated at one mediation device, which then provides a central interface for all application servers to retrieve accounting and performance records. As discussed in the preceding section, selecting the right location for the collection server is an important task. In case of the mediation device, the functionality is more relevant than the location. As shown earlier in Figure 2-24, the main functions of the mediation device are as follows:
Estimation from sampling (if sampling applies)
Threshold monitoring (optional)
- Based on common criteria (key fields)
- Aggregation over time
Data record correlation (from different sources) and enrichment data record formatting and storage
Data record formatting and storage
Two main areas of filtering exist at the mediation device:
Filtering to reduce the volume of the data collection
Filtering for application purposes
Complex filtering for volume reduction is a mediation device task, because the implementation of process-intensive filters at the network element has a performance impact. Ideally, the collection granularity and filtering functions at the device would allow for configuring exactly the data set that is required. Although some accounting and performance technologies, such as NetFlow, support filters at the network element, others, such as SNMP interface counters and RMON, do not offer a filter concept at the meter. To reduce the performance impact of metering to a minimum, simple filters are implemented at the device, ideally in hardware instead of software operations. Complex and CPU-intensive operations are realized at the mediation device.
Filtering for application purposes is based on the "divide and conquer" paradigm: collect data records once, but use them as input for multiple applications, such as capacity planning, billing, and security monitoring. In this case, different aggregations with different filters are required. Table 2-26 lists examples of filters provided by the Cisco NetFlow Collector (NFC).
|srcaddr||Source IP address||Filters the input data based on the source IP address|
|dstaddr||Destination IP address||Filters the input data based on the destination IP address|
|srcport||Source port number||Filters the input data based on the source port number|
|dstport||Destination port number||Filters the input data based on the destination port number|
|srcinterface||Source interface number||Filters the input data based on the source interface number|
|dstinterface||Destination interface number||Filters the input data based on the destination interface number|
|nexthop||Next-hop IP address||Filters the input data based on the next-hop IP address|
|Protocol||Protocol name||Filters the input data based on the protocol definitions|
|prot||Protocol number||Filters the input data based on the protocol number in the flow record|
|ToS||Type of service||Filters the input data based on the type of service (ToS)|
|srcas||Source AS||Filters the input data based on the autonomous system number of the source, either origin or peer|
|dstas||Destination AS||Filters the input data based on the autonomous system number of the destination, either origin or peer|
Table 2-26 is sourced from http://www.cisco.com/univercd/cc/td/doc/product/rtrmgmt/nfc/nfc_3_6/iug/tuning.htm#1035085
Filtering can also be required because of a lack of configuration granularity at the meter. For example, in the past, NetFlow could be configured only on a physical interface level, not on a logical (or subinterface) level. Fortunately, the subinterface details were reported in the accounting record, so it was possible to filter everything except for the required subinterface information. Because this is not a very efficient way to collect accounting records, Cisco introduced a new feature to configure data collection on a subinterface level. Even though the feature is in place now, it might still be useful to collect data from all subinterfaces and filter some afterwards.
This section is specific to NetFlow and applies only if NetFlow sampling is configured at the device. However, it is an important task, because the child population gathered by sampling must be adjusted for the estimation of the parent population and to deduce an approximation of the volume based on the sampling rate.
A NetFlow mediation device estimates the absolute traffic volumes by renormalizing the volume of sampled traffic through multiplication with the meter's sampling frequency. If sampling is applied with a sampling rate of 1:100, the data records need to be multiplied by a factor of 100. Now you see why metering accuracy is so important. If the child population, which is gathered by sampling techniques, is biased or inaccurate, the multiplication increases it even more. Estimation from sampling must be performed per metering instance, because multiple meters with different sampling rates can be configured per device and even per interface if filtering and sampling apply in conjunction.
The example in the section "Filtering at the Network Device" defined three traffic classes: priority, business, and best effort. Priority traffic was fully collected and needs no adjustment. Business traffic was sampled with a rate of 1:100 and therefore needs to be multiplied by a factor of 100. Best-effort traffic had a sampling rate of 1:1000 and needs to be multiplied by 1000. This scenario makes it clear why the sampling factor must be included in the exported data record, because otherwise the estimation from sampling is incorrect.
Threshold monitoring is an optional task for the mediation device; it can be implemented at both the mediation device and the application server level. It is not so relevant where the function is located, but which purpose it serves. A metering device for a traffic planning application might leave the monitoring of the threshold up to the application server, because they are not critical for planning purposes. On the other hand, if metering is applied for security monitoring, a relevant feature is to set thresholds for the received traffic and monitor them in real time, because a reaction to an attack must occur quickly. Exceeded thresholds can identify security issues, such as a denial-of-service (DoS) attack, in which a huge number of very small datagrams flood a network and eventually stop the services in the network. In case of a NetFlow collection during a DoS attack, the mediation device monitors a sudden increase of the number of flows; at the same time, the average flow size decreases. This does not have to be an attack per se, but it certainly indicates a change in the traffic patterns, which should be brought to the administrator's attention immediately. Again, a prerequisite for monitoring thresholds is a baseline of the network performance under normal circumstances.
Although filtering reduces the volume of collected data, a further reduction is required to keep the total data volume manageable. The concept of aggregation describes the task of reducing the granularity by identifying common criteria (key fields) and combining information from multiple records into a single record. Two different aggregation concepts exist:
Aggregation of key fields
Aggregation over time
Aggregation of common criteria is related specifically to accounting records, whereas aggregation over time can be applied to both accounting and performance records. For example, a service provider might be interested in NetFlow records per BGP autonomous system (AS) instead of a full NetFlow collection. In this case, all records with the same source or destination AS are consolidated into a single record. An enterprise IT department might not want detailed records for each user, but rather aggregate all records belonging to the same department. Note that you have a smaller number of records after aggregation, but also reduce the granularity. Another method of aggregation is to define buckets and maintain only the summary per bucket instead of single transactions. The Cisco IP SLA feature and the ART MIB implement this concept of aggregation at the device level; a summary is displayed in Table 2-27. The first column represents the metered values, such as the delay of a round-trip time operation. The meter calculates the square sum and adds both values as a matrix in the defined buckets. At the end of the aggregation interval, only the bottom three rows are maintained per bucket: the total number of entries, the sum of the values, and the sum of the squares of each value.
|Probe Value||Square Value||Buckets|
|[ms]||[ms2]||< 10 ms||10 to 30 ms||30 to 100 ms||> 100 ms|
|Number of entries||1||4||4||1|
|Sum of squares||49||2326||27270||27889|
Aggregation of key fields can be combined with aggregation over time. One approach is for the mediation device to store all received flow records immediately to disk. Alternatively, the server can aggregate packets in memory first and write the aggregated records to the disk when the aggregation interval expires. Table 2-28 shows accounting records before aggregation.
|Entry Number||Source IP Address||Destination IP Address||Interface Number (Ingress)||Destination Port Number (Application)||BGP Autonomous System (AS) Number (Source)||Number of Bytes|
Table 2-26 shows several aggregation examples (the entry numbers where aggregation applies are in parentheses):
Aggregate per ingress interface (1 and 6, 2 and 4, 3 and 5)
Application aggregation (1 and 6, 2 and 5, 3 and 4)
BGP AS (1, 2, 4 and 6, 3 and 5)
IP source address range, depending on the net mask (1 and 6, 2 and 4, 3 and 5)
IP destination address range, depending on the net mask (1, 2, and 5, 3 and 4, 6)
As you can see, there is not one "best aggregation" scheme; the application's requirements determine how granular the data set needs to be, to determine the appropriate aggregation scheme. If the objective of metering is to collect data sets for multiple aggregations, individual records need to be stored and must be aggregated individually per application. In case of three applications (capacity planning, billing, and security), you aggregate three times from the same set of records.
The following are some generic examples of the Cisco NetFlow Collector's aggregation schemes, separating key fields and value fields:
Description: A HostMatrix aggregation scheme creates one record for each unique source and destination IP address pair during the collection period.
Key fields: IP source address, IP destination address, potentially the QoS (ToS or DSCP)
Value fields: packet count, byte count, flow count
Description: The output of a CallRecord aggregation scheme consists of one record for each unique combination of source IP address, destination IP address, source port, destination port, protocol type, and type of service during the collection period.
Key fields: IP source address, IP destination address, source port number, destination port number, protocol byte, the QoS (ToS or DSCP)
Value fields: packet count, byte count, flow count, first time stamp, last time stamp, total active time
Description: The output of a DetailASMatrix aggregation scheme consists of one record for each unique combination of source IP address, destination IP address, source port, destination port, protocol, input interface, output interface, source autonomous system number, and destination autonomous system number during the collection period.
Key fields: IP source address, IP destination address, source port number, destination port number, protocol, input interface, output interface, source AS number, destination AS number
Value fields: packet count, byte count, flow count
Aggregating records based on key fields is very useful for the mediation device, because it merges multiple data sets into one, which reduces the required storage capacity for the records. If data should be stored for longer periods, such as months or years, an additional data reduction mechanism is required to condense the total volume of stored records. Performance management applications analyze and identify traffic trends in the network, which requires data records from the last several months or, ideally, years. Keeping all the details from a daily record set for years is neither necessary nor economical; therefore, time-based aggregation is applied to older data sets. By aggregating hourly collection records into a daily set, 24 detailed records are aggregated into one summary record for the day. Aggregation over time should be applied to all data collections that are stored for trend reporting and other applications. As a rule of thumb, keep detailed records for the maximum time that is useful for troubleshooting (for example, one month), and aggregate to longer periods afterwards.
Time-based aggregation can be applied to all collected data sets, independent of the technology, because the collection or aggregation occurs during a fixed time interval, usually between 5 and 30 minutes. These intervals can be increased to reduce the total number of records. RFC 1857 suggests the following aggregation periods:
Over a 24-hour period, aggregate to 15 minutes
Over a 1-month period, aggregate to 1 hour
Over a 1-year period, aggregate to 1 day
Tables 2-29 and 2-30 exemplify aggregation over time. Table 2-29 consists of 12 entries; each represents a 5-minute interval of an SNMP interface counter. These records should be aggregated from the 5-minute interval into a 1-hour interval; therefore, 12 records are consolidated into one, as shown in Table 2-30.
|Entry Number||Interface Number (Ingress)||Number of Bytes||Interface Number (Egress)||Number of Bytes|
|Entry Number||Interface Number (Ingress)||Number of Bytes||Interface Number (Egress)||Number of Bytes|
An advanced function of time-based aggregation is the definition of aggregation thresholds, which means that time-based aggregation applies to only data sets in which the data values are within a defined normal range. Records that pass thresholds are not aggregated, which provides the operator with additional information about abnormal situations even after aggregating the data sets.
Another task at the mediation layer is correlating information from different metering sources to enrich the data records. Here are some examples:
Augmenting the NetFlow records with logical VPN information, to assign the records to a customer
Gathering the call data record from multiple sources (call leg correlation)
Modifying a data record by correlating the record details with DNS information to replace an IP address with a username
Grouping information from different sources into a common data records is a clear benefit of upper-layer applications, such as billing, which can retrieve enriched data sets instead of very basic sets, which need correlation afterwards.
The next mediation task is flow de-duplication, which is specific to flow-based technologies such as NetFlow. If NetFlow records are collected at several locations in the network, chances are great that one flow will be collected multiple times at different metering locations. Duplicate records lead to inaccurate results at the application level; therefore, these duplications need to be eliminated. The de-duplication algorithm identifies a flow's constant parameters. Parameters that change per hop, such as ingress and egress interface and next-hop address, are not considered. The time stamp per accounting record is very relevant, because duplicate flows occur within a short time interval. If multiple flows have the same constant parameters and were received from multiple devices within a defined time window, they can be considered duplicates, and all flows except one are eliminated. The following steps are performed:
Identify common flow parameters, such as source and destination address, port numbers, AS number, ToS/DHCP fields, and others
Check the time-stamps
Associate the information and eliminate duplicate flows
Finally, yet importantly, the processed data records are stored in a database and made available to other applications. Records have to describe usage type details, such as keys and values, where a key links to an index in a database table. A common data format definition protects the NMS and OSS applications from the variety of accounting formats that are implemented at the device level. Because these device formats change regularly, applications would need to adapt to all of these changes; instead, the mediation device shields them by providing consistent formats. The location where the records are stored can be a simple flat file system, where a new folder is created for each device and subfolder per interface, with separate text files for each aggregation scheme and interval. Alternatively, the data store can be a complex relational database system that provides sophisticated operations for local and remote access. In both cases, the format in which data records are stored should be consistent among various vendors. Consistency requires either a standard defined by a standards organization or a "de facto" standard, which is a commonly used method across the industry. Today, several different standards describe in detail the format in which the data sets are saved, as discussed in Chapter 3, "Accounting and Performance Standards and Definitions." A summary of the record specification "Network Data Management Usage (NDM-U)" from the IPDR organization serves as a useful example.
The records are implemented as XML schemas with self-defining field attributes, including five major attributes: who, what, where, when, and why. Here's a brief summary of NDM-U:
Who?— Responsible user ID
When?— Time when the usage took place
What?— Service description and consumed resources
Where?— Source and destination ID
Why?— Reason for reporting the event
Figure 2-27 summarizes all mediation device functions in a flow chart. As mentioned, some tasks, such as threshold monitoring and aggregation over time, can be applied at the upper-layer application level instead of the mediation device.