Chapter 1. Understanding the Need for Accounting and Performance Management
This chapter defines the foundation for this book and answers the following general questions:
What is accounting management?
What is performance management?
What is the relationship between accounting and performance management?
Why do networks require accounting and performance management?
Why is accounting almost a stealth area within network management?
Which problems do accounting and performance management solutions solve?
How can the business use this information for network planning, redesign, and billing?
What aspects make up accounting and performance monitoring (data collection, data analysis, reporting, billing, and so on)?
By the end of this chapter, you will be able to grasp the basic concepts of accounting and performance management, distinguish the two areas, and apply the relevant part of both technologies to network design and applications.
During the last decade, the Internet has changed our ways of communicating more than anything else. The Internet is almost ubiquitous today, and we take connectivity for granted until for some reason we cannot connect. At that point, we suddenly feel isolated. These days we expect Internet connectivity to be available anytime, anywhere. Most of us realize that this is impossible without intelligent systems managing the network. This leads us to technologies, processes, and applications in the area of Network Management and Network Management Systems and Operations Support Systems (NMS-OSS). NMS was a set of niche applications for quite some time, until businesses realized that their performance depended on the network. Then, suddenly, network downtime became a business issue instead of just a minor problem. Therefore, notions such as service level agreements (SLA) are imposed on the network to support specific business application requirements.
Nobody questions the need for fault and security management these days, and there is obviously a need for performance statistics, but still some questions are left open: "Do I really need accounting?" "Is accounting the same as billing?" "What can accounting do for me?" In this chapter, you will find answers to these questions and understand how accounting relates to performance management.
In a nutshell, accounting describes the process of gathering usage data records at network devices and exporting those records to a collection server, where processing takes place. Then the records are presented to the user or provided to another application, such as performance management, security management, or billing.
An example is collecting usage records to identify security attacks based on specific traffic patterns or measuring which applications consume the most bandwidth in the network.
This book focuses on accounting, but because accounting is so closely related to performance, this chapter also discusses performance aspects in detail and identifies how accounting and performance can be used together to support each other. Because many more networks have deployed performance management than accounting solutions, this chapter starts with a deeper inspection of accounting before addressing the performance area, where you will learn the relationship between performance management and service level agreements. The objective is to enable you to distinguish between accounting, performance management, service level monitoring, and fault management. This chapter briefly introduces management standards and concepts to help you understand common areas and demarcations between accounting and performance management. Chapter 3, "Accounting and Performance Standards and Definitions," describes these concepts in more detail and also describes the roles of the various standards bodies, along with their main objectives and directions.
Most network administrators ask themselves whether they need accounting when looking at the Fault, Configuration, Accounting, Performance, and Security (FCAPS) management model. The FCAPS model is an international standard defined by the International Telecommunication Union (ITU) that describes the various network management areas.
The FCAPS model was chosen as a structure even though other models exist, such as FAB and eTOM (which are introduced in Chapter 3). The advantage of the FCAPS model is that it clearly distinguishes between accounting and performance.
Table 1-1 describes the main objectives of each functional area in the FCAPS model.
To identify similarities and differences between accounting and performance management, we will first define both terms. Unfortunately, there is no exclusive definition of accounting, which sometimes results in different groups meaning different things. The following definitions are the most common ones used to describe accounting. Note that this section is purely theoretical; however, the objective of this book is to provide a comprehensive perspective on accounting and performance management, which makes this part relevant!
Defining Accounting Management
When searching for a common definition of accounting management, you will discover that several definitions exist. No one unique description is used for all applications. This might sound surprising, but because accounting received limited attention in the past, it is understandable. The authors decided to first present the various diverging definitions to help you understand the root of the problem. Then, the authors present their own definition of accounting management, which is used consistently throughout this book. This section investigates the definition of accounting management from the ITU (http://www.itu.int/home/index.html), the TeleManagement Forum (TMF, http://www.tmforum.com), and the Internet Engineering Task Force (IETF, http://www.ietf.org).
ITU-T definition (M.3400 and X.700, Definitions of the OSI Network Management Responsibilities):
"Accounting management enables charges to be established for the use of resources in the OSIE [Open Systems Interconnect Environment], and for costs to be identified for the use of those resources. Accounting management includes functions to:
"inform users of costs incurred or resources consumed;
"enable accounting limits to be set and tariff schedules to be associated with the use of resources;
"enable costs to be combined where multiple resources are invoked to achieve a given communication objective."
The TMF refers to the ITU-T accounting definition (M.3400) and provides additional details for billing in the enhanced Telecom Operations Map (eTOM), The Business Process Framework, Document GB921.
The Fulfillment, Assurance, and Billing (FAB) model of TMF's eTOM positions the "Network Data Management" building block between assurance and billing. "Network Data Management: this process encompasses the collection of usage data and network and information technology events and data for the purpose of network performance and traffic analysis. This data may also be an input to Billing (Rating and Discounting) processes at the Service Management Layer, depending on the service and its architecture." Chapter 3 explains the FAB model in more detail.
Most TMF documents are available free of charge to TMF members. Nonmembers can purchase them on the TMF's website at http://www.tmforum.org.
The informational Request For Comment (RFC) 2975, Introduction to Accounting Management, gives the following definition of accounting: the collection of resource consumption data for the purposes of capacity and trend analysis, cost allocation, auditing, and billing. Accounting management requires that resource consumption be measured, rated, assigned, and communicated between appropriate parties."
As you can see from the preceding definitions, there is no universal definition of accounting management. The ITU-T addresses primarily the charging of resource usage, which is a major requirement for every service provider, but we think that this definition is too limited. Even if the IETF definition is not a standard but only an informational RFC, it mostly suits our view of "accounting management." To overcome the numerous definitions and the limitations of some of them, we decided to develop our own definition.
In this book, we use the term accounting management to describe the following processes:
Collecting usage data records at network devices
Optionally preprocessing data produced by the device (for example, filter, sample, aggregate)
Exporting the data from the device toward a collection server
Processing the data at the collection server (for example, filter, sample, aggregate, de-duplicate)
Converting usage records into a common format to be used by higher-layer applications (for example, performance, SLA, fault, security, billing, planning, and so on): the mediation procedure
Figure 1-3 illustrates the use of accounting management for multiple applications. Notice the functions of the different layers and the distinction between record generation and processing (such as data collection, exporting, and aggregation) and the applications that will use the records.
Figure 1-3. Accounting Management Architecture
[View full size image]
The temptation is great to relate accounting exclusively to billing, because the dictionary (Merriam-Webster's) characterizes accounting as follows:
"The system of recording and summarizing business and financial transactions and analyzing, verifying, and reporting the results."
This definition does not even mention the term "billing," but it indirectly relates accounting to billing. Nevertheless, in our mind, accounting is different from billing, because billing is just one of the applications that leverage accounting.
Defining Performance Management
Although there is no unique definition of accounting management, as just described, performance management at least has more parts that are common across the various definitions. The authors decided to take the same approach as for accounting and investigate the definitions from the ITU-T and TMF. Note that none of the IETF RFCs cover the definition of performance management. Finally, the authors decided to create their own definition, which is used throughout this book.
ITU-T definition (M.3400 and X.700, Definitions of the OSI Network Management Responsibilities):
"Performance Management provides functions to evaluate and report upon the behavior of telecommunication equipment and the effectiveness of the network or network element. Its role is to gather and analyze statistical data for the purpose of monitoring and correcting the behavior and effectiveness of the network, network elements, or other equipment and to aid in planning, provisioning, maintenance and the measurement of quality.
"Performance management includes functions to:
"a) gather statistical information
"b) maintain and examine logs of system state histories
"c) determine system performance under natural and artificial conditions
"d) alter system modes of operation for conducting performance management activities"
TMF definition: The TMF defines performance and SLA management in the context of assurance. The assurance process is responsible for the execution of proactive and reactive maintenance activities to ensure that services provided to customers are continuously available and to SLA or quality of service (QoS) performance levels. It performs continuous resource status and performance monitoring to detect possible failures proactively, and it collects performance data and analyzes it to identify potential problems to resolve them without affecting the customer. This process manages the SLAs and reports service performance to the customer. Related documents are TMF 701, Performance Reporting Concepts & Definitions; TMF GB917, SLA Management Handbook, which also refers to ITU M.3010; and the FAB model of the eTOM. Performance collection is part of the network data management process, according to the TMF:
"This process encompasses the collection of usage data and network and information technology events and data for the purpose of network performance and traffic analysis."
As you see from these definitions, performance management does not have a unique classification. ITU-T's definition addresses primarily the behavior and effectiveness of the telecommunication equipment and network, but we feel that this definition is too limited, because it does not address the supervision of the services running on the top of the network. The TMF definition describes the notion of service-level management in detail but only briefly mentions the need for network element monitoring. To overcome the numerous slightly different definitions, we decided to develop our own definition.
At this point, we can distinguish between performance monitoring and performance management.
Performance monitoring collects device-, network-, and service-related parameters and reports them via a graphical user interface, log files, and so on. Performance management builds on these data collections but goes one step further by actively notifying the administrator and reconfiguring the devices if necessary. An example is the data collection for SLAs. Performance monitoring and accounting would only collect the data and store it at a collection point. Performance management would analyze the data and compare it against predefined thresholds and service definitions. In the case of a service level violation, it then would generate a trouble ticket to a fault application or reconfigure the device. For example, it would filter best-effort traffic or increase the committed access rate, and so on.
This book uses the term performance management to describe the following processes:
Performance monitoring— Collecting network activities at the device level for the sake of
- Device-related performance monitoring
- Network performance monitoring
- Service performance monitoring
Monitoring subtasks include
- Availability monitoring
- Response time reporting
- Monitoring utilization (link, device, CPU, network, service, and so on)
- Ensuring accuracy of the collected data
- Verification of quality-of-service parameters
- Data aggregation
Data analysis— Baselining and reporting
Data analysis subtasks include
- Network and device traffic characterization and analysis functions
- Capacity analysis
- Traffic forecasting
Performance management— Whereas monitoring only observes activities in the network, management modifies device configurations. In the case of performance management, this means adjusting configurations to improve the network's performance and traffic handling (threshold definitions, capacity planning, and so on).
Management subtasks include
- Ensuring compliance of SLAs and class-of-service (CoS) policies and guarantees
- Defining thresholds
- Sending notifications to higher-level applications
- Adjusting configurations
- Quality assurance
Figure 1-4 shows the performance management architecture. When comparing it to the accounting architecture, you can see several similarities, but also differences. The next section discusses both. Whereas accounting focuses on passive collection methods only, performance management can also apply active measurements. In this case, we inject synthetic traffic into the network and monitor how the network treats it.
Figure 1-4. Performance Management Architecture
[View full size image]
The definitions of performance management just described partly mingled the management of devices and services. A further refinement of performance management identifies three subcategories:
Device-specific performance management
Network-centric operations management
Device-specific performance management considers the device in an isolated mode. The device status is almost binary: it operates correctly, or a fault occurred. Performance monitoring at the network element level can also be considered binary, after the thresholds definition. For example, if CPU utilization in the range of 5 to 80 percent is considered normal, link utilization should be below 90 percent, and interface errors should not exceed 1 percent. Therefore, depending on whether the established threshold is exceeded, those situations are either normal or abnormal.
Network-centric performance management extends the focus to a network edge-to-edge perspective. Even though all devices might appear OK from a device perspective, the overall network performance might be affected by duplex mismatches, spanning-tree errors, routing loops, and so on.
Service management addresses the level above network connectivity. A service can be relatively simple, such as the Domain Name Server (DNS), or complex, such as a transactional database system. In both cases, the user expects the service to be available and have a predictable response time. Performance monitoring at the service level needs to include service monitoring as well as management functions to modify components of the service in case of failures.
The Relationship Between Accounting and Performance
This section covers both the similarities and differences between accounting management and performance management. We see a strong relationship between performance and accounting, which is reflected by some of the standard definitions. Even though a few have different descriptions (for example, the FCAPS model), the concepts are closely related. As an interesting observation, the TMF combines these two areas.
Both parts collect usage information, which can be applied to similar applications afterward, such as monitoring, baselining, security analysis, and so on. Some technologies, such as Simple Network Management Protocol (SNMP) counters, can be assigned to both performance and accounting and could lead to long theoretical discussions concerning which area they belong to.
Accounting and performance monitoring are important sources not only for performance management, but also for security management—this is another common area. A security management application can import the collected traffic information and analyze the different types of protocols, traffic patterns between source and destination, and so on. A comparison of current data sets versus a defined baseline can identify abnormal situations or new traffic patterns. In addition, the same application can collect device performance monitoring details, such as high CPU utilization. The combination of the two areas can be a strong instrument to identify security attacks almost in real time. The security example is perfectly suited to explaining the benefits of performance monitoring and accounting. Each symptom by itself (abnormal traffic or high CPU utilization) might not be critical per se, but the amalgamation of them could indicate a critical situation in the network.
From a network perspective, performance takes into account details such as network load, device load, throughput, link capacity, different traffic classes, dropped packets, congestion, and accounting addresses usage data collection.
The collection interval can be considered a separation factor between accounting and performance monitoring. A data collection process for performance analysis should notify the administrator immediately if thresholds are exceeded; therefore, we need (almost) real-time collection in this case. Accounting data collection for billing does not have real-time collection requirements, except for scenarios such as prepaid billing.
History is certainly a differentiator. An accounting collection for billing purposes does not need to keep historical data sets, because the billing application does this. Performance management, on the other hand, needs history data to analyze deviation from normal as well as trending functions.
Monitoring device utilization is another difference between the two areas. Device health monitoring is a crucial component of performance monitoring, whereas accounting management is interested in usage records. We will examine this in the following example. Imagine a normal network situation with average traffic load. Now think of a user installing "interesting" software without notifying the administrator. For example, suppose someone installs a monitoring tool and starts discovering devices in the network. Even though it is strongly recommended to secure device access and use cryptic values for SNMP communities instead of the default values "public" and "private," sometimes these suggestions are not followed. If at the same time security restrictions (such as access control lists [ACL]) are not in place, the user discovers network- and device-related details. The situation becomes critical when the user's monitoring tool collects the routing table of an Internet edge router. For example, retrieving the complete routing table of a Cisco 2600 router with 64 MB of RAM and 4000 routes takes about 25 minutes and utilizes about 30 percent of the CPU. An accounting application would not be able to identify this scenario, because the issue is not the amount of traffic transferred, but device utilization. A performance-monitoring application would identify this situation immediately and could report it to a fault application. Even this simple example illustrates the close relationship between performance and fault management as well as the fact that neither accounting nor performance management alone is a sufficient solution for network management.
Another situation would be a misconfigured link. Imagine that a logical connection between two routers was configured as a trunk of three parallel links. For troubleshooting, the administrator shut down two of the links and then solved the issue. However, imagine if he put only one link back to operational, providing only two-thirds of the required bandwidth. Traffic would still go through, and accounting records would be generated, but the increased utilization of the two active links could only be identified by a performance monitoring tool.
This leads to the observation that "fault" is another differentiator between accounting and performance management. In the first example, a performance management application could send a notification to a fault application and configure an ACL at the device to stop the unauthorized SNMP information gathering. In the second example, the performance management tool could automatically activate the third link and notify the administrator. Accounting management applications do not collect device health information, such as CPU utilization, and therefore would not have identified these issues. The close relationship between performance management and fault management is the subject of other publications. For more details, refer to Performance and Fault Management (Cisco Press, 2000).
A fundamental difference between monitoring approaches is active and passive monitoring. Accounting management is always passive, and performance monitoring can be passive or active.
Passive monitoring gathers performance data by implementing meters. Examples range from simple interface counters to dedicated appliances such as a Remote Monitoring (RMON) probe. Passive measurement needs to monitor some or all packets that are destined for a device. It is called sampling if only a subset of packets is inspected versus a full collection. In some scenarios, such as measuring response time for bidirectional communications, implementing passive measurement can become complex, because the request and response packets need to be correlated. An example is the Application Response Time (ART) Management Information Base (MIB), which extends the RMON 2 standard. ART measures delays between request/response sequences in application flows, such as HTTP and FTP, but it can monitor only applications that use well-known TCP ports. To provide end-to-end measurement, an ART probe is needed at both the client and the server end. Cisco implements the ART MIB at the Network Analysis Module (NAM).
The advantage of passive monitoring is that it does not interfere with the traffic in the network, so the measurement does not bias the results. This benefit can also be a limitation, because network activity is the prerequisite for passive measurement. For example, observed traffic can indicate that a phone is operational, but how do you distinguish between an operational phone that is not in use and a faulty one if neither one generates any traffic? In this case the better approach would be to send some test traffic to the phone and monitor the results or alternatively have the device send keepalives regularly. The active monitoring approach injects synthetic traffic into the network to measure performance metrics such as availability, response time, network round-trip time, latency, jitter, reordering, packet loss, and so on. The simplicity of active measurement increases scalability because only the generated traffic needs to be analyzed. The Cisco IP SLA is an example of active monitoring. See Chapter 11, "IP SLA," for details about IP SLA.
Best current practices suggest combining active and passive measurements because they complement each other.
As you can see from the preceding sections, there are common areas and differences, but in most cases, the combination of accounting and performance management provides benefits. Let us reflect on the following situation: a network operator has performance management and accounting management in place, which both collect usage information from the devices and store them at different collection servers. So far, the data sets have not been linked with to each other, but this can be useful. If you want to reduce the measurement's error rate, you can collect detailed usage records (accounting management) and basic SNMP counters (performance monitoring) and compare the results. This increases the confidence of the measurement.
Figure 1-5 illustrates the different network management building blocks.
Figure 1-5. Network Management Building Blocks
[View full size image]
In summary, we suggest combining performance management and accounting, as well as fault and security management, to build a complete network management solution.
A Complementary Solution
These examples clearly point out that the right question is not "Why do I need accounting or performance management?" but instead "In which cases do I apply which method?" The first and most important question should always be "What is the purpose of collecting data?" When the administrator differentiates between security management, service-level management, billing, and so on by taking this top-down approach, it is straightforward to position accounting and performance management in the overall architecture.
To summarize, we want to emphasize that both performance monitoring and accounting management gather usage data used as input for various management applications. Performance management is one example of a management area that benefits from performance monitoring and accounting, but also actively modifies the network and its behavior. They are related, because without performance monitoring you operate the network blindfolded. Without accounting, you can hardly identify the cause of bottlenecks and outages identified by performance management. The intersection between the two areas is typically the network monitoring part (see the section "Network Monitoring"). This is a generic term for any data collection tasks that are common between accounting management and performance management. Figure 1-6 illustrates the overlap between the two areas.
Figure 1-6. Complementary Solution
Now that we have clearly defined accounting management, performance monitoring, and performance management, we can take a closer look at the benefits network planners and administrators can achieve from each area.