Service Notions

Performance management deals with the notion of services, among other things, as discussed in Chapter 1, "Understanding the Need for Accounting and Performance Management." The term service implies the concept of a Service Level Agreement (SLA). This is an agreement whereby a service provider agrees to provide certain SLA parameters for a specific service: uptime, guaranteed bandwidth, minimum response time, minimum jitter, minimum packet loss, and so on.

SLAs offer service providers the ability to provide additional services, implement competitive pricing schemes, and gain a competitive edge. For example, a premium price can be defined for a faster (either faster than or just fast) VPN connection so that the users can take advantage of real-time applications. After defining the SLA parameters, the service provider is supposed to verify the delivered level of service. Note that the customers also should monitor the delivered service quality.

We distinguish between two categories of SLA parameters, also called metrics: intrinsic and operational. Intrinsic parameters are essential to the service itself, and operational factors depend on the procedures to deliver and maintain a service:

Intrinsic— These metrics are related to the service quality and can consist of some or all of the following components:
- Latency or network delay—The time it takes a packet to traverse from one endpoint to another.
Network delay = propagation delay + serialization delay + queuing delay (at the device)
- Jitter or latency variance—The variation of the delay due to queuing, load balancing, etc., which is an issue especially for voice and real-time applications.
- Availability of the network or a specific service
Availability = (uptime) / (total time)
Throughput / available bandwidth
CPU utilization
Buffer and memory allocation (size, misses, hit ratio)
- Packet loss—This feature requires the counting of packets between the outbound interface of the source device and the inbound interface of the destination device. Reasons for packet loss can be CRC errors, queue dropping, route changes, device outages, misconfigured access control lists, and others.
- Out of sequence/packet reordering—This can be caused by queuing or load balancing.
- Errors—Include failed components, devices, links, and others.
Operational— These metrics relate to a service's functional factors:
- Mean Time to Provision
- Mean Time to Restore
- Mean Time to Repair
- Mean Time Between Failures

Figure 3-10 summarizes the service level parameters.

Figure 3-10. Service-Level Parameters

[View full size image]

Even though the TMN and eTOM framework building blocks describe the concepts of service level management and TMF's GB917 specifies SLA management, service providers often offer similar services with different SLA parameters.

Most of the time, the SLA implies monetary penalties under certain circumstances—for example, no connectivity for a period of time, a too-high percentage of packet loss, or exceeding delay or jitter results in a refund to the customer. Therefore, the implications are important. Yet TMN and eTOM leave some details open. How should you measure the intrinsic SLA parameters? Should you use active probing or passive measurement for delay measurement? If active probing is the solution, which interval, frequency, or packet size should you use? In case of passive measurement, which packet sampling parameters should be applied? How should you deduce the statistics of the real traffic?

The ITU-T defined a set of objectives for performance service parameters in specification Y.1541, Network Performance Objectives for IP-Based Services:

"The objectives apply to public IP Networks. The objectives are believed to be achievable on common IP network implementations. The network providers' commitment to the user is to attempt to deliver packets in a way that achieves each of the applicable objectives. The vast majority of IP paths advertising conformance with Recommendation Y.1541 should meet those objectives. For some parameters, performance on shorter and/or less complex paths may be significantly better."

In summary, Y.1541 suggests these parameters, and they can be considered current best practice definition.

The IETF IP Performance Metric (IPPM) working group developed a set of standard metrics that can be applied to the quality, performance, and reliability of Internet data delivery services. The metrics, as defined by IPPM, do not represent a value judgment (they don't define "good" and "bad") but rather provide guidelines for unbiased quantitative measures of performance.

For voice quality, there is a specific parameter: the Mean Opinion Score (MOS). However, the MOS is subjective. It ranges from 5 (excellent) to 0 (unacceptable). A typical desirable range for a voice over IP network is from 3.5 to 4.2. MOS is defined by ITU-T Recommendation G.107 (E-Model). Chapter 15, "Voice Scenarios," describes voice management in detail.

Most of the time, you have to rely on best practices, and these rely on some (basic) statistics.

Measuring SLAs requires some basic understanding of statistical knowledge to analyze and understand performance data and predict future network performance:

Average or mean— The sum of the collected values divided by the number of values.
Mode— The most common occurrence of a value in a distribution.
Median— The middle value in the distribution.
Variance— A measure of how spread out a distribution is.
Standard deviation— The square root of the variance. It measures the spread of the data around the mean value, where a smaller standard deviation is better than a larger value.
nth percentile— A score in the nth percentile is a score that is greater than n percent of the scores attained within a given set. For example, a response time SLA is defined as the 90th percentile of 100 ms for latency values. This implies that 90 percent of the network traffic has latency less than or equal to 100 ms. This excludes "extreme" values caused by abnormal situations or collection errors.

Table 3-11 lists some IPPM standard-track RFCs.

Table 3-11. IPPM References
RFC	Status	Title	Description
2678	Standard	IPPM Metrics for Measuring Connectivity	Defines a series of metrics for connectivity between a pair of Internet hosts
2679	Standard	A One-way Delay Metric for IPPM	Defines a metric for one-way delay of packets across Internet paths
2680	Standard	A One-way Packet Loss Metric for IPPM	Defines a metric for one-way packet loss across Internet paths
2681	Standard	A Round-trip Delay Metric for IPPM	Defines a metric for round-trip delay of packets across Internet paths
3393	Standard	IP Packet Delay Variation Metric for IPPM	Refers to a metric for variation in delay of packets across Internet paths
3432	Standard	Network Performance Measurement with Periodic Streams	Describes a periodic sampling method and relevant metrics for assessing the performance of IP networks

For more information on statistics, refer to Probability and Statistics by Spiegel, Schiller, and Srinivasan (McGraw-Hill Professional, 2001) or other statistical books.