Measuring and Improving Performance

No matter what the application, architects and developers always must consider the performance implications of their solutions. The Compact Framework is no exception, and, so, in the rest of this chapter, we'll look at the developer challenge of measuring and improving performance in Compact Framework applications.

Specifically, this discussion includes advice on how to measure performance accurately and how to enable performance statistics that are built into the Compact Framework, as well as a discussion of some common performance issues, and finally some general tips on improving performance.

Measuring Performance

In order to assess the design decisions made when implementing an application, an accurate baseline must first be created. The time-honored way of doing this involves implementing a timing mechanism and then instrumenting the code in the application to use the timer.

The most accurate technique available to developers is the use of a high-resolution timer. This mechanism is not provided in the Compact Framework and so requires calling two unmanaged API functions, QueryPerformanceFrequency and QueryPerformanceCounter. Because the latter function is processor dependent, the former function returns a value that represents the number of counts per second, known as the frequency, while the latter simply returns a count. A developer would call QueryPerformanceCounter at the beginning and end of the code block or method, subtract to get the processor-dependent duration, and then divide by the frequency, which would yield a value in seconds. Because some measurements can be short, the code should allow for conversion to milliseconds.

To encapsulate the timing mechanism, a class like PerfTimer, shown in Listing 11-3, can be created. This class includes declarations for the two API methods exported from Coredll.dll. In this implementation, the constructor will retrieve the frequency via the QueryPerformanceFrequency and two methods for starting and stopping the time measurement. At the beginning of the code block to be measured, the developer would call the Start method, which stores the time returned from QueryPerformanceCounter. At the end of the code block to be measured, the developer would call the Stop method, which returns a 64-bit integer that reflects the number of milliseconds that have elapsed since the Start method was called.

Listing 11-3 A `PerfTimer` Class. This class wraps the timer functionality needed to instrument a Compact Framework application.

using System;
using System.Runtime.InteropServices;

namespace Atomic.CeUtils
{
  class PerfTimer
      {
    [DllImport("coredll.dll")]
    extern static int QueryPerformanceCounter(ref long perfCounter);

    [DllImport("coredll.dll")]
    extern static int QueryPerformanceFrequency(ref long frequency);

    static private Int64 m_frequency;
    private Int64 m_start;

    // Static constructor to initialize frequency.
    static PerfTimer()
    {
      if (QueryPerformanceFrequency(ref m_frequency) == 0)
      {
        throw new ApplicationException();
      }
      // Convert to ms.
      m_frequency /= 1000;
    }

    public void Start()
    {
      if (QueryPerformanceCounter(ref m_start) == 0)
      {
        throw new ApplicationException();
      }
    }

    public Int64 Stop()
    {
      Int64 stop = 0;
      if (QueryPerformanceCounter(ref stop) == 0)
      {
        throw new ApplicationException();
      }
      return (stop - m_start)/m_frequency;
    }
  }
}

After adding this class to a project, it is simple to utilize in an application. There are three steps: (1) create a PerfTimer reference and instantiate, (2) call the Start method at the beginning of the test, and (3) call the Stop method storing the long value at the end of test.

Atomic.CeUtils.PerfTimer timer = new Atomic.CeUtils.PerfTimer();
timer.Start();
DoSomething();
long lDur = timer.Stop();
MessageBox.Show("DoSomething executed in " + lDur + "ms");

Enabling Performance Statistics

Although using the PerfTimer class shown in Listing 11-3 allows an application to profile its own code, it would also be nice to see performance statistics related to the Compact Framework itself, analogous to the Performance Counter statistics available on the desktop through the Performance Monitor utility. Fortunately, this can be accomplished by enabling the Compact Framework performance statistics.

To enable the statistics, a developer need only create the HKEY_LOCAL_ MACHINE\SOFTWARE\Microsoft\.NETCompactFramework\PerfMonitor registry key on the device. Under this key, a value called Counters must be created of type DWORD. To toggle statistics gathering, the value should be set to 1 (on) or 0 (off).

After enabling the statistics, the developer can run a managed program and terminate the process (note that on the Pocket PC, just tapping the X in the upper right-hand corner will not terminate the process), and the report will be generated in a file named Mscoree.stat, located in the root directory on the device.

TIP

It's important not to start any other managed programs while capturing statistics or the statistics may be corrupted.

Table 11-1. Compact Framework Performance Statistics
EE start-up time	Bytes in use after full collection
Total program runtime	Time in full collection
Peak bytes allocated	GC number of application-induced collections
Number of objects allocated	GC latency time
Bytes allocated	Bytes JITed
Number of simple collections	Native bytes JITed
Bytes collected by simple collection	Number of methods JITed
Bytes in use after simple collection	Bytes pitched
Time in simple collect	Number of methods pitched
Number of compact collections	Number of exceptions
Bytes collected by compact collections	Number of calls
Bytes in use after compact collection	Number of virtual calls
Time in compact collect	Number of virtual-call cache hits
Number of full collections	Number of PInvoke calls
Bytes collected by full collection	Total bytes in use after collection

Within the Mscoree.stat file the statistics shown in Table 11-1 will be captured.

Working with the performance counters can be automated by creating an application that can toggle the registry key and display the statistics. Such a utility is discussed in our white paper (along with the meanings of some of the counters listed in Table 11-1) on Compact Framework performance referenced in the "Related Reading" section at the end of this chapter.

Performance Issues

Using the PerfTimer and performance statistics allows developers to test the performance of their specific code and its impact on the Compact Framework; however, there are some general performance tests that we've done and placed in our white paper. The most important of these tests and their general conclusions are reproduced here.

Data Binding

In one particular test, both large and small objects in a 100-element array were used to test the performance of automatic and manual data binding to a ListBox control. In our tests, manual data binding of the objects in a loop was approximately three times faster than using the DataSource property of the ListBox. This result should serve to warn developers to test their applications using a variety of techniques, instead of defaulting to the one that is simplest to code.

Handling XML

In a second test, we looked at the differences between using an XmlTextReader and an XmlDocument object to parse both small and large XML documents on the device. Not surprisingly, our tests confirmed that for simply loading and enumerating a document of any size, the XmlTextReader is faster, and as the size of the document increases, the advantage increases as well (from 1.5 times faster to almost 4 times faster). This indicates that unless developers need to hold an XML document in memory, the XmlTextReader is the better choice.

XML Web Services and SqlClient

In a third test we looked at the relative performance difference between retrieving data using the SqlClient .NET data provider and using an XML Web Service. Because the Web Service involves creating SOAP messages and using HTTP, whereas SqlClient uses TCP/IP directly, SqlClient outperformed the Web Service by a factor of eight when retrieving data. This indicates that if performance is crucial for applications that are in an always-connected mode, SqlClient is the better choice, unless the database is not a SQL Server.

Loading Controls and Forms

As controls are placed on the designer, VS.NET generates code to instantiate and initialize the controls in the InitializeComponent method; however, this code is often not optimized to create the form in a top-down manner by, for example, setting the Parent property of containers before adding controls to the container and reducing the number of method and property calls on controls during start-up by using Controls.Bounds instead of Control.Location and Control.Size. In one test, where the form contained a nested hierarchy using panels and a large number of controls, after applying these simple rules, the load time of the form decreased by 50%.

Concatenating Strings

Strings (System.String) in managed code are immutable objects, and so, when they are initialized, they retain their value. As a result, if the string is modified through concatenation, it is discarded, and a new string object is created. This has performance consequences when strings are concatenated in loops and during other code-intensive processing. For this reason, a more efficient option is to use the StringBuilder class. In our test, a simple concatenation was performed 100 times, and the StringBuilder outperformed the string by a factor of 23.

Improving Performance

Finally, the following is a list of more general performance recommendations that developers should keep in mind as they design their applications:

Reduce the number of function calls and function size. Minimize the number of function calls and allow functions to receive more parameters (make the methods "chunkier" and less "chatty").
Maximize object reuse. Frequently, creating objects will lead to performance degradation due to the frequent cleanup of objects and the fragmentation of the managed heap. This is especially important in resource-constrained devices. Keeping objects in memory will reduce the overhead of memory management.
Avoid manual garbage collection. Although the Compact Framework allows applications to invoke the GC, using the GC.Collect method, it is recommended that developers stay away from this because most developers will not know better than the Compact Framework when to start the process.
Delay initialization. In order to make an application appear more responsive on start up, the application can delay its initialization by allowing some of it to take place on a background thread. This can be done with the Thread class and the Invoker class shown in Chapter 3.
Use exceptions sparingly. Throwing exceptions in managed code is a relatively expensive process. As a result, developers should reserve exceptions for truly exceptional cases and not throw them as a normal event in the application.