6.2 Processor-independent simulation speed

What value should we use for an animation's dt ? We'd like to make it somewhat independent of the speed at which our program is running. The run speed can be influenced not only by your processor speed, but also by the size of your game window, and whether or not you have multiple views or documents open in your game. As far as possible, we'd like the apparent speed of our moving creatures to stay the same.

What exactly does this mean? As we discuss in Chapter 7: Simulating Physics, we give each simulation object a vector _position and a vector _velocity. For each update cycle, we compute an appropriate time step dt, we update our _velocity, and then we use the standard rule:

_position = _position + dt *_velocity

The issue at hand is this: what should dt be? One might imagine setting dt to some 'reasonable' fixed value like 0.1 that happens to look good on your own machine. But if your processor is running at 400 Mhz and your user's machine is running at 200 Mhz, your critters are going to move half as fast on the user's machine. If the user has a slow video card then your program is going to run even slower. And, on the other hand, when you get a 1.2 Gigahertz machine your critters are going to go three times as fast, and if they're part of a game this game is now going to be unplayable. (Gigahertz, or GHz, is of course a billion cycles per second, that is, a thousand Mhz. It took desktop machines something like 20 years to make it from MHz to GHz speeds. One of these days you'll see personal computers running at a terahertz or Thz speeds, where a terahertz is a trillion cycles per second.)

No, the trick is to let dt be real time. That is, we will measure the time length dt of each update cycle, and use that in our simulation. If the machine is slow, then the dt will be big, and the critter will move in a bigger step during each update cycle. If the machine is fast, the dt will be small, and the critter will move in smaller steps during each update cycle.

The way we implement this is to give our application a cPerformanceTimer object that has a tick() method which will return the elapsed time dt since the last time that tick() was called. And then we make our OnIdle method look like the following.

BOOL CPopApp::OnIdle(LONG lCount) 
{ 
    CWinApp::OnIdle(lCount); //Do the base class WinApp processing. 
    double dt = _timer.tick(); 
    animateAllDocs(dt); 
        //Step through all the docs and feed this timestep. 
    return TRUE; //Keep doing it over and over. 
}

We'll say more about the cPerformanceTimer class in the next subsection. For now, let's analyze the effect of using a 'real time' dt. Suppose that we have a simulation running on two machines, at 25 updates per second on the slower machine and at 50 updates per second on the faster machine. If each machine computes a dt as the elapsed time between updates and updates a critter's position as pos+ = dt * vel, we'll get the figures shown in Table 6.1.

Table 6.1. The effect of basing `dt` on the updates per second.
Updates per second	Time between updates	Action during 0.04 second
25	0.04	`pos + = 0.4 * vel;`
50	0.02	`pos + = 0.2 * vel;`
		`pos + = 0.2 * vel;`

Compare the net action during 0.04 second on the machines. If the velocity is constant, the net observed motion is the same. It is possible to imagine a simulation in which the value of vel might change between the first and second updates; this would simply mean that the simulation on the faster machine would be more accurate, which is no surprise. But letting dt be real time elapsed makes the best of things.

Since we measure the dt in seconds, this means that the speed is in units-per-second. Another way of looking at this is to realize that the speed is the magnitude of the _velocity, and the velocity is (new_position ? _position )/dt, which clearly has a units/sec magnitude.

In the Pop Framework we often give our critters a default speed of something like 2.0. What does this speed mean? The meaning emerges when you look at the size of the window world you are moving in. If you specify that the world is, say, ten units across, a speed of 2.0 means that a critter takes about five seconds to move across the window.

No matter what kind of computer you're using, and no matter how many or how few critters are running, no matter how big or how small the window is, the time for a critter to cross the screen should always be the same.

Measuring a timestep

We implement the timing of dt with a cPerformanceTimer class. The basic way that a timer works is to use a private double _currenttime member, a private double getsystemtime() method and a public double tick() method. The tick() call gets the system time, computes dt as the difference between the system time and the _currenttime, resets the _currenttime to match the system time, and returns dt. This is shown in Figure 6.2.

Figure 6.2. A `cPerformanceTimer` class

graphics/06fig02.gif

(Seasoned Windows programmers will be familiar with a special Windows object called a 'timer' that is created with a CWnd::SetTimer call. These timers are something like coarse beepers that can be set to send a window an OnTimer message at regular intervals, so long as the intervals aren't very short ? a hundredth of a second is for instance a shorter interval than a Windows timer can handle. Instead of being a coarse beeper, our cPerformanceTimer is a highly accurate clock. It has no relationship whatsoever to the standard Windows timers.)

The Pop Framework implements the cPerformanceTimer. On newer machines the cPerformanceTimer computes the system time by using a so-called 'high-resolution performance counter.' On the latest machines this counter seems to run at the same clock cycle as the machine, that is, if a machine's processor runs at 400 Mhz, the high-resolution performance counter measures of 400,000,000 ticks per second. And then we figure out a time interval in seconds by taking the number of elapsed ticks divided by the number of ticks per second. On slightly older machines, the high-resolution performance counter runs at about 1 Mhz, or one million ticks per second. So, the counter frequency is not necessarily the same as the chip Mhz.

On very old machines, the cPerformanceTimer code has to use the old clock () function, which runs at about 50 ticks per second. The multimedia timeGetTime function seems to be essentially the same function as clock, by the way.

A minor point. When you pause, for instance by opening a modal dialog, reading a help file, or letting your mainframe lose the focus, a lot of time will elapse before you go back to the OnIdle. Before restarting the process, call _timer.tick(), otherwise the next _timer.tick() will return a dt that's too large, as it's been running while you were out of the AppUpdate. A good place to put this extra update call is inside the CPopView::SetFocus method, because that gets called by at least one view whenever your program gets its focus back and starts back up. We do put upper and lower bounds on the dt values that our cPerformanceTimer::tick is allowed to return.

First let's talk about very high dt values. A machine may run a program dreadfully slowly, maybe only at five updates per second, taking something like 0.2 second per update. If the dt step size gets too big, the motion starts to look jerky. The objects move too far with each step, and you lose the illusion of continuous motion. The critters look like they're hopping about instead of smoothly sliding. We have a brute force correction for this. If the dt turns out to be larger than some maximum size of a _maxdt value of, let's say, 0.1 second, we'll just 'lie' to the program and have tick() return the _maxdt.

Now let's talk about very small dt values. With a really fast processor it's possible for dt to get so small that the machine begins to act weird, with odd jumps in the motion. This is because now the dt is so short that it's less than the refresh rate of your video card. If you ask your video card to refresh itself, say, 120 times per second and the card hardware is only refreshing itself at 60 Hz, then you're going to be asking for invisible and useless graphics updates ? worse than useless, actually, as the refresh requests can pile up and cause an odd-looking glitch when the message queue tries to process several of them in a row.

To avoid choking up the graphics pipeline, we set a _mindt, and make the tick() process spin in a while loop until at least _mindt seconds have passed. To compute an appropriate _minddt we find the graphics refresh rate by making a call to the global Windows method ::GetDeviceCaps(hdc, VREFRESH), and then we take the reciprocal of the refresh rate. The code looks roughly like the following.

int refreshrate = ::GetDeviceCaps(hdc, VREFRESH); 
_timer.setMinDt(1.0/double(refreshrate)); 
    //Don't run faster than the card.

More details of this code can be viewed in the Pop Framework mainfrm.cpp file.

It's useful for the designer (and eventually the user) to be able to see how fast the simulation is running. A good place to show this information is in the status bar that appears at the bottom of your View window. Rather than displaying the timestep dt, it's more useful to show the reciprocal 1.0/dt. The quantity dt is the seconds per update, and 1.0/dt is the updates per second. Because Windows is always doing little tasks in the background, the actual value of the dt is going to vary somewhat from cycle to cycle. To keep our updates per second from jumping around a lot, and being hard to read in the status bar, we actually compute this number as a rolling average of the last 60 1.0/dt values. This means that when you make a change to your program, it takes a few seconds for the updates per second value to settle down.

Improving the animation speed

The speed at which a program like Pop runs depends on two factors: the amount of computation and the graphics overhead of putting images on the screen. If you have a large number of critters with complex update methods the computation will dominate. Remember that when you have N objects, the number of pairs of objects is proportional to N². If you are checking for collisions among each pair of critters, or using forces which involve evaluating all the critter-to-critter distances, your computational overhead will go up as the square of the number of critters.

More often it is the graphics overhead that dominates. The exact costs of the graphics depend on the kind of cGraphics that your program uses, that is, cGraphicsMFC or cGraphicsOpenGL.

Whatever kind of graphics you use, there is one basic cost that we may as well call the pixel overhead. For every frame of the animation that you show, you are doing some sequence of actions in order to set the color of each visible pixel in your program's onscreen window. There are three factors that affect this pixel overhead.

pixel overhead area of rectangle * colors per pixel * bus overhead

The area of the rectangle is the number of pixels you are moving. Keep in mind that area grows as the square of the edge dimension. A 1600 x 1200 rectangle has four times as many pixels as a 800 x 600 rectangle. This means that if you develop your program while looking at a display with a 800 x 600 resolution, but some of your users run at a 1600 x 1200 resolution, then a full-screen animation program on their machine will run about four times as slow!

So one thing we do to help our animation programs run well on more machines is to start the main window out at moderate size of 800 x 600 rather than a full-screen size, because we have no control over how big 'full-screen' might be. Exercise 6.4 shows how to control the window size.

Table 6.2. The number of different bits per pixel in different color modes.
Number of colors	Bits per pixel
256	8
32,768	15
65,536	16
16,777,216	24
True Color	24 or 32

The importance of the number of colors per pixel is a little less obvious. Right-click on your desktop and select Properties... to bring up the Display Properties dialog. Go to the Settings sheet. The Color Palette control group has a dropdown select box with the options for the total number of colors. Some common options are listed in Table 6.2.

Many users tend to set the number of colors to a maximal value, although for many applications 256 colors are enough. The 256 limit is not as bad as it sounds, because a window is able to pick which particular 256 colors it uses. But programming for 256 color mode is a hassle, so our preferred choice is one of the next two higher selections, 32,768 or 65,536 colors.

The number of colors being used affects the speed of the pixel overhead because the more bits per pixel that you have, the more information your graphics implementation needs to move around. But this is not something that we can very easily change from within our program, nor should we, as it would be very poor Windows etiquette for your app to do something that affects all of the other apps on display. This said, it's actually quite common for commercial computer games to do this. In order to squeeze the most out of a system, commercial games usually bail out from Windows to a full-screen, single-task mode and adjust the graphics settings at will. But, in order to make our code as generally applicable as possible, we don't take that route in Software Engineering and Computer Games.

If you are using OpenGL graphics, then you sometimes must use the 16 bits per pixel mode, or 65,536 colors, as some graphics cards only provide hardware OpenGL acceleration for the 16-bit color mode. You can tell if you have hardware acceleration in the Pop Framework by consulting the Help | Your System's OpenGL Graphics Support dialog.

While we are on the topic of the Display Properties dialog, you may also be able to set the refresh rate of your graphics card on this dialog. A typical default speed is 60 Hz or 75 Hz, where, once again, 'Hz' means 'Hertz,' or 'updates per second.' Set this update speed as high as your card will allow for the pixel resolution you've chosen. You really shouldn't use a display running at only 60 Hz, as it will tire out your eyes. These days 90 Hz or higher is not uncommon. Upping this value gives your animation program the possibility of running faster; as was mentioned earlier in this chapter, you can't animate faster than your card's refresh rate.

A final thing to think about when you look at the Display Properties dialog is your pixel resolution. If your resolution is something like 1600 x 1400 and you try and run your game in a maximized window, the game is going to run slow, simply because of the enormous number of pixels in the window. If you want to run your game at a reasonable speed in a full-screen window, you need to reduce the pixel resolution. Alternately you can keep a high pixel resolution, but be aware that you shouldn't make your game window so large as to slow the update speed down too much.

We called the third pixel overhead factor 'bus overhead.' This is the time cost of moving the pixel information from one memory location to another. The reason we speak of 'moving the pixel information' is because normally one builds up a graphics image in some temporarily invisible offscreen memory location called a 'memory buffer', and then, when it's all ready, you move the image into a location called the 'frame buffer,' which is the information that the graphics card uses for painting the current image onscreen.

The bus overhead factor is very much dependent on the kind of graphics card you have, and whether you are running 2D MFC graphics or 3D OpenGL graphics. In the worst case, your graphics image is being stored in your system RAM and then being transferred to the frame buffer on the graphics card for each update. In a better kind of scenario, the memory image is on the graphics card 'near' the frame buffer. In the best possible situation, we don't actually have to move the memory image to the frame buffer; instead we use a trick called page-flipping to simply change the address that the graphic card uses as the location of the frame buffer.

In the past, graphics cards could only support page-flipping for an entire screen's worth of display. As mentioned above, you'll notice that most commercial computer game products do not in fact run in a windowed mode. They take over your whole screen. This is because (a) they want to use page-flipping for fast animation, sometimes (b) they want to set your screen resolution and colors per pixel down to lower values so that there's less pixels to have to set per image, and sometimes (c) they've written their code using brute pixel-count numbers and the code is not resolution-independent.

Perhaps wrong-headedly, we insist on having all our programs run inside windows on your desktop ? it seems more modern and user-friendly, and it makes our code more usable for other kinds of applications. As it happens, OpenGL graphics will in fact page-flip for windowed apps. But in MFC graphics we have to move a screen-sized block of pixels for each update, using the CDC::BitBlt method. But this method is so fast on modern graphics cards that our animations can in fact run as fast as we want. For many years, you couldn't write such a computer game with the normal Windows API, but those days are truly over.

We haven't said anything yet about the graphics costs besides the pixel overhead. Generally bitmaps are more expensive to draw than are triangles and geometric objects. In OpenGL graphics a number of more specialized considerations arise. Smoothing objects costs computation, textures are expensive, lighting has its costs, and so on. We'll say more about these details in Chapter 26: OpenGL Graphics.