1.3 Why Good People Write Bad Code

Now that we've walked on the dark side, looking at all kinds of things that can go wrong with our software, let's turn our attention back to root causes: why do software flaws occur? Why do good people write bad code?

A great many people believe that vulnerabilities are the spawn of stupid (and probably slothful) programmers. Some adherents to this credo have been customers of ours. Although we have listened respectfully to the arguments for many hours, we disagree.

We believe that, by and large, programmers want to write good software. They surely don't set out with the intention of putting security flaws in their code. Furthermore, because it's possible for a program to satisfy a stringent functional specification and nevertheless bring a vulnerability to life, many (if not most) such flaws have been coded up by people who do their best and are satisfied with (even rewarded for) the result.

What's so hard about writing secure code? Why do vulnerabilities exist at all, let alone persist for decades? Why can't the vendors get it right?

We believe there are three sets of factors that work against secure coding:

Technical factors: The underlying complexity of the task itself
Psychological factors: The "mental models," for example, that make it hard for human beings to design and implement secure software
Real-world factors: Economic and other social factors that work against security quality

This is a hard problem. After a close look at our examples, we think you will come to agree that wiping out security vulnerabilities by just doing a better job of coding is a monumental?perhaps hopeless?task. Improved coding is critical to progress, of course. But some vulnerabilities seem to arise without any direct human help at all. We engineers will have to adapt our tools and systems, our methods, and our ways of thinking. Beyond this, our companies, our institutions, and our networked society itself will need to face up to the danger before this scourge can pass away.

1.3.1 Technical Factors

Truly secure software is intrinsically difficult to produce. A true story may help show why.

1.3.1.1 The Sun tarball story

While Mark worked at Sun back in 1993, he received one of those middle-of-the-night phone calls from CERT he used to dread so much. Jim Ellis told him they had received and verified a report that every tarball produced under Solaris 2.0 contained fragments of the /etc/passwd file.^[8] If this were true, Mark thought, Sun and its customers were in terrible trouble: the password file was a fundamental part of every system's security, the target of an attacker's "capture the flag" fantasy. Was Sun giving it away? Was their software actually shipping out the password file to be deposited on archival backup tapes, FTP and web sites, and countless CD-ROMs?

^[8] A "tarball" is an archive file produced by the Unix tar (Tape Archive) utility. Originally designed to copy blocks of disk storage onto magnetic tape, it's still in worldwide use today, the predominant method of transferring files between Unix systems.

Jim had passed along a program he had put together to examine tar archive files for /etc/passwd fragments (see Figure 1-4), so it didn't take long for Mark to confirm his report. Soon he was pulling vice presidents out of meetings and mobilizing the troops?pulling the metaphorical red fire alarm handle for all he was worth. What worried him was the possibility that some devious, forward-looking mole might have inserted the vulnerability into the Sun code tree, several years earlier, with the intent of reaping the customer's password files much later?after the buggy code had distributed thousands of them around the Internet.

Figure 1-4. The Sun tarball vulnerability

The story has a happy ending. Mark was able to track down the change that introduced the bug and satisfy himself that it was inadvertent. Coincidentally, beginning with this release, the password file was no longer critical to system security: Solaris 2 introduced into Sun's product the idea of the shadow password file, so the /etc/passwd file no longer contained user passwords. He fixed the bug, built a patch, issued a security advisory (Sun Security Bulletin 122, issued 21 October 1993), and breathed a sigh of relief. But Mark has never shaken the concern that such a longitudinal attack may in fact have been launched against some vendor many years ago and is silently doing its work still today.

Let's take a step back and look at some of the technical details of this particular bug. They may help illuminate the more general problems of writing unsecure code.

Material was relayed in 512-byte blocks from a disk source to the archive file. A read-a-block/write-a-block cycle was repeated over and over, until the entire source file was saved. However, the buffer to which the disk source block was read was not zeroed first by the programmer before the read. So the part of the block that extended past the end of the file on the last read did not come from the file, but rather from whatever was in the memory buffer before the disk read.

The usual result from such an error would be random junk at the end of the archive file. So why were fragments of the password file being written? It turned out that the buffer to which the disk block was read happened to already contain a part of the user password file?every time, without fail. Why did this happen?

The buffer always held leftover information from the password file because, as part of the read/write cycle, the tar program looked up some information about the user running the program. The system call used to look up the user information worked by reading parts of the /etc/passwd file into memory. The tar program obtained memory for this purpose from the system "heap" and released it back to the heap when the check was done. Because the heap manager also did not zero out blocks of memory when it allocated them, any process requesting storage from the heap immediately after that system call was executed would receive a block with parts of the /etc/passwd file in it. It was a coincidence that tar made the system call just before allocating the "read-a-block" buffer.

Why didn't Sun notice this problem years before? In previous versions of the software, the system call relating to the check of usernames happened much earlier. Other allocations and deallocations of the buffer intervened. But when a programmer removed extraneous code while fixing a different bug, the security vulnerability was introduced. That program modification moved the system call and the disk read closer together so that the buffer reuse now compromised system security.

Once all this analysis was done, the fix was simple?from something like this:

char *buf = (char *) malloc(BUFSIZ);

to something like this:

char *buf = (char *) calloc(BUFSIZ, 1);

Editing just a few characters (making the code now invoke the "clear allocate" routine, which allocates a buffer and then zeroes it) "fixed" the problem and closed the vulnerability.^[9]

^[9] While this code "works," it is probably not the best way to fix this problem. In Chapter 3, we'll display some alternatives in the discussion of security in Section 3.3.2.

The reason we tell this story in so much detail is to illustrate that critical security vulnerabilities can often result not from coding or design mistakes, but merely from unanticipated interactions between system elements that by themselves are neither unsecure nor badly engineered.

In the next chapter, we'll discuss architectural principles that (if followed) could have rendered this particular bug harmless. Please note, however, that a program with "harmless" bugs is not really secure. It's more like a person who has a deadly disease under control. We'll discuss this issue in more detail a little later on, when we talk about the effects of system complexity.

1.3.1.2 Effects of composition

Here is a related effect: application systems are often composed from multiple separate components, each of which may be perfectly secure by itself. However, when components are taken together, they may create a hole that can be exploited. A famous example of this class of problem was the Unix "rlogin -l -froot" bug. It was caused by the composition of an rlogin server from one source and a login program from another. The problem was that the login program accepted preauthenticated logins if passed an argument -f <username>, assuming that the invoking program had done the authentication. The rlogin server program, however, did not know about the -f argument, and passed a username of -froot on to the login program, expecting it to do the authentication.

Neither program was wrong, exactly; but taken together they allowed any remote attacker to log in as system administrator without authentication. In other fields, the whole may be greater than the sum of the parts; in computer security, the sum of the parts is often a hole.

As a bridge-playing expert that we know observed after a disastrous tournament result, "No one made any mistakes. Only the result was ridiculous."

1.3.1.3 Other effects of extreme complexity

In addition, spontaneous security failures seem to occur from time to time. Why does this happen? Consider the following explanation, from James Reason's masterful Human Error. He draws a surprising analogy:

There appear to be similarities between latent failures in complex technological systems and resident pathogens in the human body.

The resident pathogen metaphor emphasizes the significance of casual factors present in the system before an accident sequence actually begins. All man-made systems contain potentially destructive agencies, like the pathogens within the human body. At one time, each complex system will have within it a certain number of latent failures, whose effects are not immediately apparent but that can serve both to promote unsafe acts and to weaken its defense mechanisms. For the most part, they are tolerated, detected and corrected, or kept in check by protective measures (the auto-immune system). But every now and again, a set of external circumstances?called here local triggers?arises that combines with these resident pathogens in subtle and often unlikely ways to thwart the system's defenses and bring about its catastrophic breakdown.

We believe that it's in the very complexity of the computer systems we engineers work with that the seeds of security failure are sown. It's not just that an algorithm too complex for the skill of the eventual coder will engender bugs. Perfect reliability?in this context, a complex system with no security vulnerabilities?may not in fact be achievable. (We'll leave that to the academics.) We certainly have never seen one; and between the two of us, we have studied hundreds of complex software systems.

Ah, but the situation gets worse. Do you know any mistake-proof engineers? We'll look at the human side of failure in the next section.

1.3.2 Psychological Factors

Programmers are people, a fact that many security analysts seem to overlook when examining the causes of vulnerabilities. Oh, everybody agrees that "to err is human," and it's common to lament the fallibility of software engineers. But we've seen little in the way of careful thinking about the influence human psychology has on the frequency and nature of security vulnerabilities.^[10]

^[10] If this subject interests you, we recommend that you follow up with the best text we know, Psychology of Computer Programming by Gerald Weinberg. It's a remarkable book, which has just been reprinted for its 25th anniversary. There are a few other authors who have made a good start on the study of human error as well. See Appendix A for details.

1.3.2.1 Risk assessment problems

Programming is a difficult and frustrating activity. When we or our colleagues perform a security analysis on software, we've noticed that (unless we take special precautions to the contrary) the kinds of errors we find are the ones we're looking for, the ones we understand, and the ones we understand how to fix. This factor (the tarball vulnerability we described earlier illustrates it) is one of the best arguments we know for automated security tests that require one to run and respond to a whole range of errors, both familiar and unfamiliar.

Here's another factor. When we ourselves do design work, we find that we are uncomfortable thinking about some of our colleagues/coworkers/customers/fellow human beings as crooks. Yet, that is exactly the mindset you as a developer need to adopt. Never trust anyone until his trustworthiness has been verified by an acceptably trustworthy source?that's the rule. Most of us find that to be an uncomfortable mental posture; and that's a real complication.

Another difficulty is that human beings tend to be bad at particular kinds of risk assessment?for example, determining how hard you need to try to protect passwords against snooping on your network. Your judgments are going to be made using a brain design that seems to have been optimized for cracking skulls together on an African savannah. However we got here, our brains certainly haven't been reengineered for Internet times. Your trust decisions are going to be influenced by your own personal experiences with various kinds of bad guys. The evaluations you make about the relative likelihood of possible attacks will be influenced by physical proximity to the attack sources. The impact of these outdated natural tendencies will be felt in every design you produce.

This fact is one of the reasons we strongly recommend the use of checklists, which can be prepared once (and specially designed to concentrate on such perceptual problems) and utilized ever after while in a more everyday frame of mind.

1.3.2.2 Mental models

During the design stage of a project, another of our most interesting human foibles is most evident: the concept of psychological "set," which is the adoption of mental models or metaphors. It's an abstract topic, for sure, and most developers probably never consider it. But we think it bears a little examination here.

All of us use mental models every day as an aid in executing complex tasks. For example, when you're driving a car, you are probably not conscious of the roadway itself, of the asphalt and the paint and the little plastic bumps you might find to help guide your way. Instead, you accept the painted lines and the roadway contours, berms, and culverts as mental channels, constraining your actions and simplifying your choices. You can manage to keep your car between two painted lines (that is, stay in your "lane") more easily than you could calculate the necessary angles and minute real-time adjustments without them. Painted driving lanes are, in fact, an engineering achievement that takes into account this exact human trait.

Designing a piece of software?putting a mental conception into terms the computer can execute?is a complex mental activity as well. All the software engineers we know make extensive use of mental models and metaphors to simplify the task.

In fact, one of the characteristics of an excellent engineer may be that very ability to accept for the moment such a metaphor, to put oneself in the frame of mind in which, for example, a "stream of input characters is what the user is saying to us about what actions the program should take." If you take a second look at that last phrase, we think you will agree with us that extensive metaphorical imagery comes very naturally when people are talking about programs.

Enter the bad guy. Attackers can often succeed by purposely looking only at the asphalt, without seeing the lanes. To find security holes, think like an alien: look at everything fresh, raw, and without socialized context. (See the later sidebar The Case of the Mouse Driver for an example of this in action.) Similarly, to avoid security vulnerabilities in your code, you must develop the habit of suspending, from time to time, your voluntary immersion in the program's metaphors. You must train yourself (or be goaded by checklists and aided by specialized tests) to examine the ones and zeroes for what they are, surrendering their interpretation as identification numbers, or inventory elements, or coordinates on a screen.

1.3.2.3 Ways of thinking about software

In order for your applications to stand up against a determined attack, you will need to build in several layers of defense. You don't want an exploitable weakness at any level. To weed those out, you will need a thorough understanding of what a program is?of the worlds in which your software lives.

Many of us have spent our whole working lives dealing with software. We design, write, adapt, fix, and use the stuff. When we do, what are we manipulating? You have probably gestured at a printout or a display of letters on a screen, for example, and referred to that as a program. But what is a computer program, really?

Here is a list of ways that you might think about the nature of software. We invite you to try to imagine how you as an attacker might try to exploit a program in each "plane of existence" we list. You can think of software as:

An arrangement of abstract algorithms
Lines of text on a sheet of paper or screen
A series of instructions for a particular computer processor
A stream of ones and zeros in computer memory, or stored on magnetic or optical media
A series of interlinked library routines, third-party code, and original application software
A stream of electronic and optical signals along electromechanical and other kinds of pathways
Running or residing on a host as an element of a hardware network

All of the above are fairly straightforward. But here are a few other ways that may not be so straightforward. You'll want to consider your application as:

A set of "vertical" layers, such as transport, protocol, and presentation. (These are elements that, in a way, can be thought of as being built on top of one another.)
A set of "horizontal" stages, such as firewall, GUI (Graphical User Interface), business logic, and database server. (These are "peer" elements that operate at the same level and communicate with each other.)
A series of events that takes place in designated time slices and in a controlled order.
Executing at a disparate set of locations. Think about it: when an application is running, where are the user, the code itself, the host, the server, the database, the firewall, and the ISP located? They can all be in different locations, spread around the world.

It's remarkable to us, but true, that we have seen successful attacks based on each of the points of view listed in this section! It is mind-bending considerations like these that make effective application security such a tremendous challenge.

Here are a couple of examples of how some of these unusual considerations can affect security. On the "good guy" side, one of the most intriguing security patents of recent years uses the physical location of a person (as indicated by a global positioning system device) to help decide whether that person should be allowed to log into a system. This approach uses a characteristic that is seldom considered?precise physical location?to enhance the accuracy of authentication and authorization decisions. On the other hand, some of the most difficult software vulnerabilities we've ever had to fix had to do with subtle timing effects involving events?just a few milliseconds apart?that could occur in two slightly different orders.

For an illustration of how "mental" aspects of software can lead to vulnerabilities, see the following sidebar.

The Case of the Mouse Driver

One of our favorite security bugs helps illustrate how attackers think outside the programming metaphors. In this case, an attacker found that he was able to take control of a Unix workstation by manipulating a piece of system software known as a mouse driver. The designer of this program certainly never intended it to be invoked by a real user. It was called as part of a chain of execution by another program. Still, probably because convenient library routines were available for the purpose?or perhaps because it made it easy to debug the program during development?input to the driver was supplied in the form of parameters on a command line. The job of the mouse driver was to position the cursor on the screen in a spot corresponding to movements of the mouse. The X and Y coordinates at which the cursor was to be positioned were supplied as integral values from, say, 0 to 1023. In normal use, the command line provided by the invoking screen-control software would look something like "driver 100 100".

The program, because it needed to manipulate the screen cursor, was installed with high system privileges. And this design worked perfectly well for years, until one day someone with malevolent intent found a way to subvert it. By invoking the program directly and by supplying X and Y coordinates that were so large as to be meaningless, the manipulator was able to deliberately overflow the buffers allocated for the coordinates and use the program's privileges to take control of the system.

This vulnerability came into existence precisely because the engineer successfully "channelized" his thinking. The attacker succeeded by ignoring the purpose for which the program was designed, rejecting the metaphor underlying the design and instead looking straight at the bits. It's a skill to be cultivated by those who want to understand how software can be subverted, though, and as we mentioned, it's a skill that's perhaps antithetical to the skills that facilitate software design itself.

1.3.3 Real-World Factors

Enough theory. Let's come back to the real world now, and consider for a moment how software is actually produced. We'll start with a few points that are sure to offend some of our colleagues.

1.3.3.1 The source of our source code

Do you know who wrote most of the software the Internet runs on? Amateurs originally wrote many of the systems programs that have the worst vulnerabilities. (Don't worry, we'll excoriate the professionals soon enough.) One reason for this is that Berkeley undergraduates first developed much of Unix?in particular, the TCP/IP networking subsystem. Thus, we owe many of the Internet's design and architectural decisions, and a surprising amount of code, to a collection of students of widely varying abilities using techniques that were current in the mid-1970s!^[11]

^[11] Professor Eugene H. Spafford describes the history well in "UNIX and Security: The Influences of History," Information Systems Security, Auerbach Publications; 4(3), pp. 52-60, Fall 1995.

1.3.3.2 The democratization of development

The problem of amateurs writing code is not simply a historic one. Much of today's new software is being written by folks with no training at all in software engineering. A good example is the fact that many CGI scripts used extensively on the Net (some on which other folks have built entire businesses) have been clearly thrown together by people with no background at all in software. (That is, in fact, one of the design goals of HTML.) Don't get us wrong. We think it's terrific that practically anybody with the will to learn the basics can put together an online service, or a library, or a form-based database. But there is a cost.

Of course, we don't really believe that most of the security problems on the Net arise because gross amateurs are writing the programs. We professionals deserve most of the blame. So we're going to shift gears again and look at a few reasons why, even with the best training and the best intentions, doing software engineering securely in the real world remains a very challenging undertaking.

1.3.3.3 Production pressures

Almost all software is produced under some schedule pressure. Software engineers don't work in a vacuum?even if they care passionately about secure coding and work not for profit-seeking software houses, but as part of an open source effort. Testing time is limited. The chance to research how someone else has approached a problem may not come before it's time to freeze and ship. The real world impinges, sometimes in unpredictable ways.

The plight of the software engineer who wants to produce secure code is never easy. Sometimes we have to give up on the best possible result, and settle for the best result possible. And sometimes that best result (from the point of view of the individual engineer, or his or her management) has or may have security weaknesses.

1.3.3.4 Just secure enough

It is often hard for people who understand technical security issues, but have not worked as full-time software engineers, to understand how companies comprised of their colleagues can produce deeply flawed and insecure products.^[12] One of the hopes we have for this book is that it will provide some insight here?not by way of making excuses for anyone, but rather by helping to foster a level of understanding that can help remove the root causes of these problems.

^[12] We have in mind comments such as one by Karl Strickland, a convicted computer attacker and member of the "8LGM" group, which posted exploit scripts on the Internet in the late 1990s. "I don't see the problem. One bug fix, one person. Two bugfixes [sic], two people. Three bugfixes [sic], three people, working simultaneously on different bugs. How hard can that be?" ?Usenet comp.security.unix discussion thread, May 1994.

Suppose that you are a software vendor in a competitive marketplace. Your profit margins are tight, and your marketing team believes that security is not a deciding factor for customers in your product space. In this kind of environment, wouldn't you be likely to produce software that is "just secure enough"? Secure enough, we mean, not to alienate the majority of your customer base.

A friend of ours was "security coordinator" for one of the major Internet software producers. Often buttonholed by customers at security conferences and asked questions like, "When are you guys going to stop shipping this crap?" he claims the answer he is proudest of was, "Sometime soon after you folks stop buying it." It's a point to consider.

Let's assume that the vendor's goal is to expend minimal resources to forestall show-stopping vulnerabilities, prevent loss of sales, and keep the company's name out of the news. What are some other factors that keep corporations from investing heavily in security quality?

The main reason, we think, is that whatever time and effort is spent on finding, verifying, and fixing security bugs means that fewer engineers are available for adding new features.

A second reason may be that some companies act as if downplaying, denying, or delaying acknowledgment of security vulnerabilities will give them an edge over the competition. Think about it. If you were the CEO and no one was forcing you to face up to the security flaws in your products, wouldn't you be focusing on positive angles, on new features and services that bring in the revenue? You would overlook flaws in your product if you could get away with it, wouldn't you? Most of us would at least be tempted (and we're not battered about by stockholders and litigation-wary attorneys).

1.3.3.5 The tragedy of the commons

We'd like to think that, even if marketing factors (and common decency) don't suffice, considerations of citizenship and business ethics might compel corporate software producers to clean up their act in security matters. Unfortunately, it doesn't seem to work that way. This might be explained by the so-called "tragedy of the commons," an idea first brought to wide attention in a seminal article by Garrett Hardin in 1968:

The tragedy of the commons develops in this way. Picture a pasture open to all. It is to be expected that each herdsman will try to keep as many cattle as possible on the commons.

As a rational being, each herdsman seeks to maximize his gain...The rational herdsman concludes that the only sensible course for him to pursue is to add another animal to his herd. And another . . . But this is the conclusion reached by each and every rational herdsman sharing a commons. Therein is the tragedy. Each man is locked into a system that compels him to increase his herd without limit?in a world that is limited.^[13]

^[13] See Garrett Hardin, "The Tragedy of the Commons," Science, 162(1968):1243-1248.

In our context, the Internet is the common resource. Each vulnerability is a kind of pollution. Adding one more bug to the world's security burden is in the shortsighted economic interest of each company. So long as fixing bugs will divert resources that can be used to individual advantage elsewhere, profit-seeking companies will not invest in wholesale secure coding practices. As Hardin observed, "The inherent logic of the commons remorselessly generates tragedy."

The Lesson of Y2K

Many security experts, including your authors, have lobbied for years for "blanket code sweeps" for security vulnerabilities at some of the big software houses^[14]. A careful one-time effort would be no substitute for the revolution in secure coding that seems to be called for, but it would be a giant step forward. Why do you think such pleas have always failed? A similar effort for the remediation of Y2K bugs succeeded notably.

We can think of three reasons:

In the case of Y2K, there was a definite, unchangeable deadline.
The worldwide focus on possible Y2K catastrophes meant that any company that failed to fix their code was guaranteed a mass of highly unfavorable headlines.
In the case of security, it's hard to see where the one-time budget allocation for the sweep would come from. Hope springs eternal, of course!