In this section, we provide lists of recommended practices in a number of categories.
George Santayana said, "Those who do not remember history are doomed to repeat it." This is certainly applicable to software implementation flaws. The lesson that we should take from this oft-repeated statement is that we can prevent at least the most common of implementation flaws by studying them and learning from them. We believe that everyone who writes software should take some time to study and understand the mistakes that others have made.
 And Edna St. Vincent Millay is supposed to have said, somewhat more colorfully, "It is not true that life is one damn thing after another. It's the same damn thing over and over." Maybe she was thinking of buffer overflows.
Some specific things that you can do include the following:
The Internet is home to a myriad of public forums where software vulnerability issues are frequently discussed. Quite often, particularly in so-called full disclosure groups, software source code examples of vulnerabilities and their solutions are provided. Seek out these groups and examples; study them and learn from them.
In addition to this book, there have been dozens of excellent papers and books written on secure coding practices, as well as analyses of software flaws. Appendix A provides a good starting point for reading about mistakes and solutions.
One of the side effects of the Open Source Software movement is the vast amount of software source code that is now available to programmers. As a result, there is no shortage of examples of how to perform various actions in pretty much any programming language. (Just beware, though, that you'll also find copious examples of how not to do things as well.)
Most programs accept input of some kind. The topic of taking data input in a program is a rather broad one. Data can be acquired from a surprising variety of input sources, from the software's users to other computer systems on a network. With regard to security issues, though, the one thing they should all have in common is that the programmer should verify every piece of data input to the program. Take into account the architectural principles we discussed in Chapter 2 and make sure that you're heeding their warnings in the way you implement your software.
In particular, follow these practices:
Cleansing data is the process of examining the proposed input data for indications of malicious intent. Attackers often attempt to introduce data content to a program that is beyond what the programmer anticipated for that particular data input. Examples include altering character sets (e.g., Unicode), using disallowed characters (e.g., non-ASCII), and performing buffer-overflow insertion of data. Make sure to exhaustively scrub any data input to the program, regardless of its source. In this way, the code that cleanses data input will act much like a network firewall protecting a network segment from external attack.
Although bounds checking is technically an aspect of cleansing data content, it is so important that we believe that it bears repeating specifically here, because herein lies the birthplace of buffer overflows. Whenever you take input into a program, be sure to verify that the data provided can fit into the space that you allocated for it. Check array indexes to ensure that they stay within their bounds.
Configuration files are used by many popular operating systems. Some experts feel that they are an inherent security architecture flaw in and of themselves. Without doubt, from the perspective of a person intent on attacking a program, configuration files can be a ripe target. For example, quite often, subtle human errors are made in the file access controls that are intended to protect the configuration files of programs or processes that execute with system-level privileges. In such cases, the attacker may be able to alter a configuration file and cause the privileged process to facilitate a system compromise.
You must validate and cleanse the data coming from a configuration file just as you would if it were user input being typed in on a keyboard by an (untrusted) user. Always assume that the configuration file data has potentially been tampered with by an attacker.
Command-line parameters are even easier to trick than configuration files. That's because command lines are usually entered directly by the program's user, thus enabling a malicious user to try to fool the program into doing something that it was not intended to do.
Depending on how they're used, web URLs can be conceptually very similar to command-line parameters. In particular, many web application designers use URLs to embed variables and their values, so that they can be passed along to other programs and/or web pages. Although this is a popular technique, the web application programmer must take care that the receiving program does not blindly trust the contents of the URL. This is because the user can alter the URL directly within his browser by setting variables and/or their values to whatever settings that he chooses. If the web application isn't properly checking the data or is trusting it without verification, the web application can be successfully attacked.
Another popular web application programming technique is to embed variables in hidden HTML fields, similar to the way they can be embedded in web URLs. Such fields can also be modified by the user in a browser session, resulting in the same kinds of problems as with web URLs.
A third popular means of storing web variables is within browser cookies. As with web URLs and hidden HTML fields, cookie values can be altered by the end user and should not be simply trusted.
Most modern operating systems have some form of user environment variables that enable users to tailor their working environments to suit their interests and tastes. One common use for environment variables is to pass configuration preferences to programs. Attackers have long tried ways of tricking programs into misbehaving by providing them with unanticipated (by the programmer) environment variables.
Because the list of data input sources here can't possibly be comprehensive, be particularly cautious about sources of information that are not listed here. For example, be careful with included inherited environment variables, system signals, system resources, and so on. The point is that your code should have an inherent mistrust of any and all data that it receives, and should therefore go to great pains to ensure that the information that it receives is safe
Although some modern operating systems are vigilant about clearing memory that is allocated by programs and their variables, not all of them are. In particular, most common operating systems don't provide this type of protection; writing software for such an operating system is always going to require additional effort and vigilance on the part of the programmer. It is therefore important not to assume that your memory and storage are being initialized properly. They may well be given the more-or-less random default values of the physical memory segments where they are allocated. Get into the habit of initializing your variables to some safe value whenever they are allocated. Adopting this practice will save untold amounts of grief.
Apart from the security concerns of not adequately initializing variables, these mistakes can cause programs to behave unreliably if a variable gets a different initial value each time the program is run. Programmers can spend countless hours debugging code that contains this kind of simple mistake. It can be extraordinarily difficult to spot.
By filename references, we're referring to the practice of accessing file and directory pathnames within programs. While this may seem like a rather trivial topic, many subtle implementation flaws can occur when filenames are referred to in unsafe ways.
Most modern filesystems are organized hierarchically. While this organization is a boon for keeping our systems organized, it also leads to some security issues. Hierarchical naming makes it possible for a file to be referred to directly as well as indirectly?for example, /etc/passwd and /bin/../etc/passwd refer to the same file on most Unix and Unix-like systems. If you aren't careful in implementing a program, especially one that makes use of system privileges, it's possible that a malicious user can trick you into accessing a file that he may not have been able to access without your unwitting "assistance."
Likewise, some modern filesystems include the construct of a file link, whereby a filename actually "points" to another path/file elsewhere on a system. Here, too, a malicious user can sometimes trick a program into reading or writing a file that the programmer never intended him to and the system would not otherwise allow.
Most operating systems include the notion of an execution path or a data path, whereby an ambiguously specified program can be searched for (by the operating system) through a search path. This feature is generally meant to make life easier for the system's users. For example, rather than typing /bin/ls to list the contents of a directory, the user simply has to type ls and the operating system finds the utility ls in /bin/ls by traversing through the execution search path. For the programmer, however, danger lies in this ambiguity. Imagine, if you will, a system attacker who writes a piece of malicious software, gives it a name that's identical to that of a legitimate system utility, and is able to get this name into a user's search path ahead of the legitimate utility (perhaps by manipulating an improperly protected shell startup script). The attacker could thus dupe the user into running an arbitrary program of his choosing.
So, the lesson for the programmer should be clear: when interpreting a filename provided to your program, take great care in verifying that you are getting the file you intended to get.
From time to time, you will need to store, from a program, information deemed to be sensitive, such as a user's password or a credit card account number. Depending on the purpose of the data, it's likely to be vital that, at the very least, you protect its confidentiality and integrity. Not surprisingly, there are good ways and bad ways of doing this. A rule of thumb is to heed the advice provided in Chapter 2 and use multiple layers of security. For example, as a first layer, ensure that the file access permissions are configured in such a way that only the authorized user(s) can get to the file. As a second layer, encrypt the contents of the file so the information will be protected even if an attacker succeeds in breaking through the file access controls.
One particular example of this is in the tracking of an application's state information, especially for a web-based application (see the sidebar "State" on the Web). The bottom line on this example is that if you store the state-tracking in such a way that a user can alter it, you can bet that a maliciously-inclined user will alter it.
"State" on the Web
The World Wide Web, for all of its utility and popularity, has no shortage of security difficulties. For a programmer charged with writing an e-commerce application, one of the major shortcomings of the Web is the fact that it is a stateless medium. Stateless means that many of the things that we users of web applications take for granted?for example, tracking a session through multiple screens on a web site?have to be written from scratch by each application developer. The seemingly simple process of adding a product to a web shopping cart, proceeding to a checkout counter, and paying for it, is as unnatural to the Web as a fish riding a bicycle. To perform these functions, software developers either use add-on tools that were designed for this type of function or write their own code from scratch. All too often, smaller, low-budget web sites attempt the latter to save the cost of purchase?a choice that can result in catastrophic security flaws.
So how do web application writers keep track of state in their applications? There are several ways to track the state of a web session. The most common methods involve carrying a customer and session identification number along in the browser's URL, or carrying the same type of information in a browser cookie.
Both of these processes involve storing sensitive data in an area that can be altered by a user. If the application developer didn't implement some form of data integrity protection on the storage and retrieval of these numbers, then a malicious user might be able to change his customer identification number, for example, and compromise the privacy of another customer?or, even worse, charge someone else for his fraudulent purchases.
One way of ensuring the integrity of these identification numbers is to encrypt them prior to storage and decrypt them upon retrieval. Doing this requires a fair amount of additional coding and development, however, and is often overlooked by the naïve programmer (though typically, only once!).
In the world of law, it has been (jokingly) said that no original legal text has been written since the Magna Carta. Similarly, every programmer has "borrowed" or "liberated" source code from other programs. Whether you're making use of open source code, as we discussed previously, or making use of your own archives of past code, it makes good sense to reuse software that has been thoroughly reviewed and tested, and has withstood the tests of time and users. After all, why reinvent the wheel? Why write your own shopping cart application code when it has already been written a thousand times?
Because even the best programmer makes mistakes, it's always advisable to follow a practice of reviewing source code for security (and unsecurity) flaws. Depending on how formal a development process you follow, such reviews can be either informal or highly formal. A good rule of thumb, though, is that if a program is going to be relied on by multiple people, then multiple people should be involved in reviewing its security.
Here are a few commonly used practices:
For relatively informal development environments, a process of peer review of code can be sufficient. Particularly if the review process is a new one for you and your peers, developing a checklist of things to look for is a good thing to do. Note, though, that the checklist needs to be maintained and updated as new programming flaws are discussed or otherwise documented. (This is similar to the way that conventional anti-virus products need to be kept up to date.)
Some programming projects, such as those that can impact human safety, justifiably deserve a far more formal review process than the one we just described. For those, there is the process known as independent validation and verification (IV&V). An IV&V is a highly formal process that involves reviewing a program's source code, one line at a time, to ensure that it conforms to its design, as well as to certain other criteria (e.g., safety conditions).
To many of us, reviewing source code for flaws is roughly as appealing as watching paint dry. Don't worry: there are a number of software tools available to assist in the process. Just understand that tools are useful but only to a point. They are particularly good at catching known, common mistakes, and they are particularly bad at spotting anything else. Nonetheless, they can be an excellent starting point to reduce the required level of effort.
In Chapter 6, we discuss tools and provide numerous practical examples of their appropriate usage. One vital point to remember, though, is that while automating the review process is useful, you must not blindly rely upon the tools you use.
Security checklists can be very helpful in making sure you've covered all the bases during implementation. Here is an excerpt from one such checklist, reproduced with permission (but without attribution) from a Fortune 100 company of our acquaintance. This checklist has in fact been automated. We'll show the full version?complete with a rudimentary scoring system?in Chapter 5.
This application system requires a password for users to gain access
All user ID logins are unique (i.e., no group logins exist)
This application system uses role-based access control
This application system uses other techniques in addition to Unix system password/application logon for authentication/authorization
With this application system, passwords are never transmitted across the network (WAN) in cleartext
Encryption is used to protect data when it is transferred between servers and clients
Code maintenance may be vitally important to the security of software over the course of its lifetime. By code maintenance, we're not just referring to the seemingly ubiquitous practice of patching vulnerabilities in software. Such maintenance extends far beyond that, and the choices that are made early on can potentially have a great impact on the people who will be maintaining the code later.
Be sure to follow these code maintenance practices:
It's likely that your organization has a standard level of practice with regard to things like inline documentation of source code. It may also have standards for things like selecting names for variables that are self-explanatory. (It should, anyway!) But, even if these things are true, have you considered the security ramifications of how you write your code?particularly with regard to maintaining it later on? Code that is well-documented, modular, and easy to follow is easier to maintain. Because such code is easier to maintain, we believe that it is easier to secure, or keep secure (or, perhaps more accurately, that it is harder to make security mistakes).
Apart from following good practices (like the ones already listed) that make life easier for those who will subsequently maintain your code, pay particularly careful attention to removing any obsolete code. Even if such code isn't being directly referenced elsewhere within the code, if it isn't necessary, we recommend that you remove it if you are sure it is safe to do so.
Make sure to thoroughly test your code changes before they go into production. Changes should be tested at least as vigorously as the software was tested in the first place. Consider, for example, the changes made to the TCP stack in response to the SYN flood attacks. Although we can make the leap of faith that the changes were successful at hardening the operating system network code against these attacks, what other issues might have been introduced in the process of changing the code? Could it have caused some network applications to fail?