There is so much more to security design than planning out a modularized application with minimal privilege. So, how do we get where we want to go? Although there are many differing software development processes, security design efforts typically include most of the following specific steps, and ask the following specific questions:
Assess the risks and threats:
What bad thing can happen?
What are the legal requirements?
Adopt a risk mitigation strategy:
What is our plan?
Construct one or more mental models (e.g., a "bathtub," a "safe," a "jail," a "train") to facilitate development:[2]
[2] Interestingly, this model need not resemble in any way the model presented to the eventual users of the software as a guide to their use.
What does it do?
Settle high-level technical issues such as stateful versus stateless operation, use of privileges, or special access by the software:
How does it work?
Select a set of security techniques and technologies (good practices) to satisfy each requirement:
What specific technical measures should we take?
Resolve any operational issues such as user account administration and database backup (and the protection of the resulting media):
How will we keep this application system operating securely?
The following sections cover the first five of these steps; Chapter 4 and Chapter 5 describe the final step from different perspectives.
Here we present an informal way of working that has sufficed for our purposes for many years. There are plans available for integrating security into development that are much more structured. We summarize the best in Chapter 6 and provide additional pointers in Appendix A.
Suppose that a stranger accosts you with a shoebox containing $100,000 in plain, unmarked currency. He proposes to parachute you into an undisclosed location, where you are to guard the contents for 24 hours. You agree (for good and sufficient reasons we won't speculate about). How would you guard the money?
What's that you say? You want to know where you are going before you make your plan? That makes sense. If you land on a desert island with high surf, your best strategy might be to hollow out a few coconuts to replace the shoebox. Water and wind would be your special concern. On the other hand, if the stranger places you on a busy downtown intersection, you might, as night falls, feel a trifle insecure clutching those coconuts?if you could find any.
In this simplistic example, the risk is that the money could be lost or stolen. The threat, on the other hand, is introduced by whichever environment you end up in?wind and water, or a mugger.
Now let's put this in the context of software design. In our discussions of the SYN flood attacks, what should the designers of the kernel TCP code have done to better prevent the attacks? It seems clear that they underestimated the threat. As we've mentioned, they seem to have assumed that no one would deliberately try to break the behavior of the RFCs that defined the protocol. The specific risks from such a threat vary widely, depending upon the business use of the system in question. (TCP is "infrastructure code," used by a vast number of applications.) For this reason, the designers should have provided for the presence of great risk.
We think that the design of the TCP/IP protocol and its implementing code would have been significantly different if these designers had followed the mental processes we recommend here?and, especially, asked some of our favorite questions at the right time.
As the shoebox example shows, no one can design a secure application without understanding what is to be protected and from whom.
We've found two methods especially useful in this regard. One was produced by NIST, the National Institute of Standards and Technology. It's described in "Risk Management Guide for Information Technology Systems," NIST special publication 800-30. Another, developed under the auspices of the Software Engineering Institute, is known as Operationally Critical Threat, Asset, and Vulnerability Evaluations (OCTAVE). (See http://www.cert.org/octave/ for an introduction.)
NIST argues, and we agree, that the best way to start a risk assessment is by asking many detailed questions. One of OCTAVE'S principal strengths, on the other hand, is that it places the value of the business process that the system is supporting as an integral part of the risk assessment.
The short list that follows is adapted primarily from NIST's publication 800-30. For more examples of questions to ask, as well as checklists and web links to other relevant sources, please refer to our book's web site companion at www.securecoding.org.
We're not going to cascade you with volumes of sample questions here, because in Chapter 6, we'll discuss how you may be able to automate this whole process by using software that comes with many more questions than we could print or discuss here.
Questions about the organization:
What is the mission of the organization?
Who are the valid users?
Does the organization have a business continuity/disaster recovery plan in place?
Questions about the application:
What is the purpose of the application in relation to the mission?
How important is the application to the user organization's mission?
How much application downtime can the organization tolerate?
How does the tolerable downtime compare with the mean repair/recovery time?
Could an application malfunction or could unavailability result in injury or death?
What is the effect on the organization's mission if the application or information is not reliable?
Is the application or data accessible from the Internet, and thus subject to a more hostile environment?
Is the application newly developed, or is it "legacy" software?
Is the application composed of third-party software and/or hardware?
Questions about the information the application manipulates:
What information (both incoming and outgoing) is required by the organization?
What information is generated by, consumed by, processed on, stored in, and retrieved by the application?
How important is the information to the user organization's mission?
What are the paths of information flow?
What types of information are processed by the application (e.g., financial, personnel, research and development, medical, command and control)?
What is the sensitivity (or classification) level of the information?
Is the information of special fiscal value?
What is the potential impact on the organization if the information is disclosed to unauthorized personnel?
What information handled by or about the system should not be disclosed, and to whom?
What are the requirements for information availability and integrity?
Where specifically is the information processed and stored?
What are the types of information storage?
We can't leave the topic of questions without posing one more: whom should you ask?
We have a strong opinion about this question: ask those who both have a real stake in the outcome of your risk assessment and are likely to know the correct answers. For example, when asking questions related to the organization or the information flow, ask the owner of the business process being handled by the application.
That policy may seem simple-minded, but you'd be surprised how hard it can be to get it accepted by upper management. It seems that everyone likes to believe that they possess useful knowledge. You may often find that, as the old saying goes, "Those who know won't tell. Those who tell don't know." We won't dwell on this point, except to say this: if you can't determine who has the knowledge you need, can't motivate them to give you accurate answers, or can't sufficiently verify that the answers are true?well, you had better stop the risk assessment until you do, that's all!
Once you have a good understanding of what may go wrong (and what the impact would be), you can begin to examine your choices of what to do. At this point, you need to consider your risk mitigation options.
What do we mean by risk mitigation? Let's look at an example. Suppose that you've identified a risk relating to the server operating your main e-commerce web site. It turns out that the server is vulnerable to a just-discovered attack that, if executed successfully by bad guys, could cause you to lose track of up to a day's worth of online purchases. What are your options?
The following list is adapted again from NIST's SP800-30 (the NIST comments are in quotation marks below). It includes some alternatives you might not have realized were available to you.
"To accept the potential risk and continue operating the IT system or to implement controls to lower the risk to an acceptable level."
In other words, grin and bear it. Don't try to close the hole. After all, you might not be attacked at all!
"To avoid the risk by eliminating the risk cause and/or consequence (e.g., forgo certain functions of the system or shut down the system when risks are identified)."
In our example, this would mean taking the server off the network until you could be sure the risk was gone. That is, temporarily (and quickly) shut down the site, apply the (presumably) available security software patch, verify that it works, and then reconnect to the network. If your reputation for prompt service means everything to you, this may well be your best approach. Although there is a short-term availability outage, you're avoiding a potentially much longer (and catastrophic, from a business sense) outage.
"To limit the risk by implementing controls that minimize the adverse impact of a threat's exercising a vulnerability (e.g., use of supporting, preventive, detective controls)."
For example, you might institute special checks to see if transactions are being dropped and special procedures to direct your representatives to contact (preemptively) the ordering customer with an explanation.
"To manage risk by developing a risk mitigation plan that prioritizes, implements, and maintains controls."
In our example, this might mean a series of consultations between business unit leaders and technologists, leading to a series of agreements about what to do in various loss scenarios. You could also activate (or prepare to activate) your recovery plan?you know, the one that substitutes paper orders and acknowledgment letters for email. You do have a recovery plan, don't you?
"To lower the risk of loss by acknowledging the vulnerability or flaw and researching controls to correct the vulnerability."
You might warn visitors to your web site about the hazard, at the same time taking technical measures to remove the vulnerability by upgrading software or shifting services to an unaffected machine.
"To transfer the risk by using other options to compensate for the loss, such as purchasing insurance."
You might well be able to buy Internet attack insurance. Alternatively, you might decide to transfer the risk to your customers by raising your prices!
We'll close this discussion of options by pointing out what we have often admonished our consulting clients. Risk is Good. Having risks is a sure sign that you're still in business. Managing risks is a good way to stay in business.
Let's look now at some of the actions you might select to implement the options we presented in the previous section.
Table 3-1 lists various supporting, preventive, and detection and recovery services and actions, broken down into the categories of technical, managerial, and operational options. This table, which is adapted from the NIST SP800-30 document, follows its nomenclature and structure. (It is not intended to be a complete list?it's merely illustrative.)
An important part of security design involves selecting which of these actions to use in addressing an identified risk or threat. We will discuss that process in the next section.
Supporting |
Preventive |
Detection and recovery |
|
---|---|---|---|
Technical |
Identify users Manage cryptographic keys Administer security of application and OS Make use of access control features of application and OS |
Authenticate users Authorize activity Control access Ensure nonrepudiation Protect communications |
Audit actions Contain damage Verify integrity of application and OS Restore secure state |
Management |
Assign security responsibility Develop and maintain system security Implement personnel security controls Conduct security awareness and technical training |
Conduct periodic review of security controls Perform periodic system audits Conduct ongoing risk management Authorize acceptance of residual risk |
Provide continuity of support Develop, test, and maintain the continuity of operations plan Establish an incident response capability |
Operational |
Control data media access Limit external data distribution Secure wiring closets that house hubs and cables Provide backup capability Establish off-site storage procedures and security |
Provide physical security Ensure environmental security |
Have you decided whether to try to mitigate and manage your risks? Hard choices remain. The crucial step now is to construct an effective security model of your application. Here, we're talking about adopting the correct metaphor to act as a skeleton for design.
Although the headline-writers may disagree, our experience tells us that more serious security battles are lost at this stage than at any other. Sure, buffer overflows and other kinds of coding errors (see Chapter 4) can open terrible holes; but implementation holes can be patched. On the other hand, if you have designed your application as a bathtub?when the occasion called for, let's say, a basting bulb or a syringe?recovery may be impossible without a complete redesign.
There is no doubt that, if security design really is (as we've heard it described) a kind of magic, the metaphor-selection step is where the wizard's wand is waved. But how can you pick the right model? Our advice is to start by grabbing on mentally to the first picture that occurs to you. Take it for a ride and see where it wants to go. If inspiration starts to run dry, go back to the well and fetch out another.
Ouch, did you notice how mangled the metaphors in that last paragraph were? You can't "ride" a "picture"! Yet in a way, that's just the sort of linguistic and cognitive liberty you need to grant yourself to be successful at this stage. Here's what works for us:
Immerse ourselves in the overall goal and requirements of the project.
Find a playful and forgiving brainstorming partner.
White-board many possible metaphors (and metaphor mixes). Here's a somewhat simplistic example: if you're writing a database to track the contents of your wine cellar, visualize the database as a shed full of receipts (from the bottle/case purchases) and copies of the bottle labels. Who might walk into the shed to look around? What would they be looking for? Now step inside that shed yourself and think about how you'd organize the stacks of paper. Would you staple each receipt to its corresponding label, or would you create a pile marked "receipts" and a pile marked "labels"?
Seriously analyze the few most promising metaphors for possible weaknesses, referring back to the risk and threat assessments already completed. In the above example, ask yourself the following question: what would happen if a pile of receipts falls over and you don't know which wines they refer to any more?
Now try experimenting with variations on the best metaphor. For example, it's not a shed full of receipts, but a box, and it's not receipts, but hand-written notes. The notes are stuck to the front of each wine label. You have to remove the note to read the label under it properly.
If you find your work is not finished, go around again.
If all else fails, fall back on explaining the problem to an empty chair! It is an odd little technique, to be sure, but one that has worked faithfully for us over the years.
We have one last piece of advice. Try to ignore the mental handles that someone who actually uses the software might employ. (They may think, for example, that they are "making a vacation request" or "buying a book.") Drop, too, that algorithmic framework you'll need later in the project. Focus instead on what is actually "in front" of you. As we advocated in Chapter 1, you need to think about the pavement, the paint, and the concrete curbside?not the road.
If you are designing a database application, now is the time to think (at least for a little while) not of "transactions" and "updates," but rather of pieces of paper and manila folders. Even better: try to go one level of abstraction lower than that. Think about the streams of bits and the magnetic impressions on the surface of the disk platters. Think about the numbers and characters, the aggregated levels of symbols that make up your program and its data.
In doing this type of analysis, you also need to occasionally step out of the abstract frame of mind and translate the abstract into the real. This may seem like contradictory advice, but we've found that this balanced approach is best. Considering the real allows you to consider practical issues, such as modularity. Is the user authentication part sufficiently modular to be replaced in its entirety in case a vulnerability is found, or in case something better and cheaper comes along?
If you can think about your problem in this way, even for a little while, your program will have a much better chance of withstanding a determined attack. Why? Because this may well be the way that attackers are thinking. They probably don't know the ins and outs of your application. They may not even know what it "does," from the point of view of your enterprise and your users. So you need to lay down this understanding for a while to guard against an unconventional attack.
How might all this cogitation have helped the TCP designers prevent or reduce the impact of the SYN flood attacks? Well, let's think specifically about the static array containing the pool of available sockets in their implementation. From a mental model standpoint, how might the designers have considered this array? Maybe as a supermarket delicatessen, with red paper tabs on a big spool. Each customer takes one to get a number. The deli clerks behind the counter call out numbers in sequence, and everybody gets served in order. (Simple? Sure, but fairly accurate for TCP socket administration.)
If we buy that model as being usable, then what happens when an attacker tries to break it? Can an attacker forge fake requests? It's true that there aren't any strong anti-counterfeiting defenses on the paper tickets particularly, but what would the impact of such an attack be? If the attacker could predict an upcoming number, he could print a counterfeit ticket and possibly get an earlier slot in line, but it's likely that the legitimate holder of that number would object. In any event, that's not likely to help the attacker much. Now consider an attacker who doesn't want to play by the rules. What would happen if the attacker grabbed several numbers and then left the supermarket?
You get the point. We want you to pick a mental model and then play with it a bit to see how it might respond to a real world event.
Once you have selected a general security model, the more technical work can begin. Much of the work we call "security design" is a process of making security-related decisions about how an application system under development will operate. In this section, we list some technical issues you may need to consider. In the next section, we suggest methods you can use to decide between the possibilities we raise here.
What kinds of technical issues are we talking about? There are several categories: those concerning your own isolated application; those relating to your application's interaction with other software or with the network or enterprise environment under which it operates; and those relating specifically to your application's defenses against an actual attack.
Your own isolated application:
Which language, which platform, and which database packages will facilitate the security of your application?
Will users be differentiated and authenticated? If so, how? Will you rely on passwords, smart cards, or perhaps some biometric device?
Must your application operate with special privileges or access rights? If so, can these exceptional rights be suspended until needed, or discarded after use?
If the application is to be invoked by a user, what safeguards will control the user interface? For example, how will commands be authenticated? How will they be evaluated for appropriateness or safety?
How will the different functions of the application be allocated among program sections? How will the sections communicate with each other, especially when operating asynchronously? Will the program operate statelessly?
How will the application discern and react to anomalous events, such as disk space exhaustion?
What safeguards can be put in place to prevent the application from its own exhaustion of vital system resources, such as disk space, memory, or processor time?
What kinds of records will the application keep of requests and transactions (of denied requests, specifically)?
Your application as a "network citizen":
What mechanisms will the program use to mediate access to the resources (such as files and records) it manipulates? For example, will it rely on access control lists maintained and enforced by the operating system, itself, or authorization vectors calculated upon request by an authorization server?
On what basis will the program allow or refuse to grant access requests? For example, will access decisions be made on the basis of a user identifier by job classification, or according to the time of day or originating location of the request? Or will these decisions be made on the basis of a combination of these?
What other software entities will the application communicate with and at what level of trust?
If your application communicates with a database or web server, will you need to implement a "De-Militarized Zone" (DMZ[3])? If one exists on your network already, will it be invisible to your application or will you need to negotiate a pathway to the server?
[3] The DMZ nomenclature is commonly used in the security community to refer to a network segment on which externally reachable computers and services, such as a web server, are connected. DMZ network segments are typically hardened and carefully monitored against external attack.
What libraries, what software repositories, and what third-party elements will you reuse, and what security implications arise as a result?
Your application's defenses against attack:
To what degree can your application conceal, while it is running, what it is doing? Will command lines, filenames, or even passwords be visible to people or programs using the system (or network) simultaneously?
How should your application react when confronted with hundreds or thousands of requests that violate access rules, or commands that contain disallowed characters, verbs, or resource references? (Say, did you decide to operate with or without state information?) What about requests that follow each other in a cascade, coming in too fast to be completed in sequence?
At design time, can you foresee the possibility that the program may engage in a sequence of actions that, if interrupted, might open a security compromise? (This is the basis of "race condition" attacks, and in some cases you can design "atomic," nonseparable actions after identifying the problem.)
Once your application has granted access to a resource, under what circumstances should the authorization expire or be revoked?
Should your application "context check" user requests, comparing them for reasonableness with other requests made by the same or similar users?
Can a single user properly run more than one instance of this application at a time? In each case, should the application check the temporal and spatial distances between the invocations for reasonableness?
When you got to the last few questions, did you start wondering if we're going overboard? We may be, but the fact is that we have no way of knowing how much security is enough in your own specific circumstances. How can you know what to do? That is the topic of the next section.
It turns out that it's not enough to uncover and address security threats, design errors, and vulnerabilities. Choosing the wrong implementation measures can be just as disruptive as those attacks you have been worrying about. The following sections summarize various good practices to follow in determining the most appropriate implementation measures to execute later.
You need to start by doing your homework. Acquaint yourself with:
The user community, including its "culture" and traditions
The functions required of your software and the detailed sequence of operations it will carry out
The technical environment in which your program will run (network, operating system, other supporting software, etc.)
The general nature of attacks, threats, and compromises, as described in Chapter 1
The specific history of known attacks against your enterprise (or similar ones)?especially, "successful" attacks
Resources or attributes of your environment that would make especially inviting targets
Security measures should be consistent with the corporate culture?not too expensive, not too disruptive, and, of course, effective. But many common-sense actions you could take satisfy those requirements.
Suppose that you have reached the point where you have to decide how to authenticate users of the application. How will you know if the user is who he or she claims to be? Let's say the choice comes down to using passwords, smart cards, or biometric means such as a retinal scan. How would you decide?
Later, we'll discuss some fairly complex methods of evaluating the possibilities. For now, let's just consider five factors that many businesses find important (the relative ranking will vary from business to business):
Who will maintain the biometric database and keep the scanners in repair? If the choice is a smart card, how will you pay for the replacement of lost, stolen, or damaged cards? Who will make sure that the inevitable driver updates are installed on the proper machines?
How much will those cards or scanners cost? Don't forget the cost of training system administrators and users about the new devices.
Sure, reuseable passwords are remarkably ineffective. But if you go with the alternatives, don't forget that you will need procedures in place to issue temporary cards (or grant access some other way) when someone forgets or loses a card. Further, suppose that half the employees in the company get queasy when it comes to plunking an eyeball down on an eyepiece. Won't enough smart people find an excuse ("Sorry, conjunctivitis is running through my kid's school this week") that you'll have to actively support an alternative procedure?
"We've always done it this way" may sound like a weak argument to you, but to the vice president in charge of accounting that may be the very essence of reassurance (and professionalism). Training, propaganda, documentation, software and server upgrades, and just a modicum of distraction from everyday business concerns: is it really worth it?
Nothing undermines the usefulness and popularity of a security requirement so much as unreliability. Would that scanner let in somebody it shouldn't? Really? OK, then how many times is it going to drive people crazy denying them access unjustly? You'd better know.
If all this seems too simple, consider the following. Why are reuseable passwords still being used for authentication all over the world? Their limitations have been manifest for decades. We think it's because passwords are simple to understand and easy to use, and because they effectively discourage casual efforts at abuse. Coming up with technical measures that surpass that accomplishment is surprisingly difficult.
As with so many things in business and in life, deciding on security measures is a trade-off between their costs and their benefits.
First, let's look at costs (and bear in mind that the costs you consider should include a "big picture" outlook and not just the price of purchase). Pay attention to such things as training costs, maintenance costs, development costs, and so on. You may even want to consider the cost of software development, both in actual outlay and in lost opportunities (e.g., to implement new features).
Now, what about the benefits? If you have performed a detailed risk and threat assessment, you should have a good grasp of what this particular measure is worth. If not, we suggest you go back and ask someone who knows?probably, the executive in charge of the business asset that needs to be protected.
Once you've balanced costs against benefits for all of the services and resources used and relied upon by your application, you'll be in a position to make a well-informed security decision.
It may occur to you that this can be a fairly arduous process to keep track of, with all of the many lists, evaluations, weights, and costs to keep straight. We know many experts who manage all of these details using simple spreadsheets, but we'd like something better for our readers.
One of the best thought-out methods we have encountered is called the Security Attribute Evaluation Method (SAEM). It was developed at Carnegie Mellon University, and the principal architect was Shawn A. Butler. It's a good example of the rigor that was already in use at the time of its publication in some large enterprises. SAEM is a four-step process that helps you organize and assess the direct benefits of a security technology, its effectiveness, and its costs.
We've seen the kind of approach used by SAEM before. What distinguishes SAEM is that it emphasizes a methodical process to gather and evaluate risks and potential mediations. Why is this so valuable?
There are several general advantages of a structured cost-benefit analysis:
Security managers make their assumptions explicit and capture decision rationale
Sensitivity analysis shows how assumptions affect design decisions
Design decisions are reevaluated consistently when assumptions change
IT managers see whether investment is consistent with risk expectations
A final advantage introduces a factor we'll see again as the sophistication of the tools we present accelerates. Here's what Butler says (emphasis added) about SAEM:
Comparing costs among alternative security architectures is significantly easier than comparing benefits. [P]roven financial analysis tools can more precisely estimate costs. In contrast, benefits are based on uncertain events and imperfect knowledge. Although no one can accurately predict how often an attack will occur and how effectively the security will mitigate the damage, experienced security managers intuitively, and implicitly, estimate the risk and the effectiveness of their risk mitigation strategies. The key to security cost-benefit analyses is to make these intuitions explicit.
We agree heartily with that last point. How SAEM tries to achieve this goal is beyond our current scope to explain well. We'll only note that the key is a careful evaluation of the potential effectiveness of various security measures against the risk being addressed.
Now that we've walked you through the five principal steps involved in designing secure code, we have one last piece of advice. A process for selecting security design decisions must yield repeatable results.
Tim Townsend taught us this lesson. He illustrated the problem with a simple test. He once approached a team of internal security consultants with a sample business application and asked for a set of recommendations he could enforce. A few days later, he received a complex and hefty list of technical security measures. A month later, he took the same application back to the same group (but a different expert) and asked the same question. The answer he received a few days later was another lengthy list of technical measures. Unfortunately, the two sets of recommendations were wildly different. We feel that this result could be reproduced at most companies.