3.3 Cost-Benefit Analysis and Best Practices

Time and money are finite. After you complete your risk assessment, you will have a long list of risks?far more than you can possibly address or defend against. You now need a way of ranking these risks to decide which you need to mitigate through technical means, which you will insure against, and which you will simply accept. Traditionally, the decision of which risks to address and which to accept was done using a cost-benefit analysis, a process of assigning cost to each possible loss, determining the cost of defending against it, determining the probability that the loss will occur, and then determining if the cost of defending against the risk outweighs the benefit. (See Cost-Benefit Examples sidebar for some examples.)

Risk assessment and cost-benefit analyses generate a lot of numbers, making the process seem quite scientific and mathematical. In practice, however, putting together these numbers can be a time-consuming and expensive process, and the result is numbers that are frequently soft or inaccurate. That's why the approach of defining best practices has become increasingly popular, as we'll discuss in a later section.

3.3.1 The Cost of Loss

Determining the cost of loss can be very difficult. A simple cost calculation considers the cost of repairing or replacing a particular item. A more sophisticated cost calculation can consider the cost of out-of-service equipment, the cost of added training, the cost of additional procedures resulting from a loss, the cost to a company's reputation, and even the cost to a company's clients. Generally speaking, including more factors in your cost calculation will increase your effort, but will also increase the accuracy of your calculations.

For most purposes, you do not need to assign an exact value to each possible risk. Normally, assigning a cost range to each item is sufficient. For instance, the loss of a dozen blank diskettes may be classed as "under $500," while a destructive fire in your computer room might be classed as "over $1,000,000." Some items may actually fall into the category "irreparable/irreplaceable"; these could include loss of your entire accounts-due database or the death of a key employee.

You may want to assign these costs based on a finer scale of loss than simply "lost/not lost." For instance, you might want to assign separate costs for each of the following categories (these are not in any order):

  • Non-availability over a short term (< 7-10 days)

  • Non-availability over a medium term (1-2 weeks)

  • Non-availability over a long term (more than 2 weeks)

  • Permanent loss or destruction

  • Accidental partial loss or damage

  • Deliberate partial loss or damage

  • Unauthorized disclosure within the organization

  • Unauthorized disclosure to some outsiders

  • Unauthorized full disclosure to outsiders, competitors, and the press

  • Replacement or recovery cost

3.3.2 The Probability of a Loss

After you have identified the threats, you need to estimate the likelihood of each occurring. These threats may be easiest to estimate on a year-by-year basis.

Quantifying the threat of a risk is hard work. You can obtain some estimates from third parties, such as insurance companies. If the event happens on a regular basis, you can estimate it based on your records. Industry organizations may have collected statistics or published reports. You can also base your estimates on educated guesses extrapolated from past experience. For instance:

  • Your power company can provide an official estimate of the likelihood that your building would suffer a power outage during the next year. They may also be able to quantify the risk of an outage lasting a few seconds versus the risk of an outage lasting minutes or hours.

  • Your insurance carrier can provide you with actuarial data on the probability of death of key personnel based on age, health, smoker/nonsmoker status, weight, height, and other issues.

  • Your personnel records can be used to estimate the probability of key computing employees quitting.

  • Past experience and best guess can be used to estimate the probability of a serious bug being discovered in your software during the next year (100% for some software platforms).

If you expect something to happen more than once per year, then record the number of times that you expect it to happen. Thus, you may expect a serious earthquake only once every 100 years (for a per-year probability of 1% in your list), but you may expect three serious bugs in Microsoft's Internet Information Server (IIS) to be discovered during the next month (for an adjusted probability of 3,600%).

3.3.3 The Cost of Prevention

Finally, you need to calculate the cost of preventing each kind of loss.

For instance, the cost to recover from a momentary power failure is probably only that of personnel "downtime" and the time necessary to reboot. However, the cost of prevention may be that of buying and installing a UPS system.

Costs need to be amortized over the expected lifetime of your approaches, as appropriate. Deriving these costs may reveal secondary costs and credits that should also be factored in. For instance, installing a better fire-suppression system may result in a yearly decrease in your fire insurance premiums and give you a tax benefit for capital depreciation. But spending money on a fire-suppression system means that the money is not available for other purposes, such as increased employee training or even investments.

Cost-Benefit Examples

Suppose you have a 0.5% chance of a single power outage lasting more than a few seconds in any given year. The expected loss as a result of personnel not being able to work is $25,000, and the cost of recovery (handling reboots and disk checks) is expected to be another $10,000 in downtime and personnel costs. Thus, the expected loss and recovery cost per year is (25,000 + 10,000) x .005 = $175. If the cost of a UPS system that can handle all your needs is $150,000, and it has an expected lifetime of 10 years, then the cost of avoidance is $15,000 per year. Clearly, investing in a UPS system at this location is not cost-effective. On the other hand, reducing the time required for disk checking by switching to a journaling filesystem might well be worth the time required to make the change.

As another example, suppose that the compromise of a password by any employee could result in an outsider gaining access to trade secret information worth $1,000,000. There is no recovery possible, because the trade secret status would be compromised, and once lost, it cannot be regained. You have 50 employees who access your network while traveling, and the probability of any one of them accidentally disclosing the password (for example, having it "sniffed" over the Internet; see Chapter 11) is 2%. Thus, the probability of at least one password being disclosed during the year is 63.6%.[2] The expected loss is (1,000,000 + 0) x .636 = $636,000. If the cost of avoidance is buying a $75 one-time password card for each user (see Chapter 8), plus a $20,000 software cost, and the system is good for five years, then the avoidance cost is (50 x 75 + 20,000) / 5 = $4,750 per year. Buying such a system would clearly be cost-effective.

[2] That is, 1 - (1.0 - 0.02)50.

3.3.4 Adding Up the Numbers

At the conclusion of this exercise, you should have a multidimensional matrix consisting of assets, risks, and possible losses. For each loss, you should know its probability, the predicted loss, and the amount of money required to defend against the loss. If you are very precise, you will also have a probability that your defense will prove inadequate.

The process of determining if each defense should or should not be employed is now straightforward. You do this by multiplying each expected loss by the probability of its occurring as a result of each threat. Sort these in descending order, and compare each cost of occurrence to its cost of defense.

This comparison results in a prioritized list of things you should address. The list may be surprising. Your goal should be to avoid expensive, probable losses before worrying about less likely, low-damage threats. In many environments, fire and loss of key personnel are much more likely to occur, and are more damaging than a break-in over the network. Surprisingly, however, it is break-ins that seem to occupy the attention and budget of most managers. This practice is simply not cost-effective, nor does it provide the highest levels of trust in your overall system.

To figure out what you should do, take the figures that you have gathered for avoidance and recovery to determine how best to address your high-priority items. The way to do this is to add the cost of recovery to the expected average loss, and multiply that by the probability of occurrence. Then, compare the final product with the yearly cost of avoidance. If the cost of avoidance is lower than the risk you are defending against, you would be advised to invest in the avoidance strategy if you have sufficient financial resources. If the cost of avoidance is higher than the risk that you are defending against, then consider doing nothing until after other threats have been dealt with.[3]

[3] Alternatively, you may wish to reconsider your costs.

Risk Cannot Be Eliminated

You can identify and reduce risks, but you can never eliminate risk entirely.

For example, you may purchase a UPS to reduce the risk of a power failure damaging your data. But the UPS may fail when you need it. The power interruption may outlast your battery capacity. The cleaning crew may have unplugged it last week to use the outlet for their floor polisher.

A careful risk assessment will identify these secondary risks and help you plan for them as well. You might, for instance, purchase a second UPS. But, of course, both units could fail at the same time. There might even be an interaction between the two units that you did not foresee when you installed them. The likelihood of a power failure gets smaller and smaller as you buy more backup power supplies and test the system, but it never becomes zero.

Risk assessment can help you protect yourself and your organization against human risks as well as natural ones. For example, you can use risk assessment to help protect yourself against computer break-ins, by identifying the risks and planning accordingly. But, as with power failures, you cannot completely eliminate the chance of someone breaking in to your computer.

This fact is fundamental to computer security: no matter how secure you make a computer, it can always be broken into given sufficient resources, time, motivation, and money, especially when coupled with random chance.

Even systems that are certified according to the Common Criteria (successor to the Department of Defense's "Orange Book," the Trusted Computer Systems Evaluation Criteria) are vulnerable to break-ins. One reason is that these systems are sometimes not administered correctly. Another reason is that some people using them may be willing to take bribes to violate security. Computer access controls do no good if they're not administered properly, exactly as the lock on a building will do no good if it is the night watchman who is stealing office equipment at 2:00 a.m.

People are often the weakest link in a security system. The most secure computer system in the world is wide open if the system administrator cooperates with those who wish to break into the machine. People can be compromised with money, threats, or ideological appeals. People can also make mistakes?such as accidentally sending email containing account passwords to the wrong person.

Indeed, people are usually cheaper and easier to compromise than advanced technological safeguards.

3.3.5 Best Practices

Risk analysis has a long and successful history in the fields of public safety and civil engineering. Consider the construction of a suspension bridge. It's a relatively straightforward matter to determine how much stress cars, trucks, and severe weather will place on the bridge's cables. Knowing the anticipated stress, an engineer can compute the chance that the bridge will collapse over the course of its life given certain design and construction choices. Given the bridge's width, length, height, anticipated traffic, and other factors, an engineer can compute the projected destruction to life, property, and commuting patterns that would result from the bridge's failure. All of this information can be used to calculate cost-effective design decisions and a reasonable maintenance schedule for the bridge's owners to follow.

The application of risk analysis to the field of computer security has been less successful. Risk analysis depends on the ability to gauge the expected use of an asset, assess the likelihood of each risk to the asset, identify the factors that enable those risks, and calculate the potential impact of various choices?figures that are devilishly hard to pin down. How do you calculate the risk that an attacker will be able to obtain system administrator privileges on your web server? Does this risk increase over time, as new security vulnerabilities are discovered, or does it decrease over time, as the vulnerabilities are publicized and corrected? Does a well-maintained system become less secure or more secure over time? And how do you calculate the likely damages of a successful penetration? Few statistical, scientific studies have been performed on these questions. Many people think they know the answers to these questions, but research has shown that most people badly estimate risk based on personal experience.

Because of the difficulty inherent in risk analysis, another approach for securing computers called best practices or due care, has emerged in recent years. This approach consists of a series of recommendations, procedures, and policies that are generally accepted within the community of security practitioners to give organizations a reasonable level of overall security and risk mitigation at a reasonable cost. Best practices can be thought of as "rules of thumb" for implementing sound security measures.

The best practices approach is not without its problems. The biggest problem is that there really is no one set of "best practices" that is applicable to all sites and users. The best practices for a site that manages financial information might have similarities to the best practices for a site that publishes a community newsletter, but the financial site would likely have additional security measures.

Following best practices does not assure that your system will not suffer a security-related incident. Most best practices require that an organization's security office monitor the Internet for news of new attacks and download patches from vendors when they are made available.[4] But even if you follow this regimen, an attacker might still be able to use a novel, unpublished attack to compromise your computer system. And if the person monitoring security announcements goes on vacation, then the attackers will have a lead on your process of installing needed patches.

[4] We are appalled at the number of patches issued for some systems, especially patches for problem classes that have long been known. You should strongly consider risk abatement strategies based on use of software that does not require frequent patches to fix security flaws.

The very idea that tens of thousands of organizations could or even should implement the "best" techniques available to secure their computers is problematical. The "best" techniques available are simply not appropriate or cost-effective for all organizations. Many organizations that claim to be following best practices are actually adopting the minimum standards commonly used for securing systems. In practice, most best practices really aren't.

We recommend a combination of risk analysis and best practices. Starting from a body of best practices, an educated designer should evaluate risks and trade-offs, and pick reasonable solutions for a particular configuration and management. For instance, servers should be hosted on isolated machines, and configured with an operating system and software providing the minimally required functionality. The operators should be vigilant for changes, keep up to date on patches, and prepare for the unexpected. Doing this well takes a solid understanding of how the system works, and what happens when it doesn't work. This is the approach that we will explain in the chapters that follow.

3.3.6 Convincing Management

Security is not free. The more elaborate your security measures become, the more expensive they become. Systems that are more secure may also be more difficult to use, although this need not always be the case.[5] Security can also get in the way of "power users" who wish to exercise many difficult and sometimes dangerous operations without authentication or accountability. Some of these power users can be politically powerful within your organization.

[5] The converse is also not true. PC operating systems are not secure, even though some are difficult to use.

After you have completed your risk assessment and cost-benefit analysis, you will need to convince your organization's management of the need to act upon the information. Normally, you would formulate a policy that is then officially adopted. Frequently, this process is an uphill battle. Fortunately, it does not have to be.

The goal of risk assessment and cost-benefit analysis is to prioritize your actions and spending on security. If your business plan is such that you should not have an uninsured risk of more than $10,000 per year, you can use your risk analysis to determine what needs to be spent to achieve this goal. Your analysis can also be a guide as to what to do first, then second, and can identify which things you should relegate to later years.

Another benefit of risk assessment is that it helps to justify to management that you need additional resources for security. Most managers and directors know little about computers, but they do understand risk and cost/benefit analysis.[6] If you can show that your organization is currently facing an exposure to risk that could total $20,000,000 per year (add up all the expected losses plus recovery costs for what is currently in place), then this estimate might help convince management to fund some additional personnel and resources.

[6] In like manner, few computer security personnel seem to understand risk analysis techniques.

On the other hand, going to management with a vague "We're really likely to see several break-ins on the Internet after the next CERT/CC announcement" is unlikely to produce anything other than mild concern (if that).



    Part VI: Appendixes