To size your server and install the software, you need to understand the variables involved in the decision and then match up the options to create three possible choices: standard, enhanced, and optimal. To do that, you need to find out what the minimal possible configurations are and what to get if money is not an option. You can use the information in this section to develop your baseline, and then expand that with metrics to determine the second two options. If you make your baseline realistic and support it with the information that follows, you should be able to make a strong case for it. The bad news is that no magic formulas tell you what to buy and how to configure it. The best way to make the hardware decisions is to detail the information in some meaningful way and examine it to come up with a solution. An additional benefit is that after you have written everything down, you can talk with vendors and other professionals intelligently about your needs. I detail several processes here using simple spreadsheets, tables, and scripts. You can take these processes and create a programmatic system from them; remember, however, that at some level there is always an element of judgment that you will use from your own experiences and corporate environment. Every situation will differ, which is why no computer program can build your landscape for you. Some vendor programs perform this task, but they are for specific application servers, not for a generic use such as a database server. Unless your application maps to one of these calculators directly, you still need to do the work yourself. VariablesFirst, build the variables from the information in this chapter. Use a separate table or spreadsheet for each and dedicate a separate grouping of these tables for each function the servers will perform. ApplicationBegin with the application. The following spreadsheet is a representation of the kind of information you want to gather:
LandscapeNext, describe the type of environment the software will use. Record details regarding the number of sites, physical locations, and the number of users at each site:
User InformationCreate a list of the number of users expected for the application. For each location, categorize the users into types. For instance, in an ERP application, you might have 100 users that enter orders requests, 10 buyers, and so forth. Find out the hours the users will access the system, using Greenwich mean time if you are in multiple time zones. This information is important to know to develop your disaster recovery and maintenance windows. If you discover that you have no discernable maintenance windows, you must add a duplicate system to "fail over" to so that you can apply patches and so forth. Develop a "load profile" that gives a metric to the amount of resources that type of user will take. In the case of a user entering purchase orders, for instance, you might evaluate the system used to determine that they take 1 percent of the CPU, 14K of bandwidth, and 2K of storage per transaction. You can use those numbers to classify the users into high, medium, and low, or you can use a numeric scheme such as 1 to 10. Finally, record the functions within the application that this classification of user accesses, especially as they relate to reporting.
Decision MatrixThe process to arrive at a good decision is to record what you know and use that information to answer what you do not know. What you know is explained in the variables section, and the decision matrix helps you develop answers to what you do not know. What you know is captured in the spreadsheets you just created, and what you do not know is which edition(s) of SQL Server to install and what to install them on. With that information filled out, you can put the variables into a matrix to determine the factors that will help you set up your systems. Assign a numeric set of values to each factor, using whatever scale is appropriate. In this example, I have set the numbers to range from 1 to 10, with 1 being a "lighter" value than 10. To further enhance the reliability of the decision, weight the factors that make sense in your environment. For instance, if speed is more important than growth, weight the user and I/O calculations higher than the projected growth values. After you put the measurements together in a matrix, you must figure what meets the need. You can create a single "atomic" value of a transaction and multiply that out over the variables or you can generalize the load across peak and base values. If you are unsure of creating the atomic transaction number, it is safer to estimate a peak load and put the base at half that value. Those numbers enable you to select a system that runs at a minimal level and one that can handle the peak load. In the middle of those systems is the "average" choice. I use the peak and base method in this example. In this chart, I take the variables I know and place them on the left. Each variable is given a score from 0 (not applicable/advisable) to 100 (perfectly suited), indicating how well that particular variable works on the choices of hardware and software. The numeric figure I use is not scientific, because I will weight the score for the environment that I have. The application I have is a Web-based architecture with a separate reporting instance, and the landscape at the organization is a "Business" type. The storage for the application is 100GB for the first year, with an estimated 10 percent growth pattern. My testing scenario against the application involved a Fibre Channel SAN with six LUNs presenting. I also used a 32-bit operating system and SQL Server edition. For this example, I have already filled out the user profile and used that information to get the following breakdown:
Each transaction (averaged) in my testing showed the following loads on these objects, per minute:
In some systems, each transaction generates a different load on these resources. To keep the example succinct, I averaged the transactions from the various user types based on testing. This is a valid method if your application will (over time) show an average load distribution. Your application might differ. To find out how your real-world application loads the system, measure the various objects such as CPU and RAM over a day of average use (or test to simulate that) and get the averages. Also calculate the standard deviations for those same periods. If the standard deviations are low, you can use the averages. If the standard deviations are high and you use the average method, you will end up with an underpowered system at critical times and an idle system at others, and a lot of unhappy users. In that situation, it is better to determine just how often the high values occur. If they occur often, optimize the system to handle the high values. I will consider the reporting system separate from the application server, so the information I gather in this exercise is for the application server only. I will follow the same process to develop the reporting server later. Using the information above, I developed the following matrix for my example:
With the raw numbers on this chart, I can see that I need at least three and a half CPUs, and that is just for the application. That brings me to four full processors for the application and the operating system. I have to make a decision here, however, regarding growth. After checking with the developers, I discover that the application can federate (that is, break functions onto other servers). I decide that four processors in a single machine will be enough. If the system did not federate, I would recheck my numbers, and if they still held true, I would recommend more processors. Each decision affects others, so I might repeat the load testing with 64-bit servers and see whether I experience any gains there. Adding more than four processors kicks the SQL Server edition up a notch. The memory for the application is not a high value, but I want to keep as much cache as I can in RAM. I decide that 4GB of RAM will provide enough room for caching, locks, the operating system, and other utilities. Evaluating the network bandwidth, I see that I have plenty of network capability on the network interface card (NIC) that I will dedicate to the clients. I always include a separate NIC in a server for backups and replication of data, so I do not need to factor that traffic on the dedicated client network interface. My matrix also shows lots of room for growth with what is already in use over on the testing system. It is also fast, so I recommend a duplicate of that storage system for my production landscape. Based on all this information, I see that I need a SQL Server edition that can support four processors and has the capability for Reporting Services. Because the servers can federate and I do not need more than four processors on any one server, I can get by with Standard Edition. For the mobile users, I query the developers again and find that the Express Edition will work for the laptops they use. I now have enough information to begin researching the vendors for the best deal, all from fairly well-researched information. Although this is a simple representation, it does demonstrate the process. You can extrapolate from the information that you see here to create a system for your production applications. Remember that if the system will house multiple applications, you need to go through these exercises for each. MonitoringUsing this process, you can now make the decisions for your environment and develop the metrics you will use to monitor the solution. Make sure that you instrument the system so that you can track the values over an appropriate interval, such as weekly or monthly. Each metric will have its own periodicity, so make sure your collection method takes that into account. I explain the mechanics of monitoring SQL Server further along in the book, but make sure you bring up this topic with the vendor or in-house developers. You need to make sure you can monitor their software, too. In Chapter 5, you will find that SQL Server 2005 provides a wealth of functions to monitor and track your SQL Server. You can use those features to monitor not only for growth, but also for performance. With your system installed and configured, let's explore a little more about the tools you have to control SQL Server. |