I am focusing in this book more on maintenance and optimization of SQL Server 2005 features, but I do spend a little time in this chapter on designing BI systems. Before you implement a BI solution, you should learn how to complete a thorough design process that always begins with finding out what questions need to be answered. What I give you here is a brief overview of the process so that you can understand how all the pieces fit together in Analysis Services. I do that by explaining a little theory and then working through a complete scenario in the "Take Away" section at the end of the chapter. All along, I have been using the words business intelligence system to describe the tools that provide the ability to make strategic decisions. Many people equate a particular function or tool with BI, but a proper BI system is made up of several tools, an effective data strategy, and a good design. The first part of that design is a proper data structure on the OLTP systems. You must have a stable, secure landscape that stores the day-to-day operational data, called a source system, before you begin. If you try to build strategic information from a faulty origin, you will end up enabling bad decisions. The next layer in the BI system is a robust reporting system. This is either a software package that accesses the source system directly or an entire subsystem that extracts data from the source system into a reporting database. In the latter case, the database is physically structured differently than the original data to facilitate reporting. In some cases, this is called an operational data store (ODS).
The definitions I use in this chapter might differ from the ones you will see in other books or vendor documentation. In the many years I have used BI systems, everyone seems to describe things a little differently. The concepts I present here follow the best practices of BI systems as I have seen them implemented. Many smaller firms use a combination of the source system and a reporting system as their BI system. Spreadsheets or specialized reporting tools provide a good basis for manual analysis. At the next layer are the data marts. These systems collect data from several source systems and combine the data into a single format. To do that, data must often be homogenized so that it reports the same meaning. For instance, in one source system, the product called Buckster's Coffee uses the product code 12345, and in another source system, the product number is identified as A13GFF. To show how many of these things are sold, you need a way to put them together. One strategy to homogenize the two part numbers is to convert them to an entirely different part number with an Integration Services process that is coded to replace the part number based on a rule. Another method for homogenizing the data is to create a third table to act as a joining entity. Both approaches are valid based on your reporting needs from the data mart. As the organization grows, it requires more formal analytical tools. These tools are essentially "blank slates" on the screen that allows data selection and presentation. The reason a tool is needed rather than just a report is that the analysis is on more than one axis, or dimension. To understand how dimensions work, it is helpful to think of a standard spreadsheet. A spreadsheet (as well as a paper report) shows data on two axes at a time. Imagine a typical spreadsheet with rows and columns. In the columns, you can visualize store locations. Along the rows, you can visualize product names. In this case, the store locations and product names are dimensions of what you want to see. The intersection of the rows and columns (the cells) shows a number indicating the amount of a product sold at a store location. In other words, where the dimensions intersect each other, you have a fact about that dimension. In this example, you can easily imagine the number 300 in a cell where Tampa and Buckster's Coffee intersect. That particular fact means that 300 cases of Buckster's Coffee sold at the Tampa location. Now imagine that you also want to see a further breakdown of the data, involving when those products were sold, perhaps by month or quarter. That is a little harder to represent with a spreadsheet, although you could add a set of rows that shows each month, and perhaps even another for the quarters. The spreadsheet is still useable, but is becoming more cluttered and harder to interpret. But now, imagine you want to know the name of the products and the location they sold at, by month and quarter and by salesperson. A paper report with this information becomes cumbersome to navigate, and a spreadsheet would need several tabs to show all the permutations. You need something that enables you to "flip" through these all dimensions, showing calculated facts about a view of the dimensions dynamically. This is also called "slice and dice." This component of a BI system is called online analytical processing (OLAP). OLAP systems store dimensions and facts in a structure called a cube, because a cube shape represents data with more than one axis. OLAP systems collect data from the ODS, data mart, or source system layer and aggregate the various facts and dimensions into possible combinations. This is all done online, in an investigative manner. Another part of the BI system is a facility for data mining. Data mining involves interactively searching the data for statistically relevant patterns. A data mining system uses sophisticated algorithms that remove false correlations and show the user things such as products that sell well together, how effective a marketing campaign is, and more. At a higher level, many mid- to lower-level managers want to know how they are doing against an established norm, such as growth over last year's sales and so forth. The system that provides this information is called key performance indicators (KPIs). All of these disparate parts make up a BI system. When I use the term system here, I am describing the conceptual parts of a group of data and software that provide strategic, consolidated, analytic data to management users. The physical parts of a BI system are called the BI landscape. The landscape includes the source system software and databases, reporting servers, processing engines, and display mechanisms. |