Operational Data Stores Don't Count

Frequently people don't understand why an ODS is not a data warehouse. Since many ODS projects are referred to as data warehousing initiatives, people often mistakenly assume that an ODS is therefore a data warehouse. That assumption is false as an ODS is merely a stepping-stone to a true data warehouse. An ODS is simply a means to an end, and not the end itself. Let's see where the ODS fits into the data warehousing equation.

Companies generally have numerous legacy application systems that were developed with varying technologies over a long period of time. For example, an insurance company may have different policy and commission applications across its different business units (e.g., life, health, property, casualty, and investments). It also would not be uncommon to have several such applications for the various product families within each different business unit (e.g., for investments, IRAs vs. annuities vs. 401Ks vs. 403Bs). Moreover, there could even be different applications by product nature (e.g., individual vs. group policies). So, an insurance company could have dozens of policy and commission applications across many different hardware and software platforms. Furthermore, these applications were very likely developed in total seclusion from the others. Thus, each application is really like an island unto itself (often referred to as stovepipe applications).

Now, imagine that you need to generate reports for a specific customer or agent, John Smith. Since John Smith the customer or agent might exist in one or more of those different applications, the insurance company needs a common staging area to merge this eclectic data into one centralized source. Such a centralized collection of disparate but interrelated data sources is known as an ODS. Figure 1-1 demonstrates a typical OD.

Figure 1-1. Typical ODS Source and Target Architecture


An ODS contains the centralized, single-source location for OLTP data. It is very often referred to as the system of record. Moreover, an ODS typically keeps a window of history on that data (usually by merely adding date and timestamp columns to the OLTP data). So, an ODS can be quite large, often into the 400+ GB range. But, ODS data is in its most raw form, sometimes nothing more than a copy of OLTP data with dates and timestamps. No useful transformations or aggregations have been performed to translate that transactional data into the tactical or strategic format necessary for executive management reporting needs. Therefore, to repetitively report off that ODS data in its unprocessed form would be very expensive. Thus, ODS data needs to be transformed into a format suitable for effective and efficient reporting. This pathway for loading a data warehouse via an ODS is shown in the highlighted portion of Figure 1-2.

Figure 1-2. Typical Data Warehouse Data Loading Options


Also note that Figure 1-2 shows that you can just as easily bypass the ODS and directly transform legacy database data into the data warehouse. The point is that an ODS is not mandatory. For example, let's assume that we have a number of legacy application databases that were all developed in Oracle. Furthermore, let's assume that we have an accurate data dictionary for all business attributes such that all like tables and columns across those different Oracle databases have exactly the same type and size. In this case, building an ODS would merely serve to remove duplicate rows. In such a case, we might reasonably forgo building an ODS.

Figure 1-2 also shows that the data warehouse resides separately from the data marts. The point is that a data warehouse and a data mart are not quite the same thing. The primary difference between a data mart and a data warehouse is simply a question of scope. A data warehouse is a single, large store for the transformation of all legacy databases or ODS data. So, everyone would report off an enterprise data warehouse. A data mart is a smaller, specialized store for the transformation of all related legacy databases and ODS data, generally referred to as a subject area. For example, a consumer retail company might keep a data mart of cash register or POS data. A typical company might then have several to several dozen such data marts.