The Nature of the Beast

So just how do you decide if you're working on a true data warehouse? First, examine the intended nature of your database and the application it supports. For each subject area in your data warehouse, simply ask your sponsoring business user to provide the following eight items:

  • Mission statement

  • Number of ad-hoc query users

  • Number ad-hoc queries per day per ad-hoc user

  • Number of pre-canned report users

  • Number of pre-canned reports per day per pre-canned user

  • Number of pre-canned reports

  • Amount of history to keep in months, quarters, or years

  • Typical daily, weekly, or monthly volume of data to record

These answers should help you categorize your database application into one of the following choices:

  • Online transaction processing (OLTP)

  • Operational data store (ODS)

  • Online analytical processing (OLAP)

  • Data mart/data warehouse (DM/DW)

Use the criteria outlined in Table 1-1 to make your distinction.

Table 1-1. General Database Application Categorizations
 

OLTP

ODS

OLAP

DM / DW

Business Focus

Operational

Operational / Tactical

Tactical

Tactical / Strategic

End User Tools

Client/Server or Web

Client/Server or Web

Client/Server

Client/Server or Web

DB Technology

Relational

Relational

Cubic

Relational

Transaction Count

Large

Medium

Small

Small

Transaction Size

Small

Medium

Medium

Large

Transaction Time

Short

Medium

Medium

Long

DB Size in GB

10?400

100?800

100?800

800?80,000

Data Modeling

Traditional ERD

Traditional ERD

N/A

Dimensional

Normalization

3?5 NF[1]

3 NF

N/A

0 NF

[1] Normal Form

For example, suppose your answers are as follows:

  • "The point of sale (POS) subject area of the data warehouse should enable executives and senior sales managers to perform predictive, "what-if" sales analysis and historical analysis of:

    • A sales campaign's effectiveness

    • Geographic sales patterns

    • Calendar sales patterns

    • The effects of weather on sales

  • 20 ad-hoc query users

  • 10?20 ad-hoc queries a day per ad-hoc user

  • 40 pre-canned report users

  • 1?4 pre-canned reports a day per pre-canned user

  • 60 months of history

  • 40 million sales transactions per day

From this example, we can discern that we genuinely have a candidate for a data mart or data warehouse. First, the mission statement clearly indicates that our users' requirements are of a more tactical or strategic nature. Second, the majority of our report executions will clearly be ad-hoc (200?400 ad-hoc versus a maximum of 160 pre-canned). Third, we have significant historical data requirements and large amounts of raw data?and thus a potentially very large database (especially once we consider aggregates as well).

While it may seem like I've painted an example tailored to the conclusion, I've actually found the process to be this straightforward and easy in most cases. Unfortunately, these days, people tend to call any reporting database a data warehouse. It's okay for people to call their projects whatever they like, but as I pointed out, the techniques in this book only apply to the DM/DW column of Table 1-1.