Transformation Requirements

Before we look at the implementation methods, it's key to understand the transformation process requirements as they directly affect the options available. In general, I have seen only three data transformation scenarios:

  • Source data is already transformed (i.e., no transformation required).

  • Source data requires very simple transformation operations.

  • Source data requires very complex transformation operations.

The first and second scenarios almost never occur. If they do, then either you're really building an ODS?and hence this book's techniques generally do not apply?or somehow you've tricked the upstream application teams into producing your source data files with all your data transformations already complete.

Almost always, I find the third scenario to be the case. By more complex transformations, I mean data transformation operations more complex than simply:

  • Substituting constants.

  • Converting data types.

  • Applying format masks.

  • Assigning fields to a sequence number.

  • Setting to NULL based on a condition.

  • Setting a DEFAULT based on a condition.

  • Using simple SQL operators (e.g., UPPER, TRUNC, SUBSTR, etc.).

You may recognize these data transformation operations as those available from within Oracle's SQL Loader. So another way to state the third scenario would be that your data transformation process requires logic more complex than that provided by SQL Loader.

For some data warehouses, another key data transformation requirement is the ability to update previously inserted data. I've always referred to this as doing an "UPSERT" (i.e., the combination of an UPDATE and an INSERT). Beginning with Oracle 9i, the new MERGE command directly supports this key data transformation need. However, the MERGE is only supported in SQL DML commands, not in Oracle's SQL Loader.