8.6 Closing Remarks

Checking the content of individual columns is where you start data profiling. It is the easiest place to find inaccurate data. The values uncovered are clearly inaccurate because they violate the basic definition of the column.

You not only find inaccurate values, you find bad practices. This can lead the way to improving data encoding methods, use of standardized domains, use of standardized representations of the NULL condition, and even improving the storage requirements for some data. By looking at more data from more columns, your analysts will become experts at identifying best practices for column design.

Another large benefit is that you determine and document the correct metadata for each column. This alone is a huge benefit for subsequent projects and people who intend to use this data.

This step is the easiest to use for qualifying data feeds coming into your corporation from the outside. Column property analysis will tell you up front the general quality level of the data.

Data profiling can be very time consuming and expensive without software designed specifically for that purpose. Having to fabricate queries for each of the properties that apply to each column can be a laborious job. It may take three or more hours to examine each column in full detail using ad hoc methods. For a database with hundreds or thousand of columns, this is a lot of time. This can be reduced significantly with more efficient software.