Chapter 8. Integration Services

It seems strange that the minute you get data into a database, you are deluged with requests to take it out and put it into another one. It is also common to import data from other databases, files, and systems into your databases. Microsoft's SQL Server Integration Services (SSIS) serves as the primary engine and design environment for everything from simple imports and exports to a full-fledged extract, transform, and load (ETL) platform for the most demanding business intelligence (BI) landscape.

However, it is not just about moving data back and forth. SSIS has an intuitive graphical design environment that you can use to create powerful programs that perform maintenance on your databases, talk to other systems such as FTP servers or Web services, read and write XML files, or even all of that in the same package.

In this chapter, I explain how you navigate this powerful new feature in SQL Server 2005. I explain the architecture and the tools to create and manage solutions and then take you down the path of the ETL features that you have in the product. I use a lot of tables and lists for these explanations because I want you to see the breadth of what you have available as a toolset. I encourage you to do further research on the items I show you in the information you will find in this chapter. SSIS is one of those areas that have a lot of "corners" to explore.

Because this tool replaces Data Transformation Services (DTS) in previous versions of SQL Server, I explain what you need to know to make the transition in the "Take Away" section at the end of the chapter. I cover a little more information about upgrading a DTS package in Appendix , "Upgrading to SQL Server 2005."

One of the primary tasks of a database is to move data into and out of the system. Other than the data-entry screens of an application, you have several tools in SQL Server 2005 available to do that, such as the bulk-copy program (bcp) or by running a query from the sqlcmd command-line utility with an output flag. But these tools are not really suited for complex or fine-grained control when you import and export data, and they do not allow the data to be changed in any meaningful way during their operation. These tools also are not well suited to data migrations when you need to import or export a table from the database between another database or nondatabase format.

SSIS is a tool that has a simple wizard-driven interface for quick imports and exports and a powerful design environment and development model capable of serving as the ETL function of a full BI implementation. That is a big requirement, and SSIS handles it by providing various objects that you can connect to make powerful workflows for your system.

Integration Services is an engine, a service, a development environment, and a programming application interface designed to move data through a workflow that you design. It is the data movement and manipulation engine of SQL Server 2005 that you can use in simple and complex applications. In general, it is an ETL system.

Most ETL systems focus only on moving data, usually through a staging area. Using database tables, the data is stored "raw" prior to its transformation. Staging tables have open fields that allow any ASCII characters of almost any length. This allows the data to come over from the source system quickly, and the transform process deals with any problems once it reaches the ETL system. The source systems and the BI landscape are considered separate entities.

Microsoft takes a different view of this process.

Rather than bringing the data into a set of open tables, using SSIS you can transform the data into its final form directly within the extract process. This is called a pipelined procedure. Integration Services is set up to perform transforms with components built right in to the toolset and contains a complete coding model for any custom transforms along the way. That is not to say you cannot create a staging area, just that you should give serious thought to transforming as early in the process as you can. My experience has always been to transform as late in the process as possible, but the in-line transformation enhancements in Integration Services makes the pipelining process convenient. The real benefit is that you can do things either way.