SQL Server Replication Types

Microsoft has narrowed the field to three major types of data replication approaches within SQL Server: snapshot, transactional, and merge. Each replication type applies only to a single publication. However, it is possible to have multiple types of replication per database.

Snapshot Replication

Snapshot replication makes an image of all the tables in a publication at a single moment in time, and then moves that entire image to the subscribers. Little overhead on the server is incurred because snapshot replication does not track data modifications like the other forms of replication do. It is possible, however, for snapshot replication to require large amounts of network bandwidth, especially if the articles being replicated are large. Snapshot replication is the easiest form of replication to set up and is used primarily with smaller tables for which subscribers do not have to perform updates. An example of this might be a phone list that is to be replicated to many subscribers. This phone list is not considered to be critical data, and the frequency of it being refreshed is more than enough to satisfy all its users.

AGENTS USED: Snapshot Agent and the Distribution Agent primarily.

The snapshot agent creates files that contain the schema of the publication and the data. The files are temporarily stored in the snapshot folder of the distribution server, and then the distribution jobs are recorded in the distribution database.

The distribution agent is responsible for moving the schema and data from the distributor to the subscribers.

A few other agents also are used that deal with other needed tasks for replication, such as cleanup of files and history. In snapshot replication, after the snapshot has been delivered to all the subscribers, these agents will delete the associated .BCP and .SCH files from the distributor's working directory.

Transactional Replication

Transactional replication is the process of capturing transactions from the transaction log of the published database and applying them to the subscription databases. With SQL Server transactional replication, you can publish all or part of a table, views, or one or more stored procedures as an article. All data updates are then stored in the distribution database and sent and applied to any number of subscribing servers. Obtaining these updates from the publishing database's transaction log is extremely efficient. No direct reading of tables is required except during initial snapshot, and only the minimal amount of traffic is generated over the network. This has made transactional replication the most often used method.

As data changes are made, they are propagated to the other sites at nearly real time?you determine the frequency of this propagation. Because changes are usually made only at the publishing server, data conflicts are avoided for the most part. As an example, Push sub-scribers usually receive updates from the publisher in a minute or less, depending on the speed and availability of the network. Subscribers also can be set up for pull subscriptions. This is useful for disconnected users who are not connected to the network at all times.

AGENTS USED: Snapshot Agent, Log Agent, and the Distribution Agent primarily.

The snapshot agent creates files that contain the schema of the publication and the data. The files are stored in the snapshot folder of the distribution server, and then the distribution jobs are recorded in the distribution database.

The log reader agent monitors the transaction log of the database that it is set up to service. Each database published has its own log reader agent set up for replication and it will copy the transactions from the transaction log of that published database into the distribution database.

The distribution agent is responsible for moving the schema and data from the distributor to the subscribers for the initial synchronization and then moving all of the subsequent transactions from the published database to each subscriber as they come in. These transactions are stored in the distribution database for a certain length of time and eventually purged.

As always, a few other agents are used that deal with the other housekeeping issues surrounding data replication, such as schema files cleanup, history cleanup, and transaction cleanup.

Merge Replication

Merge replication involves getting the publisher and all subscribers initialized and then allowing data to be changed at all sites involved in the merge replication at the publisher and at all subscribers. All these changes to the data are subsequently merged at certain intervals so that again, all copies of the database have identical data.

Occasionally, data conflicts will have to be resolved. The publisher will not always win in a conflict resolution. Instead, the winner is determined by whatever criteria you establish.

AGENTS USED: Snapshot Agent and the Merge Agent primarily.

The snapshot agent creates files that contain the schema of the publication and the data. The files are stored in the snapshot folder of the distribution server, and then, the distribution jobs are recorded in the distribution database. This is essentially the same behavior as all other types of replication methods.

The merge agent takes the initial snapshot and applies it to all of the subscribers. It then reconciles all changes made on all the servers based on the rules that you configure.

Preparing for Merge Replication

When you set up a table for merge replication, SQL Server performs three schema changes to your database. First, SQL Server must either identify or create a unique column for every row that is going to be replicated. This column is used to identify the different rows across all of the different copies of the table. If the table already contains a column with the ROWGUIDCOL property, SQL Server will automatically use that column for the row identifier. If not, SQL Server will add a column called rowguid to the table. SQL Server also will place an index on this rowguid column.

Next, SQL Server adds triggers to the table to track changes that occur to the data in the table and record them in the merge system tables. The triggers can track changes at either the row or the column level, depending on how you set it up. SQL Server will support multiple triggers of the same type on a table, so merge triggers will not interfere with user-defined triggers on the table.

Last, SQL Server adds new system tables to the database that contains the replicated tables. The MSMerge_contents and MSMerge_tombstone tables track the updates, inserts, and deletes. These tables rely on the rowguid to track which rows have actually been changed.

The merge agent is responsible for moving changed data from the site where it was changed to all other sites in the replication scenario. When a row is updated, the triggers that were added by SQL Server fire off and update the new system tables, setting the generation column equal to 0 for the corresponding rowguid. When the merge agent runs, it collects the data from the rows where the generation column is 0, and then resets the generation values to values higher than the previous generation numbers. This allows the merge agent to look for data that has already been shared with other sites without having to look through all the data. The merge agent then sends the changed data to the other sites.

When the data reaches the other sites, the data is merged with existing data according to rules that you have defined. These rules are flexible and highly extensible. The merge agent evaluates existing and new data and resolves conflicts based on priorities or which data was changed first. Another available option is that you can create custom resolution strategies using the Component Object Model (COM) and custom stored procedures. After conflicts have been handled, synchronization occurs to ensure that all sites have the same data.

The merge agent identifies conflicts using the MSMerge_contents table. In this table, a column called lineage is used to track the history of changes to a row. The agent updates the lineage value whenever a user makes changes to the data in a row. The entry into this column is a combination of a site identifier and the last version of the row created at the site. As the merge agent is merging all the changes that have occurred, it examines each site's information to see whether a conflict has occurred. If a conflict has occurred, the agent initiates conflict resolution based on the criteria mentioned earlier.