Planning for SQL Server Data Replication

You must consider many factors when choosing a method to distribute data. Your business requirements will determine which is the right method for you. In general, you will need to understand the timing and latency of your data, its independence at each site, and your specific need to filter or partition the data.

Autonomy, Timing, and Latency of Data

Distributed data implementations can be accomplished using a few different facilities in Microsoft. These are Data Transformation Services (DTS), Distributed Transaction Coordi-nator (DTC), and Data Replication. The trick is to match the right facility to the type of data distribution that you need to get done.

In some applications, such as online transaction processing and inventory control systems, data must be synchronized at all times. This requirement, called immediate transactional consistency, was known as tight consistency in previous versions of SQL Server.

SQL Server implements immediate transactional consistency data distribution in the form of two-phase commit processing. A two-phase commit, sometimes known as 2PC, ensures that transactions are committed on all servers, or the transaction is rolled back on all servers. This ensures that all data on all servers is 100 percent in sync at all times. One of the main drawbacks of immediate transactional consistency is that it requires a high-speed LAN to work. This type of solution might not be feasible for large environments with many servers because occasional network outages can occur. These types of implementations can be built with DTC and DTS.

In other applications, such as decision support and report generation systems, 100 percent data synchronization all of the time is not as important. This requirement, called latent transactional consistency, was known as loose consistency in previous versions of SQL Server.

Latent transactional consistency is implemented in SQL Server via data replication. Replication allows data to be updated on all servers, but the process is not a simultaneous one. The result is "real-enough time" data. This is known as real-enough time data, or latent transactional consistency, because a lag exists between the data updated on the main server and the replicated data. In this scenario, if you could stop all data modifications from occurring on all servers, then all of the servers would eventually have the same data. Unlike the two-phase consistency model, replication works over both LANs and WANs, as well as slow or fast links.

When planning a distributed application, you must consider the effect of one site's operation on another. This is known as site autonomy. A site with complete autonomy can continue to function without being connected to any other site. A site with no autonomy cannot function without being connected to all other sites. For example, applications that utilize two-phase commits, or 2PC, rely on all other sites being able to immediately accept changes that are sent to it. In the event that any one site is unavailable, no transactions on any server can be committed. In contrast, sites using merge replication can be completely disconnected from all other sites and continue to work effectively, not guaranteeing data consistency. Luckily, some solutions combine both high data consistency and site autonomy.

Methods of Data Distribution

After you have determined the amount of transactional latency and site autonomy based on your business requirements, it is important to select the data distribution method that corresponds. Each different type of data distribution has a different amount of site autonomy and latency. With these distributed data systems, you can choose from several methods:

Distributed transactions?Distributed transactions ensure that all sites have the same data at all times. This method requires a certain amount of overhead cost to maintain this consistency. We will not be discussing this non-data replication method here.
Transactional replication with updating subscribers?Users can change data at the local location, and those changes are applied to the source database at the same time. The changes are then eventually replicated to other sites. This type of data distribution combines replication and distributed transactions because data is changed at both the local site and source database.
Transactional replication?With transactional replication, data is changed only at the source location and is sent out to the subscribers. Because data is only changed at a single location, conflicts cannot occur.
Snapshot replication with updating subscribers?This method is much like transactional replication with updating subscribers; users can change data at the local location, and those changes are applied to the source database at the same time. The entire changed publication is then replicated to all subscribers. This type of replication provides a higher autonomy than transactional replication.
Snapshot replication?A complete copy of the publication is sent out to all subscribers. This includes both changed and unchanged data.
Merge replication?All sites make changes to local data independently and then update the publisher. It is possible for conflicts to occur, but they will be resolved.