State Synchronization's Role in High Availability

An application is said to be highly available when you have taken steps to minimize single points of failure. In the mid-1990s, several companies introduced High-Availability (HA) software that provided an infrastructure allowing applications to be monitored on the primary system and "failed over" to a secondary system when a failure was detected. The applications were stored on a shared medium (usually a mirrored disk connected to both systems), and state for the applications had to be stored on the shared disk drive. This allowed the secondary system to pick up where the primary system left off. This setup is historically referred to as a warm standby.

Although this was a boon for many companies, it had some drawbacks.

Only one system at a time was active. The secondary system was usually idle.
Some state was always lost. Placing state on disk had its limits and was not possible with many applications, including FireWall-1.

A better way to handle High Availability is to have both systems active at the same time, actively sharing information. In FireWall-1, this can be done with a feature called State Synchronization.

State Synchronization is a mechanism in FireWall-1 that allows two firewalls to share information contained within their respective state tables. This allows a firewall to more readily take over in the event of a failure. This condition is referred to as a hot standby.

Although State Synchronization preserves most connections, any connections involving the Security Servers will not fail over properly. This is because all Security Server?related connections actually terminate at a single firewall. It is difficult to fail over connections that terminate at a specific firewall.

A Word about Licensing

There is no special license for High Availability, though each gateway must have its own unique firewall license, even in a hot-standby configuration. Check Point typically offers discounts to customers who purchase gateways that will be used in an HA configuration. Node-limited gateway licenses can be used in an HA configuration, though single-gateway products (i.e., firewall plus local management module licenses) cannot.

The notable exception to the "no extra licenses" rule is ClusterXL, which requires a separate license. This feature is discussed later in this chapter.

The State Synchronization Protocol

State Synchronization occurs via UDP port 8116, or at least that's the way sync appears to work. In reality, State Synchronization is a layer-2 multicast-type protocol that, in many protocol analyzers, looks an awful lot like UDP port 8116 traffic. Packets from the master host use one MAC address; packets from a slave host use a different MAC.

Within State Synchronization, there are two modes of operation: full sync and incremental sync. When firewalls begin synchronizing with each other for the first time (e.g., after a reboot or restart of FireWall-1) or after certain events (e.g., a policy install), a full sync is done. This means all tables marked as "sync" on these platforms are synchronized. Depending on how loaded the firewalls are, this can take a lot of communication, and thus time, to accomplish.

Once a full sync has been performed, every 100 milliseconds or so, the firewalls simply exchange the "changes" that have occurred since the last full sync. This is an incremental sync. In most cases, FireWall-1 operates in this mode.

Configuring State Synchronization

In FireWall-1 4.1 and earlier releases, synchronization was configured in $FWDIR/conf/sync.conf. This meant you could tie any two firewalls together, whether or not they ran the same security policy. In FireWall-1 NG, synchronization is now configured in a gateway cluster object. This makes sense because a gateway cluster is designed to treat two or more firewalls as if they were one. Ensuring all the cluster members have the same state information allows the cluster members to act as a single gateway.

You can create a gateway cluster object from the objects tree by clicking on the Network Objects icon, right-clicking New Check Point, and selecting Gateway Cluster. Alternatively, you can select Network Objects from the Manage menu, click on New, select Check Point, and then select Gateway Cluster. You will see a screen similar to Figure 13.1.

Figure 13.1. Gateway Cluster Properties, General Properties frame

graphics/13fig01.gif

Configuring a gateway cluster object is similar to configuring a standard gateway object, so I cover only the differences in the procedures here. Options you might configure in this object apply to all members of the cluster.

For your gateway cluster object, you should use an IP address that addresses the "virtual" firewall. All of the supported HA schemes, including Check Point HA, Virtual Router Redundancy Protocol (VRRP), Nokia IP Clustering, Rainfinity, and Stonebeat have some sort of concept of a virtual IP address. The active firewall will respond to packets sent to this virtual IP address. In a load-balanced configuration, multiple firewalls may be active at the same time, but only the appropriate firewall will respond. When a failover occurs, a different firewall will respond.

Figure 13.2 shows the Cluster Members frame, where you configure which gateways are part of your gateway cluster.

Figure 13.2. Gateway Cluster Properties, Cluster Members frame

graphics/13fig02.gif

WARNING!

In FireWall-1 NG FP3, once you add an individual gateway to a gateway cluster, you will not be able to see it in SmartDashboard/Policy Editor. The only way to edit the properties of an individual gateway is to edit them from within the gateway cluster object.

The 3rd Party Configuration frame, shown in Figure 13.3, allows you to configure whether this cluster is an HA cluster (i.e., active-standby) or a Load Sharing (active-active) cluster. On a Nokia platform, regardless of whether you use VRRP or IP Clustering, the cluster type is High Availability. If you use ClusterXL, it is a Load Sharing configuration.

Figure 13.3. Gateway Cluster Properties, 3rd Party Configuration frame

graphics/13fig03.gif

Other options on this screen include those listed below.

3rd Party Solution: Choose which "solution" will be providing High Availability. Unless High Availability is being provided by VRRP, you should choose OPSEC.
Support non-sticky connections: This might be better described as "Does your HA mechanism support asymmetric connections?" instead. The State Synchronization mechanism can account for connections that leave one firewall and return through another. The checkbox controls whether or not FireWall-1 will do this. For Nokia IP Clustering, this option should be checked. For Nokia VRRP, uncheck this box. For other OPSEC-certified products, consult the appropriate documentation.
Hide Cluster Members' outgoing traffic behind the Cluster's IP Address: With this checkbox enabled, any traffic originating from any member of the cluster will be automatically translated to appear to come from the cluster IP address. Any NAT rules on the cluster or cluster members override this setting. I generally do not recommend enabling this option because it's quite likely to make troubleshooting packets originating from a specific gateway more difficult.
Forward Cluster's incoming traffic to Cluster Members' IP addresses: With this checkbox enabled, when a client establishes an incoming connection to the cluster IP address, it will be automatically translated to the physical IP address of one of the cluster members.

Figure 13.4 shows the Synchronization frame, where you configure which network will be used for State Synchronization. Add a network name (choose a name that doesn't match a network object name), an IP address, and a netmask. The chosen network should be a dedicated network segment, that is, one not used for other kinds of traffic (including traffic used by Nokia IP Clustering, which should be on its own dedicated segment).

Figure 13.4. Gateway Cluster Properties, Synchronization frame

graphics/13fig04.gif

WARNING!

Under no circumstances should you provide a virtual IP address with VRRP, IP Clustering, or other HA mechanisms on the synchronization interface. This configuration has been shown to cause problems with State Synchronization.

What Are the Limitations of State Synchronization?

You should be aware that State Synchronization happens approximately every 100 milliseconds. All changes in the state table since the last sync interval are sent to the peer firewalls. It also takes roughly 55 milliseconds for these changes to be incorporated into the state tables. This means it takes a minimum of 155 milliseconds for the peer firewall to be updated. The actual amount of time varies based on system load.

What does this delay in synchronization mean? The long and the short of it is this: If one firewall receives a TCP SYN packet and the other firewall receives a corresponding TCP SYN/ACK packet before the synchronization actually occurs, you will end up dropping or severely delaying the establishment of the TCP connection. This is a condition known as asymmetric routing, which is discussed in its own subsection below.

Some other important restrictions concerning State Synchronization include those listed below.

The firewalls must be running on the same type of platform. This means that two Nokia, two Windows, or two Solaris platforms may synchronize with one another, but a Nokia and a Windows platform cannot. This is due to differences in how each platform internally stores the tables.
The firewalls must be running the same version of the software at the same service pack level.
The firewalls must have the same security policy. By using a gateway cluster, this is enforced.
With respect to NAT, careful consideration must be given to routing. Where routing is symmetric, it is usually only necessary to make the needed ARP and routing changes on both firewalls. Where routing is asymmetric, additional configuration of routers on either side may need to be done.
No connections or information relating to Security Servers are synchronized because they rely on individual processes on the platforms in question and cannot be easily synchronized.
If accounting logging is used, neither firewall will be able to provide accurate data on how much traffic was transferred because accounting data is not synchronized.
State Synchronization starts becoming a huge burden when the rate of change in the connections table becomes too great. It is a difficult task to keep more than two firewalls synced, especially under these conditions. In some applications (particularly firewalls protecting HTTP farms), State Synchronization can be either eliminated or greatly reduced by disabling the Synchronize on Cluster option in some services, such as HTTP and DNS.