Error Messages That Occur with ClusterXL or State Synchronization

A number of error messages show up when using ClusterXL or State Synchronization. The following subsections document some of the more common ones and what, if anything, you need to do about them.

13.5 Various Error Messages Occur during a Full Sync

During a full sync between firewalls running State Synchronization, a number of error messages might show up, particularly if the gateways are under load. One or more of the following messages might appear.

"h_rename: entry not found"
"h_slink: an attempt to link to a link"
"kbuf id not found"
"fw_conn_post_inspect: fwconn_init_links failed"

During full sync, an existing member sends its firewall tables to a new member while at the same time updating the tables with information from incoming packets. Updates are necessary when receiving packets because the firewall should not stop handling packets while a full sync is under way. Unfortunately, a full sync can be somewhat lengthy, and these errors will show up during that time. You can safely ignore these errors.

13.6 Error Changing Local Mode from <mode1> to <mode2> because of ID <machine_id>

You may see this error message when using ClusterXL if the working mode of the cluster members isn't in sync. For example, one member might be in HA mode, and another might be in Load Sharing mode. ClusterXL resolves this by reducing the working mode to the lowest common mode. This error is safe to ignore.

13.7 Inconsistencies Exist between Policies Installed on Cluster Members on My Console

This error may appear if the State Synchronization mechanism detects that cluster members have different policies. Such a condition may result from a fw fetch command executed on a cluster member after a policy on the management module was pushed but not successfully installed on any cluster member. This results in the two cluster members actually enforcing different policies, which is an unstable situation. To resolve it, reinstall the policy to the entire cluster.

13.8 CPHA: Received Confirmations from More Machines Than the Cluster Size

This message occurs during a policy installation on the cluster. It means that your cluster configuration is inconsistent or another cluster is using the same network for State Synchronization and you haven't configured the other cluster to use different MAC addresses as documented in FAQ 13.2.

13.9 FwHaTimeWorker: Wait Failed (Status N)

This error message occurs on a multiprocessor Windows platform. When this occurs, ClusterXL will not function at all. Check Point reports this is a failure within the operating system. Check Point does not provide a resolution to this problem.

13.10 fwha_reset_timer: Failed to Allocate Timer DPC or Timer Object

This error message, which occurs only on Windows, indicates FireWall-1 was unable to allocate either a timer DPC or a timer object. Either way, ClusterXL will not be able to function due to a failure in the underlying operating system. Check Point does not provide a resolution to this problem.

13.11 There Are More Than 4 IPs on Interface <interface name> Notifying Only the First Ones

This message means that some cluster members have more than three virtual IP addresses (in addition to the one real one) defined on the same interface. This is unsupported and will break ClusterXL functionality. However, this error message also shows up on IPSO when using State Synchronization with VRRP and not ClusterXL. In this instance, the error message is addressed in NG AI as well as in HFA-310 and above for NG FP3. Contact Check Point support to obtain HFA-310.

13.12 fwha_create_icmp_echo_request: Failed to Create Packet

This error occurs when system resources are very low, which means that ClusterXL is probably failing. Troubleshoot your system to ensure it has adequate resources.

13.13 fwha_receive_fwhap_msg: Received Incomplete HAP Packet (Read <number> Bytes)

Check Point claims this is a rare log message. You should contact Check Point Support if this error occurs because it likely points to ClusterXL not functioning correctly.

13.14 Inconsistencies Exist between Policies Installed on the Cluster Members

If the policy is different between two cluster members, you will see this error message. It will also occur if you install two cluster members on different operating systems (e.g., IPSO and Windows). This message does not appear in NG AI. In FP3 you can ignore it if policy installation succeeded on all members.

13.15 Sync Could Not Start Because There Is No Sync License

If you have a basic firewall license, even a node-limited license, you should be able to use sync. Notable exceptions are small-office licenses and single-gateway licenses (e.g., management and firewall on the same platform). Ensure your firewalls have the correct license.

13.16 fwldbcast_timer: Peer X Probably Stopped

This error shows up when the member that prints this message stops hearing particular messages from member X. Use the command cphaprob state to validate the state of all members. The command fw ctl pstat should also report that sync is configured correctly and working correctly on all members. Perhaps there was a temporary connectivity problem that was resolved (meanwhile, a few connections through the cluster might experience problems), or perhaps peer X is really down.

13.17 fwlddist_adjust_buf: Record Too Big for Sync

This error message indicates that the amount of information that needs to be synced between cluster members is larger than the buffers designed to hold that information. Things that can affect this buffer include the following.

Active connections view in SmartView Tracker/Log Viewer: This uses the same infrastructure as the State Synchronization mechanism.
High rate of change of connections: If a number of connections are being created, torn down, and then created again, likely due to heavy load, a lot of activity is going on in the state tables. This causes State Synchronization to work extra hard to keep the state tables in sync.

You can attempt to reduce the number of services being synchronized (see FAQ 13.4), or you can increase the size of the sync buffer, which can be anywhere from 8K (0x2000) to 64K (0x10000). You can do this in FireWall-1 NG FP3 HFA-308 or later. (You can obtain HFA-308 from Check Point Support.) The following examples set the sync buffer size to 64K.

On Solaris machines, add the following line to the bottom of the /etc/system file, and then reboot:

set fw: int fwlddist_buf_size = 0x10000

On Linux machines, edit $FWDIR/boot/modules/fwkern.conf and add the following lines, rebooting afterward:

fwlddist_buf_size= 0x10000

Check Point states that this variable cannot be changed on Nokia or Windows. However, the IPSO version of the FireWall-1 loadable kernel module does contain the appropriate values, and using the modzap utility from Nokia Resolution 1261 on these values appears to work. Here is the command you must enter before rebooting:

nokia# modzap $FWDIR/boot/modules/fwmod.o _fwlddist_buf_size 0x10000

13.18 fwha_pnote_register: Too Many Registering Members, Cannot Register

The Pnote mechanism can store only up to 16 different devices. Attempts to configure a seventeenth device, either by editing $FWDIR/conf/cphaprob.conf or by using the cphaprob -d ... register command, fail and generate this error message.

13.19 fwha_pnote_register: foo Already Registered (#5)

This message may occur when registering a new Pnote device. The example message used for this subheading means that the device foo is already registered as with Pnote number 5. Each Pnote device must have a unique name.

13.20 fwha_pnote_reg_query: Pnotes Not Relevant in Service Mode

This error shows up when third-party HA/Load Sharing solutions attempt to use a Pnote. These are supported only in ClusterXL.

13.21 fwldbcast_update_block_new_conns: Sync in Risk: Did Not Receive ack for the Last 410 Packets

Synchronization load is considered heavy when the synchronization transmit queue of a firewall starts to fill beyond the fw_sync_buffer_threshold, which is set by default to 80%. Increasing the sync buffer size as described in FAQ 13.17 will improve synchronization performance substantially and allow a larger fw_sync_buffer_threshold value.

When the fw_sync_buffer_threshold reaches over 80% of the sync buffer size, you might also see the following error messages on your console:

Jun 5 10:21:25 nokiafw2 [LOG_CRIT] kernel: FW-1: It is
recommended to set the global parameter fw_sync_block_
new_conns to 0.
FW-1: State synchronization is in risk. Please examine
your synchronization network to avoid further problems!

The kernel variable fw_sync_block_new_conns can be modified to allow FireWall-1 to detect heavy loads and start blocking new connections. The default behavior in NG AI, and the way things worked in previous releases, was to allow new connections into the connections table despite not being fully synchronized. This allows new connections at the expense of possibly not being able to recover if failover occurs. By default fw_sync_block_new_conns is set to -1 (load detection is disabled).

To enable load detection on sync and block new connections when the fw_sync_buffer_threshold is over 80%, you must first update your firewall cluster members to NG FP3 HFA-315 or higher, which includes NG AI. Contact Check Point or your support provider for the latest HFA for NG FP3.

On Solaris machines, add the following line to the bottom of the /etc/system file:

set fw:fw_sync_block_new_conns = 0

On Linux machines, edit $FWDIR/boot/modules/fwkern.conf and add the following lines, rebooting afterward:

fw_sync_block_new_conns=0

nokia# modzap $FWDIR/boot/modules/fwmod.o _fw_sync_block_new_cons 0

By default, if more than 410 consecutive packets are sent without getting an ACK on any one of them, new connections are dropped. When blocking starts, fw_sync_block_new_conns is automatically set to 1. When the situation stabilizes, the variable is set back to 0.

WARNING!

Under no circumstances should you manually set fw_sync_block_new_conns to 0.

Apply the above changes to all members of the cluster. On Nokia platforms, you need to make some additional changes to your IPSO configuration. For VRRP configurations, you need to set the Coldstart Delay to an appropriate value to give the firewalls an adequate amount of time to synchronize prior to accepting any traffic. Here are some guidelines based on platform type.

IP650: 180 seconds
IP440: 240 seconds
IP120/330: 300 seconds
Other platforms: 120 seconds

Note that the most up-to-date recommendations regarding the settings for this situation are in Resolution 17111 in Nokia's Knowledge Base.

Once the modzap and VRRP Coldstart Delay changes are made, reboot each of the cluster members.

WARNING!

Do not perform a reboot on any of the cluster members until they all have their fw_sync_block_new_conns values changed. Failure to do so will result in lost synchronization upon a reboot of any cluster member!

13.22 fwhandle_get: Table kbufs?Invalid Handle?Bad Entry in Pool 0

This error condition appears when the kbuf pointer stored in a connections table entry is incorrect. This may indicate that the kbuf was already freed or this pointer was overwritten. There were some issues in the base NG FP3 release that caused this problem, which can be fixed by a hotfix obtainable from Check Point Support. However, the most likely cause of this problem is a result of a software mismatch between the members of the gateway cluster.

All members of the same cluster need to have the same Check Point packages loaded on them, even if they are deactivated. For example, even if the FloodGate package was installed on only one member of the cluster but never started, State Synchronization can become corrupted. The reason for this is that every Check Point package installation notifies the FireWall-1 product of its existence through the Check Point registry so that it can integrate with FireWall-1 when enabled. This registry is read at boot time.

Mismatched packages on cluster members can lead to several problems with firewall functionality including inaccurate sync of address translation information. This later leads to the inability to correctly free the NAT information, which is partially stored in the fwx_alloc table. If you look at the fwx_alloc table using the command fw tab ?t fwx_alloc, and either the #VALS or #PEAK is close to 25,000 entries (the default limit for this table), it is likely the result of mismatched package installation in your cluster.

To resolve this issue, uninstall the Check Point packages not present on all cluster members, and reboot. Once the machine is rebooted, it will contain the same configuration as its cluster peer, the Check Point registry will be reread, and the firewall module will be updated. A full sync will be performed, synchronization should work correctly with the other cluster members, and the cluster should be intact after that.