10.2 Why (Not) Use a Hierarchy?

A neighbor cache improves performance by providing some extra fraction of requests as cache hits. In other words, some of the requests that are misses in your cache may be hits in the neighbor cache. If your cache can download these neighbor hits faster than from the origin server, the hierarchy should improve performance overall. The downside is that neighbor caches usually provide only a small percentage of requests as hits. About 5%, or maybe 10% if you're lucky, of your requests that are cache misses will be hits in a neighbor. In some cases, this small benefit doesn't justify the hassle of joining a hierarchy. In other cases, such as networks with poor or overutilized connectivity, hierarchies definitely improve performance for end users.

If you use Squid inside a firewalled network, you may need to configure the firewall proxy as a parent. In this case, Squid forwards every request to the firewall because it can't connect directly to outside origin servers. If you have some origin servers inside the firewall, you can instruct Squid to connect to them directly.

You can also use a hierarchy to send web traffic in different directions. This is sometimes called application-layer routing, or more recently, content routing. Consider, for example, a large organization with two Internet connections. Perhaps the second connection costs less, or has higher latency, than the other. This organization may want to use the second connection for low-priority traffic, such as downloading binaries, audio and video files, or other kinds of large transfers. Or, perhaps they want to send all HTTP traffic over one link, and non-HTTP traffic over the other. Or, perhaps certain users' traffic should go through the low-priority connection, while premium customers get to use the more expensive link. You can accomplish any of these scenarios with a hierarchy of caching proxies.

Trust is one of the most important issues for the members of a cache hierarchy. You must trust your neighbors to serve correct, unmodified responses. You must trust them with sensitive information, such as the URIs requested by your users. You must trust that they maintain secure and up-to-date systems to minimize the chances of unauthorized access and denials of service.

Another problem with hierarchies is the way that they normally propagate errors. When a neighbor cache experiences an error, such as an unreachable server, it generates an HTML page that explains the error and its origin. Your users may become confused if they get errors from neighbor caches outside the immediate organization. If the problem persists, they'll have a hard time finding an administrator who can help them.

Sibling relationships are subject to special problem, known as false hits. This occurs when Squid sends a request to a sibling, believing it will be a cache hit, but the sibling is unable to satisfy the request without contacting the origin server. False hits happen in a number of circumstances, but usually with a low probability. Furthermore, Squid and other HTTP proxies have features for automatically retrying such requests so that the user isn't even aware of the problem.

A forwarding loop is another problem sometimes seen in cache hierarchies. It occurs when Squid forwards a request somewhere, but that request comes back to Squid again, as shown in Figure 10-1.

Figure 10-1. A forwarding loop
figs/SQ_1001.gif

Forwarding loops typically happen when two caches consider each other parents. If you have such an arrangement, make sure that you use the cache_peer_access directive to prevent loops. For example, if the neighbor's IP address is 192.168.1.1, the following lines ensure Squid won't cause a forwarding loop:

acl FromNeighbor src 192.168.1.1

cache_peer_access the.neighbor.name deny FromNeighbor

Forwarding loops can also occur with HTTP interception, especially if the interception device is on the path between Squid and an origin server.

Squid detects forwarding loops by looking for its own hostname in the Via header. You may actually get false forwarding loops if two cooperating caches have the same hostname. The unique_hostname directive is useful in this situation. Note that if the Via header is filtered out (e.g., with headers_access), Squid can't detect forwarding loops.



    Appendix A. Config File Reference