10.4 Restricting Requests to Neighbors

Many people who use hierarchical caching need to control or limit requests that Squid sends to its neighbors. Squid has seven different directives that affect request routing: cache_peer_access, cache_peer_domain, never_direct, always_direct, hierarchy_stoplist, nonhierarchical_direct, and prefer_direct.

10.4.1 cache_peer_access

The cache_peer_access directive defines an access list for a neighbor cache. That is, it determines which requests may, or may not, be sent to the neighbor.

You can use this, for example, to split the flow of FTP and HTTP requests. You can send all FTP URIs to one parent and all HTTP URIs to another:

cache_peer A-parent.my.org parent 3128 3130

cache_peer B-parent.my.org parent 3128 3130

acl FTP proto FTP

acl HTTP proto HTTP

cache_peer_access A-parent allow FTP

cache_peer_access B-parent allow HTTP

This configuration ensures that A-parent receives only requests for FTP URIs, while B-parent receives only requests for HTTP URIs. This includes ICP/HTCP queries as well.

You might also use cache_peer_access to enable or disable a neighbor cache during certain times of the day:

cache_peer A-parent.my.org parent 3128 3130

acl DayTime time 07:00-18:00

cache_peer_access A-parent.my.org deny DayTime

10.4.2 cache_peer_domain

The cache_peer_domain directive is an earlier form of cache_peer_access. Rather than using the full access control feature set, it only uses domain names in URIs. It is often used to partition a group of parent caches by domain name. For example, if you have a global intranet, you may want to send requests to caches located on each continent:

cache_peer europe-cache.my.org parent 3128 3130

cache_peer asia-cache.my.org   parent 3128 3130

cache_peer aust-cache.my.org   parent 3128 3130

cache_peer africa-cache.my.org parent 3128 3130

cache_peer na-cache.my.org     parent 3128 3130

cache_peer sa-cache.my.org     parent 3128 3130

cache_peer_domain europe-cache.my.org parent .ch .dk .fr .uk .nl .de .fi ...

cache_peer_domain asia-cache.my.org parent   .jp .kr .cn .sg .tw .vn .hk ...

cache_peer_domain aust-cache.my.org parent   .nz .au .aq ...

cache_peer_domain africa-cache.my.org parent .dz .ly .ke .mz .ma .mg ...

cache_peer_domain na-cache.my.org parent     .mx .ca .us ...

cache_peer_domain sa-cache.my.org parent     .br .cl .ar .co .ve ...

Of course, this scheme doesn't address the popular global top-level domains, such as .com.

10.4.3 never_direct

The never_direct directive is an access list for requests that must never be sent directly to an origin server. When a request matches this access list, it must be sent to a neighbor (usually parent) cache.

For example, if Squid is behind a firewall, it may be able to talk to your "internal" servers directly but must send all requests for external servers via the firewall proxy (a parent). You can tell Squid "never connect directly to sites outside the firewall." To do so, tell Squid what is inside the firewall:

acl InternalSites dstdomain .my.org

never_direct allow !InternalSites

The syntax is a little strange. never_direct allow foo means Squid will not go directly for requests that match "foo." Since the set of internal sites is easy to specify, I used the negation operator (!) to match external sites, which Squid must never directly contact.

Note that this example doesn't force Squid to connect directly to sites that match the InternalSites ACL. The never_direct access rule can only force Squid not to contact certain origin servers. You must use the always_direct rule to force direct connections to origin servers.

You must take care when using never_direct in combination with the other directives that control request routing. You can easily create an impossible situation. Here's an example:

cache_peer A-parent.my.org parent 3128 3130

acl COM dstdomain .com

cache_peer_access A-parent.my.org deny COM

never_direct allow COM

This configuration creates a contradiction because any request whose domain name ends with .com must go through a neighbor cache. However, I defined only one neighbor cache, and don't allow the .com requests to go there. When this happens, Squid emits the "cannot forward" error message mentioned earlier in Chapter 10.

10.4.4 always_direct

As you can probably guess, the list of always_direct rules tell Squid that some requests must be forwarded directly to the origin server. For example, many organizations want to keep their local traffic local. An easy way to do this is to define an IP address-based ACL and put it in the always_direct rule list:

acl OurNetwork src 172.16.3.0/24

always_direct allow OurNetwork

10.4.5 hierarchy_stoplist

Internally, Squid flags each client request as either hierarchical or nonhierarchical. A nonhierarchical request is one that is unlikely to result in a cache hit. For example, responses to POST requests are almost never cachable. Forwarding requests for uncachable objects to neighbors is a waste of resources when Squid can simply connect to the origin server.

Some of the rules for differentiating hierarchical and nonhierarchical requests are hardcoded in Squid. For example, the POST and PUT methods are always nonhierarchical. However, the hierarchy_stoplist directive allows you to customize the algorithm. It contains a list of strings that, when found in a URI, make the request nonhierarchical. The default list is:

hierarchy_stoplist ? cgi-bin

Thus, any request that contains a question mark or the cgi-bin string matches the stoplist and becomes nonhierarchical.

By default, Squid prefers to send nonhierarchical requests directly to origin servers. Because they are unlikely to result in cache hits, they are generally an extra burden on neighbor caches. However, the never_direct access control rules override hierarchy_stoplist. In particular, Squid:

Never sends ICP/HTCP queries for nonhierarchical requests unless the request matches a never_direct rule
Never sends ICP/HTCP queries to sibling caches for nonhierarchical requests
Never looks in neighbor cache digests for nonhierarchical requests

10.4.6 nonhierarchical_direct

This directive controls the way that Squid forwards nonhierarchical (i.e., probably uncachable) requests. By default, Squid prefers to send nonhierarchical requests directly to origin servers. This is because such requests are unlikely to result in cache hits. I feel it is always better to get them directly from the origin server, rather than waste time looking for them in neighbor caches. If, for some reason, you want to route such requests through the hierarchy, disable this directive:

nonhierarchical_direct off

10.4.7 prefer_direct

This directive controls the way that Squid forwards hierarchical (i.e., probably cachable) requests. By default, Squid prefers to send such requests to a neighbor cache first and then directly to the origin server. You can reverse this behavior by enabling the directive: