I started the Squid project eight years ago while working at the National Laboratory for Applied Network Research and the University of California. Back then I certainly enjoyed writing code and fixing bugs but always felt bad about the lack of decent documentation. This book is my attempt to rectify that situation. It's been a long time coming and almost didn't happen. Like they say, "better late than never!"
This book is written for those who are tasked with setting up and maintaining one or more Squid caches. If you're new to Squid, I'll show you how to download, compile, and install the code. Those of you who have been using Squid for a while will be more interested in the later chapters, where I talk about disk cache performance, modifying requests, surrogate mode, caching hierarchies, monitoring Squid, and more.
In order to use this book, you should have a basic knowledge of Unix systems. Many of the book's examples are based on free operating systems, such as Linux, FreeBSD, NetBSD, and OpenBSD. I also have some tips for Solaris users. If you're more comfortable with Windows systems, you can use Squid under a Unix emulator or give the native NT port a try.
Here's an overview of the book's contents:
This chapter introduces you to Squid and web caching. I give a brief history of the project, and a few notes on our future work. I explain how you can find additional support and information, including a FAQ, on the Squid web site.
In this chapter, I explain how and why you should download Squid's source code. You may prefer to install a precompiled binary or use a preconfigured package. I also talk about staying up to date with Squid using the anonymous CVS server.
Assuming you've downloaded the source code, this chapter explains how to configure and compile Squid. In some cases you may need to tune your system before compiling Squid. For example, your kernel may have relatively low file-descriptor limits that affect Squid's performance.
Here, I give a brief introduction to Squid's configuration file. If you are the impatient type and can't wait to start using Squid, this chapter will leave you with a minimal configuration file you can start playing with.
In this chapter, I explain how to run Squid for the first time and how to test Squid in a terminal window. Following that, I suggest a number of ways to configure your system so that Squid starts each time it boots. I also explain how to reconfigure Squid while it is running and how to safely shut it down.
I talk extensively about access controls in this chapter. Squid has a powerful collection of access control features and a number of different rule sets that determine how requests and responses are treated. This is an important chapter because a mistake in your access controls may leave your cache, or even internal systems, vulnerable to abuse from outsiders.
This chapter is about Squid's primary function: storing cached responses on disk. I explain how to configure the disk cache, including replacement policies and freshness controls. I also show you how to manually remove unwanted objects from the cache.
In this chapter, I explain how to improve the performance of Squid's disk cache. I'll talk about Squid's different storage schemes and a number of filesystem tuning options that may help. If your Squid cache handles a relatively light load, you probably don't need to worry about disk performance.
Here, I explain how to configure Squid for HTTP interception, sometimes also called transparent caching. Actually, configuring Squid is the easy part. The difficulty comes from setting up a router or switch on your network and the host from which Squid is running. I explain how to configure networking equipment from Cisco, Alteon, Foundry, and Extreme. I'll also show you how to configure your operating system (Linux, FreeBSD, NetBSD, OpenBSD, and Solaris) for HTTP interception. Finally, I talk about WCCP.
In this chapter, I cover the ins and outs of cache cooperation, including meshes, arrays, and hierarchies. You may also find it useful if you simply need to forward requests from Squid to another proxy or intermediary. I'll talk about the various intercache protocols supported by Squid (ICP, HTCP, Cache Digests, and CARP) and how Squid chooses the next-hop location for a given cache miss.
Redirectors are the best way to make Squid rewrite HTTP requests before forwarding them. I describe the interface between Squid and a redirector program so that you can write your own. I also present a few of the more popular third-party redirectors available.
In this chapter, I explain how Squid interfaces with external authentication databases such as LDAP, NT domain controllers, and password files. Squid comes with a number of authentication helpers and understands Basic, Digest, and NTLM authentication credentials. I also document the API for each, in case you want to develop your own helper.
I cover Squid's various log files in this chapter, including access.log, store.log, cache.log, and others. I explain what each log file contains and how you should periodically maintain them.
This chapter provides a lot of information on monitoring Squid's operation. I cover both SNMP and Squid's own cache manager interface. You'll find it useful for both long-term monitoring and short-term problem diagnosis.
Squid's server accelerator mode is useful in a number of situations. You can use it to boost your origin server's poor performance, as a firewall to protect the server, or even to build your own content delivery network. I show how to set up Squid and make sure that outsiders can't abuse your service.
The book's final chapter explains how to debug and troubleshoot problems with Squid. You may find that some sites, or some user agents, don't work properly with Squid. I show how to isolate and reproduce the problem and how to present the information to Squid developers for assistance.
This appendix is a reference guide for each of Squid's 200 configuration file directives. Each has a description, syntax, defaults, and examples.
This brief appendix explains a little about Squid's memory cache.
You can use Squid's delay pools feature to limit bandwidth consumed by web surfers. I explain how the delay pools work and provide a number of example configurations.
In this appendix, I present the results of numerous filesystem benchmarks. These may help you make informed decisions regarding particular operating systems, filesystem features, and Squid's storage techniques.
Have a look at this appendix if you'd like to run Squid on your Windows box. I talk about using Cygwin and about a native port of Squid, called SquidNT.
This appendix contains information on how to configure various user agents to use Squid. I talk about manual configuration, environment variables, Proxy Auto-Configuration functions, and the Web Proxy Auto Discovery protocol.
As I'm finishing up this book, the latest stable version is Squid-2.5.STABLE4, and the development version is Squid-3.0. Perhaps the most important difference between the two is that Squid-3 is being rewritten in C++. You should find that most things are backward-compatible, although a few new configuration directives have been created. Please read the release notes carefully if you use Squid-3.0 or later.
I have created a web site for the book, located at http://squidbook.org/. There, you will find errata, supplemental information, and links to online resources.
Due to a lack of time and space, there are some topics I was unable to cover in this book; they include:
You'll find that I mostly talk about HTTP, even though Squid also supports FTP, Gopher, and some other relatively obscure protocols.
Squid's error messages can be customized and the source distribution includes versions of the error messages in a number of different languages. You can probably figure out how to customize the error messages by modifying the default pages or by reading Squid's source code.
Load balancing is a popular way to increase the capacity of a caching service. Refer to one of the load balancing books mentioned in the following section if necessary.
HTTP has a number of somewhat complicated rules for determining what may, or may not be, cached, and for how long. Refer to Web Caching, or HTTP: The Definitive Guide (for more information, see the next section).
A number of nontechnical issues surround web caching. These include copyrights and privacy.
I don't go into detail about Squid's source code in this book. The Squid project hosts a programmers' guide, which is generally incomplete and out of date. If you have questions about the source code, please join the squid-dev mailing list.
Squid doesn't support the SOCKS protocol at this time.