7.1 The cache_dir Directive

The cache_dir directive is one of the most important in squid.conf. It tells Squid where and how to store cache files on disk. The cache_dir directive takes the following arguments:

cache_dir scheme directory size L1 L2 [options]

7.1.1 Scheme

Squid supports a number of different storage schemes. The default (and original) is ufs. Depending on your operating system, you may be able to select other schemes. You must use the enable-storeio=LIST option with ./configure to compile the optional code for other storage schemes. I'll discuss aufs, diskd, coss, and null in Section 8.7. For now, I'll only talk about the ufs scheme, which is compatible with aufs and diskd.

7.1.2 Directory

The directory argument is a filesystem directory, under which Squid stores cached objects. Normally, a cache_dir corresponds to a whole filesystem or disk partition. It usually doesn't make sense to put more than one cache directory on a single filesystem partition. Furthermore, I also recommend putting only one cache directory on each physical disk drive. For example, if you have two unused hard drives, you might do something like this:

# newfs /dev/da1d

# newfs /dev/da2d

# mount /dev/da1d /cache0

# mount /dev/da2d /cache1

And then add these lines to squid.conf:

cache_dir ufs /cache0 7000 16 256

cache_dir ufs /cache1 7000 16 256

If you don't have any spare hard drives, you can, of course, use an existing filesystem partition. Select one with plenty of free space, perhaps /usr or /var, and create a new directory there. For example:

# mkdir /var/squidcache

Then add a line like this to squid.conf:

cache_dir ufs /var/squidcache 7000 16 256

7.1.3 Size

The third cache_dir argument specifies the size of the cache directory. This is an upper limit on the amount of disk space that Squid can use for the cache_dir. Calculating an appropriate value can be tricky. You lose some space to filesystem overheads, and you must leave enough free space for temporary files and swap.state logs (see Section 13.6). I recommend mounting the empty filesystem and running df:

% df -k

Filesystem  1K-blocks     Used    Avail Capacity  Mounted on

/dev/da1d     3037766        8  2794737     0%    /cache0

/dev/da2d     3037766        8  2794737     0%    /cache1

Here you can see that the filesystem has about 2790 MB of available space. Remember that UFS reserves some "minfree" space, 8% in this case, which is why Squid can't use the full 3040 MB in the filesystem.

You might be tempted just to put 2790 on the cache_dir line. You might even to get away with it if your cache isn't very busy and if you rotate the log files often. To be safe, however, I recommend taking off another 10% or so. This extra space will be used by Squid's swap.state file and temporary files.

Note that the cache_swap_low directive also affects how much space Squid uses. I'll talk about the low and high watermarks in Section 7.2.

The bottom line is that you should initially be conservative about the size of your cache_dir. Start off with a low estimate and allow the cache to fill up. After Squid runs for a week or so with full cache directories, you'll be in a good position to re-evaluate the size settings. If you have plenty of free space, feel free to increase the cache directory size in increments of a few percent.

7.1.3.1 Inodes

Inodes are fundamental building blocks of Unix filesystems. They contain information about disk files, such as permissions, ownership, size, and timestamps. If your filesystem runs out of inodes, you can't create new files, even if it has space available. Running out of inodes is bad, so you may want to make sure you have enough before running Squid.

The programs that create new filesystems (e.g., newfs or mkfs) reserve some number of inodes based on the total size. These programs usually allow you to set the ratio of inodes to disk space. For example, see the -i option in the newfs and mkfs manpages. The ratio of disk space to inodes determines the mean file size the filesystem can support. Most Unix systems create one inode for each 4 KB, which is usually sufficient for Squid. Research shows that, for most caching proxies, the mean file size is about 10 KB. You may be able to get away with 8 KB per inode, but it is risky.

You can monitor your system's inode usage with df -i. For example:

% df -ik

Filesystem  1K-blocks     Used    Avail Capacity iused   ifree  %iused  Mounted on

/dev/ad0s1a    197951    57114   125001    31%    1413   52345     3%   /

/dev/ad0s1f   5004533  2352120  2252051    51%  129175 1084263    11%   /usr

/dev/ad0s1e    396895     6786   358358     2%     205   99633     0%   /var

/dev/da0d     8533292  7222148   628481    92%  430894  539184    44%   /cache1

/dev/da1d     8533292  7181645   668984    91%  430272  539806    44%   /cache2

/dev/da2d     8533292  7198600   652029    92%  434726  535352    45%   /cache3

/dev/da3d     8533292  7208948   641681    92%  427866  542212    44%   /cache4

As long as the inode usage (%iused) is less than the space usage (Capacity), you're in good shape. Unfortunately, you can't add more inodes to an existing filesystem. If you find that you are running out of inodes, you need to stop Squid and recreate your filesystems. If you're not willing to do that, decrease the cache_dir size instead.

7.1.3.2 The relationship between disk space and process size

Squid's disk space usage directly affects its memory usage as well. Every object that exists on disk requires a small amount of memory. Squid uses the memory as an index to the on-disk data. If you add a new cache directory or otherwise increase the disk cache size, make sure that you also have enough free memory. Squid's performance degrades very quickly if its process size reaches or exceeds your system's physical memory capacity.

Every object in Squid's cache directories takes either 76 or 112 bytes of memory, depending on your system. The memory is allocated as StoreEntry, MD5 Digest, and LRU policy node structures. Small-pointer (i.e., 32-bit) systems, like those based on the Intel Pentium, take 76 bytes. On systems with CPUs that support 64-bit pointers, each object takes 112 bytes. You can find out how much memory these structures use on your system by viewing the Memory Utilization page of the cache manager (see Section 14.2.1.2).

Unfortunately, it is difficult to predict precisely how much additional memory is required for a given amount of disk space. It depends on the mean reply size, which typically fluctuates over time. Additionally, Squid uses memory for many other data structures and purposes. Don't assume that your estimates are, or will remain, correct. You should constantly monitor Squid's process size and consider shrinking the cache size if necessary.

7.1.4 L1 and L2

For the ufs, aufs, and diskd schemes, Squid creates a two-level directory tree underneath the cache directory. The L1 and L2 arguments specify the number of first- and second-level directories. The defaults are 16 and 256, respectively. Figure 7-1 shows the filesystem structure.

Figure 7-1. The cache directory structure for ufs-based storage schemes

Some people think that Squid performs better, or worse, depending on the particular values for L1 and L2. It seems to make sense, intuitively, that small directories can be searched faster than large ones. Thus, L1 and L2 should probably be large enough so that each L2 directory has no more than a few hundred files.

For example, let's say you have a cache directory that stores about 7000 MB. Given a mean file size of 10 KB, you can store about 700,000 files in this cache_dir. With 16 L1 and 256 L2 directories, there are 4096 total second-level directories. 700,000 ÷ 4096 leaves about 170 files in each second-level directory.

The process of creating swap directories with squid -z, goes faster for smaller values of L1 and L2. Thus, if your cache size is really small, you may want to reduce the number of L1 and L2 directories.

Squid assigns each cache object a unique file number. This is a 32-bit integer that uniquely identifies files on disk. Squid uses a relatively simple algorithm for turning file numbers into pathnames. The algorithm uses L1 and L2 as parameters. Thus, if you change L1 and L2, you change the mapping from file number to pathname. Changing these parameters for a nonempty cache_dir makes the existing files inaccessible. You should never change L1 and L2 after the cache directory has become active.

Squid allocates file numbers within a cache directory sequentially. The file number-to-pathname algorithm (e.g., storeUfsDirFullPath( )) is written so that each group of L2 files go into the same second-level directory. Squid does this to take advantage of locality of reference. This algorithm increases the probability that an HTML file and its embedded images are stored in the same second-level directory. Some people expect Squid to spread cache files evenly among the second-level directories. However, when the cache is initially filling, you'll find that only the first few directories contain any files. For example:

% cd /cache0; du -k

2164    ./00/00

2146    ./00/01

2689    ./00/02

1974    ./00/03

2201    ./00/04

2463    ./00/05

2724    ./00/06

3174    ./00/07

1144    ./00/08

1       ./00/09

1       ./00/0A

1       ./00/0B

...

This is perfectly normal and nothing to worry about.

7.1.5 Options

Squid has two scheme-independent cache_dir options: a read-only flag and a max-size value.

7.1.5.1 read-only

The read-only option instructs Squid to continue reading from the cache_dir, but to stop storing new objects there. It looks like this in squid.conf:

cache_dir ufs /cache0 7000 16 256 read-only

You might use this option if you want to migrate your cache storage from one disk to another. If you simply add one cache_dir and remove another, Squid's hit ratio decreases sharply. You can still get cache hits from the old location when it is read-only. After some time, you can remove the read-only cache directory from the configuration.

7.1.5.2 max-size

With this option, you can specify the maximum object size to be stored in the cache directory. For example:

cache_dir ufs /cache0 7000 16 256 max-size=1048576

Note that the value is in bytes. In most situations, you shouldn't need to add this option. If you do, try to put the cache_dir lines in order of increasing max-size.