8.5 The diskd Storage Scheme

diskd (short for disk daemons) is similar to aufs in that disk I/Os are executed by external processes. Unlike aufs, however, diskd doesn't use threads. Instead, inter-process communication occurs via message queues and shared memory.

Message queues are a standard feature of modern Unix operating systems. They were invented many years ago in AT&T's Unix System V, Release 1. The messages passed between processes on these queues are relatively small: 32-40 bytes. Each diskd process uses one queue for receiving requests from Squid and another queue for transmitting results back.

8.5.1 How diskd Works

Squid creates one diskd process for each cache_dir. This is different from aufs, which uses a large pool of threads for all cache_dirs. Squid sends a message to the corresponding diskd process for each I/O operation. When that operation is complete, the diskd process sends a status message back to Squid. Squid and the diskd processes preserve the order of messages in the queues. Thus, there is no concern that I/Os might be executed out of sequence.

For reads and writes, Squid and the diskd processes use a shared memory area. Both processes can read from, and write to, this area of memory. For example, when Squid issues a read request, it tells the diskd process where to place the data in memory. diskd passes this memory location to the read( ) system call and notifies Squid that the read is complete by sending a message on the return queue. Squid then accesses the recently read data from the shared memory area.

diskd (as with aufs) essentially gives Squid nonblocking disk I/Os. While the diskd processes are blocked on I/O operations, Squid is free to work on other tasks. This works really well as long as the diskd processes can keep up with the load. Because the main Squid process is now able to do more work, it's possible that it may overload the diskd helpers. The diskd implementation has two features to help out in this situation.

First, Squid waits for the diskd processes to catch up if one of the queues exceeds a certain threshold. The default value is 64 outstanding messages. If a diskd process gets this far behind, Squid "sleeps" a small amount of time and waits for it to complete some of the pending operations. This essentially puts Squid into a blocking I/O mode. It also makes more CPU time available to the diskd processes. You can configure this threshold by specifying a value for the Q2 parameter on a cache_dir line:

cache_dir diskd /cache0 7000 16 256 Q2=50

Second, Squid stops asking the diskd process to open files if the number of outstanding operations reaches another threshold. Here, the default value is 72 messages. If Squid would like to open a disk file for reading or writing, but the selected cache_dir has too many pending operations, the open request fails internally. When trying to open a file for reading, this causes a cache miss instead of a cache hit. When opening files for writing, it prevents Squid from storing a cachable response. In both cases the user still receives a valid response. The only real effect is that Squid's hit ratio decreases. This threshold is configurable with the Q1 parameter:

cache_dir diskd /cache0 7000 16 256 Q1=60 Q2=50

Note that in some versions of Squid, the Q1 and Q2 parameters are mixed-up in the default configuration file. For optimal performance, Q1 should be greater than Q2.

8.5.2 Compiling and Configuring diskd

To use diskd, you must add it to the enable-storeio list when running ./configure:

% ./configure --enable-storeio=ufs,diskd

diskd seems to be portable since shared memory and message queues are widely supported on modern Unix systems. However, you'll probably need to adjust a few kernel limits relating to both. Kernels typically have the following variables or parameters:

MSGMNB: This is the maximum characters (octets) per message queue. With diskd, the practical limit is about 100 outstanding messages per queue. The messages that Squid passes are 32-40 octets, depending on your CPU architecture. Thus, MSGMNB should be 4000 or more. To be safe, I recommend setting this to 8192.
MSGMNI: This is the maximum number of message queues for the whole system. Squid uses two queues for each diskd cache_dir. If you have 10 disks, that's 20 queues. You should probably add even more in case other applications also use message queues. I recommend a value of 40.
MSGSSZ: This is the size of a message segment, in octets. Messages larger than this size are split into multiple segments. I usually set this to 64 so that the diskd message isn't split into multiple segments.
MSGSEG: This is the maximum number of message segments that can exist in a single queue. Squid normally limits the queues to 100 outstanding messages. Remember that if you don't increase MSGSSZ to 64 on 64-bit architectures, each message requires more than one segment. To be safe, I recommend setting this to 512.
MSGTQL: This is the maximum number of messages that can exist in the whole system. It should be at least 100 multiplied by the number of cache_dirs. I recommend setting it to 2048, which should be more than enough for as many as 10 cache directories.
MSGMAX: This is the maximum size of a single message. For Squid, 64 bytes should be sufficient. However, your system may have other applications that use larger messages. On some operating systems such as BSD, you don't need to set this. BSD automatically sets it to MSGSSZ x MSGSEG. On other systems you may need to increase the value from its default. In this case, you can set it to the same as MSGMNB.
SHMSEG: This is the maximum number of shared memory segments allowed per process. Squid uses one shared memory identifier for each cache_dir. I recommend a setting of 16 or higher.
SHMMNI: This is the systemwide limit on the number of shared memory segments. A value of 40 is probably enough in most cases.
SHMMAX: This is the maximum size of a single shared memory segment. By default, Squid uses about 409,600 bytes for each segment. Just to be safe, I recommend setting this to 2 MB, or 2,097,152.
SHMALL: This is the systemwide limit on the amount of shared memory that can be allocated. On some systems, SHMALL may be expressed as a number of pages, rather than bytes. Setting this to 16 MB (4096 pages) is enough for 10 cache_dirs with plenty remaining for other applications.

To configure message queues on BSD, add these options to your kernel configuration file:^[2]

^[2] OpenBSD is a little different. Use option instead of options, and specify the SHMMAX value in pages, rather than bytes.

# System V message queues and tunable parameters

options         SYSVMSG         # include support for message queues

options         MSGMNB=8192     # max characters per message queue

options         MSGMNI=40       # max number of message queue identifiers

options         MSGSEG=512      # max number of message segments per queue

options         MSGSSZ=64       # size of a message segment MUST be power of 2

options         MSGTQL=2048     # max number of messages in the system

options         SYSVSHM

options         SHMSEG=16       # max shared mem segments per process

options         SHMMNI=32       # max shared mem segments in the system

options         SHMMAX=2097152  # max size of a shared mem segment

options         SHMALL=4096     # max size of all shared memory (pages)

To configure message queues on Linux, add these lines to /etc/sysctl.conf:

kernel.msgmnb=8192

kernel.msgmni=40

kernel.msgmax=8192

kernel.shmall=2097152

kernel.shmmni=32

kernel.shmmax=16777216

Alternatively, or if you find that you need more control, you can manually edit include/linux/msg.h and include/linux/shm.h in your kernel sources.

For Solaris, add these lines to /etc/system and then reboot:

set msgsys:msginfo_msgmax=8192

set msgsys:msginfo_msgmnb=8192

set msgsys:msginfo_msgmni=40

set msgsys:msginfo_msgssz=64

set msgsys:msginfo_msgtql=2048

set shmsys:shminfo_shmmax=2097152

set shmsys:shminfo_shmmni=32

set shmsys:shminfo_shmseg=16

For Digital Unix (TRU64), you can probably add lines to the kernel configuration in the style of BSD, seen previously. Alternatively, you can use the sysconfig command. First, create a file called ipc.stanza like this:

ipc:

          msg-max = 2048

          msg-mni = 40

          msg-tql = 2048

          msg-mnb = 8192

          shm-seg = 16

          shm-mni = 32

          shm-max = 2097152

          shm-max = 4096

Now, run this command and reboot:

# sysconfigdb -a -f ipc.stanza

After you have message queues and shared memory configured in your operating system, you can add the cache_dir lines to squid.conf:

cache_dir diskd /cache0 7000 16 256 Q1=72 Q2=64

cache_dir diskd /cache1 7000 16 256 Q1=72 Q2=64

...

If you forget to increase the message queue limits, or if you don't set them high enough, you'll see messages like this in cache.log:

2003/09/29 01:30:11| storeDiskdSend: msgsnd: (35) Resource temporarily unavailable

8.5.3 Monitoring diskd

The best way to monitor diskd performance is with the cache manager. Request the diskd page; for example:

% squidclient mgr:diskd

...

sent_count: 755627

recv_count: 755627

max_away: 14

max_shmuse: 14

open_fail_queue_len: 0

block_queue_len: 0



             OPS SUCCESS    FAIL

   open   51534   51530       4

 create   67232   67232       0

  close  118762  118762       0

 unlink   56527   56526       1

   read   98157   98153       0

  write  363415  363415       0

See Section 14.2.1.6 for a description of this output.