Squid's delay pools are often useful, but not perfect. You need to be aware of a few drawbacks and limitations before you use them.
One of the most important things to realize about the current delay pools implementation is that it does nothing to guarantee fairness among all users of a single bucket. This is especially important for aggregate buckets (where sharing is high), but less so for individual buckets (where sharing is low).
Squid generally services requests in order of increasing file descriptors. Thus, a request whose server-side TCP connection has a lower file descriptor may receive more bandwidth from a shared bucket than it should.
Bandwidth shaping and rate limiting usually operate at the network transport layer. There, the flow of packets can be controlled very precisely. Delay pools, however, are implemented in the application layer. Because Squid doesn't actually send and receive TCP packets (the kernel does), it has less control over the flow of individual packets. Rather than controlling the transmission and receipt of packets on the wire, Squid controls only how many bytes to read from the kernel.
This means, for example, that incoming response data is queued up in the kernel. The TCP/IP stack can buffer some number of bytes that haven't yet been read by Squid. On most systems, the default TCP receive buffer size is usually between 32 KB and 64 KB. In other words, this much data can arrive over the network very quickly, regardless of anything Squid can do. On the one hand, it seems silly to read this data slowly even though it is already on your system. On the other hand, because the client doesn't receive the whole response right away, it is likely to postpone any future requests until the delayed responses are complete.
If you are concerned that the kernel buffers too much server-side data, you can decrease the TCP receive buffer size with the tcp_recv_bufsize directive. Even better, your operating system probably has a way to set this parameter for the whole system. On NetBSD/FreeBSD/OpenBSD, you can use the sysctl variable named net.inet.tcp.recvspace. For Linux, read about /proc/sys/net/ipv4/tcp_rmem in Documentation/networking/ip-sysctl.txt.
The current delay pools implementation assumes that your LAN uses /24 (class C) subnets, and that all users are in the same /16 (class B) subnet. This might not be so bad, depending on how your network is configured. However, it would be nice if the delay pools subnetting scheme were fully customizable.
If your address space is larger than a /24 and smaller than a 16/, you can always create a class 3 pool and treat it as a class 2 pool (that is one of the examples given earlier).
If you use just one class 2 pool with more than 256 users, some users will share the individual buckets. That might not be so bad, unless you happen to have a bunch of heavy users fighting over one measly bucket.
You might also create multiple class 2 pools and use delay_access rules to divide them up among all users. The problem with this approach is that you can't have all users share a single aggregate bucket. Instead, each subgroup has their own aggregate bucket. You can't make a single client go through more than one delay pool.