Squid saves key information about HTTP transactions in access.log. This file is line-based, such that each line corresponds to one client request. Squid records the client IP address (or hostname), requested URI, response size, and other information.
Squid records all HTTP accesses in access.log, except for those that disconnect before sending any data. Squid also records all ICP (but not HTCP) transactions unless you disable them with the log_icp_queries directive. Section 13.2.4 describes the other squid.conf directives that affect the access log.
The default access.log format contains 10 fields. Here are some examples, with long lines split and indented:
1066037222.011 126389 9.121.105.207 TCP_MISS/503 1055 GET http://home.gigigaga.com/n8342133/Miho.DAT.019 - DIRECT/203.187.1.180 - 1066037222.011 19120 12.83.179.11 TCP_MISS/200 359 GET http://ads.x10.com/720x300/Z2FtZ3JlZXRpbmcxLmRhd/7/AMG - DIRECT/63.211.210.20 text/html 1066037222.011 34173 166.181.33.71 TCP_MISS/200 559 GET http://coursesites.blackboard.com:8081/service/collab/../1010706448190/ - DIRECT/216.200.107.101 application/octet-stream 1066037222.011 19287 41.51.105.27 TCP_REFRESH_MISS/200 500 GET http://fn.yam.com/include/tsemark/show.js - DIRECT/210.59.224.59 application/x-javascript 1066037222.011 19395 41.51.105.27 TCP_MISS/304 274 GET http://fnasp.yam.com/image/coin3.gif - DIRECT/211.72.254.133 - 1066037222.011 19074 30.208.85.76 TCP_CLIENT_REFRESH_MISS/304 197 GET http://ads.icq.com/content/B0/0/..bC6GygEYNeHGjBUin5Azfe68m5hD1jLk$/aol - DIRECT/64.12.184.121 - 1066037222.011 19048 12.83.179.11 TCP_MISS/200 261 GET http://ads.adsag.com/js.ng/...ne&cat=friendship&subcat=girltalk - DIRECT/209.225.54.119 application/x-javascript 1066037222.118 106 41.51.105.27 TCP_HIT/200 536 GET http://rcm-images.amazon.com/images/G/01/rcm/privacy.gif - NONE/- image/gif 1066037222.352 19475 27.34.49.248 TCP_MISS/200 12387 GET http://espanol.geocities.com/lebastias/divulgacion/budismo-tarot.html - DIRECT/209.1.225.139 text/html 1066037222.352 132 144.157.100.17 TCP_MISS/504 1293 GET http://ar.atwola.com/image/93101912/aol - NONE/- -
Here are the definitions for all fields:
The completion time of the request, expressed as the number of seconds since the Unix epoch (Thu Jan 1 00:00:00 UTC 1970), with millisecond resolution. Squid uses this format, instead of something more human-friendly, to simplify the work of various log file processing programs.
You can use a simple Perl command to convert the Unix timestamps into local time. For example:
perl -pe 's/^\d+\.\d+/localtime($&)/e;' access.log
For HTTP transactions, this field indicates how much time it took to process the request. The timer starts when Squid receives the HTTP request and stops when the response has been fully delivered. The response time is given in milliseconds.
The response time is usually 0 for ICP queries. This is because Squid answers ICP queries very quickly. Furthermore, Squid doesn't update the process clock between receiving an ICP query and sending the reply.
While time values are reported with millisecond resolution, the precision of those entries is probably about 10 milliseconds. Timing becomes even less precise when Squid is heavily loaded.
This field contains the client's IP address, or hostname if you enable log_fqdn. For security or privacy reasons, you may want to mask a part of client's address out using the client_netmask directive. However, that also makes it impossible to group requests coming from the same client.
This field consists of two tokens separated by a slash. The first token, result code, classifies the protocol and the result of a transaction (e.g., TCP_HIT or UDP_DENIED). These are Squid-specific codes, defined in Section 13.2.1. The codes that begin with TCP_ refer to HTTP requests, while UDP_ refers to ICP queries.
The second token is the HTTP response status code (e.g, 200, 304, 404, etc.). The status code normally comes from the origin server. In some cases, however, Squid may be responsible for selecting the status code. These codes, defined by the HTTP RFC, are summarized later in Table 13-1.
This field indicates the number of bytes transferred to the client. Strictly speaking, it is the number of bytes that Squid told the TCP/IP stack to send to the client. Thus, it doesn't include overheads from TCP/IP headers. Also note that the transfer size is normally larger than the response's Content-Length. This value includes the HTTP response headers, while Content-Length does not.
These properties make the transfer size field useful for approximate bandwidth usage analysis but not for exact HTTP entity size calculations. If you need to know a response's Content-Length, you can find it in the store.log file.
This field contains the request method. Because Squid clients may use ICP or HTTP, the request method is either HTTP- or ICP-specific. The most common HTTP request method is GET. ICP queries are always logged with ICP_QUERY. See Section 6.1.2.8 for a list of HTTP methods Squid knows about.
This field contains the URI from the client's request. The vast majority of logged URIs are actually URLs (i.e., they have hostnames).
Squid uses a special format for certain failures. These are cases when Squid can't parse the HTTP request or otherwise determine the URI. Instead of a URI/URL, you'll see a string such as "error:invalid-request." For example:
1066036250.603 310 192.0.34.70 NONE/400 1203 GET error:invalid-request - NONE/- -
Also in this field look out for whitespace characters in the URI. Depending on your uri_whitespace setting, Squid may print the URI in the log file with whitespace characters. When this happens, the tools that read access.log files may become confused by the extra fields.
When logging, Squid strips all URI characters after the first question mark unless the strip_query_terms directive is disabled.
Squid can determine a user's identity in two different ways. One is with the RFC 1413 ident protocol; the other is from HTTP authentication headers.
Squid attempts ident lookups based on the ident_lookup_access rules, if any (see Section 6.2). Alternatively, if you use proxy authentication (or regular server authentication in surrogate mode), Squid places the given username in this field. If both methods provide Squid with a username, and you're using the native access.log format, the HTTP authentication name is logged, and the RFC 1413 name is ignored. The common log file format has separate fields for both names.
The peering information consists of two tokens, separated by a slash. It is relevant only for requests that are cache misses. The first token indicates how the next hop was chosen. The second token is the address of that next hop. The peering codes are listed in Section 13.2.3.
When Squid sends a request to a neighbor cache, the peerhost address is the neighbor's hostname. If the request is sent directly to the origin server, however, Squid writes the origin server's IP address or its hostname if log_ip_on_direct is disabled. The value NONE/- indicates that Squid didn't forward this request to any other servers.
The final field of the default, native access.log is the content type of the HTTP response. Squid obtains the content type value from the response's Content-Type header. If that header is missing, Squid uses a hyphen (-).
If you enable the log_mime_headers directive, Squid appends two additional fields to each line:
Squid encodes the HTTP request headers and prints them between a pair of square brackets. The brackets are necessary because Squid doesn't encode space characters. The encoding scheme is a little strange. Carriage return (ASCII 13) and newline (ASCII 10) are printed as \r and \n, respectively. Other non-printable characters are encoded with the RFC 1738 style, such that Tab (ASCII 9) becomes %09.
Squid encodes the HTTP response headers and prints them between a pair of square brackets. Note that these are the headers sent to the client, which may be different from headers received from the origin server.
Squid writes to access.log only after the entire response has been sent to the client. This allows Squid to include both request and response information in the log file. However, transactions that take minutes, or even hours, to complete aren't visible in access.log at the time of the request. When these types of transactions present a performance or policy concern, the access.log may be unable help you. Instead, use the cache manager to view a list of pending transactions (see Section 14.2.1.37).
The following labels may appear in the fourth field of the access.log file in response to HTTP requests:
Squid found a likely fresh copy of the requested resource and sent it immediately to the client.
Squid didn't have a cached copy of the requested resource.
Squid found a likely stale copy of the requested resource and sent a validation request to the origin server. The origin server sent a 304 (Not Modified) response, indicating that Squid's copy is still fresh.
Squid found a likely stale copy of the requested resource and sent a validation request to the origin server. However, the origin server failed to respond or sent a response that Squid didn't understand. In any case, Squid sent the cached (and likely stale) copy to the client.
Squid found a likely stale copy of the requested resource and sent a validation request to the origin server. The server responded with new content, indicating the cached response was indeed stale.
Squid found a copy of the requested resource, but the client's request included a Cache-Control: no-cache directive. Squid forwarded the client's request to the origin server, forcing a cache validation.
The client sent a validation request, and Squid found a more recent, and likely fresh, copy of the requested resource. Squid sent the newer content to the client, without contacting the origin server.
Squid found a valid copy of the requested resource but failed to load it from disk. Squid then sent the request to the origin server as though it were a cache miss.
When a request to an origin server results in an HTTP error, Squid may cache the response anyway. Repeated requests for these resources, within a short amount of time, result in negative hits. The negative_ttl directive controls the amount of time these errors may be cached. Also note that errors are cached only in memory and never written to disk. The following HTTP status codes may be negatively cached, subject to additional constraints: 204, 305, 400, 403, 404, 405, 414, 500, 501, 502, 503, 504.
Squid found a valid copy of the requested resource in the memory cache and sent it immediately to the client. Note that this doesn't accurately represent all responses served from memory. For example, responses that are cached in memory, but require validation, are logged with TCP_REFRESH_HIT, TCP_REFRESH_MISS, etc.
The client's request was denied, due to either the http_access or http_reply_access rules. Note that requests denied by http_access have NONE/- in the ninth field, whereas those denied by http_reply_access have a valid entry.
When offline_mode is enabled, Squid returns cache hits for almost any cached response, without considering its freshness.
A redirector program told Squid to generate an HTTP redirect to a new URI (see Section 11.1). Normally, Squid doesn't log these redirects. To do so, you must manually define the LOG_TCP_REDIRECTS preprocessor directive before compiling Squid.
Unclassified result used for certain errors, such as invalid hostnames.
The following labels may appear in the fourth field of the access.log file in response to ICP queries:
Squid found a likely fresh copy of the requested resource in the cache.
Squid didn't find a likely fresh copy of the requested resource in the cache. If the same object is requested via HTTP, it would probably be a cache miss. Compare with UDP_MISS_NOFETCH.
Like UDP_MISS, except that this also indicates Squid's reluctance to handle the corresponding HTTP request. If you use the -Y command-line option, Squid returns this, instead of UDP_MISS, while rebuilding its in-memory indexes at startup.
The ICP query is denied due to the icp_access rules. If more than 95% of the ICP replies to a client are UDP_DENIED, and the client database is enabled (see Appendix A), Squid stops sending any ICP replies to the client for an hour. When this happens you'll also see a warning in cache.log.
Squid received an invalid query (e.g., truncated message, invalid protocol version, whitespace in the URI, etc.). Squid sent an ICP_INVALID reply back to the client.
Table 13-1 lists the numerical HTTP response codes and reason phrases. Note that Squid and other HTTP agents care only about the numeric value. The reason phrase is purely informational and doesn't affect the meaning of the response. For each status code, I also provide a reference to the particular section in RFC 2616 that describes it. Note that status codes 0 and 600 are nonstandard values used by Squid, and aren't mentioned in the RFC.
Code |
Reason phrase |
RFC 2616 section |
---|---|---|
0 |
No Response Received (Squid-specific) |
N/A |
1xx |
Informational |
10.1 |
100 |
Continue |
10.1.1 |
101 |
Switching Protocols |
10.1.2 |
2xx |
Successful |
10.2 |
200 |
OK |
10.2.1 |
201 |
Created |
10.2.2 |
202 |
Accepted |
10.2.3 |
203 |
Non-Authoritative Information |
10.2.4 |
204 |
No Content |
10.2.5 |
205 |
Reset Content |
10.2.6 |
206 |
Partial Content |
10.2.7 |
3xx |
Redirection |
10.3 |
300 |
Multiple Choices |
10.3.1 |
301 |
Moved Permanently |
10.3.2 |
302 |
Found |
10.3.3 |
303 |
See Other |
10.3.4 |
304 |
Not Modified |
10.3.5 |
305 |
Use Proxy |
10.3.6 |
306 |
(Unused) |
10.3.7 |
307 |
Temporary Redirect |
10.3.8 |
4xx |
Client Error |
10.4 |
400 |
Bad Request |
10.4.1 |
401 |
Unauthorized |
10.4.2 |
402 |
Payment Required |
10.4.3 |
403 |
Forbidden |
10.4.4 |
404 |
Not Found |
10.4.5 |
405 |
Method Not Allowed |
10.4.6 |
406 |
Not Acceptable |
10.4.7 |
407 |
Proxy Authentication Required |
10.4.8 |
408 |
Request Timeout |
10.4.9 |
409 |
Conflict |
10.4.10 |
410 |
Gone |
10.4.11 |
411 |
Length Required |
10.4.12 |
412 |
Precondition Failed |
10.4.13 |
413 |
Request Entity Too Large |
10.4.14 |
414 |
Request-URI Too Long |
10.4.15 |
415 |
Unsupported Media Type |
10.4.16 |
416 |
Requested Range Not Satisfiable |
10.4.17 |
417 |
Expectation Failed |
10.4.18 |
5xx |
Server Error |
10.5 |
500 |
Internal Server Error |
10.5.1 |
501 |
Not Implemented |
10.5.2 |
502 |
Bad Gateway |
10.5.3 |
503 |
Service Unavailable |
10.5.4 |
504 |
Gateway Timeout |
10.5.5 |
505 |
HTTP Version Not Supported |
10.5.6 |
6xx |
Proxy Error |
N/A |
600 |
Unparseable Response Headers (Squid-specific) |
N/A |
You'll see status code 0 in the access.log if Squid doesn't receive any response from the origin server. You'll see status code 600 if Squid received a response but couldn't find any HTTP headers. In a small fraction of cases, certain origin servers send only the response body and omit any headers.
The following codes may appear in the ninth field of the access.log. Refer to Section 10.10 for a description of how Squid selects the next-hop for cache misses.
This indicates that Squid didn't communicate with any other servers (neighbors, origin) for this request. You'll see it in association with various types of cache hits, denied requests, cache manager requests, errors, and all ICP queries.
Squid forwarded the request directly to the origin server. The second half of the field shows the origin server's IP address, or hostname if you've disabled log_ip_on_direct.
Squid sent the request to this sibling cache after the sibling returned an ICP or HTCP hit.
Squid sent the request to this parent cache after the parent returned an ICP or HTCP hit.
Squid selected this parent because it was marked as default on the cache_peer line in squid.conf.
Squid forwarded the request to this parent because it is the first parent in the list known to be alive.
Squid forwarded the request to the parent cache that was first to respond with an ICP/HTCP miss message. In other words, for this particular ICP/HTCP query, at this particular time, the selected parent had the best round-trip time. Note that measured RTTs may be artificially adjusted by the weight option to the cache_peer directive.
Squid selected this parent because it reports the lowest RTT to the origin server. This occurs only if both caches have netdb enabled (see Section 10.5), and the origin server (or other servers on its subnet) returns ICMP pings.
This is similar to CLOSEST_PARENT_MISS, except that the RTT measurements don't come from the ICP/HTCP reply messages. Instead, they come from older measurements saved by Squid, such as the netdb exchange feature.
Squid forwarded the request to the origin server based on netdb measurements. This happens if any of these conditions occur:
The RTT between Squid and the origin server is less than the configured minimum_direct_rtt value.
The measured number of router hops between Squid and the origin server is less than the configured minimum_direct_hops value.
The RTT values returned in ICP/HTCP replies indicate that Squid is closer to the origin server than any of its neighbors.
Squid forwarded the request to this parent because the round-robin option was set, and it had the lowest usage counter.
Squid forwarded the request to this parent based on the Cache Digest algorithm (see Section 10.7).
Squid forwarded the request to this sibling based on the Cache Digest algorithm.
Squid selected this parent based on the Cache Array Routing Protocol algorithm (see Section 10.9).
Squid selected this parent as a last resort because none of the other methods resulted in a viable next-hop.
Note that most of these codes may be preceded by TIMEOUT_ to indicate that a timeout occurred while waiting for ICP/HTCP replies. For example:
1066038165.382 345 193.233.46.21 TCP_MISS/200 2836 GET http://www.caida.org/home/images/home.jpg TIMEOUT_CLOSEST_DIRECT/213.219.122.19 image/jpeg
You can adjust the timeout with the icp_query_timeout directive.
Following are the configuration file directives that affect the access.log in one way or another.
This directive, enabled by default, causes Squid to log all ICP queries. If you're running a busy parent cache, this may make your access.log files huge. To save space, disable this directive:
log_icp_queries off
If you disable ICP query logging, I suggest that you monitor the number of queries, either through the cache manager or with SNMP.
The access.log file has two formats: common and native. The common format is the same as most HTTP servers (e.g., Apache) use. It contains less information than Squid's native format. However, you might want to use the common log-file format if you use Squid as a surrogate (see Chapter 15). The common format may also be useful if you have log-file analysis tools that know how to parse it. Use this directive to enable the common format:
emulate_httpd_log on
See the site http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format, for a description of this format.
Use the log_mime_hdrs directive to make Squid log the HTTP request and response headers:
log_mime_headers on
When enabled, Squid appends the request and response headers to access.log. This adds two fields to each line. Each field is surrounded by square brackets to make parsing easier. Certain characters are encoded to keep the log file readable. Table 13-2 shows the encoding scheme.
Character |
Encoding |
---|---|
Newline |
\n |
Carriage return |
\r |
Backslash |
\\ |
[ |
%5b |
] |
%5d |
% |
%25 |
ASCII 0-31 |
%xx (hexadecimal value) |
ASCII 127-255 |
%xx (hexadecimal value) |
By default, Squid puts client IP addresses in the access.log. You can record hostnames, when available, by enabling this directive:
log_fqdn on
This causes Squid to make reverse DNS lookups for the client's address when it receives a request. If an answer is available by the time the request is complete, Squid places it in the third field.
This access rule list determines whether or not Squid makes an RFC 1413 ident query for the client's TCP connection. By default, Squid doesn't issue ident queries. To enable this feature, simply add one or more rules:
acl All src 0/0 ident_lookup_access allow All
If an answer is available by the time the request is complete, Squid places it in the eighth field. If you are also using HTTP authentication, that username is written instead of the ident answer.
When Squid forwards a cache miss to an origin server, it records the origin server's IP address in the ninth field. You can disable this directive so that Squid writes the hostname instead:
log_ip_on_direct off
In this case, the hostname comes from the URI. If the URI contains an IP address, Squid doesn't convert it to a hostname.
This directive exists to provide some level of privacy for your users. Rather than logging the entire client IP address, you can mask off some bits. For example:
client_netmask 255.255.255.0
With this setting, all client IP addresses in access.log have 0 as the last octet:
1066036246.918 35 163.11.255.0 TCP_IMS_HIT/304 266 GET http://... 1066036246.932 16 163.11.255.0 TCP_IMS_HIT/304 266 GET http://... 1066036247.616 313 140.132.252.0 TCP_MISS/200 1079 GET http://... 1066036248.598 44459 140.132.252.0 TCP_MISS/500 1531 GET http://... 1066036249.230 17 170.210.173.0 TCP_IMS_HIT/304 265 GET http://... 1066036249.752 2135 140.132.252.0 TCP_MISS/200 50230 GET http://... 1066036250.467 4 170.210.173.0 TCP_IMS_HIT/304 265 GET http://... 1066036250.762 102 163.11.255.0 TCP_IMS_HIT/304 265 GET http://... 1066036250.832 20 163.11.255.0 TCP_IMS_HIT/304 266 GET http://... 1066036251.026 74 203.91.150.0 TCP_CLIENT_REFRESH_MISS/304 267 GET http://...
This directive is another privacy feature. It removes query terms from URIs before logging them. If your log files somehow fall into the wrong hands, they won't be able to find any usernames and passwords. When this directive is enabled, all characters after a question mark (?) are removed. For example, a URI like this:
http://auto.search.msn.com/response.asp?MT=www.kimo.com.yw&srch=3 &prov=&utf8
is logged like this:
http://auto.search.msn.com/response.asp?
Earlier, I mentioned the problem with whitespace appearing in some URIs. The RFCs state that URIs must not contain whitespace, but in reality it happens all too often. The uri_whitespace directive dictates how Squid should handle such cases. The allowed settings are: strip (default), deny, allow, encode, and chop. Of these, strip, encode, and chop ensure that the URI field doesn't contain any whitespace (thus adding more fields to access.log).
The allow setting allows the request to pass through Squid unmodified. It is likely to cause trouble for redirectors and log file parsers. The deny setting, on the other hand, causes Squid to deny the request. The user receives an error message, but the request is still written to access.log with the whitespace characters.
If you set it to encode, Squid changes the whitespace characters to their RFC 1738 equivalents. This is probably what the user-agent should have done in the first place. The chop setting causes Squid to cut off the URI at the first whitespace character.
The default setting is strip, which makes Squid remove the whitespace characters from the URI. It ensures that your log-file parsers and redirectors will be happy, but it might break certain things, such as improperly encoded search engine queries.
By default, Squid disables buffering for the cache.log file, which allows you to run tail -f and watch log file entries appear in real time. If you think this will cause an unnecessary overhead, you can disable buffering:
buffered_logs off
However, it probably doesn't matter unless you are running Squid with full debugging. Note that this option affects only cache.log. The others always use unbuffered writes.
The access.log file contains a wealth of information, much more than you can see by just browsing through it. In order to get the big picture view, you'll need to use a third-party log-file analysis package. You can find a long list of them linked from the Squid web page, or by going directly to http://www.squid-cache.org/Scripts/.
One of the most popular tools is Calamarisa Perl script that parses the log file and generates either text or HTML-based reports. It provides a breakdown of traffic by request method, client IP address, origin server domain name, content types, filename extensions, reply size, and more. Calamaris also reports on ICP query traffic and even understands log files from other caching products. Check it out by visiting http://calamaris.cord.de/.
Squeezer, and its derivative, Squeezer2, are Squid-specific analysis tools. They provide many statistics that can help you understand Squid's performance, especially when you have neighbors. Both generate HTML pages as output. Visit the Logfile Analysis page on the squid-cache.org site for links to these programs.
Webalyzer is another good utility. It is designed to be fast and produces HTML pages with tables and bar charts. It was originally designed for origin server access logs. Although it can parse Squid's logs, it doesn't report on such things as hit ratios and response times. It also uses some terms differently than I do. For example, Webalyzer calls any request a "hit," which isn't the same as a cache hit. It also makes a distinction between "pages" and "files." For more information, visit the Webalyzer home page at http://www.mrunix.net/webalyzer/.