You have a LAN set up, and your Red Hat Linux computer has both a connection to the LAN and a connection to the Internet. One way to provide Web-browsing services to the computers on the LAN without setting up routing is to configure Red Hat Linux as a proxy server.
The Squid proxy caching server software package comes with Red Hat Linux. In a basic configuration, you can get the software going very quickly. However, the package is full of features that let you adapt it to your needs. You can control which hosts have access to proxy services, how memory is used to cache data, how logging is done, and a variety of other features. Here are the basic proxy services available with Squid:
HTTP — Allowing HTTP proxy services is the primary reason to use Squid. This is what lets client computers access Web pages on the Internet from their browsers (through your Red Hat Linux computer). In other words, HTTP proxy services will find and return the content to you for addresses that look similar to this: www.ab.com.
FTP — This represents File Transfer Protocol (FTP) proxy services. When you enable HTTP for a client, you enable FTP automatically (for example, ftp://ftp.ab.com).
Gopher — The gopher protocol proxy service was one of the first mechanisms for organizing and searching for documents on the Internet (it predates the Web by more than a decade). It isn't used much anymore. However, if you need to use it, gopher is supported when you enable HTTP for a client.
Besides allowing proxy services, Squid can also be part of an Internet cache hierarchy. Internet caching occurs when Internet content is taken from the original server and copied to a caching server that is closer to you. When you, or someone else in the caching hierarchy, requests that content again, it can be taken from the caching server instead of from the original server.
You don't have to cache Internet content for other computers to participate in caching with Squid. If you know of a parent caching-computer that will allow you access, you can identify that computer in Squid and potentially speed your Web browsing significantly.
Caching services in Squid are provided through your Linux system's ICP port. Besides ICP services, you can also enable Simple Network Management Protocol (SNMP) services. SNMP lets your computer make statistics and status about itself available to SNMP agents on the network. SNMP is a feature for monitoring and maintaining computer resources on a network.
Caution? |
SNMP poses a potential security risk if it is not configured properly. Use caution when configuring SNMP with Squid. |
The squid daemon process (/usr/sbin/squid) can be started automatically at system boot time. After it is set up, most of the configuration for Squid is done in the /etc/squid/squid.conf file. The squid.conf file contains lots of information about how to configure Squid (the file contains more than 3200 lines of comments and examples, although there are only 32 lines of active settings).
For further information about the Squid proxy server, refer to the Squid Web Proxy Cache home page (www.squid-cache.org).
When you install Red Hat Linux, you have an opportunity to install Squid (squid package). If you are not sure whether or not Squid was set up, there are a couple of ways to check. First, type the following as root user:
# ps x | grep squid
If the squid daemon is running, you should see an entry that looks similar to the following:
774 ? S 0:00 squid -D
If you don't see a Squid process running, the daemon process may not be set up to start automatically. To set up the daemon to start at boot time, type the following:
# chkconfig squid on
At this point, the squid daemon should start automatically when your system boots. By default, the squid daemon will run with the -D option. The -D option enables Squid to start without having an active Internet connection. If you want to add other options to the squid daemon, you can edit the /etc/sysconfig/squid configuration file. Look for the line that looks similar to the following:
SQUID_OPTS="-D"
You can add any options, along with the -D option, between the quotes. Most of these options are useful for debugging Squid:
-a port# — Substitute for port# a port number that will be used instead of the default port number (3128) for servicing HTTP proxy requests. This is useful for temporarily trying out an alternative port.
-f squidfile — Use this option to specify an alternative squid.conf file (other than /etc/squid/squid.conf). Replace squidfile with the name of the alternative squid.conf file. This is a good way to try out a new squid.conf file before you replace the old one.
-d level — Change the debugging level to a number indicated by level. This also causes debugging messages to be sent to stderr.
-X — Use this option to check that the values are set properly in your squid.conf file. It turns on full debugging while the squid.conf file is being interpreted.
You can restart the Squid service by typing /etc/init.d/squid restart. While the squid daemon is running, there are several ways you can run the squid command to change how the daemon works, using these options:
squid -k reconfigure — Causes Squid to again read its configuration file.
squid -k shutdown — Causes Squid to exit after waiting briefly for current connections to exit.
squid -k interrupt — Shuts down Squid immediately, without waiting for connections to close.
squid -k kill — Kills Squid immediately, without closing connections or log files. (Use this option only if other methods don't work.)
With the squid daemon ready to run, you need to set up the squid.conf configuration file.
You can use the /etc/squid/squid.conf file that comes with squid to get started. Though the file contains lots of comments, the actual settings in that file are quite manageable. The following paragraphs describe the contents of the default squid.conf file:
hierarchy_stoplist cgi-bin ?
The hierarchy_stoplist tag indicates that when a certain string of characters appear in a URL, the content should be obtained from the original server and not from a cache peer. In this example, requests for the string cgi-bin and the question mark character (?) are all forwarded to the originating server.
acl QUERY urlpath_regex cgi-bin \? no_cache deny QUERY
The preceding two lines can be used to cause URLs containing certain characters to never be cached. These go along with the previous line by not caching URLs containing the same strings (cgi-bin and ?) that are always sought from the original server.
acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255
The acl tags are used to create access control lists. The first line above creates an access control list called "all" that includes all IP addresses. The next acl line assigns the manager acl to handle the cache_object protocol. The localhost source is assigned to the IP address of 127.0.0.1.
The next several entries define how particular ports are handled and how access is assigned to HTTP and ICP services.
acl SSL_ports port 443 563 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port 443 563 # https, snews acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT http_access allow manager localhost http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports http_access allow localhost http_access deny all http_reply_access allow all icp_access allow all
The following sections describe these settings in more detail, as well as other tags you may want to set in your squid.conf file. By default, no clients can use the squid proxy server, so you at least want to define which computers can access proxy services.
To make sure that this simple Squid configuration is working, follow the procedure below:
On the Squid server, restart the squid daemon. To do this, either reboot or type /etc/init.d/squid restart. (If Squid isn't running, use start instead of restart.)
On the Squid server, start your connection to the Internet (if it is not already up).
On a client computer on your network, set up Mozilla (or another Web browser) to use the Squid server as a proxy server (described later in this chapter). (In Mozilla, choose Edit ? Preferences ? Advanced ? Proxies, then choose Manual proxy configuration and add the Squid server's computer name and, by default, port 3128 to each protocol.)
On the client computer, try to open any Web page on the Internet with the browser you just configured.
If the Web page doesn't appear, see the Squid debugging section for how to fix the problem.
If you want to set up a more complex set of access permissions for Squid, you should start with the default squid.conf configuration file (described earlier).
To begin, open the /etc/squid/squid.conf file (as the root user). You will see a lot of information describing the values that you can set in this file. Most of the tags that you need to configure Squid are used to set up cache and provide host access to your proxy server.
Tip? |
Don't change the squid.conf.default file! If you really mess up your squid.conf file, you can start again by making another copy of this file to squid.conf. If you want to recall exactly what change you have made so far, type the following from the /etc/squid directory: # diff squid.conf squid.conf.default | less This will show you the differences between your actual squid.conf and the version you started with. |
To protect your computing resources from being used by anyone, Squid requires that you define which host computers have access to your HTTP (Web) services. By default, all hosts are denied access to Squid HTTP services except for the local host. With the acl tag, you can create access lists. Then, with the http_access tag, you can authorize access to HTTP (Web) services for the access lists you create.
The form of the access control list tag (acl) is:
acl name type string acl name type file
The name is any name you want to assign to the list. A string is a string of text, and file is a file of information that applies to the particular type of acl. Valid acl types include dst, src, dstdomain, srcdomain, url_path_pattern, url_pattern, time, port, proto, method, browser, and user.
Several access control lists are set up by default. You can use these assigned acl names to assign permissions to HTTP or ICP services. You can also create your own acl names to assign to those services. Here are the default acl names from the /etc/squid/squid.conf file that you can use or change:
acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl SSL_ports port 443 563 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port 443 563 # https, snews acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT
When Squid tries to determine which class a particular computer falls in, it goes from top to bottom. In the first line, all host computers (address/netmask are all zeros) are added to the acl group all. In the second line, you create a manager group called manager that has access to your cache_object (the capability to get content from your cache). The group localhost is assigned to your loopback address. Secure socket layer (SSL) ports are assigned to the numbers 443 and 563, whereas Safe_ports are assigned to the numbers shown above. The last line defines a group called CONNECT (which you can use to allow access to SSL ports).
To deny or enable access to HTTP services on the Squid computer, the following definitions are set up:
http_access allow manager localhost http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports http_access allow localhost http_access deny all
These definitions are quite restrictive. The first line allows someone requesting cache objects (manager) from the local host to do so, but the second line denies anyone else making such a request. Access is not denied to ports defined as safe ports. Also, secure socket connections via the proxy are denied on all ports, except for SSL ports (!SSL_ports). HTTP access is permitted only from the local host and is denied to all other hosts.
To allow the client computers on your network access to your HTTP service, you need to create your own http_access entries. You probably want to do something more restrictive than simply saying http_access allow all. Here is an example of a more restrictive acl group and how to assign that group to HTTP access:
acl ourlan src 10.0.0.1-10.0.0.100/255.255.255.0 http_access allow ourlan
In the previous example, all computers at IP addresses 10.0.0.1 through 10.0.0.100 are assigned to the ourlan group (the netmask is 255.255.255.0, indicating that the network number is 10.0.0.0). Access is then allowed for ourlan with the http_access line.
Caching, as it relates to a proxy server, is the process of storing data on an intermediate system between the Web server that sent the data and the client that received it. The assumption is that later requests for the same data can be serviced more quickly by not having to go all the way back to the original server. Instead, the proxy server can simply send you the content from its copy in cache. Another benefit of caching is that it reduces demands on network resources and on the information servers.
You can arrange caching with other caching proxy servers to form a cache hierarchy. The idea is to have a parent cache exist close to an entry to the Internet backbone. When a child cache requests an object, if the parent doesn't have it, the parent goes out and gets the object, sends a copy to the child, and keeps a copy itself. That way, if another request for the data comes to the parent, it can probably service that request without making another request to the original server. This hierarchy also supports sibling caches, which can, in effect, create a pool of caching servers on the same level.
Caution? |
Caching can consume a lot of your hard disk space if you let it. If you have separate partitions on your system, make sure that you have enough space in /var to handle the added load. |
Here are some cache-related tags that you should consider setting:
cache_peer — If there is a cache parent whose resources you can use, you can add the parent cache using this tag. You would need to obtain the parent cache's host name, the type of cache (parent), proxy port (probably 3128), and ICP port (probably 3130) from the administrator of the parent cache. (If you have no parent cache, you don't have to set this value.) Here's an example of a cache_peer entry:
cache_peer parent.handsonhistory.com parent 3128 3130
You can also add options to the end of the line, such as proxy-only (so that what you get from the parent isn't stored locally) and weight=n (where n is replaced by a number above 1 to indicate that the parent should be used above other parents). Add default if the parent is used as a last resort (when all other parents don't have the requested data).
cache_mem — Specifies the amount of cache memory (RAM) used to store in-transit objects (ones that are currently being used), hot objects (ones that are used often), and negative-cached objects (recent failed requests). The default is 8MB, though you can raise that value. To set cache_mem to 16MB, enter the following:
cache_mem 16 MB
Note? |
Because Squid will probably use a total of three times the amount of space you give it for all its processing, Squid documentation recommends that you use a cache_mem size one-third the size of the space that you actually have available for Squid. |
cache_dir — Specifies the directory (or directories if you want to distribute cache across multiple disks or partitions) in which cache swap files are stored. The default is the /var/spool/squid directory. You can also specify how much disk space to use for cache in megabytes (100 is the default), the number of first-level directories to create (16 is the default), and the number of second-level directories (256 is the default). Here is an example:
cache_dir /var/spool/squid 100 16 256
Note? |
The cache directory must exist. Squid won't create it for you. It will, however, create the first- and second-level directories. |
cache_mgr — Add the e-mail address of the user who should receive e-mail if the cache daemon dies. By default, e-mail is sent to the local Webmaster. To change that value to the root user, use the following:
cache_mgr root
cache_effective_user — After the squid daemon process is started as root, subsequent processes are run as squid user and group (by default). To change that subsequent user to a different name (for example, to nobody) set the cache_effective_user as follows:
cache_effective_user nobody
Note? |
When I changed the cache_effective_user name so that a user other than squid ran the squid daemon, the messages file logged several failed attempts to initialize the Squid cache before the process exited. When I changed the user name back to squid, the process started properly. To use the cache_effective_user feature effectively, you must identify which files are not allowing access. |
When you configure client computers to use your Squid proxy services, the clients need to know your computer's name (or IP address) and the port numbers associated with the services. For a client wanting to use your proxy to access the Web, the HTTP port is the needed number. Here are the tags that you use to set port values in Squid for different services, along with their default values:
http_port 3128 — The http_port is set to 3128 by default. Client workstations need to know this number (or the number you change this value to) to access your proxy server for HTTP services (that is, Web browsing).
icp_port 3130 — ICP requests are sent to and from neighboring caches through port 3130 by default.
htcp_port 4827 — ICP sends HTCP requests to and from neighboring caches on port 4827 by default.
If Squid isn't working properly when you set it up, or if you just want to monitor Squid activities, there are several tools and log files to help you. These are discussed below.
By running the squid daemon with the -X option (described earlier), you can check what is being set from the squid.conf file. You can add an -X option to the SQUID_OPTS line in the /etc/init.d/squid file. Then run /etc/init.d/squid restart. A whole lot of information is output, which details what is being set from squid.conf. If there are syntax errors in the file, they appear here.
Squid log files (in Red Hat Linux) are stored in the /var/log/squid directory by default. The following are the log files created there, descriptions of what they contain, and descriptions of how they may help you debug potential problems:
access.log — Contains entries that describe each time the cache has been hit or missed when a client requests HTTP content. Along with that information is the identity of the host making the request (IP address) and the content they are requesting. Use this information to find out when content is being used from cache and when the remote server must be accessed to obtain the content. Here is what some of the access result codes mean:
TCP_DENIED — Squid denied access for the request.
TCP_HIT — Cache contained a valid copy of the object.
TCP_IMS_HIT — A fresh version of the requested object was still in cache when the client asked if the content had changed.
TCP_IMS_MISS — An If-Modified-Since request was issued by the client for a stale object.
TCP_MEM_HIT — Memory contained a valid copy of the object.
TCP_MISS — Cache did not contain the object.
TCP_NEGATIVE_HIT — The object was negatively cached, meaning that an error was returned (such as the file not being found) when the object was requested.
TCP_REF_FAIL_HIT — A stale object was returned from cache because of a failed request to validate the object.
TCP_REFRESH_HIT — A stale copy of the object was in cache, but a request to the server returned information that the object had not been modified.
TCP_REFRESH_MISS — A stale cache object was replaced by new, updated content.
TCP_SWAPFAIL — An object could not be accessed from cache, despite the belief that the object should have been there.
cache.log — Contains valuable information about your Squid configuration when the squid daemon starts up. You can see how much memory is available (Max Mem), how much swap space (Max Swap), the location of the cache directory (/var/spool/squid), the types of connections being accepted (HTTP, ICP, and SNMP), and the port on which connections are being accepted. You can also see a lot of information about cached objects (such as how many are loaded, expired, or canceled).
store.log — Contains entries that show when content is being swapped out from memory to the cache (SWAPOUT), swapped back into memory from cache (SWAPIN), or released from cache (RELEASE). You can see where the content comes from originally and where it is being placed in the cache. Time is logged in this file in raw UNIX time (in milliseconds).
Another log file may interest you: /var/log/messages. This file contains entries describing the startup and exit status of the squid daemon.
Run the top command to see information about running processes, including the Squid process. If you are concerned about performance hits from too much Squid activity, type M from within the top window. The M option displays information about running processes, sorted by the percent of memory each process is using. If you find that Squid is consuming too large a percentage of your system memory, you can reduce the memory usage by resetting the cache_mem value in your squid.conf file.