Preventing Abuse

Denial of service (DoS) attacks work by swamping your server with a great number of simultaneous requests, slowing down the server or preventing access altogether to legitimate clients. DoS attacks are difficult to prevent in general, and usually the most effective way to address them is at the network or operating system level. One example is blocking specific addresses from making requests to the server; although you can block those addresses at the Web server level, it is more efficient to block them at the network firewall/router or with the operating system network filters.

Other kinds of abuse include posting extremely big requests or opening a great number of simultaneous connections. You can limit the size of requests and timeouts to minimize the effect of attacks. The default request timeout is 300 seconds, but you can change it with the TimeOut directive. A number of directives enable you to control the size of the request body and headers: LimitRequestBody, LimitRequestFields, LimitRequestFieldSize, LimitRequestLine, and LimitXMLRequestBody.

To prevent abuse, the mod_bwshare module enables you to limit the number of files or bytes that a given client can download from the server. You can learn more about mod_bwshare at http://www.topology.org/src/bwshare/README.html.

Robots

Robots, Web spiders, and Web crawlers are names that define a category of programs that download pages from your Web site, recursively following your site's links. Web search engines use these programs to scan the Internet for Web servers, download their content, and index it. Normal users use them to download an entire Web site or portion of a Web site for later offline browsing. Normally, these programs are well behaved, but sometimes they can be very aggressive and swamp your Web site with too many simultaneous connections or become caught in cyclic loops.

Well-behaved spiders will request a special file, called robots.txt, that contains instructions about how to access your Web site and which parts of the Web site won't be available to them. The syntax for the file can be found at http://www.robotstxt.org/. You can stop the requests at the router or operating system levels.

Part II: Basic Language Elements

Part III: Getting Involved with the Code