Choosing How Articles Are Stored

Choosing How Articles Are Stored

Traditionally, news servers have stored newsgroup articles in a very simple format. In the news spool directory (such as /var/spool/news), each article was stored under a subdirectory named after the newsgroup. For example, articles for the comp.os.linux.x newsgroup would be stored in comp/os/linux/x in the news spool directory. Each article was named by its unique message number and placed in that directory.

Unfortunately, the traditional way of storing news articles has become quite inefficient, given the huge volume of newsgroup articles these days. In addition to the traditional method, the INN news server offers the following other methods for storing newsgroup articles:

  • timehash:???Articles are stored in directories based on when they arrive. This method makes it easier to control how long articles are kept and prevents any directory from containing too many files.

    In the default news directory (/var/spool/news), the timehash method of storage creates directories based on the time articles are received. A timehash directory is in the form time-xx/bb/cc/yyy-aadd. Here, xx is a hexadecimal value of the storage class, and yyyy is a hexadecimal sequence number. The other values represent the arrival time.

  • cnfs:???Articles are stored in buffer files that are configured before articles arrive. In this arrangement, when a new article arrives and the buffer is full, the new article replaces the oldest article. This is referred to as cyclical storage.

    When buffers are used instead of the file system, articles can be stored and served much faster. The downside to this method is that, because articles are overwritten automatically after the buffer limit is reached, it is harder to enforce a policy that retains articles for a set period of time. This method also requires more configuration.

  • timecaf:???Lots of articles are stored in a single file with this storage method. This method can be about four times faster than the timehash method, though it gives you less control over the article spool. Because this method is relatively new, it has not been as well tested as other methods. Like timehash, the arrival time is used to name the files where articles are stored.

  • tradspool:???This is the original storage method for INN, where each article is stored as a separate file in a directory structure that is named after its newsgroup. While this method makes it easy to access articles on the news server, it has become ineffective for handling the volume of news that today's news servers need to handle.

  • trash:???This method is only used for testing and for discarding articles based on your particular storage method. You cannot retrieve articles that have been assigned to the trash storage method.

Activating different storage methods

Storage methods used for your INN server are set in the /etc/news/storage.conf file. You can activate the timehash, cnfs, timecaf, tradspool, or trash storage methods by creating method entries in the storage.conf file. You can also assign different newsgroups and other attributes to different methods. (After this file is configured, no additional configuration file setup is needed for the timehash method; however, the cnfs method requires that you set up a cycbuff.conf file.)

The format of a storage.conf file entry is as follows:

method <methodname> {
     newsgroups: <wildmat>
     class: <storage class number>
     size: <minsize> [,<maxsize>]
     expires: <mintime>[,<maxtime>]
     options: <options>
}

For each method name (timehash, cnfs, timecaf, tradspool, or trash), define the newsgroup(s) that applies to the method. Wildcard characters (*, ?, and so on) that can be used are described in the "Understanding Wildmat Characters" sidebar, earlier in this chapter. The optional class value can be assigned a number (0, 1, and so on) that matches an entry in the expire.ctl file where article expiration times are stored. The optional size value determines the minimum and maximum size an article can be. (A 0 as maxsize places no limits on article size.) The optional expires value determines the storage class based on the Expires: headers in the article. The options value (which is itself optional) can be used to set options that are specific to a method.

Using the timehash storage method

The timehash storage method stores newsgroup articles based on when your news server receives them. The following timehash method entry examples are contained in the storage.conf file itself. You can uncomment and modify these entries to create your own entries:

method timehash {
       newsgroups: *
       class: 0
}
method timehash {
       newsgroups: alt.binaries.*
       class: 1
       size: 2,32000
}
method timehash {
       newsgroups: alt.*
       class: 2
       size: 1
}

The first timehash entry matches all newsgroups that come in (*). The class number basically identifies a class that matches expiration time settings for newsgroups that are stored with the entry. (See the description of expire.ctl in "Setting Up Expiration Times," later in the chapter, for information on how each class is defined.) The second timehash entry assigns a class (1) and size (2-byte to 32,000-byte limit) on newsgroups below the alt.binaries hierarchy. The third timehash entry is an example of assigning a class (2) and a size (1) to an unlimited number of characters) to all groups under the alt newsgroup hierarchy.

Using the cnfs storage method

The cnfs newsgroup storage method is an efficient way to rotate out newsgroup articles based on how many articles have been received (rather than just when they were received). Although this method is more complicated to configure, it is a good way to manage the size of your incoming news article database.

Tip?

The INN installation instructions recommend the cnfs method of storing articles if you have a full news feed. This method is much more efficient than the timehash storage method for managing the volume of news that must be handled nowadays.

Here are some examples of cnfs method entries from the storage.conf file. You can uncomment and modify these entries to suit your configuration:

method cnfs {
       newsgroups: *
       class: 1
       size: 0,3999
       expires 4d1s
       options: FAQS
}
method cnfs {
       newsgroups: *
       class: 2
       size: 0,3999
       expires: 0s,4d
       options: SMALLAREA
}
method cnfs {
       newsgroups: *
       class: 3
       size: 4000,1000000
       options: BIGAREA
}

Notice that each of the cnfs storage methods in these examples applies to all newsgroups. Articles are stored in different buffers based on class and size. The values in each options field need to match entries in the cycbuff.conf file, as shown in the following section.

Assigning buffers for cnfs storage

Newsgroup articles are cycled out of your news server, for appropriate storage methods, based on the contents of the /etc/news/cycbuff.conf file. Here are some entries from the cycbuff.conf file that define the buffers used for the methods previously described:

# The order of lines in this file is not important among the same item.
# But all cycbuff items should be presented before any metacycbuff item.
   
# 1. Cyclic buffers
   
cycbuff:ONE:/export/cycbuffs/one:512000
cycbuff:TWO:/export/cycbuffs/two:512000
cycbuff:THREE:/export/cycbuffs/three:512000
   
# 2. Meta-cyclic buffers
   
metacycbuff:BIGAREA:ONE,TWO
metacycbuff:SMALLAREA:THREE

In the cycbuff.conf file, all cyclic buffers (cycbuff) entries should appear before metacyclic buffers (metacycbuff). The second field of a cycbuff entry identifies the buffer's name. In this example, the three buffer entries are named ONE, TWO, and THREE, respectively. (Each buffer name is later assigned to a metacyclic buffer.) The third field in each cycbuff field is the filename that identifies the path to the buffer file. The last field is the size of the buffer in kilobytes (1K equals 1024 bytes).

In the metacycbuff entries, the second field contains the symbolic names of the metacyclic buffers (which are used in the options entries of the storage.conf file). The third field in each entry then assigns cycbuff entries to each metacyclic class.

You can also add optional entries to this file, such as the following, to affect buffering:

  • cycbuffupdate:???Reflects how many articles are stored between header updates. The default value is 25.

  • refreshinterval:???Reflects the number of seconds between the time a cycbuff header is read and the time it is reread. The default value is 30.

Creating buffers for cnfs storage

You can use the dd command to create a big file that exists on top of your regular file system. Here is an example of the dd command for creating a buffer file:

$ dd if=/dev/zero of=/var/spool/news/articles/cycbuff bs=32k count=N

In this example, N would be replaced with the size of the buffer that you want, divided by 32.

The news user and newsgroup must be assigned ownership of the buffer file you create. The permission mode should be 0664 or 0660. For example:

$ chown news /var/spool/news/articles/cycbuff
$ chgrp news /var/spool/news/articles/cycbuff
$ chmod 0664 /var/spool/news/articles/cycbuff



Part IV: Red Hat Linux Network and Server Setup