10.2 WU-BLAST Installation

Obtaining WU-BLAST software is slightly more complicated than NCBI-BLAST because it requires a license from Washington University in St. Louis. If you are affiliated with an academic institution or a nonprofit organization, the license is free. If you are part of a for-profit enterprise, you must pay a licensing fee. The price is expensive by shrink-wrapped software standards, but is similar to other bioinformatics software packages available from universities. If you find the cost prohibitive, an earlier version of WU-BLAST is available for free. The free version contains fewer features, and is available for a limited number of operating systems, but for most people, it works just fine. If your operating system isn't supported and your specific use doesn't require gapped alignment, a free version of the classic, ungapped BLAST with public domain source code also exists. This older version, 1.4.9, is nearly identical to NCBI-BLAST Version 1.4, which is no longer available from the NCBI.

Should you wish to license WU-BLAST or download the free versions, visit the official site for the WU-BLAST software at http://blast.wustl.edu. The free versions can be downloaded with a couple clicks, but more patience is required for the licensed version. After the license is issued, you will be sent a user-specific URL from which to download the software. It's a good idea to save this information because you will use it again to download the free updates. Licensed users are notified by email as new features are added (usually a few times per year).

WU-BLAST is available only for Unix operating systems. If you don't have access to a Unix computer, you can run Linux or FreeBSD under a virtual machine with products such as VMWare (http://www.vmware.com) or VirtualPC (http://www.connectix.com).

10.2.1 Expanding the tarball

The software comes as a compressed Unix archive, or tarball. First, create a directory such as /usr/local/pkg/wu-blast; if you don't have root access, create a wu-blast directory inside your home directory. Next, download the tarball to that directory. If you do this from a browser, the files may be extracted automatically. If not, use the following command, where your_platform_name will be something like linuxi686.tar.Z:

tar -xzf blast2.your_platform_name

Not all versions of tar support the -z option above, in which case you can use the following command line:

zcat blast.your_blastform_name | tar -xf -

Before you continue with the rest of the installation procedures, look at what's inside the tarball.

10.2.2 Files and Directories

There are a number of files and two subdirectories. The most important items are described very briefly in Table 10-2 in logical, rather than alphabetical, order. See the WU-BLAST reference in Chapter 14 for more information.

Table 10-2. WU-BLAST files and directories




The WU-BLAST executable. Unlike the free version, which comes with five different BLAST executables, the licensed version has only one.

blastn, blastp, blastx, tblastn, tblastx

Symbolic links (aliases) to blasta. blasta figures out what kind of program to run based on the name of the symbolic link.


Executable for formatting both nucleotide and protein databases.


Executable that allows you to retrieve sequences by accession number from a WU-BLAST database.

nrdb, patdb

Programs used to create nonredundant databases. nrdb keeps only unique sequences and concatenates the descriptions of identical sequences. patdb goes a little further and removes sequences that are perfect substrings of other sequences.

gb2fasta, gt2fasta, pir2fasta, sp2fasta

Programs to convert GenBank, SwissProt, and PIR files to FASTA files. gb2fasta extracts the nucleotides, and gt2fasta extracts the proteins.


Directory containing the complexity filtering programs used by WU-BLAST (seg, dust, and xnu).


Directory containing two subdirectories, aa and nt, which contain, respectively, the amino acid and nucleotide scoring matrices. The amino acid matrices like BLOSUM 62 are singular files, but the nucleotide matrices exist in two forms, with the extension 4.2 or 4.4 that corresponds to 4- and 16-symbol matrices.

setdb, pressdb

Executable used to format protein and nucleotide databases. The xdformat executable replaces these programs, but they are included for those who prefer the old interface or require compatibility with older executables.

wu-blastall, wu-formatdb

Perl scripts that mimic the NCBI-BLAST command-line interface while executing the WU-BLAST counterparts.


Configuration file that allows administrators to enforce system-level resource limitations on BLAST jobs.

10.2.3 Executables

Let's assume the tarball has been downloaded to /usr/pkg/wu-blast, and you normally keep your executables in /usr/local/bin. Issue the following commands to put the executables in your path.

ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/blastn
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/blastp
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/blastx
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/tblastn
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/tblastx
ln -s /usr/pkg/wu-blast/xdformat /usr/local/bin
ln -s /usr/pkg/wu-blast/xdget /usr/local/bin

Note, unlike the NCBI program blastall, blasta can not be executed by its own name, but only through aliases.

10.2.4 Environment Variables

You'll need to set three environment variables: BLASTDB, BLASTMAT, and BLASTFILTER. These variables correspond to the locations of the databases, scoring matrices, and complexity filters. WU-BLAST environment variables use a colon-delimited list of locations, like the PATH variable. This is especially useful for database files, which can be placed in several locations in the filesystem and then be accessed by name rather than explicit path. This is convenient because it allows computers to access databases on a networked server or on their local disks, and this is invisible to the user. Databases are looked for from a colon-delimited list of locations defined in the BLASTDB environment variable (similar to the PATH variable for executables). If BLASTDB isn't set, blasta looks in the current directory and in /usr/ncbi/blast/db. In these cases, FASTA databases of the same name must be present (or symbolic links to such databases). It's generally a better idea to use the BLASTDB variable because this strategy uses less disk space and is much less confusing.

Two environment variables, BLASTMAT and BLASTFILTER, must be set so blasta can find the scoring matrices and complexity filters. These variables also use colon-delimited lists, but there's little reason to have them in more than one location.

Now set the BLASTMAT and BLASTFILTER environment variables to the explicit paths of the matrix and filter directories (we'll assume that the software was unpackaged in /usr/local/wu-blast). Here's how to do so in csh and its derivatives:

setenv BLASTMAT /usr/local/wu-blast/matrix
setenv BLASTFILTER /usr/local/wu-blast/filter

And in sh and its derivatives:


10.2.5 Setting Resource Limits with /etc/sysblast

WU-BLAST has a special file called /etc/sysblast that sets systemwide resource limitations for each machine running BLAST jobs. The /etc/sysblast file currently supports three commands: nice, cpus, and cpusmax. The nice value gives BLAST processes a lower priority (nice values are generally in the range of 1 to 20, with 20 being the least demanding). If the computer is used for other jobs, such a workstation, setting this to 5 makes the workstation more responsive, but the BLAST job will take over at idle times. The cpus value is the default number of CPUs to use, and cpusmax defines the maximum number of CPUs allowed. These two should be set on any large, multiprocessor machine. Here is a sample /etc/sysblast file:

nice = 5
cpus = 1
cpusmax = 4

The behavior of WU-BLAST on multiprocessor systems is worth discussing, and if you're one of the lucky people who have access to a computer with 16 processors or more, /etc/sysblast will definitely help you. WU-BLAST lets users control the number of CPUs with the -cpus command-line parameter. If this parameter isn't given an explicit value, the programs uses all the processors in the computer (except for BLASTN, which reins itself in at four processors). While this may be good for BLAST users, your other users may not be so happy. This is where the /etc/sysblast file is critical because it allows you to modify the default behavior and set limits for CPU usage.