14.3 WU-BLAST Parameters

WU-BLAST has many control parameters, some of which are esoteric and rarely useful. The most important parameters are listed here.

altscore=[string]

Default: Off

Defines an alternate scoring system for any pair of letters. For example, altscore="M M -3" changes the score of M-M pairs to -3, and altscore="A C 4" gives a score of 4 if the query is A and the subject is C. Letters may be designated as any to change an entire row or column. The score can be given as min or max for the minimum and maximum scores in the matrix or na to make the score infinitely low. To set the score of all rows and columns containing stop codons to negative infinity, set altscore="* any na" and altscore="any * na". If you change the scoring parameters, you may also want to adjust gapL, gapH, and gapK.

See also

nogap, gapL, gapH, gapK

B=[integer]

Default: 250

Sets the number of database hits to report. A warning is issued if this number is exceeded. It is typical to set this parameter to a very high value, such as B=100000, to ensure that no alignments are missed.

bottom

Default: OffPrograms: blastn, tblastx, blastx

Search only the bottom strand of the query.

See also

top

cpus=[integer]

Default: 4 for blastn; all for blastp, blastx, tblastn, and tblastx

Sets the number of processors to use. If not set, all processors on the system may be used except blastn, which will limit itself to 4. See Chapter 10 for information on the /etc/sysblast file used for setting systemwide resource limitations.

dbrecmax=[integer]

Default: Last database record

Last database record number to search.

See also

dbrecmin, qrecmin, qrecmax

dbrecmin=[integer]

Default: 1

First database record number to search. For example, by setting dbrecmin=1 dbrecmax=10, only the first 10 database sequences are searched.

See also

dbrecmax, qrecmin, qrecmax

E=[number]

Default: 10

This is the E from the Karlin-Altschul equation. Database hits whose E-value is greater than this threshold will not be reported. If both E and S are set, the more restrictive parameter is used.

See also

S

E2=[number]

Default: Variable; calculated from scoring parameters

Sets the alignment threshold for ungapped alignments. When E2 and S2 are set, the more restrictive parameter is used.

See also

S2, gapE2, gapS2

echofilter

Default: Off

Prints out the query sequence after all filtering is performed. This is useful for troubleshooting when there are no database hits, and you suspect the filtering is too aggressive.

See also

filter, wordmask, maskextra

errors

Default: Off

Suppress nonfatal error messages. It is generally a good idea to pay attention to the error messages, but at times it is useful to block them.

See also

nonnegok, novalidctxok

filter=[string]

Default: Off

Processes the query sequence with the specified filtering method. Letters are replaced with X and N for proteins and nucleotides, respectively.

seg

Identifies low-complexity regions in both nucleotide and amino acid sequences.

dust

The standard low-complexity filter for nucleotide sequences. Generally less sensitive than seg.

xnu

Finds short repeats in protein sequences.

seg+xnu

Combines both seg and xnu.

ccp

Coiled-coil filter for proteins.

Multiple filtering methods may be specified on the same command line; for example:

blastp nr query filter=seg filter=ccp filter=xnu

See also

echofilter, maskextra, wordmask

gapE2=[number]

Default: Variable; calculated from scoring parameters

Expectation threshold for saving individual gapped alignments. When gapE2 and gapS2 are set, the more restrictive parameter is used.

See also

gapS2, E2, S2

gapH=[number]

Default: Variable; depends on scoring parameters

Sets the value of H (information per aligned letter) for gapped alignments. If a particular combination of scoring matrix (or match/mismatch scores) and gap values doesn't already have precomputed values for gapH, gapK, and gapL, WU-BLAST uses ungapped statistics. In this case, the resulting E-values may be much too low. A warning is issued when this is the case. Computing proper values for gapped Karlin-Altschul parameters requires simulations with random sequences that determine what ungapped scoring scheme is most similar to the gapped scoring scheme.

See also

H, K, gapK, L, gapL, warnings

gapK=[number]

Default: Variable; depends on scoring parameters

Sets the value of the Karlin-Altschul K parameter for gapped alignments. See the description for gapH.

See also

H, gapH, K, L, gapL

gapL=[number]

Default: Variable; depends on scoring parameters

Sets the value of the Karlin-Altschul parameter lambda (information per unit score) used for gapped alignments. See the description for gapH.

See also

H, gapH, K, gapK, L

gapS2=[integer]

Default: Variable; calculated from scoring parameters

Score threshold for saving individual gapped alignments. Alignments below the threshold aren't reported.

See also

gapE2

gapsepqmax=[int]

Default: Unlimited

Maximum separation allowed between gapped alignments along the query.

See also

gapsepsmax, hspsepqmax, hspsepsmax

gapsepsmax=[int]

Default: Unlimited

Maximum separation allowed between gapped alignments along the subject.

See also

gapsepqmax, hspsepqmax, hspsepsmax

gapX

Default: Variable; depends on scoring parameters

Sets the alignment extension cutoff for gapped alignment.

See also

X

gi

Default: Off

Displays the GenInfo identifiers of database hits, if present.

golf=[number]

Default: 0.1

Maximum fractional length overlap for gapped alignment consistency. See the description for olf.

golmax=[integer]

Default: Unlimited

Maximum absolute length of overlap for gapped alignment consistency. See the description for olf.

gspmax=[integer]

Default: 1,000

Sets the maximum number of gapped alignments per subject sequence. gspmax is bounded by hspmax. A value of 0 implies no limit.

See also

hspmax

H=[number]

Default: Variable; depends on scoring parameters

Sets the value of the Karlin-Altschul parameter H.

See also

gapH, K, gapK, L, gapL

hspmax=[integer]

Default: 1000

Sets the maximum number of ungapped alignments per subject sequence. A warning is issued if this limit is exceeded. A value of 0 implies no limit.

See also

gspmax

hitdist=[integer]

Default: 0, off

Maximum distance between word hits for the two-hit seeding algorithm. WU-BLAST uses one-hit seeding by default.

hspsepqmax=[int]

Default: Unlimited

Maximum separation allowed between alignments along the query.

hspsepsmax=[int]

Default: Unlimited

Maximum separation allowed between alignments along the subject.

K=[number]

Default: Variable; depends on scoring parameters

Sets the value for K from the Karlin-Altschul equation.

See also

gapK, H, gapH, L, gapL

kap

Default: Off

Assesses individual alignment scores with Karlin-Altschul statistics rather than using sum statistics on groups of alignments.

L=[number]

Default: Variable; depends on scoring parameters

Sets lambda (nats per unit score) from the Karlin-Altschul equation.

See also

gapL, H, gapH, K, gapK

lcfilter

Default: Off

Filters lowercase letters in the query sequence. The lowercase letters are treated as if they had been filtered out by one of the filtering programs.

See also

echofilter, filter, wordmask, lcmask

lcmask

Default: Off

Masks lowercase letters in the query sequence for seeding only. Lowercase letters in the query sequence aren't used in the initial word search but are available for alignment during the extension stage; known as soft masking.

See also

echofilter, filter, wordmask, lcfilter

links

Default: Off

Display group information. Parentheses indicate the placement of the alignment in the group. The following example shows three alignments in the group. The score of the second reported alignment is 159, the last alignment in the chain.

Score = 159 (61.0 bits), Sum P(3) = 2.7e-38
Identities = 26/39 (66%), Positives = 32/39 (82%)
Links = 1-3-(2)

See also

topcomboN

M=[integer]

Default: +5 blastn

Sets the match score. This parameter is usually used for blastn only but may be used for other programs.

See also

N

maskextra=[integer]

Default: Off

Extends masking an extra distance of [integer] letters.

See also

echofilter, filter, wordmask, lcfilter, lcmask

matrix=[file]

Default: BLOSUM62Programs: blastp, blastx, tblastn, tblastx

Specifies a scoring matrix file. The default is BLOSUM62. A large number of scoring matrices are distributed with WU-BLAST in the matrix/aa directory. Nucleotide matrices for use with blastn are in matrix/nt.

N=[integer]

Default: -4 blastn

Sets the mismatch score. This parameter is usually used for blastn only but may be used for other programs.

See also

M

nogap

Default: Off

Turns off gapped alignment. This parameter is useful in conjunction with altscore to prevent stop codons.

See also

altscore

nonnegok

Default: Off

Under Karlin-Altschul statistics, the expected score, must be negative. WU-BLAST normally exits with a fatal error if this isn't the case. Sometimes scoring schemes with positive expected scores are useful, and setting nonnegok silences the error condition.

See also

novalidctxok, errors

nosegs

Default: Off

WU-BLAST doesn't allow alignments to cross hyphen characters that act as query segment boundaries (e.g., for draft sequence). nosegs effectively converts hyphens to Ns.

notes

Default: Off

Suppresses informational messages. For example, if you are intentionally searching for a low-complexity sequence, you may wish to disable the message that suggests that a low-complexity filter would help remove meaningless alignments.

See also

errors, warnings

novalidctxok

Default: Off

If a sequence can't generate any significant HSPs, WU-BLAST normally exits with an error that says there are no valid contexts. You may see encounter such an error when searching a collection of sequencing reads, some of which are mostly (or completely) Ns. Setting novalidctxok allows you to continue without error.

See also

nonnegok, errors

nwlen=[integer]

Default: End of sequence

Sets the length of region for seeding.

See also

nwstart

nwstart=[integer]

Default: 1

Sets the starting position for seeding alignments. nwstart and nwlen indicate that a specific region of the query should be seeded. Alignments may extend outside of this region. For example, nwstart=500 nwlen=200 seeds positions 500 to 700 of the query sequence.

See also

nwlen

o=[file]

Default: stdout

Write results to this file instead of to stdout (the screen).

olf=[number]

Default: 0.125

Maximum fractional length of overlap for alignment consistency.

Consistent alignments must be ordered and have minimal overlap (see Chapter 5). The amount of permitted overlap is expressed as both a relative fraction and an absolute number. The default setting, 0.1, prevents alignments whose overlap length is more than 10 percent of the length of either alignment from being in the same group. The golf parameter plays the same role for gapped alignments. The olmax and golmax parameters control the absolute length of the overlap.

olmax=[integer]

Default: Unlimited

Maximum absolute length of overlap for alignment consistency. See the description for olf.

postsw

Default: OffPrograms: blastp

Performs Smith-Waterman alignment after initial BLAST alignment to return the single maximum-scoring pair rather than several high-scoring pairs.

Q=[integer]

Default: 10 blastn, 9 blastp, blastx, tblastn, tblastx

Sets the cost for the first gap character.

See also

R

qoffset=[integer]

Default: 0

Adjusts the query numbering by this amount?for example, if you search with a sequence that was known to have a vector sequence in the first 25 bases. By setting this parameter to 25, your numbering will be based on the insert sequence.

qrecmax=[integer]

Default: 1

Last query sequence to search. See the description for qrecmin.

Qrecmin=[integer]

Default: 1

By default, WU-BLAST produces one BLAST report for each query sequence in a FASTA files with multiple sequences. Setting qrecmin and qrecmax allows you to select a subset of query sequences in much the same way as dbrecmin and dbrecmax.

See also

qrecmax, dbrecmin, dbrecmax

R=[integer]

Default: 10 blastn, 2 blastp, blastx, tblastn, tblastx

Sets the cost for the second and remaining gap characters.

See also

Q

restest

Default: Off

blastp and blastx statistical tests are based on the number of residues (letters) in the database. If Z is set in conjunction with restest, blastn, tblastn, and tblastx will also be based on the number of letters.

See also

seqtest, Z

S=[integer]

Default: Variable; calculated from E

Sets the final score threshold. Since S and E are interconvertible through the Karlin-Altschul equation, setting S effectively sets E, and vice versa. When both are set, the more restrictive one is used.

See also

E

mS2=[integer]

Default: Variable; depends on scoring parameters

Score threshold for individual ungapped alignments. If both S2 and E2 are set, the more restrictive one is used.

See also

E2, gapS2, gapE2

seqtest

blastn, tblastn, and tblastx statistical tests are based on the number of sequences in the database. If Z is set in conjunction with seqtest, blastp and blastx will also be based on the number of sequences.

See also

restest, Z

span, span1, span2

Default: span2

WU-BLAST normally discards HSPs that are contained completely within a larger, higher-scoring HSP. This behavior is called span2. If span1 is set, alignments are thrown out if they are subsets of the query or subject (unlike span2, both conditions aren't required). This is useful if the sequences contain many repeats. To prevent discarded alignments, set span. The output may become very large.

T=[integer]

Default: 11 blastp, 12 blastx, 13 tblastn, 13 tblastx

Sets the neighborhood word threshold score. Setting this value extremely high removes neighborhood words and makes seeding require matching words. T, W, and hitdist are the most effective parameters for controlling the sensitivity and speed of BLAST searches.

See also

W, hitdist

top

Default: OffPrograms: blastn, tblastx, blastx

Searches only the top strand of the query.

See also

bottom

topcomboN=[integer]

Default: Off

Reports the number of consistent, or collinear, HSP combinations.

V=[integer]

Default: 500

Controls the number of one-line summaries.

See also

B

warnings

Default: Off

WU-BLAST reports various warning conditions. This parameter turns them off.

See also

notes, errors

wink=[integer]

Default: 1

Words are created by sliding a window of width W by wink letters at a time. If W equals wink, words don't overlap.

See also

W, T, hitdist

wordmask=[method]

Default: Off

Filters the query sequence for seeding only. Low-complexity region in the query sequence isn't used in the initial word search but is available for alignment during the extension stage; called soft masking.

See also

filter, lcfilter, lcmask, echofilter, maskextra

W=[integer]

Default: 11

Sets the word size for seeding alignments.

See also

T, hitdist, wink

X=[integer]

Default: Variable; depends on scoring parameters

Controls the alignment extension cutoff for ungapped alignments.

See also

gapX

Y=[number]

Default: Variable; depends on scoring parameters

Sets the size of the query sequence.

See also

Z

Z=[number]

Default: Variable; depends on scoring parameters

Sets the size of the database in letters (restest is assumed), but Z may also be used to mean the number of sequences if seqtest is set.

See also

Y, seqtest, restest