13.6 megablast Parameters

megablast is similar to blastn but optimized to find near identities very quickly. It's much faster than the standard blastn, partly because it uses query packing. The extension algorithm differs from the standard blastn and isn't designed for cross-species searches. Many parameters are identical between megablast and blastall, but some are unique to one program or the other, and some parameters with the same symbol do different things.

Here are a few example command lines:

megablast -d my_db -i my_query -F "m D"
megablast -d my_db -i my_query -D 2 -t 18 -W 11

-a [integer]

Default: 1

The number of processors; same as blastall.

-A [integer]

Default: 40

The two-hit algorithm window size; same as blastall.

-b [integer]

Default: 250

The number of database sequences to show; same as blastall, if -D 2 is set.

-d [string]

Default: nr

The database; same as blastall.

-D [0..3]

Default: 0

The type of output. The -m option applies only if -D 2 is set here.

Options

0

One-line output for each alignment in the form of:

'subject-id'=='[+-]query-id' (s_beg q_beg s_end q_end) Score

For example:

'AF071362'=='+AF071357' (1 715 200 920) 8

Score for non-affine gapping parameters (the default) is the total number of differences (mismatches + gaps); it's the actual raw score when using affine gapping.

1

Same as the output of -D 0, but additionally shows the endpoints and percent identity for each ungapped segment in the alignment.

#'>AF071362'=='+AF071357' (1 715 200 920) 8
a {
  s 8
  b 1 715
  e 200 920
  l 1 715 26 740 (96)
  l 27 742 27 742 (100)
  l 28 744 47 763 (100)
  l 48 765 50 767 (100)
  l 51 769 60 778 (100)
  l 61 780 133 852 (100)
  l 134 854 200 920 (99)
}

s: Score.
b: Begin coordinates for the subject and query, respectively.
e: End coordinates for subject and query, respectively.
l: Coordinates for each ungapped segment with the percent identity in parentheses at the end.

2

A traditional BLAST output.

3

A tab-delimited, one-line format. The 12 reported tab-delimited fields are as follows:

Query

Subject

Percent identity

Alignment length

Mismatches

Gap openings

Query start

Query end

Subject start

Subject end

E value

Bit score

-e [real number]

Default: 1,000,000

The expectation value; same as blastall. However, it's set to a very large number, so there is effectively no cutoff.

-E [integer]

Default: 0

Setting -E and -G turns on affine gapping (same as standard blastall). This causes megablast to use more memory and isn't necessary when the sequences are expected to be nearly identical. When -E and -G aren't set, the gap extension penalty is calculated from the match (-r) and mismatch (-q) so that E = r/2 -q. E is rounded down to the nearest integer. So, for the default +1/-3 matrix, the gap extension penalty equals 3.

-f [T/F]

Default: F

Shows full IDs of the database sequences in the output. The default is only the accession, or just the GI if no accession is given. Applies to -D 0, -D 1, and -D 3.

-F [T/F] [string]

Default: T

Filters the query sequence; same as blastall.

-G [integer]

Default: 0

Setting -E and -G turns on affine gapping (same as standard blastall). This causes megablast to use more memory and isn't necessary when the sequences are expected to be nearly identical.

-H [integer]

Default: 0

The maximum number of HSPs to save per database sequence. The default of 0 means "unlimited."

-i [file]

Default: stdin

The query file; same as blastall.

-I [T/F]

Default: F

Shows GI numbers in database deflines; same as blastall.

Can be used only with -D 2.

-l [file]

Default: Optional

Restricts search to a list of GI numbers; same as blastall.

-L [string]

Default: Optional

The location on query sequence; same as blastall.

-m [0..11]

Default: 0

Alignment view options. Must set -D 2, then it's the same as blastall.

-M [integer]

Default: 20000000 (20 million)

The maximum total length of queries for a single search. Reducing this number reduces the amount of memory required by megablast.

-n [T/F]

Default: F

Uses dynamic programming extension for affine gap scores. The default is to use a greedy algorithm for an extension.

-N [0,1,2]

Default: 0

The type of discontiguous template. To use discontiguous seeding, -t must be set to 16, 18, or 21, and -W must be 11 or 12.

Discontiguous templates don't require the usual exact word match employed by the other BLAST programs, but use a template pattern that must be matched to seed an alignment. If a template is specified by 1s and 0s, for example, with 1 representing required matches and 0 representing residues that need not match, then you can represent a template size 16 with a word size of 11 as:

1,110,010,110,110,111

Options

0

Coding template. This discontiguous template uses a pattern of 110 to match coding sequence where the third codon position is variable (and therefore set to 0 and not required to match). Here are all coding template combinations:

110,110,110,110,110,1         [11 of 16]
111,110,110,110,110,1         [12 of 16]
10,110,110,010,110,110,1      [11 of 18]
10,110,110,110,110,110,1      [12 of 18]
10,010,110,010,110,010,110,1  [11 of 21]
10,010,110,110,110,010,110,1  [12 of 21]

1

Optimal. This template pattern tries to minimize the correlation between successive words. Here are all optimal template combinations:

1,110,010,110,110,111        [11 of 16]
1,110,110,110,110,111        [12 of 16]
111,010,010,110,010,111      [11 of 18]
111,010,110,010,110,111      [12 of 18]
111,010,010,100,010,010,111  [11 of 21]
111,010,010,110,010,010,111  [12 of 21]

2

Simultaneous optimal and coding. This option increases sensitivity by allowing seeding from a match to either template at a given position.

-o [file]

Default: Optional

Output file; same as blastall.

-p [real number]

Default: 0

Percent identity cutoff. Alignments less than [real number] aren't reported. If using -D 0, all alignments are kept regardless of percent identity (no trace-back is performed, so percent identity can't be calculated).

-P [integer]

Default: 0

The maximum number of positions for a hash value. If set to nonzero, redundant subsequences will be masked in the word seeding phase. This allows a simple type of filtering by masking out subsequences that occur in the query sequences more than [integer] times. When the word size (-W) is set to 16 or higher, -P applies to subsequences of length 12; it applies to subsequences of length 8 when -W is set less than 16.

-q [negative integer]

Default: -3

Mismatch penalty; same as blastall.

-Q [file]

Default: Optional

Masked query output. Each query sequence is reported to [file], but with any region hit turned to Ns. This works only in conjunction with -D 2.

-r [integer]

Default: 1

Match score; same as blastall.

-R [T/F]

Default: Optional

Reports a short log message at the end of the run.

-s [integer]

Default: Optional

The minimum hit score to report. All alignments scoring less than [integer] aren't reported. By default, this is set to the word size, which results in all hits being reported.

-S [0..3]

Default: 3

The strands to search; same as blastall.

-t [16,18,21]

Default: Optional

Sets discontiguous template size. This, combined with the word size (-W) of either 11 or 12 and the template type (-N), sets discontiguous megablast.

-T [T/F]

Default: F

The HTML output; same as blastall, but is active only if -D 2 is set.

-U [T/F]

Default: F

Lowercase filtering; same as blastall.

-v [integer]

Default: 500

The number of one-line descriptions. Same as blastall if -D 2 is set.

-W [integer]

Default: 28

Word size. The default word size is very high because sequences aligned by megablast are expected to be nearly identical. For discontiguous searches (-t), word size can be only 11 or 12. megablast generates words every four bases (similar to the WU-BLAST wink parameter), so using a word size divisible by four assures that all words of that size will be found.

-X [integer]

Default: 20

The X dropoff value for a gapped alignment; same as blastall.

-y [integer]

Default: 10

The X dropoff value for an ungapped extension; same as blastall.

-z [real number]

Default: 0

The effective length of a database; same as blastall.

-Z [integer]

Default: 50

The X dropoff value for a dynamic programming gapped extension.