14.4 xdformat Parameters

xdformat formats BLAST databases from FASTA files. It also reports descriptive information about the database and dumps the entire content to FASTA format.

Here are some examples:

xdformat -n files
xdformat -p files
zcat fasta.*.gz | xdformat -o my_db -n  --  -
xdformat -n -i database
xdformat -n -r datatbase > fasta_file
-A [0..2]

Default: 2

When indexing accession.version identifiers, you have three indexing options:

0

Accession only; version isn't stored

1

Stored as accession.version

2

Stored as both accession only and accession.version

-a [database]

Appends sequences to the named database. If the database is indexed, the appended sequences will also be indexed.

-c [character]

Default: Off

If an invalid letter is encountered, xdformat terminates and reports an error message. If this occurs, check the sequence file for errors. After checking, you may either skip illegal characters with -k or change them to a legal character with -c. The typical operation for nucleotides is to set -c N, and for proteins -c X.

See also

-k

-D [integer]

Default: Unlimited

Sets the maximum length for definition lines.

-d [string]

Default: None

Sets a user-defined release date for the database. The date may have 63 characters at most.

See also

-v

-e [file]

Default: stderr

Appends information and errors to the named file.

-G

Default: Off

Prefaces each sequence with the database record number in the format of gnl|xdf|#.

-i

Default: Off

Reports descriptive information about a BLAST database. This is useful for determining when a database was created, how many sequences it contains, and if it is indexed.

-K [integer]

Default: Unlimited

Sets the maximum number of identifiers with Control-A separators. This is useful for trimming highly redundant sequences created with nrdb or another redundancy purifier that uses Control-A separators.

-k

Default: Off

If an invalid letter is encountered, xdformat terminates. If this occurs, you can either skip illegal characters with -k or change them to a legal letter with -c. Check the errors to ensure the input file is formatted properly.

See also

-c

-L [number]

Default: 100000000 (100 million letters)

Sets the maximum sequence length. For optimal performance, break up large sequences into smaller fragments no larger than 1 million letters.

-l [number]

Default: 0

Sets the minimum sequence length.

-M [number]

Default: 96m

Sets the cache size for indexing. For faster indexing, the size may be increased (for example, -M 512m).

-O [4..8]

Default: 4

Sets the number of bytes of precision. The default value allows databases of up to 4 billion amino acids or 16 billion nucleotides. If you expect a database to contain more than this limit, increasing precision by one level multiplies the limit by 256. Setting -O is necessary only if you append to the database because the precision automatically increases appropriately when databases are created.

-P [integer]

Default: 60

This option applies only when dumping the entire content of a database with -r. -P controls the length of the sequence lines; -P 0 puts the whole sequence on one line.

See also

-r

-q [0..3]

Default: 0

Certain files may contain numerous nonfatal errors in their identifier format. -q quiets these errors.

0

No silencing

1

Silences field1 errors

2

Silences field 2 errors

3

Silences all fields

-r

Default: Off

Reports (dumps) the entire database content to stdout in FASTA format.

-T [string]

Default: Off

This option lets you restrict indexing of identifiers to a particular database name or tag. The [string] has two parts: part 1 is the name of the database (e.g., gb for GenBank or emb for EMBL?see Chapter 10), and part 2 is either blank or a number.

blank

Index all identifiers.

0

Don't index.

1

Index only field 1.

2

Index only field 2.

Here are some examples:

-T emb0 doesn't index EMBL records.
-T gb1 indexes GenBank accession but not locus.
-T gb2 indexes GenBank locus but not accession.
-T gb index both accession and locus of GenBank records.
-v

Default: Off

Sets a user-defined version string for the database (a maximum of 63 characters).

See also

-d

-X

Default: Off

Databases that are formatted but not indexed may be indexed or re-indexed (e.g., with a different indexing scheme) with -X. In the following examples, the two commands on Line 1 are equivalent to the one on Line 2.

xdformat -n nt_db ; xdformat -n -X nt_db
xdformat -n -I nt_db