13.5 fastacmd Parameters

fastacmd retrieves sequences, individually or in batches, from BLAST databases. When using it, you don't have to keep FASTA files on your file system after you've formatted the BLAST database. Sequences are stored in a case-insensitive format, however, so if you use lower- and uppercase for semantic purposes, this information will be lost.

Here are a few sample command lines using fastacmd:

fastacmd -d nr -s P02042
fastacmd -d nr -s 12837002,P02042
fastacmd -d nr -D
fastacmd -d est -i file_of_gi
cat file_of_gi | fastacmd -d est -i stdin

The following reference lists the default value for each fastacmd parameter.

-a [T/F]

Default: F

Retrieves all accessions even duplicates when using -s or -i to retrieve sequences. If -a isn't set, only the first accession of duplicates is retrieved.

-c [T/F]

Default: F

Uses Control-A as a nonredundant definition line separator. This parameter applies only to nonredundant databases with concatenated definition lines. By default, a normal space is used as the separator. Using Control-A unambiguously separates sequence definitions.

-d [string]

Default: nr

The database from which to retrieve sequences.

-D [T/F]

Default: F

Dumps the entire database in FASTA format.

-i [file]

Default: Optional

A batch retrieval. The format of the text file is one GI or accession per line. stdin is a valid file.

cat file_of_gi | fastacmd -d est -i stdin
-I

Default: Optional

Prints information about a formatted database. Overrides all other retrieval options. Needs to be used with -d.

fastacmd -d my_db -I
-l [integer]

Default: 80

Sequences line length. The most common values are 50 (a nice round number), 60 (evenly divisible by 3), and 80 (a traditional terminal width).

-L [integer],[integer]

Default: 0,0

Extracts a region of the sequence. Using as the start coordinate indicates the actual beginning of the sequence. Using 0 as the end coordinate indicates the end of the sequence. A colon and the sequence range are appended to the identifier to signify the region extracted.

fastacmd -d nr -s AAG39070 -L 10,50

>gi|11611819:10-50 (AF287139) Hoxa-11 [Latimeria chalumnae]
SGPDFSSLPSFLPQTPSSRPMTYSYSSNLPQVQPVREVTFR
-o [file]

Default: stdout

Sends the output to the named file or stdout, if none is named.

-p [T/F/G]

Default: G

Options

G

Guess. Look for a protein first, and then a nucleotide.

T

Protein.

F

Nucleotide.

-P [integer]

Default: Optional

Retrieves sequences with this PIG.

-s [string]

Default: Optional

An identifier of the sequence to retrieve. The identifier may be a GI or accession. To retrieve multiple sequences, the identifiers must be separated by commas as follows:

fastacmd -d nr -s AAG39070,11611819

To retrieve a large number of sequences, using the -i parameter is more convenient, especially since there may be limits on the length of command-line strings.

-S [1..2]

Default: 1

The strand on subsequence. Only used with nucleotide sequences.

1

Top strand

2

Bottom strand

-t [T/F]

Default: F

The definition line should contain target GI only. This parameter applies only to nonredundant databases. When set, only the definition line corresponding to the GI is reported, not the redundant definition lines. No such mechanism exists for accession numbers; redundancies are always reported.

-T [T/F]

Default: F

Gets taxonomy information from an NCBI-formatted BLAST database. The downloadable FASTA files don't allow this feature; only the preformatted will work. The preformatted databases can be found at ftp://ftp.ncbi.nlm.nih.gov/blast/db/FormattedDatabases/.