fastacmd
retrieves sequences, individually or in batches, from BLAST
databases. When using it, you don't have to keep
FASTA files on your file system after you've
formatted the BLAST database. Sequences are stored in a
case-insensitive format, however, so if you use lower- and uppercase
for semantic purposes, this information will be lost.
Here are a few sample command lines using
fastacmd:
fastacmd -d nr -s P02042
fastacmd -d nr -s 12837002,P02042
fastacmd -d nr -D
fastacmd -d est -i file_of_gi
cat file_of_gi | fastacmd -d est -i stdin
The following reference lists the default value for each
fastacmd parameter.
Retrieves all accessions even
duplicates when using -s or -i
to retrieve sequences. If -a
isn't set, only the first accession of duplicates is
retrieved.
Uses Control-A as a nonredundant
definition line separator. This parameter applies only to
nonredundant databases with concatenated definition lines. By
default, a normal space is used as the separator. Using Control-A
unambiguously separates sequence definitions.
The database from which to retrieve sequences.
Dumps the
entire database in FASTA format.
A batch retrieval. The format of the text file is one GI or accession
per line. stdin is a valid file.
cat file_of_gi | fastacmd -d est -i stdin
Prints information about a formatted database. Overrides all other
retrieval options. Needs to be used with -d.
fastacmd -d my_db -I
Sequences line length. The most common values are 50 (a nice round
number), 60 (evenly divisible by 3), and 80 (a traditional terminal
width).
Extracts a region of the sequence. Using
as the start coordinate indicates the actual beginning of the
sequence. Using 0 as the end coordinate indicates the end of the
sequence. A colon and the sequence range are appended to the
identifier to signify the region extracted.
fastacmd -d nr -s AAG39070 -L 10,50
>gi|11611819:10-50 (AF287139) Hoxa-11 [Latimeria chalumnae]
SGPDFSSLPSFLPQTPSSRPMTYSYSSNLPQVQPVREVTFR
Sends the output to the named file or stdout, if
none is named.
Options
- G
-
Guess. Look for a protein first, and then a nucleotide.
- T
-
Protein.
- F
-
Nucleotide.
Retrieves sequences with this PIG.
An identifier of the sequence to
retrieve. The identifier may be a GI or accession. To retrieve
multiple sequences, the identifiers must be separated by commas as
follows:
fastacmd -d nr -s AAG39070,11611819
To retrieve a large number of sequences, using the
-i parameter is more convenient, especially since
there may be limits on the length of command-line strings.
The strand on subsequence. Only used with nucleotide sequences.
- 1
-
Top strand
- 2
-
Bottom strand
The definition line should contain target GI only. This parameter
applies only to nonredundant databases. When set, only the definition
line corresponding to the GI is reported, not the redundant
definition lines. No such mechanism exists for accession numbers;
redundancies are always reported.
Gets taxonomy information from an
NCBI-formatted BLAST database. The downloadable FASTA files
don't allow this feature; only the preformatted will
work. The preformatted databases can be found at ftp://ftp.ncbi.nlm.nih.gov/blast/db/FormattedDatabases/.