xdget
retrieves files in FASTA format from databases formatted with
xdformat (not formatdb,
pressdb, or setdb). The
database must have been indexed prior to using
xdget (see -I and
-X in the previous section Section 14.4").
Here are a few example command lines. If identifiers contain vertical
bars, as in the second example, you have to enclose the string in
quotes to prevent the shell form interpreting them as pipes. This
isn't required for identifier files.
xdget -n db 12345
xdget -p nr 'gi|11611819|gb|AAG39070.1|'
xdget -n -f db files_of_ids
Given an accession number without a
version, xdget retrieves the latest version
number. This parameter is set explicitly with -A
n. If -A 0
is set, the earliest version number is retrieved.
See also
-d, -N
The -a and -b parameters
retrieve a subsequence. For example, if you want to retrieve just
nucleotides 1 to 100, include -a
1 -b 100.
For nucleotide sequences, if -b is greater than
-a, the sequence is returned as its
reverse-complement.
See also
-b,
-r, -t
Default: 0, end of sequence | |
Ordinarily, when duplicate identifiers are present, only one is
retrieved. With -d, all duplicates are reported.
Having duplicate identifiers is generally not a good idea.
See also
-A, -N
Sets the maximum definition line length. Using definition lines to
store arbitrary sequence data is common. This option is useful when
you don't need the whole definition line.
Appends messages and errors to log file.
Flushes the output stream after each request. This is useful for
preventing I/O deadlocks between communicating processes.
Indicates that files of identifiers are given on the command line.
The file format is one identifier per line.
Prefaces each definition line with
its record number using the gnl namespace. The
format is gnl|xdf|#.
Reports FASTA
files to the named file rather than stdout.
For sequences with duplicate identifiers, the first one is retrieved
by default. It is set explicitly with -N
0. Setting -N
n retrieves the last one. Accession numbers with
version numbers have different rules.
See also
-A, -d
Sets the maximum line length for sequence data. Setting
-P 0 puts the entire sequence
on one line.
Returns the reverse complement for nucleotide sequences.
This option lets you restrict the lookup of identifiers to a
particular database name or tag. For example, to look only in GenBank
sequences, use -T gb. For only
local, use -T lcl. For tags
with multiple identifiers, a numeric suffix identifies which one to
select. For example, -T gb1
selects accessions and -T gb2
selects loci. To prevent lookups in a database name, use zero. For
example, -T gb0 omits GenBank
records.