14.5 xdget Parameters

xdget retrieves files in FASTA format from databases formatted with xdformat (not formatdb, pressdb, or setdb). The database must have been indexed prior to using xdget (see -I and -X in the previous section Section 14.4").

Here are a few example command lines. If identifiers contain vertical bars, as in the second example, you have to enclose the string in quotes to prevent the shell form interpreting them as pipes. This isn't required for identifier files.

xdget -n db 12345
xdget -p nr 'gi|11611819|gb|AAG39070.1|'
xdget -n -f db files_of_ids
-A [n, 0]

Default: n

Given an accession number without a version, xdget retrieves the latest version number. This parameter is set explicitly with -A n. If -A 0 is set, the earliest version number is retrieved.

See also

-d, -N

-a [integer]

Default: 1

The -a and -b parameters retrieve a subsequence. For example, if you want to retrieve just nucleotides 1 to 100, include -a 1 -b 100. For nucleotide sequences, if -b is greater than -a, the sequence is returned as its reverse-complement.

See also

-b, -r, -t

-b [integer]

Default: 0, end of sequence

See -a above.

-d

Default: Off

Ordinarily, when duplicate identifiers are present, only one is retrieved. With -d, all duplicates are reported. Having duplicate identifiers is generally not a good idea.

See also

-A, -N

-D [integer]

Default: Unlimited

Sets the maximum definition line length. Using definition lines to store arbitrary sequence data is common. This option is useful when you don't need the whole definition line.

-e [file]

Default: stderr

Appends messages and errors to log file.

-F

Default: Off

Flushes the output stream after each request. This is useful for preventing I/O deadlocks between communicating processes.

-f

Default: Off

Indicates that files of identifiers are given on the command line. The file format is one identifier per line.

-G

Default: Off

Prefaces each definition line with its record number using the gnl namespace. The format is gnl|xdf|#.

-o [file]

Default: stdout

Reports FASTA files to the named file rather than stdout.

-N [0, n]

Default: 0

For sequences with duplicate identifiers, the first one is retrieved by default. It is set explicitly with -N 0. Setting -N n retrieves the last one. Accession numbers with version numbers have different rules.

See also

-A, -d

-P [integer]

Default: 60

Sets the maximum line length for sequence data. Setting -P 0 puts the entire sequence on one line.

-r

Default: Off

Returns the reverse complement for nucleotide sequences.

-T [string]

Default: Off

This option lets you restrict the lookup of identifiers to a particular database name or tag. For example, to look only in GenBank sequences, use -T gb. For only local, use -T lcl. For tags with multiple identifiers, a numeric suffix identifies which one to select. For example, -T gb1 selects accessions and -T gb2 selects loci. To prevent lookups in a database name, use zero. For example, -T gb0 omits GenBank records.

-t

Default: Off

Translates nt seq.