blastclust clusters
a database of protein or nucleotide sequences. It outputs rows of
sequence identifiers from the database with clustered sequences
occurring on the same row and clusters sorted from largest to
smallest. The program can generate a list of clusters for input into
another program (e.g., an alignment program such as PHRAP); however,
it should be used only on a relatively small number of sequences
(10-1000) because it runs only on a single computer, and the RAM
requirements quickly exceed most capacities.
Here are a few sample command lines:
blastclust -i my_nucdb -p F -o my_nucdb.clusters
blastclust -i my_pepdb -o my_pepdb.clusters -L 0.7 -S 90
The following reference describes parameters used with
blastclust.
Specifies the number of CPUs to use on a multiprocessor machine.
Requires coverage on both sequences. If set to T,
the program requires both sequences to pass the coverage criteria set
with -L before they are called neighbors and clustered together.
Specifies
a configuration file with advanced options. The configuration file is
simply a list of the options that you commonly use.
The crash recovery option. Set it to complete unfinished clustering.
Set to T if using the -r option
with a file to restore the clustering. Use the same command line as
the crashed run with the same -s, with only
-C, T, and
-r being added. This restarts the run using the
hit list file specified by -r and then appending
to it (as specified by -s).
The input file is a BLAST database, not a FASTA file.
Enables ID parsing in the database-formatted report.
Specifies
the FASTA input file for clustering.
Restricts
the reclustering to the IDs specified in [file].
It can be useful when you have a very large FASTA database and wish
to cluster a subset of sequences.
Specifies the length of coverage threshold.
Input sequences are proteins. Set to F for
nucleotides.
Specifies
the file used to restore neighbors for reclustering. Set
-C to T. This file is created
by the -s command of a previous run. Use it if the
program crashes during a run.
Specifies the file in which to save the hit list. This file can
restore a crashed run and is the input file specified by
-r.
Prints progress messages. Progress is reported to standard output if
no file is specified.
Default: Protein 3, Nucleotide 32 | |
The word size; same as
blastall.