It's often useful to have both the standard report and tabular data for the same search. This program converts standard WU-BLAST or NCBI-BLAST output to the NCBI tabular format (-m 8) described in Appendix A. It supports concatenated BLAST reports and is more efficient than most full-featured BLAST parsers. blast2table.pl can be used either on a file or in a pipe:
blast2table.pl my_blast_output > my_table_output blastp nr query | blast2table.pl > table_from_wu-blast
The Unix tee program can create both the standard output and a table at the same time:
blastall -p blastp -d nr -i query | tee standard | blast2table.pl > table
blast2table.pl also has a few useful filtering options (see Table E-1). The following command displays only those with an alignment of over 50 percent identity and with a bit score greater than 20:
blast2table.pl -p 50 -b 20 blast_output
Option |
Description |
---|---|
-b |
Minimum number of bits for each alignment |
-e |
Maximum E-value for each alignment or alignment group |
-m |
Minimum coordinate of the query sequence |
-n |
Maximum coordinate of the query sequence |
-p |
Minimum percent identity for each alignment |
The complete program is listed next and may be downloaded as blast2table.pl from this book's web site.
#!/usr/bin/perl -w use strict; use Getopt::Std; use vars qw($opt_p $opt_b $opt_e $opt_m $opt_n); getopts('p:b:e:m:n:'); my $PERCENT = $opt_p ? $opt_p : 0; my $BITS = $opt_b ? $opt_b : 0; my $EXPECT = $opt_e ? $opt_e : 1e30; my $START = $opt_m ? $opt_m : 0; my $END = $opt_n ? $opt_n : 1e30; my ($Query, $Sbjct); my $HSP = ""; while (<>) { if (/^Query=\s+(\S+)/) {outputHSP( ); $Query = $1} elsif (/^>(\S+)/) {outputHSP( ); $Sbjct = $1} elsif (/^ Score = /) { outputHSP( ); my @stat = ($_); while (<>) { last unless /\S/; push @stat, $_ } my $stats = join("", @stat); my ($bits) = $stats =~ /(\d\S+) bits/; my ($expect) = $stats =~ /Expect\S* = ([\d\-\.e]+)/; $expect = "1$expect" if $expect = ~/^e/; my ($match, $total, $percent) = $stats =~ /Identities = (\d+)\/(\d+) \((\d+)%\)/; my $mismatch = $total - $match; $HSP = {bits => $bits, expect => $expect, mismatch => $mismatch, percent => $percent, q_begin => 0, q_end => 0, q_align => "", s_begin => 0, s_end => 0, s_align => ""}; } elsif (/^Query:\s+(\d+)\s+(\S+)\s+(\d+)/) { $HSP->{q_begin} = $1 unless $HSP->{q_begin}; $HSP->{q_end} = $3; $HSP->{q_align} .= $2; } elsif (/^Sbjct:\s+(\d+)\s+(\S+)\s+(\d+)/) { $HSP->{s_begin} = $1 unless $HSP->{s_begin}; $HSP->{s_end} = $3; $HSP->{s_align} .= $2; } } outputHSP( ); sub outputHSP { return unless $HSP; return if $HSP->{percent} < $PERCENT; return if $HSP->{bits} < $BITS; return if $HSP->{expect} > $EXPECT; return if ($HSP->{q_begin} < $START or $HSP->{q_end} < $START); return if ($HSP->{q_begin} > $END or $HSP->{q_end} > $END); print join("\t", $Query, $Sbjct, $HSP->{percent}, length($HSP->{q_align}), $HSP->{mismatch}, countGaps($HSP->{q_align}) + countGaps($HSP->{s_align}), $HSP->{q_begin}, $HSP->{q_end}, $HSP->{s_begin}, $HSP->{s_end}, $HSP->{expect}, $HSP->{bits}), "\n"; $HSP = ""; } sub countGaps { my ($string) = @_; my $count = 0; while ($string =~ /\-+/g) {$count++} return $count; }