Appendix E. blast2table.pl

Appendix E. blast2table.pl

It's often useful to have both the standard report and tabular data for the same search. This program converts standard WU-BLAST or NCBI-BLAST output to the NCBI tabular format (-m 8) described in Appendix A. It supports concatenated BLAST reports and is more efficient than most full-featured BLAST parsers. blast2table.pl can be used either on a file or in a pipe:

blast2table.pl my_blast_output > my_table_output
blastp nr query | blast2table.pl > table_from_wu-blast

The Unix tee program can create both the standard output and a table at the same time:

blastall -p blastp -d nr -i query | tee standard | blast2table.pl > table

blast2table.pl also has a few useful filtering options (see Table E-1). The following command displays only those with an alignment of over 50 percent identity and with a bit score greater than 20:

blast2table.pl -p 50 -b 20 blast_output

Table E-1. blast2table.pl options

Option

Description

-b

Minimum number of bits for each alignment

-e

Maximum E-value for each alignment or alignment group

-m

Minimum coordinate of the query sequence

-n

Maximum coordinate of the query sequence

-p

Minimum percent identity for each alignment

The complete program is listed next and may be downloaded as blast2table.pl from this book's web site.

#!/usr/bin/perl -w
use strict;
use Getopt::Std;
use vars qw($opt_p $opt_b $opt_e $opt_m $opt_n);
getopts('p:b:e:m:n:');
my $PERCENT = $opt_p ? $opt_p : 0;
my $BITS    = $opt_b ? $opt_b : 0;
my $EXPECT  = $opt_e ? $opt_e : 1e30;
my $START   = $opt_m ? $opt_m : 0;
my $END     = $opt_n ? $opt_n : 1e30;

my ($Query, $Sbjct);
my $HSP = "";
while (<>) {
    if    (/^Query=\s+(\S+)/) {outputHSP(  ); $Query = $1}
    elsif (/^>(\S+)/)         {outputHSP(  ); $Sbjct = $1}
    elsif (/^ Score = /) {
        outputHSP(  );
        my @stat = ($_);
        while (<>) {
            last unless /\S/;
            push @stat, $_
        }
        my $stats = join("", @stat);
        my ($bits) = $stats =~ /(\d\S+) bits/;
        my ($expect) = $stats =~ /Expect\S* = ([\d\-\.e]+)/;
        $expect = "1$expect" if $expect = ~/^e/;
        my ($match, $total, $percent)
            = $stats =~ /Identities = (\d+)\/(\d+) \((\d+)%\)/;
        my $mismatch = $total - $match;
        
        $HSP = {bits => $bits, expect => $expect, mismatch => $mismatch,
            percent => $percent, q_begin => 0, q_end => 0, q_align => "",
            s_begin => 0, s_end => 0, s_align => ""};
    }
    elsif (/^Query:\s+(\d+)\s+(\S+)\s+(\d+)/) {
        $HSP->{q_begin}  = $1 unless $HSP->{q_begin};
        $HSP->{q_end}    = $3;
        $HSP->{q_align} .= $2;
    }
    elsif (/^Sbjct:\s+(\d+)\s+(\S+)\s+(\d+)/) {
        $HSP->{s_begin}  = $1 unless $HSP->{s_begin};
        $HSP->{s_end}    = $3;
        $HSP->{s_align} .= $2;
    }
}
outputHSP(  );

sub outputHSP {
    return unless $HSP;
    return if $HSP->{percent}  < $PERCENT;
    return if $HSP->{bits}     < $BITS;
    return if $HSP->{expect}   > $EXPECT;
    return if ($HSP->{q_begin} < $START or $HSP->{q_end} < $START);
    return if ($HSP->{q_begin} > $END   or $HSP->{q_end} > $END);
    print join("\t", $Query, $Sbjct, $HSP->{percent},
        length($HSP->{q_align}), $HSP->{mismatch},
        countGaps($HSP->{q_align}) + countGaps($HSP->{s_align}),
        $HSP->{q_begin}, $HSP->{q_end}, $HSP->{s_begin}, $HSP->{s_end},
        $HSP->{expect}, $HSP->{bits}), "\n";
    $HSP = "";
}

sub countGaps {
    my ($string) = @_;
    my $count = 0;
    while ($string =~ /\-+/g) {$count++}
    return $count;
}