8.16 Parse BLAST Reports with Bioperl

The traditional BLAST output format is meant to be human readable, but when your BLAST report is 1,000 pages long, it isn't much fun to read. Sometimes all you want is the names of all sequences that have alignments above 90 percent identity. Such tasks require a BLAST parser that lets you select only the information you want. Many freely available BLAST parsers can be downloaded from the Internet, but the ones in most common use come from the Bioperl project. Bioperl is an open-source community of bioinformatics professionals that develops and maintains code libraries and applications written in the Perl programming language. If your daily routine finds you running BLAST or other sequence analysis applications, learning to use the Bioperl system can save you many hours of work and frustration.

Let's see how Bioperl can help solve the problem posed earlier: to report the names of all sequences that are more than 90 percent identical to your query.

#!/usr/bin/perl -w
use strict;
use Bio::SearchIO;

my $blast = new Bio::SearchIO(
    -format => 'blast',
    -file   => $ARGV[0]);

my %Name;
my $result = $blast->next_result;
while(my $sbjct = $result->next_hit) {
    while(my $hsp = $sbjct->next_hsp) {
        $Name{$sbjct->name} = 1 if $hsp->frac_identical >= 0.9;

print join("\n", sort keys %Name), "\n";

Pretty simple, huh? With BLAST and Bioperl, it's possible to create all kinds of useful applications.