This section includes detailed descriptions of each format, followed by an example. To create the examples, the authors performed a BLASTP search of the coelacanth HoxA11 protein sequence (AAG39070) versus the HoxDB.pep database, which is included in the online supplement.
Option 0 is the default alignment and the classic BLAST format. The definition line of the subject is given at the top of each entry, marked with the greater-than sign (>) and followed with the subject's total length. For each HSP of a subject, the score, expect, identities, positives, and gaps are reported and followed by a pairwise alignment. For the pairwise alignment in Figure A-1, the query sequence is shown on the first row and the subject on the third row. Gaps are represented in each as a dash (-). Between the query and subject lies the alignment row, which shows the residue for identities, a plus (+) for positive scoring alignments, and a dot (.) for mismatches. In BLASTN alignments, the middle row has vertical bars (|) for identities and nothing for mismatches.
All query-anchored formats (1-6) are multiple-sequence alignments. They share the same general form, with the query repeated at the top of each line and all matching subjects aligned on subsequent lines. The difference between showing identities and not showing them is counterintuitive. For the options that show identities (1 and 3), identical residues are symbolized with a dot (.), similar amino acids are in uppercase, and mismatches are in lowercase. For the options without identities (2, 4, 5 and 6) every residue is shown with identities and similar residues in uppercase and mismatches are in lowercase.
In the format shown in Figure A-2, the identical residues are represented by a dot (.) and insertions and deletions are represented in the subject sequences, but not the query.
This format (Figure A-3) is the same as Option 1 (Figure A-2), but all residues are shown with identities and positives in uppercase and mismatches in lowercase. As with Option 1, insertions and deletions are represented in the subject sequences, but not the query.
Same as Option 1 (Figure A-2), but insertions or deletions in Figure A-4 are padded in the query, rather than shown in the subjects. This is a more compact format than the nonflat one, which has residues dangling down to represent insertions within the subject sequences.
This format is the same as Option 2 (Figure A-3), but insertions or deletions in Figure A-5 are padded in the query, rather than shown in the subjects. Thus, the entire multiple sequence alignment is flat, without subject insertions dangling down.
Blunt-end options extend the HSPs out to the beginning and end of the entire query sequence so that each HSP is shown in all lines of the alignment. In Figure A-6, the HoxD11_chick and HoxD11_human entries have additional HSPs that are seen later in the alignment (not shown). You see the dashes (-) at the beginning of the second HSP of each, which makes the entry blunt.
Same as Option 5 (Figure A-6), but the insertion and deletion characters in Figure A-7 are inserted into the query, making it flat, without any dangling insertions in the subject alignment lines.
The BLAST eXtensible Markup Language (XML) is specified by the Data Type Definition (DTD) file, NCBI_BlastOutput.dtd, which is located at http://www.ncbi.nlm.nih.gov/dtd/. This format isn't meant to be human-readable, but Figure A-8 shows the first few lines from the BLASTP search so you can get a feel for how XML looks.
Tabular formats are very nice for easy parsing. All fields in Figure A-9 are tab-delimited. The fields are query id, subject id, percent identity, alignment length, mismatches, gap openings, query start, query end, subject start, subject end, e-value, and bit score.
The format of Figure A-10 is the same as that in Option 8 (Figure A-9), except it includes four header lines at the top of each query that describe the BLAST program, the query, the database, and the fields of the alignment.
Abstract Syntax Notation One (ASN.1) is an International Standards Organization (ISO) data format. ASN.1 is used to mark up data for reliable, robust exchange. Like XML, it isn't meant to be human-readable, but Example A-1 shows the first few lines of a BLASTP search so you can get a feel for the syntax. For more information on the NCBI use of ASN.1, see http://www.ncbi.nlm.nih.gov/Sitemap/Summary/asn1.html.
Seq-annot ::= { desc { user { type str "Hist Seqalign" , data { { label str "Hist Seqalign" , data bool TRUE } } } , user { type str "Blast Type" , data { { label str "BLASTP" , data int 2 } } } } , data align { { type partial , dim 2 , score { { id str "score" , value int 699 } , { id str "e_value" , value real { 139321249, 10, -85 } } , { id str "bit_score" , value real { 273862735, 10, -6 } } , { id str "num_ident" , value int 140 } } ,
This option produces the same ASN.1 output as Option 10, but in binary format. It isn't readable and therefore isn't shown.