H-U

H-U
homologous

In sequence analysis, homologous means derived from a common ancestor. Sequences are either homologous or they aren't. It is incorrect to say that sequences are 80 percent homologous unless you mean that there is an 80 percent chance of common ancestry. Use percent identity to describe the similarity of alignments.



hydrophilic

Literally, "likes water." Water is a polar molecule that mixes well with other polar molecules. The charged amino acids K, R, D, and E, are examples of hydrophilic amino acids.



hydrophobic

Literally, "fears water." Nonpolar molecules (like those in oils) don't mix well with water. The amino acids L, I, V, and F are particularly hydrophobic.



Karlin-Altschul

The standard local alignment theory is often called Karlin-Altschul statistics after its founding authors.



lambda, l

The Karlin-Altschul statistical parameter that converts a raw score to a normalized score.



local alignment

An alignment algorithm that finds the optimal subsequence alignment. The alignment may include all letters of each sequence, but it isn't required to do so.



low-complexity sequence

Regions of sequences that are highly predictable?for example, a region that is 90 percent A or T.



methionine

One of the 20 common amino acids. Methionine is abbreviated as M or Met, and is especially important because all proteins begin with a methionine. There is only one codon for this amino acid: ATG.



mutation

Any change in sequence to a DNA molecule.



N-terminus

The start of a protein. In text form, a protein's N-terminus is always at the left.



nat

Contraction for natural log digits. The base e logarithm of a number is in units of nats.



natural selection

A theory founded by Charles Darwin that explains how organisms change over time to better fit their environment. It is based on the principles of variation, heritability, and differential reproduction.



ncRNA

The abbreviation for noncoding RNA. Some RNAs, like tRNAs or rRNAs, don't contain information for protein sequences.



Needleman-Wunsch

Global alignment is often called Needleman-Wunsch after the authors who first described the algorithm.



nucleotide

The basic building block of nucleic acid sequences (DNA and RNA). DNA is made from A, C, G, or T, while RNA contains A, C, G, or U.



nt

The abbreviation for nucleotide.



O(n)

The computational complexity of an algorithm is often described by its asymptotic behavior. O(n) problems grow linearly with the size of the input. O(log2n) grow much more slowly, and O(n2) grow much more quickly.



ORF

Abbreviation for open reading frame. Each strand of DNA has three frames. Any subsequence that doesn't contain stop codons in a particular frame is an open reading frame.



ortholog

Genes that are separated by speciation (i.e., the same gene in different species). This is often approximated as the best reciprocal match between two complete genomes or proteomes.



palindrome

A palindrome in DNA is a sequence that is read the same on the plus and minus strands. For example, the sequence GAATTC is a palindrome. Palindromes and near-palindromes are often sites for DNA-protein interaction. Proteins scanning along DNA "see" a palindrome as the same sequence regardless of which direction they are moving.



PAM

An acronym for Percent or Point Accepted Mutation. PAM scoring matrix names are usually followed by a number (e.g., PAM200), which indicates how many iterations of multiplication were used starting with the PAM1 matrix. The higher number indicates a more distant similarity.



paralogs

Genes that are duplicated within a single genome. Duplication sometimes allows one of the genes to take on a specialized function.



phylogenetics

The study of evolutionary relationships among organisms.



prokaryotes

Organisms that don't contain intracellular organelles. All bacteria are prokaryotes.



proteome

The complete set of all proteins produced by a particular organism. Many proteins undergo post-translational modifications that add or subtract features from a protein. Therefore, a particular mRNA might have many different protein isoforms.



pseudogene

A sequence that looks like a gene but isn't. Most pseudogenes are derived from mRNAs that have been reverse-transcribed back to DNA and inserted into the genome. They have the hallmarks of RNA processing?notably a poly-A tail and no introns.



relative entropy

The average number of bits (or nats) per aligned letter for a given scoring scheme.



repeat

Any class of a sequence that appears multiple times in a genome. Usually, gene families aren't called repeats and the term is used for junk DNA. Some of the most common repeats in the human genome include the ALU and LINE families.



reverse transcriptase

A protein that creates DNA from an RNA template.



RNA

Ribonucleic acid. RNA is chemically similar to DNA but not used strictly for storage. Many RNA molecules have important functions in the cell and may even have enzymatic properties. Some of the most common functional RNA molecules include rRNAs and tRNAs.



RNA polymerase

A protein or multiprotein complex that creates RNA from a DNA template.



ribosome

A complex macromolecule made up of proteins and rRNAs. Ribosomes are responsible for translating mRNAs into proteins.



rRNA

Ribosomal RNA. The ribosome is composed of many specific RNA molecules, and these components are called rRNAs. rRNAs are some of the most abundant RNAs in a cell.



Smith-Waterman

Local alignment is often referred to as Smith-Waterman, after the authors who first described the algorithm.



start codon

ATG. Codes for the amino acid methionine. Many proteins have N-terminal post-translational modifications, and the first amino acid of the mature protein may therefore not be methionine.



stop codon

TAA, TGA, and TAG are the three codons that terminate translation.



sum statistics

A method that determines the aggregate statistical significance of multiple local alignments.



target frequency

The expected frequencies of individual letter pairings. For nucleotide scoring matrices, the target frequency is often summarized by the expected percent identity in sequences with unbiased composition.



transcriptome

The complete set of transcripts for a particular genome. This term is often used to mean the mRNAs of protein coding genes and their alternatively spliced variants.



tRNA

The abbreviation for transfer RNA. tRNAs transfer individual amino acids to the ribosome. Each tRNA molecule has an anti-codon the matches the reverse-complement of the amino acid it carries.



UTR

The abbreviation for an untranslated region. The 5´ and 3´ ends of an mRNA have untranslated regions. These regions sometimes play regulatory roles that change the mRNA's stability, translatability, or localization.