In common usage, the word information conveys many things. Forget everything you know about this word because you're going to learn the most precise definition. Information is a decrease in uncertainty. You can also think of information as a degree of surprise.

Suppose you're taking care of a child and the response to every question you ask is "no." The child is very predictable, and you are pretty certain of the answer the next time you ask a question. There's no surprise, no information, and no communication. If another child answers "yes" or "no" to some questions, you can communicate a little, but you can communicate more if her vocabulary was greater. If you ask "do you like ice cream," which most children do, you would be informed by either answer, but more surprised if the answer was "no." Qualitatively, you expect more information to be conveyed by a greater vocabulary and from surprising answers. Thus, the information or surprise of an answer is inversely proportional to its probability. Quantitatively, information is represented by either one of the following equivalent formulations shown in Figure 4-1.

The information, *H*, associated with some
probability *p*, is by convention the base 2
logarithm of the inverse of *p*. Values converted
to base 2 logarithms are given the unit bits,
which is a contraction of the words binary and
digit (it is also common to use base
e, and the corresponding unit is
nats). For example, if the probability that a
child doesn't like ice cream is 0.25, this answer
has 2 bits of information (liking ice cream has 0.41 bits).

It is typical to describe information as a message of symbols emitted from a source. For example, tossing a coin is a source of head and tail symbols, and a message of such symbols might be:

tththttt

Similarly, the numbers 1, 2, 3, 4,
5, and 6 are symbols emitted from a six-sided die source, and the
letters A, C, G, and T are emitted from a DNA source. The symbols
emitted by a source have a frequency distribution. If there are
*n* symbols and the frequency distribution is
flat, as it is for a fair coin or die, the probability of any
particular symbol is simply 1/*n*. It follows that
the information of any symbol is log2(n), and
this value is also the average. The formal name for the average
information per symbol is entropy.

But what if all symbols aren't equally probable? To compute the entropy, you need to weigh the information of each symbol by its probability of occurring. This formulation, known as Shannon's Entropy (named after Claude Shannon), is shown in Figure 4-2.

Entropy (*H*) is the negative sum over all the
symbols (n) of the probability of a symbol
(pi) multiplied by the log base 2 of the
probability of a symbol
(log2pi).
Let's work through a couple of examples to make this
clear. Start with the flip of a coin and assume that
h and t each have a
probability 0.5 and therefore a log2 probability of -1. The entropy
of a coin is therefore:

- ( (0.5)(-1) + (0.5)(-1) ) = 1 bit

Suppose you have a trick coin that comes up heads 3/4 of the time. Since you're a little more certain of the outcome, you expect the entropy to decrease. The entropy of your trick coin is:

- ( (0.75)(-0.415) + (0.25)(-2) ) = 0.81 bits

A random DNA source has an entropy of:

- ( (0.25)(-2) + (0.25)(-2) + (0.25)(-2) + (0.25)(-2) ) = 2 bits

However, a DNA source that emits 90 percent A or T and 10 percent G or C has an entropy of:

- ( 2(0.45)(-1.15) + 2(0.05)(-4.32) ) = 1.47 bits

In these examples, you've been given the frequency distribution as some kind of truth. But it's rare to know such things a priori, and the parameters must be estimated from actual data. You may find the following Perl program informative and entertaining. It calculates the entropy of any file.

# Shannon Entropy Calculator my %Count; # stores the counts of each symbol my $total = 0; # total symbols counted while (<>) { # read lines of input foreach my $char (split(//, $_)) { # split the line into characters $Count{$char}++; # add one to this character count $total++; # add one to total counts } } my $H = 0; # H is the entropy foreach my $char (keys %Count) { # iterate through characters my $p = $Count{$char}/$total; # probability of character $H += $p * log($p); # p * log(p) } $H = -$H/log(2); # negate sum, convert base e to base 2 print "H = $H bits\n"; # output