Algorithms, Design, Code and more: Shanon Entropy and Information Gain

Monday, November 11, 2013

Shanon Entropy and Information Gain

Shanon's Information Gain/ Entropy theory gets applied a lot in areas such as data encoding, compression and networking. Entropy, as defined by Shanon, is a measure of the unpredictability of a given message. The higher the entropy the more unpredictable the content of the message is to a receiver.

Correspondingly, a high Entropy message is also high on Information Content. On receiving a high Entropy/ high Information Content laden message, the receiver has a high Information Gain.

On the other hand, when the receiver already knows the contents (or of a certain bias) of the message, the Information Content of the message is low. On receiving such a message the receiver has less Information Gain. Effectively once the uncertainty about the content of the message has reduced, the Entropy of the message has also dropped and the Information Gain from receiving such a message has gone down. The reasoning this far is quite intuitive.

The formula for Entropy (H) calculation:
H(X) = -Summation[ p(x) * log( p(x) )] over all possible values/ outcomes of x, i.e. {x1, ..., xn}
where p(x) = probability of each of the values/ outcomes of x {x1, ..., xn}.
The log is in a certain base b.

(All calculations in base 2)
Eg 1.a: In the case of a single fair coin toss:
x = {H, T}
& p(x) = {1/2, 1/2}

H(X) = -[1/2 * log(1/2) + 1/2 * log(1/2) ] = 1

Eg 1.b: For a biased coin toss, with three times higher likelihood of a Head:
x = {H, T}
& p(x) = {3/4, 1/4}

H(X) = -[3/4 * log(3/4) + 1/4 * log (1/4) ] = 0.811 (Entropy is lower than 1.a, Information Gain is lower).

Eg 1.c: For a completely biased coin toss, which two Heads:
x = {H, T}
& p(x) = {1, 0}

H(X) = -1[ 1*log(1) + 0*log(0) ] = 0 (Entropy is zero)

The Entropy (& unpredictability) is the highest for a fair coin (example 1.a) and decreases for a biased coin (examples 1.b & 1.c). Due to the bias the receiver is able to predict the outcome (favouring the known bias) in the later case resulting in a lower Entropy.

The observation from the (2-outcomes) coin toss case generalizes to the N-outcomes case, and the Entropy is found to be highest when all N-outcomes are equally likely (fair).

Algorithms, Design, Code and more

Monday, November 11, 2013

Shanon Entropy and Information Gain

No comments:

Post a Comment