Fano inequality

From Scholarpedia

Robert Mario Fano (2008), Scholarpedia, 3(10):6648. revision #50378 [link to/cite this article]

Curator: Dr. Robert Mario Fano, Ford Professor of Engineering, Emeritus, EECS Department, MIT, Cambridge, MA

The inequality that became known as the Fano inequality pertains to a model of communications system in which a message selected from a set of N possible messages is encoded into an input signal for transmission through a noisy channel and the resulting output signal is decoded into one of the same set of possible messages. Successive input messages are assumed to be statistically independent.

Contents

Origin

Let X be the set of input messages, Y the corresponding set of output messages and P(x_k , y_j) = P(y_j) P(x_k \mid y_j ) be the joint probability of message x_k being transmitted and message y_j being received. An error occurs when the output message differs from the input message, that is when j \ne k. Let P(e) be the probability of this event and H(E)= -P(e) \log P(e) - (1- P(e)) \log (1- P(e))be the associated binary entropy.

The conditional entropy H(X \mid Y) = \sum_{j} P(y_j)  H(X \mid y_j ) represents the average amount of information lost because of the channel noise and is referred to as the “equivocation”. It is the average value of

(1)
H(X \mid y_j ) =  - \sum_{k} P(x_k \mid y_j ) \log P(x_k \mid y_j )

which is the average amount of information still needed to specify the input message when a particular output message y_j is specified.

The Fano inequality is

(2)
H(X \mid Y) \le H(E) + P(e) \log (N-1)

It first appeared as Eq. 4.35 in the 1953 edition of the lecture notes distributed to M.I.T. students in the graduate subject Statistical Theory of Information, and, later on, as Eq, 6.16 in the final textbook published in 1961 (Fano, 1961).

The inequality resulted from an early attempt to relate the equivocation, the information theory measure of the effect of channel noise, to the probability of error, the traditional practical measure. It was originally based on the following heuristic argument.

The equivocation, because of its very definition, cannot exceed the information provided about the input messages by any procedure capable of correcting output errors. The first step in correcting output errors is to specify for each input message whether the corresponding output message is correct or incorrect. The amount of information provided by this specification is given by the binary entropy H(E), the first term on the right hand side of (2). Then, for each incorrect output message the correct input message must be identified. The number of possible input messages is N-1 because the output message is known to be incorrect. Thus the amount of information necessary to identify the input message cannot exceed \log(N-1), and the corresponding average amount of information to be provided cannot exceed P(e) \log(N-1), the second term on the right hand side of (2).

Derivation

The inequality may be formally derived as follows. Let

P(e | y_j ) = \sum_{k \ne j} P(x_k \mid y_j )

be the probability of error when y_j is the output message, that is, when the input message is any x_k with k \ne j. Then,

1 - P(e \mid y_j) = P(x_j \mid y_j)

is the probability of the output message being correct, that is, of the input message being x_j, the one corresponding to the given output message y_j. The corresponding binary entropy is

H(E \mid y_j) = - P(e \mid y_j) \log P(e \mid y_j) - (1- P(e \mid y_j)) \log (1- P(e \mid y_j))

The probability of message x_k being the correct input message when it is known that the output message y_j is incorrect is

P_c (x_k \mid y_j) =  P_{k \ne j} (x_k \mid y_j) / P(e \mid y_j)

Now, (1) can be rewritten in the form

\begin{align} H(X \mid y_j) & = H(E \mid y_j) + P(e \mid y_j) \log P(e \mid y_j)  - P(e \mid y_j) \sum_{k \ne j} P_c (x_k \mid y_j) \log P(x_k \mid y_j)\\ & = H(E \mid y_j) - P(e \mid y_j) \sum_{k \ne j} P_c (x_k \mid y_j) \log P_c(x_k \mid y_j)\\ & = H(E \mid y_j) + P(e \mid y_j) H(X_{k \ne j} \mid y_j) \end{align}

where

H(X_{k \ne j} \mid y_j) = -\sum_{k \ne j} P_c(x_k \mid y_j) \log P_c(x_k \mid y_j)

is the entropy of an ensemble of N-1 elements whose value cannot exceed \log (N-1). Thus,

H \left( X \mid y_j \right) \le H(E \mid y_j) + P(e \mid y_j) \log (N-1)

which, when averaged over the Y ensemble, yields

H(X \mid Y) \le H(E \mid Y) + P(e) \log (N-1)

which, in turn, yields (2), since H(E) \ge H(E \mid Y).

References

  • Fano, RM. “Transmission of Information”, the M.I.T. Press and John Wiley and Sons, New York & London, 1961.

Internal references

  • Tomasz Downarowicz (2007) Entropy. Scholarpedia, 2(11):3901.


Recommended Reading

  • Cover, T.M. and Thomas, J.A. (1991). Elements of information theory. John Wiley & Sons, New York, NY.

External Links

See Also

Information Theory, Entropy


Robert Mario Fano (2008) Fano inequality. Scholarpedia, 3(10):6648, (go to the first approved version)
Created: 20 February 2008, reviewed: 18 October 2008, accepted: 23 October 2008
Suggested by: Mr. Nicolau Leal Werneck, USP, São Paulo, Brazil
Invited by: Dr. Eugene M. Izhikevich, Editor-in-Chief of Scholarpedia, the peer-reviewed open-access encyclopedia
Action editor: Dr. Francesco Vatalaro, University of Rome Tor Vergata, Italy
Assistant editor: Mr. Ian Stevenson, PhD Student, Northwestern University, Chicago, IL, USA
For authors