next up previous
Next: Data Compression Up: Intro to Information Theory Previous: Entropy Rate

Source Coding Thm

Let $R=\sum_x l(x)p(x)=E(l)$
A source with entropy rate H can be coded with arbitrarily small probability of coding failure at any rate R( bits/source output) if R>H. However, if R<H, the failure probability is bounded away from zero (no matter how hard you try)

Aside
This makes the code for (III) from before OPTIMAL! Can't do any better

Why you ask?
Look at long sequence of source letters and only code the "typical ones"
If $\underline{X}$ is typed then by LLN

\begin{displaymath}p(\underline{X}\ne\underline{x})\approx \prod_{j=1}^J p_j ^{n p_j} \mbox{\hspace{2cm} (J source letters)}\end{displaymath}

because
typical sequences will have (on average)
$\approx m p_j$ occurrences of letter j
Well

\begin{displaymath}p_j^{n p_j}= 2^{\log p_j^{n p_j}} = 2^{n p_j \log p_j} \end{displaymath}

So

\begin{displaymath}P(\underline{X}=\underline{x})\approx \prod_{j=1}^J 2^{np_j \log p_j}=
2^{n\sum_j p_j\log p_j}= 2^{-n H(X)}\end{displaymath}

HAND WAVE $\Rightarrow$ prob sequence is not typical for large n is negligible
Therefore, number of typical sequences $\approx 2^{n H(X)}$
So code each of the typical with nH(X) bits AND ignore the others!

\begin{displaymath}\frac{\mbox{code digits}}{\mbox{source digit}}=H(X)\end{displaymath}



Christopher Rose
1999-02-24