next up previous
Next: HUFFMAN CODES=FUN Up: Intro to Information Theory Previous: Source Coding Thm

Data Compression

Take source letter sequences and code so that you use as few bits on average as you need to, i.e. Video/Voice Audio compression $\rightarrow$ more playing time on CD because you can pack more signals into it.
Data compression $\Rightarrow$ compress (unix) zip(PC) etc.

HOW!!!??? FORMALITIES
need $a_i\rightarrow c_i$ codeword l(ci)=li, where l(ci) is length So, $x_1x_2x_3...x_n \Rightarrow c(x_1)c(x_2)...c(x_n)$
$\Rightarrow$ average length per source letter $\frac{1}{n}\sum_{i=1}^n l(c_(x_i))$
This means

\begin{displaymath}E(L)=\lim_{n\rightarrow\infty}\frac{1}{n}\sum(c(x_i))=\sum_xp(x)l(c(x))\end{displaymath}

Our job find mapping $a_j\rightarrow c_j$ which minimizes E(L) (also called $\overline{R}$
Can be shown that $\sum D^{-lx} \le 1$ for any code, where D is code alphabet size. This can be used to show that the optimum code satisfies

\begin{displaymath}H(X)\le E(L)\le H(X)+1\end{displaymath}



Christopher Rose
1999-02-24