ITU-T Recommendation G.723.1 and MASC 3500

G.723.1 is a definition of a dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. The algorithm of this speech coder is based on the CELP principle. The task of the project was to implement the speech coder and decoder operating in full duplex data transmission mode on MASC 3500.

The ITU-T C-language reference fully defines the algorithm. Interoperability between the implementation of the codec on MASC 3500 and the ITU-T C-language codec was mandatory. There are two ITU-T G.723.1 specifications: the integer and floating point definitions of the recommendation. The floating point definition is more suitable for MAS 3500 than the integer one. The simulations using the C-code implementation were necessary to determine the dynamic range of the variables and to prevent overflows or underflows.

Here is a list of the blocks that I was in charge of and that I implemented on the processor. I am going to emphasis the fundamental problems that I found during the implementation.

High Pass Filter is the second order IIR filter that removes the DC component. The implementation was straightforward.

LPC Analysis was realized as a conventional Levinson-Durbin algorithm. The 10th order Linear Predictive analysis is used. This procedure is not time consuming but its results are used in the following blocks as the IIR filter coefficients. Therefore the accuracy of the implementation was of a prime interest. The main operations of the Levinson-Durbin algorithm were implemented using simulation of floating point arithmetic.

Formant Perceptual Filtering employs the 10th order IIR filter that is implemented straightforwardly.

Pitch Estimation procedure determines the open loop estimation that represents the periodic component of the speech signal. The basic operation of this procedure is auto-correlation. Attempting to improve the calculation accuracy I have applied the block floating point approach that does not introduce significant requirements of the processing time. According to my experience, the block floating point arithmetic is a very good trade-off between the processing requirements and the accuracy in the case of auto-correlation and/or cross-correlation operations.

Harmonic Noise Shaping is the comb filtering where the filter order, delay and coefficient depend on Pitch Estimation that is described above. This was a straightforward implementation.

Pitch Predictor is the fifth order pitch predictor. It results in a closed loop pitch lag and the pitch predictor quantized gain. Two code-books are used to quantize the pitch gain. The cross- and auto-correlation are the basic operations of the procedures in this block. This block is executed on a sub-frame basis (60 samples) and takes a significant part of time requirements of the coder implementation. Therefore, block floating point arithmetic is applied only in a very few most critical procedures of the block. Pitch Decoder is part of this module (the CELP algorithms always implement a coder with the internal decoder)

Impulse Respond Calculator, Memory Update and Zero Input Response are implemented straightforwardly.

Transfer of audio samples between the analog interface, coder (speech to be coded) and decoder (speech that is decoded) is realized via SIO using DMA scheme. These DMA procedures require that processing time for the coding or decoding of one frame (240 speech samples) must be faster than the period of reception or transmission of the frame via SIO.

Transfer of coded speech is realized on the frame basis in burst transmission mode via PIO.

The codec approximately required :

38 MIPS

10000 words of data memory and constants

9000 words of program memory.

Back to Main