MASC 35000 - Concept and Programmer's Experiences
This chapter contains a description of the digital signal processor MASC 3500 that is designed and produced by a German company Intermetall-Micronas, previously part of ITT.
Here, I will present my personal opinions coming from two years of DSP programming. Basically, I will compare the MASC 3500 with a well known DSP Motorola 56001/56002.
The basic processor architecture is the HARVARD architecture.
There are five memory and register zones:
1. data 0 -D0
2. data 1- D1
3. program – P
4. register 0 - R0
5. register 1 - R1
The register zones contain all pointers, control and status registers (interfaces, address generators, hardware loop counter...). Also there are 64 general propose 20 bit registers (GPs) that are proved to be very useful in the transfers of parameters between subroutines and in storing of variables (e.g. counters, mode settings...). This processor is not stack-oriented but the GP zone is frequently used to simulate the stack.
It is a pipelined RISC machine and one instruction is fetched each clock cycle, but execution time lasts for several clock cycles e.g. there are six cycles of execution for the multiply and accumulate instruction that is the longest execution period in the processor. This pipeline concept introduces a need for implementation of parallel processes. This parallelism is strongly supported by larger number of accumulators (8 accumulators, but Motorola 56001/56002 has only two accumulators) that allows a programmer to implement a number of parallel processes. The simplest example of the parallelism that I have frequently used is the execution of one process and fetching and pre-computing of values required by a core procedure of a following process. A more complex case of parallel processes is implemented when the "splitting" of the processor is employed. Four accumulators are always used to extract the control information from the incoming bit-stream and the remaining four accumulators are dedicated to speech decoding.
The instruction and data word length is reduced to 20 bits. This short word length is basically used to reduce the IC size. Attempting to enrich the instruction set and to improve the programming, Intermetall planned to introduce the 24 bit word length in the following generation of MASC 3500.
Three data and address buses
2. A/ D0-data
Two address generators are provided. They are used to access the data memory space. There are two disadvantages that I have noticed. The first one is the lack of direct access to the data memory. The only way to write or read from the memory is to set the required pointers, modulo and configuration registers before any memory access. Once these registers are set there is only possibility of incrementing or decrementing the pointers. Which is the second disadvantage. According to my experience table look-up could be simplified if these features are supported. This is obvious in the case of Motorola 56001/56002.
A special hardware solution supports the loops: hardware loops. It is very simple to set a loop but only one hardware loop can be executed at a time. This limitation reduce the efficiency of the program code that is obvious if we compare it with Motorola 56001/56002 that supports nested hardware loops.
The interface of the processor are:
Serial interfaces (SIO). Two pairs of input-output digital serial interfaces connected to both data memory zones.
Parallel 20 bit I/O interface (PIO)
I2C slave interface. It is usually used for reception of controlling information coming from a central controlling unit
The task scheduler is a concept of event handling (e.g., serial transmit register is empty, serial input register is full...) that is substitution for an interrupt system. If an event occurs it sets "task request signal" that is stored in a register system called the scheduler. There is the list of task priorities. If "jump via scheduler" instruction is executed a procedure corresponding to the task with the highest priority is executed. The program is not interrupted by the event but is informed about events request. The program serves all the requests when it is allowed to, usually at some specific point in the program. That convenient point is usually where resources (buffers, registers and time) are freed and prepared to meet the requirements of the task procedures. According to my experience this concept is very useful when the events have a periodic, i.e., regular characteristic in time. Data exchange between the processor and analog interface via SIO in speech and audio processing applications is an example where this concept exhibits its positive performances. In the case where irregular, i.e., non-periodic or burst events are present in the system this concept could fail to conveniently handle the task requests. An example is data exchange between the central controlling unit (controller) and the DSP via PIO where the data transmission rate is not constant but depends on bit-stream contents extracted by the controller. In that case a special protocol of data exchange was needed and it was cumbersome.
32 bit ALU supports the standard DSP ALU instruction set. A useful instruction that plays very important role in implementation of floating point CELP speech coding algorithms is EXP. It gives binary exponent of a operand. This instruction combined with arithmetical shift supports very fast simulation of floating point arithmetic.
Multiply and Accumulate (MAC) building block supports the operation: A=A+X*Y,
A-32 bit accumulator with format:
+/- 2exp7...2exp0 .2exp-1...2exp-19...2exp-23
X:20 bit memory or accumulator :
+/- . 2exp-1...2exp-19
Y:20 bit memory or accumulator :
+/- . 2exp-1...2exp-19
According to my experience the main disadvantage of this block is loss of lower 15 bits (20bits * 20bits => 39 bits, but only 24 bits are stored in the accumulator). Sometimes, during the implementation of recursive structures (e.g., backward predictions in CELP algorithms or IIR structures) this bit loss could be unaffordable. In those cases simulation of floating point was required. Unlike the MAS 3500, Motorola 56001/56002 does store all the bits of the multiply operation result (24 bits *24 bits => 47 bits).
DMA support the data transfer between the digital interfaces and data memory. Here is the sequence of operations that I frequently used to exploit DMA system :
This DMA data transfer between the digital interfaces and data memory proves to be very efficient in the case of block-of-samples based algorithms (e.g., CELP speech coding, FFT based audio coding (AC.3)). The program is executed in parallel with reception and/or transmission of data supported by DMA.
Changeability of program and data RAM. There are 512 (D0)+512 (D1) words of the data memory that could be configured as the program (1024 words) memory. I used this option in application that need a self-modifying program code. For example, a program code is treated as a data steam that is fetched via some digital interface (e.g., from an incoming bit-stream). It is placed in the changeable data zone and when the transfer is over it is switched to the program zone and executed. I applied this scenario in the application where a number of different speech and audio coders are stored in the external memory and depending on the controlling information a specific coder is downloaded into the internal program memory. This option strongly supports the implementation of multistandard applications and downloading of program code from an external source.
MASC 3500 is NOT supported by a C compiler. Only assembler programming is available.
The tools that support development of applications are:
PC based simulator