# New High Speed Flexible Turbo-Block-Decoding

Kalyan Koora, Sven Zeisberg and Adolf Finger

Dresden University of Technology, Communications Laboratory, 01062 Dresden, Germany e-mail: koora@ifn.et.tu-dresden.de

Abstract: Recently a new class of codes, turbo codes have been introduced for channel coding in communication systems. Due to the iterative decoding scheme of these codes, it is possible to achieve results near to the Shannon limit if decoding is done continuously.

In this paper a turbo block decoding scheme is presented which allows variable block lengths and improves the correction capacity of these codes. The first prototype of the hardware implementation is introduced and its high flexibility and variable speed grades are enlightened. Simulated and measured results of turbo block codes in MEDIAN surroundings using OFDM are presented.

# **1. Introduction**

The MEDIAN<sup>1</sup> projects main goal is to implement a high speed wireless customer premises local area network (WCP-LAN) for multimedia applications. The present pilot project relies on a multi-carrier modulation scheme, Orthogonal Frequency Division Multiplexing (OFDM) of 286 sub-carriers. Wireless ATM networking at a net data transmission rate of 150 Mbit/s over the 60 GHz indoor radio channel is supported. In MEDIAN digital data is transmitted within extended ATM frames. One OFDM symbol carries one ATM cell plus related MEDIAN MAC overhead and redundancy information to protect the symbol from channel distortion. The modulation of the sub-carriers is DQPSK, differential to the adjacent sub-carrier. Thus, following the ATM philosophy, each ATM cell is transmitted independently from each other.

This block-wise data transmission implies a block oriented channel coding scheme as the most natural choice. Examples of such coding schemes are Reed-Solomon and BCH. But applying such coding techniques is not optimal. It has been shown during the early stage of MEDIAN project [1] that the choice of a convolutional coding scheme (either as an inner coding in a concatenated scheme or as turbo coding) outperforms a pure block coding scheme. This helds especially for lower signal-to-noise ratios, which is the critical case in a radio system with limited transmission power. The convolutional coding schemes are naturally not block oriented and the respective decoders work optimal on continuous data streams only. But introducing tail bits after a certain number of information bits allows resetting of the decoder and thus allows block oriented decoding while decreasing the loss of coding gain due to block-wise decoding appreciable.

Since introduction of turbo codes in 1993 by Berrou et. al., many results for block-wise decoding using these codes were presented. All of them could not solve the discrepancy between block decoding and natural continuos behaviour of the convolutional code applied. Due to the application of the 'Recursive Systematic Codes (RSC)' the difficulties in terminating the trellis of both decoders (resetting) were pointed out. The main problem is 'the resetting of both the decoder stages used within one iteration'. First, at the symposium on turbo codes in Brest, Sept.97, a possible solution was proposed by the authors [2]. This paper deals with a perfect scheme of Turbo-Block-Decoding were the termination of all the trellis is shown. The present paper is divided in following three main parts introducing the new modified turbo coding scheme, describing the CODEC system implementation in software and in hardware and showing the advantages of the new resetting scheme compared to normal block decoding by simulations and hardware measurement results for MEDIAN system environment.

<sup>&</sup>lt;sup>1</sup> European Research Project AC006 in the framework of the ACTS program of the 4th Research program of the European Commission

#### 2. Modified Turbo-Encoder for Block Transmission

In contrast to the Turbo-Block-Codes described in [3], this new modified coder not only makes use of the polynomial division property along with the linearity character of the convolutional codes but also utilises the tail bits as described in [2]. This ensures that the coder starts and ends in the zero state for original and interleaved block of information bits. Figure 1 illustrates this process.



Figure 1: Modified Turbo Encoder.

The information block ' $\overline{d}$ ' of 'N'-bits, where N represents the size of a block, is first passed through the RSC encoder and at the same time fed into an interleaver of size equal to  $(N + N_t)$  with ' $N_t$ ' denoting the number of tail bits, i.e. size of ' $\overline{t}$ '. The size of vector  $\overline{t}$  depends on the coder memory. The interleaver should assure that each bit d(t) at the input should be equal to the output bit d(t+x), with  $x = (N + N_t + n * l)$ , 'l' represents the grade of the reset polynomial [2] and 'n' is a natural number greater than or equal to 0. Note, this reduces the number of possible interleaver patterns.

A logic circuit inspects the present state of the RSC encoder and generates the respective bit, with which the encoder is driven to the zero state in short time. After the last information bit has been passed through the coder, the ' $S_I$ ' switch selects  $N_t$  tail bits. If the sum of the data bits and the tail bits is not a multiple of the grade of reset polynomial, then ' $N_0$ ' number of zero bits ' $\overline{0}$ ' are introduced into the data input of the RSC encoder by switching ' $S_3$ '.

$$N_0 = i * l - (N + N_t)$$
  

$$\geq 0 \tag{1}$$

'*i*' is the smallest integer which satisfies the required condition. During the insertion of zero bits the clocking of the interleaver is stopped. At the same time the output of the encoder is ignored, as these bits are simply used to satisfy the polynomial division property. Once the last zero bit has been passed through the encoder the reading of the interleaver data is done by selecting the ' $S_2$ ' switch. Through this modified method the number N is not fixed to a value which is a multiple of the length of the reset polynomial. This gives more freedom combining different data block lengths with certain turbo codes. Since the blocks are to be independent of each other, a continuous encoding is not possible.

For instance, a RSC encoder with the octal representation of the polynomials  $\{13,15\}$  is considered. The recursion polynomial is  $G_1(D) = 1 + D^2 + D^3$ .  $G_1(D)$  divides  $1 + D^7$  without any remainder, therefore the grade *l* of the reset polynomial is 7 and requires  $N_t=3$  tail bits. If *N* is 440, which is the size of an extended ATM as considered in MEDIAN, then the total size of the interleaver would be 443 with 22 rows and 21 columns where the last row is filled with only two elements. The interleaver is written with the data in rows and the reading is done column by column. This ensures that each input bit of the interleaver is written out after (443+7\*n). If a search for an optimal interleaver is performed [5], then the shuffling of the order of reading is allowed only within a column. In the considered example there is a need of 5 zero bits ( $N_0$ ) so that the size of input frames of the encoder is a multiple of its reset polynomial grade 7. Now the systematic part 'X' consists of the data bits and the tail bits. The redundancy part 'Y' is punctured to obtain the required code rate.

This new kind of Turbo-Block-Coding conveys an extra redundancy information to the soft-in-softout decoders and conduces to better correction capacity.

# **3. CODEC System Implementation**

Simulation tool aided algorithm test was the first step taken to implement the above described turbo decoder. For this purpose a C program realising a Soft-Output-Viterbi-Decoder (SOVA) was written for the simulation tools SPW and COSSAP. After successful implementation and testing of the floating point software code, a bittrue model of the decoder was programmed. There all calculations were restricted to integer numbers and the domain of numbers was kept variable. This made it easy to test the influence of the input, output and internal word lengths on the correction capacity of the decoder. An extra care has been taken as soon as the internal results crossed upper and lower bounds of the set domain. Figure 2 shows the difference between floating point simulation and hardware measurements.



Figure 2: Implementation loss.

Figure 3: Hardware circuit of Turbo-Block-Decoder.

A SNR loss of ca. 0.5 dB is achieved by the considered algorithm for the same coding conditions due to the quantisation. After the selection of required word lengths a VHDL code has been prepared to realise the SOVA-Decoder. In a parallel approach the code was fed into Cadence and Altera software tools. Concentrating on the flexibility of the codec for further tests, e.g. use of different random interleavers, influence of the word lengths in real systems etc., and taking the cost point of view into consideration it was finally decided to test the decoder using field programmable gate arrays (FPGA) from Altera. Figure 3 depicts the hardware circuit which realises the flexible Turbo-Block-Decoder.

Since this is the first realisation of the hardware the focus point was on functionality of the decoder. Using the slowest speed grade of the FPGA, a maximum throughput of 14 MBit/s channel data was measured. Controlling of hardware, coding, distortion and evaluation of results was taken up by a PC which is the bottle neck for fast tests. The system design was further developed to increase the throughput up to 30 MBit/s. It is possible to achieve a throughput which is about twice the present rate by using faster speed grades of FPGAs.

#### 4. Simulation and Measurement Results

To test efficiency of the new modified block decoding of turbo codes, at first simulations were carried out using COSSAP simulation tool with real number representation of algorithms internal results. The block length N was fixed to 440 bits which is equal to an extended ATM cell in MEDIAN. Differential QPSK modulation scheme was selected to be independent of carrier phase estimation. Figure 4a and 4b depicts the comparison between turbo block codes with two decoding iterations and Reed-Solomon Coding schemes. The code rates are close to '5/7'. TC1[440;619] and TC2[440;616] represent the AWGN channel with termination of all decoders (new method) and termination of only the second decoder (old method) respectively. In the same way TC3[440;619] and TC4[440;616] represent the multipath channel with LOS environment [4]. Since the new scheme requires more subcarriers, the number of used sub-carriers by the OFDM was increased from 310 to 312. Synchronisation of the receiver was done on a reference symbol inserted into each TDD frame.



Figure 4: Comparison of Turbo Codes (new and old termination scheme) with Reed-Solomon codes for 310/312 used sub-carrier DQPSK-OFDM system in (a) AWGN and (b) LOS environment.

This new kind of block decoding leads to a gain of 0.2dB SNR in AWGN channel and 1.5dB SNR for LOS-MEDIAN modelled channel at a BER of 2e-5. Thus, for multipath channel there is an appreciable increase in coding gain especially for small BER (TC3 compared to TC4). Another effect of Turbo-Block-Decoder, which arises by breaking the decoding scheme at the end of a block without terminating perfectly, is the flattening of the achievable bit error rate even by increasing iteration numbers. Influence of this inconvenience is also decreased using the new method of termination of the block oriented soft output Viterbi decoders. The RS1[440;616] shows the BER for AWGN channel whereas RS2[440;616] denotes for LOS channel model, both are outperformed by the turbo coding scheme (new and old). At BER of 2e-5, the new turbo block code shows a gain of 2.8 dB on SNR for AWGN channel compared with Reed-Solomon codes and 2.5 dB for LOS channel model. The improved coding gain and the substantial decreasing of the flattening level resulting from the new reset method is representative for all code polynomials. Since this algorithm brings a small change in the encoder structure, it can be applied for all types of soft-input-soft-output turbo decoding processors. An other main advantage of this new coding scheme is its ihnerent possibility to process variable input block lengths for the same coding polynomial. This property can be utilised for systems where the block lengths vary for different applications. But the maximum possible block length is a fixed number restricted by the interleaver size.

Figures 5a and 5b represent measured results using the hardware circuit. Modulation of sub-carriers was set to DQPSK, differential to adjacent sub-carriers, and code rate to ½. Selection of the random interleaver was done by a systematic search depending on the weight distribution table of the code words at the output of the encoder after puncturing [5]. The MEDAIN system synchronisation and phase noise was set to ideal. In contrast to the above mentioned software simulation results the truncation path length was 28.



In both channels it has been observed that gain on SNR between iterations 1 and 2 is greater than that between iteration 2 and 3. This natural behaviour is additionally forced due to absence of the optimal weighting of the extrinsic values between decoding iterations and limiting the internal word lengths. At BER of 1e-6, which is interesting for ATM transmission, a total coding gain of 4.5dB has been measured in AWGN channel whereas 11dB coding gain has been achieved in LOS channel.

Theoretical simulation results have been verified developing a hardware turbo decoder. In addition, the possibility to investigate bit error rates less than 1e-7 was created, which was not possible using software running on modern UNIX workstation due to the very long simulation times required. In order to keep certain decoder system parameters like internal word length, truncation path length, weighting factors and selection of optimal interleaver pattern as flexible as possible the hardware had to include some kind of re-configurability (recently called as software radio). Thus, the decoder was realised on logic level using a Field Programmable Gate Array (FPGA). This decoder is not able to reach the required MEDIAN rate of 150 Mbit/s net data, but is only one order of magnitude off the goal. Using current state-of-the-art ASIC technology the development of a turbo decoder based on this FPGA decoder design and investigation experiences is considered to be possible at our facilities.

# Conclusions

The first 30 Mbit/s flexible hardware realisation of this turbo decoder for block oriented transmission using FPGA is presented. The focus is on the flexibility of the decoder configuration. Achieved simulation and measured results using AWGN and time invariant modelled 60 GHz indoor wireless LAN channel are presented. An appreciable coding gain is achieved in both cases using the introduced modified turbo encoder for block transmission.

## Acknowledgements

We like to thank Mr. F. Poegel and Mr. F. Krieger for useful discussions and active help during hardware development. Further, we thank the AC006-MEDIAN group for their support in the early stages. Financial aid provided by "Deutsche Telekom AG" is gratefully acknowledged.

# References

- [1] S. Zeisberg, B. Kull, F. Poegel, P. Pikkarainen and A. Finger, "Channel Coding for Wireless ATM using OFDM", Proceedings of ACTS-Mobile Telecommunications Summit, Granada, Spain, pp. 652 - 658, November 27 - 29, 1996
- [2] K. Koora and A. Finger, "A New Scheme to Terminate all Trellis of TURBO-Decoder for Variable Block Length", Proceedings of the International Symposium on Turbo Codes and Related Topics, Ecole Nationale Supérieure des Télécommunications de Bretagne, France, 3-5 Sep. 1997, pp. 174 - 17
- [3] Claude Berrou, Stéphane Evano, Gérard Battail, "Turbo-block-codes", Proceedings of TURBO CODING Seminar at Lund University, Sweden, pp. 1 - 8, 28-29 August 1996
- [4] J. Hübner, S. Zeisberg, K. Koora, J. Borowski and A. Finger, "Simple Channel Model for 60GHz Indoor Wireless LAN Design Based on Complex Wiedeband Measurements", Proceedings of 47th Vehicular Technology Conference (VTC), Phoenix, Arizona, USA, 4-7 May 1997, pp. 1004 - 100
- [5] K. Koora and H. Betzinger, "Interleaver design for turbo codes with selected inputs", Electronics Letters Vol. 34, No. 4, 2nd April 1998, pp. 651 652