## Abstract

Hamming code [1] is one of the efficient error coding algorithms widely used in wireless data communication. Existing decoders are mostly based on hardware, suffering from critical bottlenecks such as scalability, programmability and flexibility. Of late, software defined radio (SDR) [2] is a promising technology that implements a range of communication protocols in software by using central processing units (CPUs) and graphics processing units (GPUs). To satisfy the increasing demand of photorealistic graphics, real time communication service, modern smart-phones, tablets etc utilize heterogeneous muti-core SoC based processor that integrates CPU and other application specific accelerator like GPU and DSP [3, 4]. In this paper, we present a computationally efficient implementation of a Hamming code decoder for variable *error tolerance* (number of bits that can be corrected), *t* bits, and variable *packet size*, *M* Bytes, on a mobile GPU for high speed wireless communication. GPU offers an extremely high-throughput parallel computing platform by employing hundreds of processors concurrently. However, the Hamming algorithm is challenging to parallelize effectively on a GPU because it executes on sparsely located data items with several conditional statements, leading to non-coalesced, long latency global memory access and huge thread divergence. In spite of this, the proposed approach provides insights into how to produce a high-performance GPU implementation. When executed on a 336 core GPU, the achieved speedup is  $99 \times$  over an equivalent CPU implementation. Moreover, the implementation yields a significant reduction in computational complexity from O(n) of the sequential algorithm to O(1) of the GPU-based approach. Furthermore, the GPU based decoder exceedingly outperforms the CPU based approach in terms of energy efficiency.





**0-20 20-40 40-60 60-80 80-100 100-120** 





 Table 1. Computational complexity

| Parameter<br>Variation | Platform | Function unit         |                       |                       |  |
|------------------------|----------|-----------------------|-----------------------|-----------------------|--|
|                        |          | Splitter              | Decoder               | Merger                |  |
| М                      | CPU      | 0(n)                  | O(n)                  | 0(n)                  |  |
|                        | GPU      | 0(1)                  | 0(1)                  | 0(1)                  |  |
| t                      | CPU      | O(log <sub>2</sub> n) | O(log <sub>2</sub> n) | O(log <sub>2</sub> n) |  |
|                        | GPU      | <i>O</i> (1)          | 0(1)                  | <i>O</i> (1)          |  |

Table 2. Energy consumption reduction by<br/>GPU over CPU

| ction unit           |               | Packet Size   | Error strength , t, (Bit) |     |     |     |     |  |
|----------------------|---------------|---------------|---------------------------|-----|-----|-----|-----|--|
| ecoder               | Merger        | (Byte)        | 1=2                       | 1=3 | (=4 | 1=5 | 1=6 |  |
| 04.3                 | <i>O(n)</i>   | M=100         | 87%                       | 88% | 89% | 90% | 92% |  |
| 0(n)                 |               | <b>M=80</b> 0 | 94%                       | 94% | 95% | 96% | 96% |  |
| D(1) D(1)            | M=1200        | 96%           | 97%                       | 97% | 97% | 98% |     |  |
| (log <sub>2</sub> n) | $O(\log_2 n)$ | M=1600        | 9 <b>8%</b>               | 98% | 98% | 98% | 99% |  |
| <i>o</i> (1)         | <i>O</i> (1)  | M-2000        | 98%                       | 98% | 99% | 99% | 99% |  |

Figure 4. Speedup by GPU over CPU

- R.Ma, S. Cheng, "The Universality of Generalized Hamming Code for Multiple Sources," IEEE Transactions on Communications, vol.59, no.10, pp.2641-2647, Oct. 2011.
- 2. P. Solic, J. Radić, and N. Rozic, "Software defined radio based implementation of RFID tag in next generation mobiles," IEEE Transactions on Consumer Electronics, vol.58, no.3, pp.1051-1055, Aug. 2012.
- Won-Jong Lee; Youngsam Shin; Jaedon Lee; Jin-Woo Kim; Jae-Ho Nah; Hyun-Sang Park; Seokyoon Jung; Shihwa Lee, "A novel mobile GPU architecture based on ray tracing," *Consumer Electronics (ICCE), 2013 IEEE International Conference on*, vol., no., pp.21,22, 11-14 Jan. 2013
- 4. Yi-Chu Wang; Kwang-Ting Cheng, "Energy and Performance Characterization of Mobile Heterogeneous Computing," *Signal Processing Systems (SiPS), 2012 IEEE Workshop on*, vol., no., pp.312,317, 17-19 Oct. 2012