10.1109-ACSSC.2009.5469833

Reduced-Complexity LLL Algorithmfor Lattice-Reduction-Aided MIMO Detection

Chun-Fu Liao and Yuan-Hao HuangInstitute of Communications Engineering and Department of Eelectrical Engineering

National Tsing-Hua University, Hsinchu, Taiwan, R.O.C. 30013.Email: [email protected], [email protected]

Abstract—In this paper, we propose a low-complexity constant-throughput LLL algorithm for lattice-reduction-aided (LRA)multi-input multi-output (MIMO) detection. The traditional LLLalgorithm for the lattice reduction has a drawback of varyingthroughput due to the variable iteration loops for the size-reduction and LLL-reduction checks. To address this problem,we propose a constant-throughput LLL (CT-LLL) algorithm thatis well suited for real-time implementation. We further proposesome techniques to reduce the redundant operations in the CT-LLL algorithm so that the computational complexity can bereduced. Simulation and analysis results show that the proposedlow-complexity CT-LLL algorithm reduces the complexity of theCT-LLL algorithm for 4×4 and 8×8 MIMO systems to 80% and72.94%, respectively, with negligible performance degradation.

I. INTRODUCTION

With the evolution of the wireless communication systems,traditional single-input single-output (SISO) transmission cannot satisfy the high data rate and spectral efficiency re-quirements of the next generation wireless communicationsystems. To increase the transmission capacity, the multiple-input multiple-output (MIMO) system has been proposed, butthe need for a high-performance and low-complexity MIMOdetector becomes an important issue. The maximum likelihood(ML) detector is known to be an optimal detector; however,it is impractical for implementation owing to its great com-putational complexity. Addressing this problem, researchershave proposed tree-based search algorithms, such as spheredecoding [1] and K-Best decoding [2], to reduce the com-plexity with near-optimal performance, but their computationalcomplexities are still very high. On the other hand, linearmethods, such as zero-forcing (ZF) and minimum mean squareerror (MMSE) detectors, and non-linear methods, like orderedsuccessive interference cancellation (OSIC) detectors, havelower complexities, but they fail to achieve full diversity gain.The lattice-reduction-aided (LRA) detection technique [3] hasbeen proposed as a solution featuring full diversity gain andacceptable complexity. The lattice reduction (LR) transformsthe channel matrix into a more orthogonal one by finding abetter basis for the same lattice so as to improve the diversitygain of the MIMO detector.

The Lenstra-Lenstra-Lovasz (LLL) algorithm is a well-known LR algorithm for its polynomial execution time. How-ever, its variable execution time is a significant problem forreal-time implementation in the MIMO Rayleigh fading chan-nel [4]. A literature have already proposed a fixed complexity

LLL algorithm [5]; however it is still no practical enough forimplementation in chip due to the lack of parallelism. Asa result, we propose a constant-throughput LLL (CT-LLL)algorithm by simply combining the parallel LLL algorithm[6] and effective LLL algorithm [7] to achieves constantthroughput. Furthermore, we exploit this CT-LLL algorithmand remove the redundant operation to close the complexitybetween structural LLL algorithm and iterative LLL algorithm.

The remainder of this paper is organized as follows. SectionII briefly describes the notations and system model. In SectionIII, we introduce the lattice-reduction-aided MIMO detection,the LLL algorithm and the simple CT-LLL algorithm. In Sec-tion IV, we demonstrate the computation complexity reductionscheme for CT-LLL algorithm, and in Section V, we presentthe simulation and complexity analysis results. Finally, wesummarize our conclusions in Section VI.

II. SYSTEM MODEL

A narrow-band nr × nt MIMO system consisting of nt

transmitters and nr receivers can be modeled by

y = Hx + n, (1)

where x ∈ Ant is the transmitted signal vector; y ∈ Cnr isthe received signal vector; H = [h1,h2, ...,hnt

] represents aflat-fading channel matrix; and n ∈ Cnr is the white Gaussiannoise with variance σ2

n. All the vectors hi are independent andidentical complex Gaussian random vectors with zero meansand unity variances. Set A consists of the constellation pointsof the QAM modulation.

To reduce the cost of complex-value operations, we can re-formulate the equivalent real channel matrix as follows:

yr =

[�(y)�(y)

]= Hrxr + nr

=

[�(H) −�(H)�(H) �(H)

] [�(x)�(x)

]+

[�(n)�(n)

].

(2)

Then, the dimension of Hr becomes n×m, where m = 2nt

and n = 2nr. The vectors yr and nr belong to Rn andxr ∈ Am, where A =

{± 1

2a, ...,±

√M−1

2a}

denotes the realconstellation points for the M -QAM modulation. We use the

parameter a =√

6

M−1for power normalization.

The QR decomposition is often applied in the pre-processing of the MIMO detection because it provides decod-ing efficiency. Then, the channel matrix Hr can be expressed

1451978-1-4244-5827-1/09/$26.00 ©2009 IEEE Asilomar 2009

μ

=<=

μμ

μ

δ

≠←⎯⎯ −

←⎯⎯ −

> +

Θ−

= =

←⎯⎯ Θ←⎯⎯ Θ

←⎯⎯

←⎯⎯ +

Fig. 1: LLL algorithm [10].

byHr = QrRr, (3)

where Qr ∈ Rn×m is an orthogonal matrix, and Rr ∈ Rm×m

is an upper triangular matrix. By multiplying QHr on both sides

of (2), we can obtain

yr = QHr yr = Rrxr + QH

r nr, (4)

where QHr nr is white Gaussian noise that experiences a

rotation corresponding to an orthonormal matrix. This for-mation is applied in many MIMO detection algorithms, e.g.,QR-based successive iterative cancellation (QR-SIC) and K-best algorithms. In addition, a column-norm-based sorted QRdecomposition (SQRD) [8] is often employed because it notonly enhances detection performance but also reduces thecomputational complexity of the lattice reduction [9].

III. LATTICE REDUCTION

A lattice L is defined as {t1hr1 + t2hr2 + ... +tNhrN |t1...tN ∈ Z}, where {hr1, ...,hrN ∈ Rn} are thebasis vectors and N equals m. The LR algorithm aims to finda unimodular matrix T (|detT| = 1 and all elements of T

are integers) such that a more orthogonal Hr = HrT has thesame lattice as Hr. Then, the signal model becomes

yr = Hrxr + nr= HrT−1xr + nr= Hrs + nr. (5)

Since xr ∈ Zn, T−1xr = s ∈ Zn. In real cases, thetransmitted signals do not belong to an integer set; however,we can still transform the signals xr ∈ An into an integer setby linear operations such as scaling and shifting.

Several lattice-reduction algorithms are described in theliterature, and the LLL algorithm is the most popular approach,as shown in Fig. 1. In the literature, Lines 4 to 19 are oftendefined as an iteration loop that can be decomposed into twoparts: 1) Lines 4 to 10 deal with the size reduction operations;and 2) Lines 11 to 19 handle LLL reduction operations. Thenumber of iterations performed in the size reduction dependson the index k, and the LLL reduction operation may increaseor decrease the index k. Thus, both of the reduction operationsresult in variable throughput. This issue makes the hardwareimplementation infeasible because a large memory buffer isrequired to realize real-time operation if the decoding timevaries for each received signal vector.

In [5], a fixed complexity LLL algorithm is proposed; how-ever, the structure is not suited for real-time implementation.The characteristic of the varying channel matrix determines theiteration number of the index k and, thus, leads to the variablenumbers of the size-reduction loops and LLL-reduction loops,as shown in Fig. 3(a). Effective LLL algorithm [7] performsthe size reduction check of the element Rk−1,k for the corre-sponding LLL-reduction check, as shown in Fig. 3(b). BecauseSIC-based detector is often employed, we can perform weaksize reduction as suggested in [7]. Next, we try to prevent theirregular change of the index k during the LLL reduction byintroducing a parallel processing loop n, as shown in Fig. 2.This concept is first proposed in [6]. This allows us to executesize-reduction and LLL-reduction checks on all even columnvector pairs or odd column vector pairs in parallel. Thus, wedefine a stage as one parallel processing operation of all evenor odd pairs in loop n, as shown in Fig. 3(b). Even-pair stagesand odd-pair stages are processed alternately. As a result, thisalgorithm fixes the execution time by specifying the numberof stages and achieves constant throughput for the real-timeoperation. Note that the number of stages has a great impacton both performance and computational complexity and formsa trade-off in this algorithm. Moreover, there are still severalredundant operations that could be eliminated to further reducethe computational complexity.

IV. COMPLEXITY REDUCTION FOR CT-LLL ALGORITHM

In the LLL-based lattice reduction algorithm such as fixedcomplexity LLL algorithm and CT-LLL algorithm, the compu-tation complexity always grows higher than original iterativetype LLL algorithm because there are many redundant oper-ations in the lattice reduction algorithm. In this section, wepropose two simple techniques to reduce the computationalcomplexity of the CT-LLL algorithm.

1452

μ

μ

= <= + +− += <= = +

≠←⎯⎯ μ

μ

μ

δ

−←⎯⎯ −

<= = +

> +

Θ−

= =

←⎯⎯ Θ←⎯⎯ Θ

==>= ←⎯⎯

←⎯⎯ +

==←⎯⎯ +

==←⎯⎯ +

=

μ

μμ

=

34 ≠←⎯⎯ −

μ←⎯⎯ −

Fig. 2: The proposed low-complexity CT-LLL algorithm.

First, we find that the LLL reduction inequality check(δ|R(n− 1, n− 1)|2 > |R(n, n)|2 + |R(n − 1, n)|2) requiresa huge amount of complexity. Therefore, we propose an LLLreduction violation check technique to determine whether theLLL reduction inequality holds according to the results of theLLL reduction check in the previous stage without computingthe inequality. We use (N − 1)-bit reduction-violation-check(RVC) registers to store the results of the previous stage’sLLL-reduction check, as shown in Fig. 3(c). If the LLLreduction check identifies no violation in the (n− 1)-th loopand the (n+1)-th loop in the previous stage, and if the checkidentifies no size reduction in the n-th loop in the currentstage, the LLL reduction check will not present a violationin the n-th loop in the current stage. Thus, no LLL-reductioncheck is required in the n-th loop. Since there are no previous

Fig. 4: BER versus SNR of the 4×4 QR-SIC MIMO detectorsusing different lattice-reduction algorithms.

Fig. 5: BER versus SNR of the 8×8 QR-SIC MIMO detectorsusing different lattice-reduction algorithms.

LLL reduction checks in the first two stages, we do not applythis prediction technique in these two stages. By successfullyidentifying unnecessary LLL-reduction checks in the currentstage, we can omit a large number of LLL-reduction checkcomputations without any performance degradation.

Second, we find that the parallel LLL reduction operationscause some redundant check operations because the loop-noperation is seldom performed in the R matrix with a smallindex n in the later stages. We eliminate these operations byintroducing an index register p to indicate the first executioncolumn of each stage. The rule for changing the index indicatoris similar to that for the k index in the original LLL algorithmexcept that we always increase p by one once the p equals2. The complete low-complexity CT-LLL algorithm for SIC-based detectors is shown in Fig. 2.

1453

(a)

(b)

(c)

Fig. 3: (a) The LLL algorithm, (b) the constant-throughput LLL algorithm, and (c) the low-complexity constant-throughputLLL algorithm.

TABLE I: Average Computational Complexity of Lattice Reduction Algorithms for 4×4 MIMO detection.

Algorithm Addition Multiplication Division Square Root Total

LLL (average) 346.12 521.34 75.20/188207 5.23 947.89LLL Fixed 40 loop 335.56 431.25 72.31 5.04 844.16CT-LLL, Stage=11 297.15 511.59 49.15 5.08 863Low-Complexity

CT-LLL, Stage=11 245.34(82.5%) 408.41(79.8%) 32.34(65.8%) 5.08(100%) 691.17(80%)

TABLE II: Average Computational Complexity of Lattice Reduction Algorithms for 8×8 MIMO detection.

Algorithm Addition Multiplication Division Square Root Total

LLL 2046.6 2709.7 364.1 12.17 5132.6LLL, Fixed 190 loops 2044.7 2705.5 363.7 12.16 5126.1

CT-LLL, Stage=25 1341.7 2289.3 211.85 11.92 3854.8Low-Complexity

CT-LLL, Stage=25 1029.50(76.73%) 1665.44(72.75%) 104.73(49.44%) 11.92(100%) 2811.6(72.94%)

V. SIMULATION RESULTS

In this section, we compare the conventional LLL algorithm,CT-LLL algorithm and the proposed low-complexity CT-LLLalgorithm in terms of computational complexity and BERperformance. We simulate the LRA-MIMO detections basedon the MIMO system described in Section II, and we employsorted QR decomposition for the preprocessing in all MIMO

detectors. The LLL-reduction parameter δ equals 0.75, assuggested in [11]. We set the stage number of the CT-LLLalgorithm by selecting the minimal stage number that cangenerate almost the same performance as the LLL algorithm,Thus, we employ 11 stages and 25 stages for 4× 4 and 8× 8MIMO detectors respectively. To demonstrate the advantageof the CT-LLL algorithm, we take the intuitive fixed-loopLLL algorithm as the constant-throughput benchmark, whose

1454

Fig. 6: BER versus SNR of different 4 × 4 MIMO detectorsusing different lattice-reduction algorithms.

Fig. 7: BER versus SNR of different 8 × 8 MIMO detectorsusing different lattice-reduction algorithms.

iteration-loop number corresponds to the stage number in theCT-LLL algorithm with the same number of LLL-reductionchecks. Thus, we choose 40 and 190 loops for the respective4 × 4 and 8 × 8 MIMO detectors in the fixed-loop LLLalgorithm. The CT-LLL algorithm is described in Section IV.A,and the low-complexity CT-LLL algorithm employs the cost-reduction techniques in Section IV. Fig. 4 and Fig. 5 showthe BER performances of the LRA-QRSIC detectors for the4 × 4 and 8 × 8 MIMO systems, respectively. Clearly, theCT-LLL algorithm outperforms the fixed-loop LLL algorithm,and it achieve the same BER performance as the original LLLalgorithm. Moreover, the low-complexity CT-LLL algorithmreduces the complexity of the CT-LLL algorithm to 80% and72.94% for 4× 4 and 8× 8 MIMO systems respectively, withalmost no performance degradation. The BER performances ofthe lattice-reduction algorithm for different MIMO detectorsare depicted in Fig. 6 and Fig. 7 for the 4 × 4 and 8 × 8

MIMO systems, respectively. The proposed low-complexityCT-LLL algorithm causes negligible performance degradationin SIC-based MIMO detection, such as the QR-SIC and K-bestalgorithms. However, for a linear detector like the ZF detector,the full-size reduction must be preserved for maintaining thelattice-reduction effect.

VI. CONCLUSION

In this paper, we propose a low-complexity, constant-throughput LLL algorithm for real-time LRA-MIMO detec-tion. Both effective size reduction and parallel LLL reductioncan prevent the variable iteration time with approximately thesame complexity as the original LLL algorithm. Since the CT-LLL algorithm yields many redundant LLL-reduction checkoperations, both LLL-reduction violation check and scarceLLL reduction indication can further reduce the complexity to80% and 72.94% of the original LLL algorithm for 4× 4 and8×8 MIMO systems. Therefore, we believe that the proposedlow-complexity CT-LLL algorithm offers a solid basis for theimplementation of a real-time LRA-MIMO detector. We willinvestigate this aspect in our future work.

REFERENCES

[1] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search inlattice,” IEEE Transactions on Information Theory, vol. 48, no. 8, pp.2201–2214, Aug. 2002.

[2] M. Shabany and P. G. Gulak, “The application of lattice-reductionto the K-best algorithm for near-optimal MIMO detection,” in IEEEInternational Symposium on Circuits and Systems, May 2008, pp. 316–319.

[3] H. Yao and G. Wornell, “Lattice-reduction-aided detectors for MIMOcommunication systems,” in IEEE Global Telecommunications Confer-ence, vol. 1, Nov. 2002, pp. 424–428.

[4] J. Jalden, D. Seethaler, and G. Matz, “Worst-case and average-case com-plexity of LLL lattice reduction in MIMO wireless systems,” in IEEEInternational Conference on Acoustics, Speech and Signal Processing,vol. 1, Mar. 2008, pp. 2685–2688.

[5] H. Vetter, V. Ponnampalam, M. Sandell, and P. A. Hoeher, “Fixedcomplexity LLL algorithm,” IEEE Transactions on Signal Processing,vol. 57, no. 4, pp. 1634–1637, Apr. 2009.

[6] G. Villard, “Parallel lattice basis reduction,” in International Conferenceon Symbolic and Algebraic Computation, 1992, pp. 269–277.

[7] C. Ling and N. Howgrave-Graham, “Effective LLL reduction for latticedecoding,” in IEEE International Symposium on Information Theory,Jun. 2007, pp. 196–200.

[8] P. Luethi, A. Burg, S. Haene, D. Perels, N. Felber, and W. Fichtner,“VLSI implementation of a high-speed iterative sorted MMSE QRdecomposition,” in IEEE International Symposium on Circuits andSystems, May 2007, pp. 1421–1424.

[9] Y. H. Gan and W. H. MOW, “Novel joint sorting and reductiontechnique for delay-constrained LLL-aided MIMO detection,” IEEESignal Processing Letters, vol. 15, pp. 194–197, 2008.

[10] D. Wubben, R. Bohnke, V. Kuhn, and K.-D. Kammeyer, “MMSE-based lattice-reduction for near-ML detection of MIMO systems,” inITG Workshop on Smart Antennas, May 2004, pp. 106–113.

[11] A. K. Lenstra, H. W. Lenstra, and L. Lovasz, “Factoring polynomialswith rational coefficients,” Math. Annalen, vol. 261, pp. 515–534, 1982.

1455

10.1109-ACSSC.2009.5469833

Documents

Transcript of 10.1109-ACSSC.2009.5469833