[6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless...

download [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless Channel

of 6

Transcript of [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless...

  • 8/13/2019 [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless Channel

    1/6

    V-BLA ST: An Architecture for Realizing Very High Data RatesOver the Rich-Scattering Wireless ChannelP. W. Wolniansky G. J. Foschini G.D. olden R. A. ValenzuelaBell Laboratories, Lucent Technologies, Crawford Hill Laboratory791 H olmdel-Keyport Rd., H olmdel, N J 07733

    ABSTRACTRecent information theory research has shown that therich-scattering wireless channel is capable oje n o m u s theoretical capacities if the multipath isproperly exploited. n this paper, we describe awireless communication architecture known 11svertical BLAST (Bell Laboratories Layered Space-Time) or V-BLAST, which has been implemented inrealtime in the laboratory. Using our laborato,yprototype, we have demonstrated spectral efic iencie sof 20 - 40 bps/Hz in an indoor propagationenvironment at realistic SNRs and error rates. To thebest of our knowledge, w ireless spectral eficienc ies ofthis magnitude are unprecedented, and aref u r t h e m r e unattainable using traditional techniques.1. INTRODUCTIONIn the past few years, theoretical investigations haverevealed that the multipath wireless channel i s capableof enormous capacities, provided that the multipathscattering is sufficiently rich and is properly exploitedthrough the use of an appropriate processingarchitecture [l-41. Th e diagonally-layered space-timearchitecture proposed by Foschini [l],now known asdiagonal BLAST (Bell Laboratories Layered Space-Time) or D-BLAST, is one such approach. D-BLASTutilizes multi-element antenna arrays at bothtransmitter and receiver and an elegant diagonally-layered coding structure in which code blocks aredispersed across diagonals in space-time. In anindependent Rayleigh scattering environment, thisprocessing structure leads to theoretical rates whichgrow linearly with the number of antennas (assumingequal numbers of transmit and receive antennas) withthese rates a pproaching 90%of Shannon capacity.However, the diagonal approach suffers fromcertain implementation complexities which make itinappropriate for initial implementation. In this paper,we describe a simplified version of BLAST known asvertical BLAST or V-BLAST, which has beenimplemented in realtime in the laboratory. Using ourlaboratory prototype, w e have demonstrated spectralefficiencies of 20 - 40 bps/Hz at average SNRsranging from 24 to 34 dB. Although these results wereobtained in a relatively benign indoor environment,we believe that spectral efficiencies of this magnitudeare unprecedented, regardless of propagationenvironment or SNR, and are simply unattainableusing traditional techniques.

    0-7803-4900-8/98/$10.00 998 IEEE 295

    2. SYSTEM OVERVIEWA high-level block ditsgram of a BL AST system isshown in Fig. 1. A single data stream isdemultiplexed into M substreams, and each substreamis then encoded into symbols and fed to its respectivetransmitter. ( m e enm ding process is discussed inmore detail below.) Transmitters 1 M operate co-channel at symbol rate 1/T symbols/sec, withsynchronized symbol tiiming. Each transmitter is itselfan ordinary QAM transmitter. Th e collection oftransmitters comprise:;, in effect, a vector-valuedtransmitter, where coinponents of each transmittedM-vector are symbols drawn from a QAMconstellation. W e assume that the sam e constellationis used for each substream, and that transmissions areorganized into bursts of L symbols. The powerlaunched by each transmitter is proportional to 1 / M sothat the total radiai.ed power is constant andindependent of M .

    Vec to rencoder

    Figure 1:V-BLAST high level system diagramThe essential difference between D-BLAST and V-BLAST lies in the vector encoding process. In D -BLAST, redundancy between the substreams isintroduced through tlne use of specialized inter-substream block coding. The D-BLAST code blocksare organized along diagonals in space-time. It is this

    coding that leads to D-BLAST'S higher spectralefficiencies for a given number of transmitters andreceivers. In V-BLA ST, however, the vector encodingprocess is simply a dernultiplex operation followed byindependent bit-to-symbol mapping of eachsubstream. No inter-substream coding, or coding ofany kind, is requir ed, though conventional coding ofthe individual substreams may certainly be applied.For the remainder of lhis paper, we will assume for

  • 8/13/2019 [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless Channel

    2/6

  • 8/13/2019 [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless Channel

    3/6

    where Q( . ) denotes the quantization (slicing)operation appropria te to the constellation in use.Step 3: Assuming that r i k = uk , , cancel uk, from thereceived vector r l , resulting in modified receivedvector r2:

    r 2 = rl , ( H ) k , (6)where denotes the k l -th column of E. Steps 1 -3 are then performed for components k 2 , . . . kM IVoperating in turn 011 the progression of modifiedreceived vectors r 2 , r 3 , . . . rM.The specifics of the detection process depend on thecriterion chosen to compute the nulling vectors w ~the most common of these being MMSE or ZF. Thedetection process is described here with respect to theZF criterion since it is somewhat simpler to state. Thek,-th ZF-nulling vector is defined as the uniqueminimum norm vector satisfying

    (7)Thus, wk s orthogonal to the subspace spanned bythe contributions to r i due to those symbols not yetestimated and cancelled. It is not difficult to show thatthe unique vector satisfying (7) is just the k ,-th row ofHh where the notation Hk, denotes the matrixobtained by zeroing columns k l k 2 , . . . k , of Hand denotes the Moore-Penrose pseudoinverse [SI.The post-detection SNR for the k,-th detectedcomponent of a is easily obtained by substituting (1)and (7) into (4), and taking expec ted values, i.e.

    where the expectation in the numerator is taken overthe constellation set.3.1 OPTIMAL DETECTION ORDERINGAs mentioned earlier, when symbol cancellation isused, the system performance is affected by the orderin which the components of a are detected, whereas itdoes not matter when pure nulling is used. In order toappreciate this, first consider why i t is that nullingwith cancellation performs better than pure nulling,regardless of ordering.When nulling alone is used, each nulling vector isrequired, according to (2), to be orthogonal to M - 1rows of H. However, when symbol cancellation isemployed in addition to nulling, wk, is required to heorthogonal only to the M undetected componentsSchwartz inequality is that the more rows of H that ait8 p e ~7), h mplc onse en e o

    the Cauchy-

    particular wk, is constrained to he orthogonal to, thelarger its norm, and thus, according to (8), the smallerits post-detection SNR. When using cancellation then,the P k , are lower bounded (with equality only for P k , )by their corresponding nulling-only P k ,

    The importance of o:rdering is simpl y that it permits,during the detection of the i-th component, a choice asconstrained by; different choices lead to different p k .For example, in an M = 3 system, detectingcomponent 1 first (in tihe presence of 2 and 3) will, ingeneral, result in a different p 1 han if component 2was detected first (in the presence of 1 and 3) . Withpure nulling, each component is always detected in thepresence of all the others, so ordering does not matter.Now recall that all components of are assumed toutilize the same constellation. Under this assumption,the component with the smallest pk , will dominate theerror performance of the system. Thus, n obviousfigure of merit for this system - though not the onlyone possible - is the maximization of the worst, i.e.the minimum, of the p k , over all possible detectionorderings. A surprising result - and one which webelieve has not been previously appreciated - is thatsimply choosing the hest p k , ut each stuge in thedetection process leads to the globally optimumordering, SOpr, n this maximin sense. The proof isgiven in the appendix.We remark that thi:s optimdity result may havewider applicability to multi-user cancellation-baseddetection as well. Although the best firstcancellation approach is widely known within themulti-user community [7-81, essentially being thedefacto approach, we are not aware of any previousproof of its optimality in the s ense given here.

    The full ZF V-BLAST detection algorithm can nowbe described compacily as a recursive procedure,including determination of the optimal ordering, asfollows:

    to which subset of kf rows that Wk, Should be

    initiulizution:i t 1 (9a)GI = H ' (9b)k l = a r g m i n \ \ ( G l ) , ) ) 2 ( 9 c )

    J

    297

  • 8/13/2019 [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless Channel

    4/6

    where G i ) ; is the j-th row of Gi. hus, (9c,i)determine the elements of So f t , he optimal ordering;(9d-f) compute respectively the ZF-nulling vector, thedecision statistic, and the estimated component of a(9g) performs cancellation of the detected componentfrom the received vector, and (9h) computes the newpseudoinverse for the next iteration. No te that thisnew pseud oinve rse is based on a deflated version ofH, in which columns k l 2 , . . k i have beenzeroed. This is because these columns correspond tocomponents of a which have already been estimatedand cancelled, and thus the system becomes equivalentto a deflated version of Fig. 1 in which transmittersk l 2 , . . i have been rem oved, or equivalently, asystem in which ak = . = a k = 0.

    4. LABORATORY RESULTSA laboratory prototype of a V-BLAST system hasbeen constructed for the purpose of dem onstrating thefeasibility of the BLAST approach. The prototypeoperates at a carrier frequency of 1.9 GHz, and asymbol rate of 24.3 ksymboldsec, in a bandwidth of30 kHz. The receiver proces sing is similar to thatshown in (9).The system was operated and characterized in theactual laboratory/office environment, not a test range,with transmitter and receiver separations up to about12 meters. This environment is relatively benign inthat the delay spread is negligible, the fading rates arelow, and there is significant near-field scattering fromnearby equipment and office furniture. Nevertheless, itis a representative indoor lab/office situation, and noattempt was made to tune the system to theenvironment, or to modify the environment in anyway.The antenna arrays consisted of h/2 wire dipolesmounted in various arrangements. For the resultsshown below, the receive dipoles we re mounted on thesurface of a metallic hem isphere approximately 20 cmin diameter, and the transmit dipoles were mounted ona flat metal sheet, in a roughly rectangular array withabout h/2 inter-element spacing. In general, thesystem performance was found to be nearlyindependent of small details of the array geometry.Fig. 2 shows results obtained with the prototypesystem, using M = 8 transmitters and N = 12receivers. In this experiment, the transmit and receivearrays were each placed at a single representativeDosition within the environment, and the performancecharacterized. Th e horizontal axis is spatiallyaveraged received SNR, i.e.,is the the ratio of received signal power (from all Mtransmitters) to noise power at the i-th receiver. Thevertical axis is the block error rate, wh ere a block isdefined as a single transmission burst. In this case, theburst length L is 100 symbol durations, 20 of which

    l NN = l SNR , where SNR

    are used for training. In this experiment, each of theeight substreams utilized uncoded 16LQAM, i.e. 4bits/symboYtransmitter, so that the payload block sizeis 8 x 4 ~ 8 0 2560 bits. The raw spectral efficiencyof this configuration is thus(8xmtrs )x (4b/ sym/xmtr )x( 24.3 ksym/ s)

    3 0 k HzE , == 25.9 bps/Hz

    and the payload efficiency is 80% of the above, or20.7 bps/Hz, corresponding to a payload data rate of621 kbps in 30 kHz bandwidth.

    LTYm

    1oo

    lo-

    1 0 - 2

    I I 1 I I II I 1 1 I I I I I10 3do 4 6 8

    SNR idEl)Figure 2: Single-position performanceThe upper curve in Fig. 2 shows performanceobtained when conventional nulling is used. The lowercurve shows performance using nulling andoptimally-ordered cancellation. Th e average differenceis about 4 dB, which corresponds to a raw spectralefficiency differential (for this configuration) ofaround 10bps/Hz.

    BLER and BER a t 24 d B SNR M . pwrtioni o - . * .

    1 0 - 7 1 I I I I 1 I I0 1 2 3 4 5 6 7 8 9 1 ,Posit ion Number

    Figure 3: Multiple-position performanceFigure 3 shows performa nce results obtained usingthe same BLAST system configuration ( M = 8,N = 12, 16-QAM ) when the rece ive array was leftfixed and the transmit array was located at differentpositions throughout the environment. In each case,

    298

  • 8/13/2019 [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless Channel

    5/6

    the transmit power was adjusted so that the averagereceived SNR w as 24k0.5 dB. Nulling with optimizedcancellation was used.It can be seen that operation at this spectralefficiency is reasonably robust with respect to antennaposition. In all positions, the system had at least 2orders of mag nitude margin relative to BEIR.For a completely uncoded system, these are entirelyreasonable error rates, and application of ordinaryerror correcting codes would significantly reduce this.At 34 dB SNR, spectral efficiencies as high as 40bps/Hz have been demonstrated at similar error rates,though with less robust performance.We believe these spectral efficiencies to heunprecedented for the wireless channel. It isworthwhile to point out that spectral efficiencies ofthese magnitudes are essentially impossible to obtainusing traditional approaches in which a singletransmitter is used, simply because the requiredconstellatioii loadings would be immen se. Forexample, to obtain the equivalent of the 32 bits pervector symbol in the experiments above, but using asingle transmitter, would require a constellation wilh232 or more than a billion (lo9)points, which seemswell outside of the realm of practicality, regardless ofSNR.SUMMARY AND CONCLUSIONSW e have described V -BLAST, a wireless architecturecapable of realizing extraordinary spectral efficienciesover the rich-scattering wireless channel. The generalBLAST approach and the V-BLAST detection schem ewere motivated and described in detail, and aninteresting new optimality result regardingcancellation-based detection (which may have widerapplicability to multi-user detection as well) wasreported. Early results with our V-BLA ST realtimelaboratory prototype have proven the feasibility of theconcept, and we have demonstrated spectralefficiencies of 20 - 40 bps/Hz under real-world indoorconditions, exceeding any results that w e are aw are ofusing traditional modulation techniques. Althoughthese results were obtained in a relatively benignenvironment, we are nevertheless stronglyencouraged, and believe that the BLAST approachmay eventually lead to significaIltly im proved spectralefficiencies in wireless systems.APPENDIX: PROOF OF THE OPTIMALITY OFORDERING IMPOSED BY EQ. (9)Definitions and notation:For a given detection ordering

    = { S I , I , * . , S M define the constraint set o fSi o be the set {S i l , . . . , S M ) , or the nullset if = M . The constraint set i s just thosecancelled.ciomponents of a whidi have nut yet heell detected and

    Let e a detection ordering. Then define ps, to bethe post-detection S I at the i-th stage of thedetection process when using this ordering, i.e. ps, isthe post-detection SNF: when detecting as accordingto (9e).Let L = { L l , L 2 , * . . , L M ] be the locally-optimum ordering obtained using (9).

    The following trivial lemmas are used in whatfollows and are stated here without proof:

    Lemma I : Let A and B be two distinct orderings. Ifconsist of identical elements (regardless of theirorder), then P A , = p E .Ak = Bk, and the Constraint Sets O f Ak and Bk

    Lemma 2: Let A and B be two distinct orderings. IfA , = Bk , and the constraint set of A t is a subset ofthe constraint set of 13k , then p A 2 pB,

    Proof:Let Q = { Q , , Q 2 , . . . Q M ) be an arbitraryordering distinct from 1; Let d be the index of the first

    (leftmost) element for ,which L and Qdiffer. Let I bethe index for which Q , = Ld . (Note that r > d , sinceL and Q have common elements up to index d 1.)By Lemma 1,

    P L , = P a 1 I i l d - 1 . (A. l )Now define Q to be a perturbation of Qobtained bymoving Q , from index r to index d , and squeezingthe rest of Qso that the elements of Q are

    Q Q I , Q 7 . * , Q d - i 7 Q T Q d , . t Q M 1where i t is understood that the sequence above Qd isactually missing the repositioned Q, eleme nt. Notethat Q matches L in the first d positions, whereas Qmatches L only in the first 1 positions.Now consider the post-detection SNRs that wouldresult from using Q inwead of Q;

    By Lemma 1, P Q = PQ,,, PQ, = PQ,,, . . .ped_, pa,,-,, since these elements have the sameconstraint sets.By either Lemma 1 or Lemma 2, ped , Q,,,,,p e d , 5 pe',+, . . . pe, 5 pet,, since these elementseither have the same constraint sets, or the constraintset of the Q elements are subsets of the constraint setsof the corresponding Q dem ent s.Finally, pa, I pQj,, since pp, = pL 6 and pL, is,by virtue of the local maximization procedure (9), atleast as large as any other choice in that position.Thus,

    m i n p Q I ml n p Q, l (A.2)I I

    An oovious inuuctive argument allows that bysuccessive similar perturbations, Q can be transformed

    299

  • 8/13/2019 [6] v-BLAST - An Architecture for Realizing Very High Data Rates Over the Rich-scattering Wireless Channel

    6/6