MIMO OFDM Transceiver for a Many-Core Computing · PDF fileMany-Core Computing Fabric –...
Transcript of MIMO OFDM Transceiver for a Many-Core Computing · PDF fileMany-Core Computing Fabric –...
MIMO OFDM Transceiver for a Many-Core Computing Fabric
–A Nucleus based Implementation
Institute for Communication Technologies and Embedded Systems
T. Kempf, D. Günther, A. Ishaque, G. Ascheid
ISS (Chair of Integrated Signal Processing Systems)
Outline
Introduction
Nucleus Methodology
MIMO OFDM Transceiver Implementation
Application Analysis - Nuclei Identification Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping
Summary & Outlook
2
FlexibleSDR
Software Defined Radio Vision
e.g UMTS
Future SDR Mobile Phone
Sou
rce:
Infin
eon
Tec
hnol
ogie
s
Free Area:Cost Savings
ornew
Functionality
Today‘s Mobile Phone
e.g.GSM
Sou
rce:
Infin
eon
Tec
hnol
ogie
s
FlexibleSDR
Software Defined Radio Vision
e.g Bluetooth
The three key properties: Portability
Software is portable onto different platformsStandard.exe → Device_1, ..., Device_n
Interoperability Different devices configured for the same standard interoperate
Standard_1/Device_1 ↔ Standard_1/Device_2
Future SDR Mobile Phone
Sou
rce:
Infin
eon
Tec
hnol
ogie
s
Free Area:Cost Savings
ornew
Functionality
Today‘s Mobile Phone
e.g.GSM
Sou
rce:
Infin
eon
Tec
hnol
ogie
s
Standard_1/Device_1 ↔ Standard_1/Device_2
Loadability Platform is capable of running different standards
Device ← Standard_1.exe, ..., Standard_n.exe
But we must not forget: Efficiency
Power consumption of flexible SDR must be close to power consumption of dedicated device (battery driven!)
FlexibleSDR
Software Defined Radios Vision
e.g Bluetooth
GSM.exeUMTS.exe LTE.exe
On-the-fly Configuration
The three key properties: Portability
Software is portable onto different platformsStandard.exe → Device_1, ..., Device_n
Interoperability Different devices configured for the same standard interoperate
Standard_1/Device_1 Standard_1/Device_2
Contradicting Requirements !
Flexibility (programmability) vs.
Future SDR Mobile Phone
Sou
rce:
Infin
eon
Tec
hnol
ogie
s
Free Area:Cost Savings
ornew
Functionality
Today‘s Mobile Phone
e.g.GSM
Sou
rce:
Infin
eon
Tec
hnol
ogie
s
Standard_1/Device_1 Standard_1/Device_2
Loadability Platform is capable of running different standards
Device ← Standard_1.exe, ..., Standard_n.exe
But we must not forget: Efficiency
Power consumption of flexible SDR must be close to power consumption of dedicated device (battery driven!)
Flexibility (programmability) vs.Energy Efficiency
Outline
Introduction
Nucleus Methodology
MIMO OFDM Transceiver Implementation
Application Analysis - Nuclei Identification Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping
Summary & Outlook
6
Nucleus Methodology
Nuclei N 1 N 2 N 7 NNN 5
Transceiver Description
Transceiver DescriptionN 1
N 7 N 5N 2Non N Tasks
NucleusLibrary
Nucleus
7
PE 2(rASIP)
PE 3(DSP)
MEM
Comm. Arch.
PE 5(FPGA)
PE1(ASIP)
PE 4(GPP)
HWPlatform
Nucleus• Critical, demanding, algorithmic kernel • Kernel is common among different waveforms• Not waveform nor hardware specific
Nuclei N 1 N 2 N 7 NNN 5
Transceiver Description
Transceiver DescriptionN 1
N 7 N 5N 2Non N Tasks
NucleusLibrary
Nucleus Methodology
8
NI
PE 2(rASIP)
PE 3(DSP)
MEM
Comm. Arch.
PE 5(FPGA)
PE1(ASIP)
PE 4(GPP)
HWPlatform
PEs PE 2 PE 3 PE 4 PE 5PE 1
NI
Flavor
Nuclei N 1 N 2 N 7 NNN 5
Transceiver Description
Transceiver DescriptionN 1
N 7 N 5N 2Non N Tasks
Mapping&
EvaluationCompile
NucleusLibrary
Nucleus Methodology
9
PE 2(rASIP)
PE 3(DSP)
MEM
Comm. Arch.
PE 5(FPGA)
PE1(ASIP)
PE 4(GPP)
HWPlatform
PEs
BoardSupportPackage
PE 2PE 1 PE 3 PE 4 PE 5
NINI
NI
NI NNIFlavors
NINI
Outline
Introduction
Nucleus Methodology
MIMO OFDM Transceiver Implementation
Application Analysis - Nuclei Identification Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping
Summary & Outlook
10
Nuclei Identification: Transceiver Structure
Outer Modem Channel (De-)coding (De-)Interleaving
IEEE 802.11n
11
(De-)Interleaving Inner Modem (RX)
RX OFDM Processing Channel Estimation Spatial Equalizing: Mitigate channel impact on payload Soft Demapping: Calculate soft bits (LLRs)
BPSK, 4QAM, 16QAM
OFDM Slot
Nuclei Identification: Kernel Identification
12
Analyze different algorithmic choices within RX bloc ks Identify computational kernels
Recurring tasks Operate on data with certain alignment
Build application as composition of kernels
Nuclei Identification: Kernel Identification (Example)
LMMSE MIMO Equalizer with QRD Basic transmission equation
Linear MMSE equalization
Regularized QRD
RQ
QI
HH a
=
= σ
ˆ
nHxy +=
( ) HH HIHHGyGx ˆˆˆ,ˆ12 −
+==s
n
Eσ
13
Rewrite G using Qa and Qb
RQI
Hb
=
=s
n
E
σ
G = E s
σ nQ b Q a
H
Computational Kernels Regularized QR decomposition Matrix-matrix multiplication Matrix-vector multiplication
Nuclei Identification: Kernel Overview
14
Application variants consist of a few kernels only!
Outline
Introduction
Nucleus Methodology
MIMO OFDM Transceiver Implementation
Application Analysis - Nuclei Identification Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping
Summary & Outlook
15
Application Implementation: P2012 Platform (ST Microelectronics)
SoC platform with maximum of 32 clusters One cluster provides
Max. 16 RISC cores (STxP70) @ 600MHz VECx vector extension (SIMD)
128 bit vector registers 8x16 bit or 4x32 bit operations
Hardware synchronizer for inter-core signaling Interface for hardware accelerators (ASICs)
16
Application Implementation: Kernel Overview
For 2x2 and 4x4 MIMO use case Cycles for execution on
single STxP70 processor core including VECX unit
Corresponding time for 600MHz clock frequency
17
In the range of … Competing solutions IEEE 802.11n real time
(4µs per OFDM slot)
Outline
Introduction
Nucleus Methodology
MIMO OFDM Transceiver Implementation
Application Analysis - Nuclei Identification Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping
Summary & Outlook
18
Algorithm Performance Evaluation: Investigated Algorithmic Choices
Wide variety of algorithms is implemented Channel Estimation, Spatial Equalizer, Channel Coding
19
Channel Estimation, Spatial Equalizer, Channel Coding Determine superior choice by error correction perfo rmance Channel simulation
Fading: i.i.d. Rayleigh Fading Power delay profile: Exponential 20dB drop along 150ns Noise: AWGN 4x4 MIMO system
Algorithm Performance Evaluation: ZF vs. MMSE MIMO Equalization
MMSE equalizerBetter performance at
little computational cost
4x4 MIMO, r=1/2, g1=(133)8, g2=(171)8, nconv=6144 bit, nldpc = 1944 bit)
= H
(C)–
H(C
|Ω)
20
4QAM region 16QAM region
I(C
,Ω)
= H
(C)
Frame Error Rate of 4x4 MIMO System (Short Frames)
21
Fix-point issues at low FERs when using MMSE-QRD, SIC-MMSE
Frame Error Rate of 4x4 MIMO System (Short Frames)
For the investigated algorithms MMSE-DS-QRD is a viable trade -off between
22
is a viable trade -off between algorithmic performance and implementation complexi ty
Frame Error Rate of 4x4 MIMO System for different Frame Sizes
Algorithmic performance comparable to results found in literature
23
to results found in literature
Outline
Introduction
Nucleus Methodology
MIMO OFDM Transceiver Implementation
Application Analysis - Nuclei Identification Efficient Nuclei Implementations on HW Platform (Flavor) Algorithmic Performance Evaluation Application-to-Architecture Mapping
Summary & Outlook
24
Application-to-Platform Mapping: Identify Parallelism
Parallelizable dimensions of OFDM receiver applicat ion Space (RX antennas) Frequency (subcarriers) Time (OFDM slots)
P = PreambleD= Data payload
25
D= Data payload
Application-to-Platform Mapping: Assign Cores to PGs
Given :Single core timing requirements Goal : Assign cores to match real time constraints (4µs per slot)
Task time (us) #cores
Preprocessing (per OFDM frame)
LS Channel Estimation 17.47
Equalizer Preprocessing 215.31
Actual Processing (per OFDM slot)
44
26
Actual Processing (per OFDM slot)
OFDM Demodulation (mem. realign) 6.83
Equalizer (Actual Detection) 6.08
Soft Demapping (16 QAM) 2.84
242
Application-to-Platform Mapping: Assign Cores
Final mapping Partitioning of components into processing groups Number of cores per group
8 cores enable real time
PG 2PP&EQ4 PEs
PG 3
27
PG 1Modulation
2 PEs
PG 3Demapping
2 PEs
2PARMA: Occupation Graph
Implementation on P2012 platform using 8 cores Minimum latency for 27 or more OFDM slots of data payload Latency = 8.2µs IEEE 802.11n allows 16µs (including MAC layer)
28