Deep Learning for Wireless Physical Layer: … Learning for Wireless Physical Layer: Opportunities...

14
1 Deep Learning for Wireless Physical Layer: Opportunities and Challenges Tianqi Wang*, Chao-Kai Wen, Hanqing Wang, Feifei Gao, Tao Jiang and Shi Jin * The corresponding author, email: [email protected] Abstract—Machine learning (ML) has been widely applied to the upper layers of wireless communication systems for various purposes, such as deployment of cognitive radio and communication network. However, its application to the physical layer is hampered by sophisticated channel environments and limited learning ability of conventional ML algorithms. Deep learning (DL) has been recently applied for many fields, such as computer vision and natural language processing, given its expressive capacity and convenient optimization capability. The potential application of DL to the physical layer has also been in- creasingly recognized because of the new features for future com- munications, such as complex scenarios with unknown channel models, high speed and accurate processing requirements; these features challenge conventional communication theories. This paper presents a comprehensive overview of the emerging studies on DL-based physical layer processing, including leveraging DL to redesign a module of the conventional communication system (for modulation recognition, channel decoding, and detection) and replace the communication system with a radically new architecture based on an autoencoder. These DL-based methods show promising performance improvements but have certain limitations, such as lack of solid analytical tools and use of architectures that are specifically designed for communication and implementation research, thereby motivating future research in this field. Index Terms—Wireless communications, deep learning, phys- ical layer. I. I NTRODUCTION W IRELESS communication technologies have experi- enced an extensive development to satisfy the appli- cations and services in the wireless network. The explosion of advanced wireless applications, such as diverse intelligent terminal access, virtual reality, augmented reality, and Internet of things, has propelled the development of wireless commu- nication into the fifth generation to achieve thousandfold ca- pacity, millisecond latency, and massive connectivity, thereby making system design an extraordinarily challenging task. Several promising technologies, such as massive multi-input multi-output (MIMO), millimeter wave (mmWave), and ultra- densification network (UDN) have been proposed to satisfy the T. Wang, H. Wang, and S. Jin are with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (e-mail: [email protected]; [email protected]; [email protected]). C.-K. Wen is with the Institute of Communications Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan (e-mail: [email protected]). F. Gao is with the State Key Laboratory of Intelligent Technology and Sys- tems, Tsinghua National Laboratory for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China (e- mail: [email protected]). T. Jiang is with the School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, P. R. China. (e-mail: [email protected]). abovementioned demands. These technologies demonstrate the same characteristic (i.e., the capability to handle large wireless data). However, extant conventional communication theories exhibit several inherent limitations in fulfilling the large data and ultra-high-rate communication requirements in complex scenarios, listed as follows. 1) Difficult channel modeling in complex scenarios: The design of the communication systems significantly depends on practical channel conditions or is based on channel models that characterize real environments implicitly for mathematical convenience. These models struggle in complex scenarios with many imperfections and nonlinearities [1], although they may capture some features in conventional channels. For example, the increased number of antennas in massive MIMO systems has changed channel properties [2], and the corresponding channel models remain unknown. The use of out-of-band signals or sensors as sources of side information at mmWave is promising [3]. However, a method for combining out-of-band and sensor information to obtain the channel state information of the mmWave remains unknown. In scenarios, such as underwater or molecular communications [4], the channels cannot be characterized by using rigid mathematical models. Thus, systems or algorithms that can complete communication tasks without defined channel models are essential. 2) Demand for effective and fast signal processing: The use of low-cost hardware, such as low-resolution analog-to-digital converters with low energy consumption [5], [6], introduces additional nonlinear imperfections that require using highly robust receiving processing algorithms (e.g., algorithms for channel estimation and detection). However, using these al- gorithms may increase computational complexity. Traditional algorithms, such as algorithms for MIMO data detection, are iterative reconstruction approaches [7] that form a computa- tional bottleneck in real time, whereas the real-time large data processing capabilities are necessary for the advanced systems (e.g., massive MIMO, mmWave, and UDN). Therefore, the corresponding algorithms require parallel signal processing architecture [2] to achieve efficiency and accuracy. 3) Limited block-structure communication systems: Con- ventional communication systems that are constructed in a divide-and-conquer manner, consist of a series of artificially defined signal processing blocks, such as coding, modula- tion, and detection; these systems solve the communication problems in imperfect channels by optimizing each block independently. An optimal performance in the entire com- munication task cannot be guaranteed, although researchers have attempted to optimize the algorithms of each processing arXiv:1710.05312v2 [cs.IT] 27 Oct 2017

Transcript of Deep Learning for Wireless Physical Layer: … Learning for Wireless Physical Layer: Opportunities...

1

Deep Learning for Wireless Physical Layer:Opportunities and Challenges

Tianqi Wang*, Chao-Kai Wen, Hanqing Wang, Feifei Gao, Tao Jiang and Shi Jin* The corresponding author, email: [email protected]

Abstract—Machine learning (ML) has been widely appliedto the upper layers of wireless communication systems forvarious purposes, such as deployment of cognitive radio andcommunication network. However, its application to the physicallayer is hampered by sophisticated channel environments andlimited learning ability of conventional ML algorithms. Deeplearning (DL) has been recently applied for many fields, suchas computer vision and natural language processing, given itsexpressive capacity and convenient optimization capability. Thepotential application of DL to the physical layer has also been in-creasingly recognized because of the new features for future com-munications, such as complex scenarios with unknown channelmodels, high speed and accurate processing requirements; thesefeatures challenge conventional communication theories. Thispaper presents a comprehensive overview of the emerging studieson DL-based physical layer processing, including leveraging DLto redesign a module of the conventional communication system(for modulation recognition, channel decoding, and detection)and replace the communication system with a radically newarchitecture based on an autoencoder. These DL-based methodsshow promising performance improvements but have certainlimitations, such as lack of solid analytical tools and use ofarchitectures that are specifically designed for communicationand implementation research, thereby motivating future researchin this field.

Index Terms—Wireless communications, deep learning, phys-ical layer.

I. INTRODUCTION

W IRELESS communication technologies have experi-enced an extensive development to satisfy the appli-

cations and services in the wireless network. The explosionof advanced wireless applications, such as diverse intelligentterminal access, virtual reality, augmented reality, and Internetof things, has propelled the development of wireless commu-nication into the fifth generation to achieve thousandfold ca-pacity, millisecond latency, and massive connectivity, therebymaking system design an extraordinarily challenging task.Several promising technologies, such as massive multi-inputmulti-output (MIMO), millimeter wave (mmWave), and ultra-densification network (UDN) have been proposed to satisfy the

T. Wang, H. Wang, and S. Jin are with the National Mobile CommunicationsResearch Laboratory, Southeast University, Nanjing 210096, China (e-mail:[email protected]; [email protected]; [email protected]).

C.-K. Wen is with the Institute of Communications Engineering, NationalSun Yat-sen University, Kaohsiung 80424, Taiwan (e-mail: [email protected]).

F. Gao is with the State Key Laboratory of Intelligent Technology and Sys-tems, Tsinghua National Laboratory for Information Science and Technology,Department of Automation, Tsinghua University, Beijing 100084, China (e-mail: [email protected]).

T. Jiang is with the School of Electronic Information and Communications,Huazhong University of Science and Technology, Wuhan 430074, P. R. China.(e-mail: [email protected]).

abovementioned demands. These technologies demonstrate thesame characteristic (i.e., the capability to handle large wirelessdata). However, extant conventional communication theoriesexhibit several inherent limitations in fulfilling the large dataand ultra-high-rate communication requirements in complexscenarios, listed as follows.

1) Difficult channel modeling in complex scenarios: Thedesign of the communication systems significantly dependson practical channel conditions or is based on channel modelsthat characterize real environments implicitly for mathematicalconvenience. These models struggle in complex scenarios withmany imperfections and nonlinearities [1], although they maycapture some features in conventional channels. For example,the increased number of antennas in massive MIMO systemshas changed channel properties [2], and the correspondingchannel models remain unknown. The use of out-of-bandsignals or sensors as sources of side information at mmWave ispromising [3]. However, a method for combining out-of-bandand sensor information to obtain the channel state informationof the mmWave remains unknown. In scenarios, such asunderwater or molecular communications [4], the channelscannot be characterized by using rigid mathematical models.Thus, systems or algorithms that can complete communicationtasks without defined channel models are essential.

2) Demand for effective and fast signal processing: The useof low-cost hardware, such as low-resolution analog-to-digitalconverters with low energy consumption [5], [6], introducesadditional nonlinear imperfections that require using highlyrobust receiving processing algorithms (e.g., algorithms forchannel estimation and detection). However, using these al-gorithms may increase computational complexity. Traditionalalgorithms, such as algorithms for MIMO data detection, areiterative reconstruction approaches [7] that form a computa-tional bottleneck in real time, whereas the real-time large dataprocessing capabilities are necessary for the advanced systems(e.g., massive MIMO, mmWave, and UDN). Therefore, thecorresponding algorithms require parallel signal processingarchitecture [2] to achieve efficiency and accuracy.

3) Limited block-structure communication systems: Con-ventional communication systems that are constructed in adivide-and-conquer manner, consist of a series of artificiallydefined signal processing blocks, such as coding, modula-tion, and detection; these systems solve the communicationproblems in imperfect channels by optimizing each blockindependently. An optimal performance in the entire com-munication task cannot be guaranteed, although researchershave attempted to optimize the algorithms of each processing

arX

iv:1

710.

0531

2v2

[cs

.IT

] 2

7 O

ct 2

017

2

module for many years and achieved success in practice,because the fundamental problem of communication dependson the reliable message recovery at the receiver side after themessage is sent by a transmitter and traverses a channel [8].This process does not require a block structure. Therefore, itholds promise for further improvement if the suboptimizationfor each module is replaced by optimizing for end-to-endperformance.

Machine learning (ML) has recently regained attentionbecause of the successful applications of deep learning (DL)in computer vision (CV), automatic speech recognition (ASR),and natural language processing (NLP). The book [9] coversthe recent results. Researchers are actively attempting toextend these technologies to other domains, including wirelesscommunication. Embedding ML theories on a wide rangeof communication systems has had an extensive history andhas achieved several successes, especially in the upper layers,such as in cognitive radio, resource management [10], linkadaptation [11], [12], and positioning [13]. In contrast tothe abovementioned straightforward applications, ML facesseveral challenges when applied to the physical layer.

Researchers have applied ML to the physical layer formodulation recognition [14], [15], channel modeling andidentification [16], [17], encoding and decoding [18], [19],channel estimation [20], and equalization [21], [22] (see fur-ther details in [23] and [24]); however, ML has been unusedcommercially because handling physical channels is a com-plex process, and conventional ML algorithms have limitedlearning capacity. Researchers believe that ML can achievefurther performance improvements by introducing DL to thephysical layer. DL possesses essential characteristics, suchas deep modularization, which significantly enhances featureextraction and structure flexibility, compared with conventionalML algorithms. In particular, DL-based systems can be usedinstead of manual feature extraction to learn features fromraw data automatically and adjust the model structures flexiblyvia parameter tuning to optimize end-to-end performance. TheDL-based communication system has promising applicationsin complex scenarios for several reasons.

First, the deep network has been proven to be a universalfunction approximator [25] with superior algorithmic learn-ing ability despite the complex channel conditions [1]. The“learned” algorithms in DL-based communication systems arerepresented by learned weights in DL models that optimizeend-to-end performance through convenient training methodsinstead of requiring well-defined mathematic models or expertalgorithms that are solidly based on information theories.

Second, handling large data is an essential feature of DL be-cause of its instinctive nature of distributed and parallel com-puting architectures, which ensure computation speed and pro-cessing capacity. The DL system demonstrates a remarkablepotential in producing impressive computational throughputthrough fast-developing parallelized processing architecturessuch as graphical processing units.

Third, DL-based communication systems can break theartificial block structure to achieve global performance im-provement because they are trained to optimize end-to-endperformance without making an implicit request for block-

structure. Besides, various libraries or frameworks, such asTensorFlow, Theano, Caffe, and MXNet, have been establishedto accelerate experiments and deploy DL architectures giventhe wide application of DL technologies.

Recent studies on DL for wireless communication systemshave proposed alternative approaches to enhance certain partsof the conventional communication system (e.g., modulationrecognition [1], channel encoding and decoding [26], [27],[28], [29], [30], [31], and channel estimation and detection [7],[4], [32], [33]) and to replace the total system with a novel ar-chitecture on the basis of an autoencoder [1], [34]. This paperaims to provide an overview of these recent studies that focuson the physical layer. We also aim to highlight the potentialsand challenges of the DL-based communication systems andoffer a guideline for future investigations by describing themotivation, proposed methods, performance, and limitationsof these studies. Table I provides the abbreviations appearedin the paper.

TABLE ILIST OF ABBREVIATION

Abbreviation Stands forML machine learningDL deep learning

MIMO multi-input multi-outputmmWave millimeter wave

UDN ultra-densification networkCV computer vision

ASR automatic speech recognitionNLP natural language processingSVM support vector machineNN neural networkGD gradient descent

DNN deep neural networkReLU rectified linear unitsSGD stochastic gradient descentCNN convolutional neural networkRNN recurrent neural network

LSTM long short-term memorySNR signal-to-noise ratioLLR log-likelihood ratioBP belief propagation

HDPC high-density parity checkBCH Bose-Chaudhuri-Hocquenghem

AWGN additive whit Gaussian noiseBER bit-error rate

mRRD modified random redundant iterative decodingNND neural network decoderMAP maximum a posterioriPNN partitioned neural networkFC fixed channelVC varying channel

AMP approximate message passingSDR semidefinite relaxationISI inter symbol interferenceCP cyclic prefix

MMSE minimum mean square errorRTN radio transformer network

BLER block error rateCSI channel state information

The rest of this paper is organized as follows. SectionII provides a brief overview of the basic concepts of DL.Section III presents several application examples of using DLas alternatives for communication systems, such as modula-

3

... output layer

bias

input layer

w1

y

x1

x2

x3

w2

wr

Fig. 1. A mathematical model of a neuron, where r features are connectedby weighted edges and fed into a sigmoid activation function to generate anoutput y.

......

input layer

hidden layers

output layer

......

......

......

......

Fig. 2. A fully connected feedforward NN architecture where all neuronsbetween adjacent layers are fully connected. The DNN uses multiple hiddenlayers between the input and output layers to extract meaningful features.

tion recognition, channel decoding, and detection. Section IVintroduces a novel communication architecture based on anautoencoder. Section V discusses the areas for future research.Section VI concludes the paper.

II. BASIC IDEA OF DL

Langley (1996) defined ML as a branch of artificial in-telligence that aimed to improve performance by experience.After long-standing research from the 20th century, variousalgorithms have been proposed, such as logistic regression,decision tree, support vector machine (SVM), and neural net-work (NN). As an emerging outstanding algorithm belongingto NN, DL is originally derived from the neuron model thatsimulates biological neural system schemes as shown in Fig.1. The weighted sum of several inputs with bias is fed into anactivation function σ(·), usually a sigmoid function, to obtainan output y. An NN is then established by connecting severalneuron elements to a layered architecture.

The simplest NN is called perceptron, which comprises oneinput layer and one output layer. A certain loss function,such as square error or cross entropy, must be establishedfor the perceptron to produce a value that is close to theexpected one as much as possible. Gradient descent (GD) iscommonly used in training the best parameters (i.e., weightsand biases) to minimize such loss function. Despite onlysolving linearly separable problems, a single perceptron willintroduce nonlinear properties, and function as a universalfunction approximator by adding hidden layers and neuronsbetween the input and output layers. The evolved architectureis called multi-layer perceptron that does not significantlydiffer from the current deep NN (DNN) or NNs with multiple

TABLE IIACTIVATION FUNCTIONS

Name Activation function σ(x)

sigmoid1

1 + e−x

tanh tanh(x)

softmaxexi∑j e

xj

ReLU max(0, x)

hidden layers, which have achieved success in CV, ASR, andNLP.

A basic DL model is a fully connected feedforward NN(Fig. 2), where each neuron is connected to adjacent layers andno connection exists in the same layer. The back propagationalgorithm has been proposed as an efficient method for trainingsuch network with GD for optimization. However, the increasein the number of hidden layers and neurons implies theexistence of several other parameters to be determined, therebymaking the network implementation difficult. Numerous prob-lems may be encountered during the training process, suchas vanishing gradients, slow convergence, and falling into thelocal minimum.

To solve the vanishing gradient problem, new activationfunctions (e.g., rectified linear units (ReLU), a special featureof Maxout) have been introduced to replace the classic sigmoidfunction. Table II shows the activation functions that canadapt to other situations. To achieve faster convergence anddecrease computation complexity, classic GD is adjusted intostochastic GD (SGD), which randomly selects one sample tocompute the loss and gradient every time. Stochasticity causessevere fluctuation in the training process; thus, mini-batchSGD (a batch of samples computed simultaneously) is adoptedas a tradeoff between classic GD and SGD. However, suchalgorithms still converge to the local optimal solution. To solvethis problem and further increase the training speed, severaladaptive learning rate algorithms (e.g., Adagrad, RMSProp,Momentum, and Adam) have been proposed. Although thetrained network performs well in the training data, this networkmay perform poorly in the testing process because of overfit-ting. In this case, early stopping, regularization, and dropoutschemes have been proposed to achieve favorable results intraining and testing data.

Convolutional NN (CNN) is another emerging DNN archi-tecture that is developed from a fully connected feedforwardnetwork to prevent a rapid growth in the parameters whenapplying the latter to image recognition. CNN introduces theidea of designing particular DNN architectures depending onthe requirements of specific scenarios. The basic concept ofCNN is to add convolutional and pooling layers before feedinginto a fully connected network (Fig. 3). In the convolutionallayer, each neuron only connects to parts of neurons in theformer adjacent layer. These neurons that are organized in amatrix form comprise several feature maps, and the neuronsshare the same weights in each map. In the pooling layer,the neurons in the feature maps are grouped to compute for

4

dense layers

inputs convolution pooling

...

dense

Fig. 3. A CNN architecture that adds convolution layers and pooling layersbefore dense layers. Each output in the convolution layers is obtained bydot production between a certain filter matrix and an input matrix comprisedof several neurons in the upper layer. Each output in the pooling layers isobtained by averaging or searching maximum in a group of neurons in theconvolution layer.

hidden layer l-1

hidden layer l

hidden layer l+1

...

output yt

unfold

l-1

l

l+1

output

l-1

l

l+1

output yt

l-1

l

l+1

output yt+1

ht-1l

ht-1l+1

ht-1l-1

htl

h tl+1

h tl-1

h t+1l

h t+1l+1

h t+1l-1

... ...

yt-1

...

... ... ... ...

input xt input intput xt input xt+1xt-1

Fig. 4. A RNN architecture that considers the extracted features in the formerstate as the one of the current input information. The current outputs dependon current and former inputs so that the network achieves memory.

the mean value (average pooling) or maximum value (maxpooling). Thus, the parameters are substantially decreasedbefore using the fully connected network.

Recurrent NN (RNN) aims to provide NNs with memorybecause the outputs depend not only on the current inputsbut also on the formerly available information in cases suchas NLP. Compared with the aforementioned memoryless NNswithout connections in the same hidden layer, the neuronsare connected such that the hidden layers consider theirformer outputs as current inputs to acquire memory (Fig. 4).Some commonly used RNNs include Elman network, Jordannetwork, bidirectional RNN, and long short-term memory(LSTM).

III. DL AS AN ALTERNATIVE

The general classic communication system architecture isconstructed as a block structure as shown in Fig. 5, and multi-ple algorithms solidly founded on expert knowledge have beendeveloped in long-term research to optimize each processingblock therein. Previous studies have attempted to leverage con-ventional ML approaches, such as SVM and small feedforwardNNs, as alternative algorithms for individual tasks. DL archi-tectures have recently been introduced into several processingblocks to adapt to emerging complex communication scenariosor outperform conventional communication algorithms. Thissection presents some examples of DL applications that covermodulation recognition, channel decoding, and detection.

A. Modulation Recognition

Modulation recognition aims to distinguish modulationschemes of the received noisy signals, which is important tofacilitate the communication among different communicationsystems, or interfere and monitor enemies for military use.Studies on modulation recognition have been conducted formany years using conventional algorithms that are dividedinto two categories, namely, decision-theoretic and patternrecognition approaches [35]. These approaches have severalcommon procedures, such as preprocessing, feature extraction,and classification. Previous studies have been keen to leverageML algorithms (usually SVM and NN) due to their robustness,self-adaption, and nonlinear processing ability.

An NN architecture (Fig. 6) is proposed in [35] as apowerful modulation classifier to discriminate noise-corruptedband-limited modulated signals from 13 types of digital modu-lation schemes (e.g., MPSK, MASK, and MFSK) and analogmodulation schemes (e.g., AM, DSB, and FM). Similar toconventional expert feature analysis, this approach manuallyextracts features that characterize digital [36] or analog mod-ulations [37] from the original signals, as well as the threefundamental parameters of instantaneous amplitude, phase,and frequency. A four-layer NN is then fed with these featuresto discriminate the modulation schemes, except the levels ofMASK and MPSK, as identified by another two two-layerNNs. The abovementioned problem-solving procedures, suchas manual feature extraction and NN classification, have beencommonly applied in previous research (e.g., [14]). Theirperformance strongly depends on the extracted features dueto the limited learning ability of conventional NNs.

DL is known for its impressive learning capacity. Its in-troduction highlights the possibility of replacing artificiallyextracting features with automatic learning features from rawdata to optimize end-to-end performance. For example, aCNN-based approach is proposed in [1] that learns single-carrier modulation schemes based on sampled raw time-seriesdata in the radio domain. The CNN classifier is trained by1.2M sequences for 128 complex-valued baseband IQ samplescovering 10 different digital and analog modulation schemesthat pass through a wireless channel with the effects ofmultipath fading, sample rate offset, and center frequencyoffset. The CNN-based modulation classifier dominates andoutperforms two other approaches, namely, extreme gradientboosting with 1,000 estimators and a single scikit-learn treeworking on the extracted expert features. The performance ofthe classifier improves along with increasing signal-to-noiseratio (SNR). However, such performance cannot be improvedfurther at high SNR because the short-term nature of trainingsamples confuses the CNN classifier between AM/FM if theunderlying signal carries limited information and betweenQAM16/QAM64 that share constellation points. More samplesmust be used to eliminate such confusion and highlight thepotential of the CNN architecture for further improvement.

B. Channel Decoding

ML-based decoders have emerged in the 1990s [28] be-cause of the straightforward applications of NN to channel

5

channel

source

destination

source coding

source decoding

channel encoding

channeldecoding

modulation

demodulation detection

RF transmitter

RF receiver

channel estimation

Fig. 5. A typical communication system diagram with blocks including source encoding/decoding, channel encoding/decoding, modulation/demodulation,channel estimation and detection, and RF transceiving. These signal processing blocks are optimized individually to achieve reliable communication betweenthe source and target destination.

input layer

.

.

.

.

.

.

extracted features as input

AMDSBVSBLSBUSBCombined(AM-FM)FMPSK2PSK4

MASK

MFSK

ASK4

FSK4

key feature 1

key feature 2

ASK2

FSK2

hidden layer 1(sigm

oid activation function)

hidden layer 2(linear activation function)

output layer(sigm

oid activation function)

input layer

output layer

input layer

output layer

Fig. 6. An NN architecture for modulation recognition that consists of afour-layer NN and two two-layer NNs [35]. The former NN distinguishesmost modulation schemes except for ASK and FSK. The latter NNs classifyASK2/ASK4 and FSK2/FSK4 with additional manually extracted key features.

decoding. First, channel decoding algorithms focus on bit-level processing; therefore, bits, or the log-likelihood ratios(LLRs) of codewords, are conveniently treated as the inputsand expected outputs of NNs. Mainly in previous studies, theinput and output nodes directly represent bits in codewords[38], or use one-hot representation (i.e., each node representsone of all possible codewords) [39] such that the correspondingvector has only one element equal to 1 and other elementsequal to 0. Second, unlike the difficulty in obtaining datasets inother scenarios, man-made codewords can generate sufficienttraining samples and achieve labeled outputs simultaneously.Furthermore, NN can learn from the noise version of code-words and avoid overfitting problems because the codewordsare randomized to different samples by the channel noise ineach training epoch.

Compared with conventional decoders that are designedstrictly based on information theory and often follow aniteration process that leads to high latency, the NN archi-tecture does not require expert knowledge. After traininga decoder, the decoding process becomes simple with lowlatency. Furthermore, the well-developed conventional decod-

ing algorithms can serve as benchmarks for comparing theperformance of newly proposed DL-based methods.

Despite its advantages, the NN-based decoder is funda-mentally restricted by its dimensionality (i.e., the trainingcomplexity increases exponentially along with block length)[40] to learn fully from and classify codewords, therebylimiting its scalability. Fortunately, DL algorithms providethe potential to this problem. Aside from its inherent paral-lel implementation for complex computations, a multi-layerarchitecture with realizable training methods offers DL withan excellent learning capacity. Recent studies have leveragedthese advantages to address the issue of dimensionality byintroducing DL to well-developed iterative algorithms (i.e.,unfolding the iterative structure to the layer structure) and bygeneralizing from limited codewords (i.e., inferring from seencodewords to unseen ones).

The fully connected DNN-based decoder proposed in [26],which falls under the first category, aims to improve theperformance of the belief propagation (BP) algorithm in de-coding high-density parity check (HDPC) codes. BP algorithmcan achieve near Shannon capacity in decoding low-densityparity check codes but struggles at decoding HDPC codes,such as Bose-Chaudhuri-Hocquenghem (BCH) codes, that arecommonly used today. Conventional BP decoders can beconstructed using a Tanner graph, where each variable node isconnected to some check nodes. In each iteration, a variable(check) node transmits a message to one of its connected check(variable) nodes based on all messages received from the otherconnected check (variable) nodes. Thus, the BP algorithmwith L iterations can be unfolded as 2L hidden layers in afully connected DNN architecture, where each hidden layercontains the same number of neurons that represent the edgesin the Tanner graph. These neurons will output the messagestransmitted on corresponding edges. In other words, the odd(even) hidden layers output messages that are transmitted fromthe variable (check) nodes to the check (variable) nodes thatare associated with the neurons or edges in the Tanner graph.

The input and output are vectors of size N that represent N -dimensional LLRs received from channels and N -bits decodedcodewords, respectively. The equations for calculating themessages based on expert knowledge are then applied to thecorresponding layers, and the final marginalization of the BP

6

……

……

……

……

……

……

input layer

2L hidden layers

output layer

N-dimensional input vector

N-dimensional output vector

first odd andeven layer (merged)

odd layer even layer odd layer last even layer

……

……

……

……

……

……

Fig. 7. A fully connected DNN-based BP decoder that unfolds conventionalBP algorithms with L iterations to 2L hidden layers [26].

……

……

…LLR m

essages

……

……

decoded codewords

input layer

marginalization

(output) layer

variable (odd) layer

parity (even) layerFig. 8. A BP-RNN decoder architecture with a variable layer and a paritylayer [27]. The output of the parity layer is fed into the variable layer in thenext time step, so that L time steps represent L iterations of the conventionalBP algorithm.

algorithm is achieved after the last layer (i.e., after the lastiteration), which is involved in the loss function. The onlydifference of the DNN-based BP algorithm from the originalalgorithm is that weights are added to these equations orare assigned to the edges in the Tanner graph. Thus, theDNN-based decoder shares the same decoding structure ofthe Tanner graph; however, the messages are propagated onthe weighted edges.

The DNN-based BP decoder proposed in [26] is shown inFig. 7. The first two hidden layers are merged into one layerbecause the check nodes do not contain any information in thefirst iteration. LLR messages (i.e., inputs) are needed by thevariable nodes to calculate the outgoing messages accordingto the corresponding BP formulas. These messages are repre-sented as red arrows at the odd layers in Fig. 7. The DNN-based BP decoder preserves the property of BP with essentiallysimilar structures and its performance is independent from thetransmitted codewords. Thus, the network can be trained bythe noise version of a single codeword (i.e., zero codeword)that belongs to all linear codes.

After training the codewords that pass through the additivewhite Gaussian noise (AWGN) channel with SNRs rangingfrom 1 dB to 6 dB, a DNN-based BP decoder for BCH (15,11)with 10 hidden layers (i.e., 5 full iterations) achieves close to

K bits messages

N bits codewords

……

……

……

……

LLR inputs

K bits messages

(estimated)

encoder

modulation layer

noise layer

LLR generator

3 hidden layers N

N decoder

Fig. 9. A plain DNN architecture for channel decoding to decode k bitsmessages from N bits noisy codewords [28].

maximum likelihood results. The performance of the DNN-Based decoder degrades on large BCH codes; nevertheless,the decoder consistently outperforms the conventional BPalgorithm. One interpretation given by [26] is that properlyweighting the transmitted messages compensates for the smallcycle effect in the original Tanner graph. Furthermore, the finalmarginalization information is considered obtainable after eachodd hidden layer. Therefore, a modified architecture calledmultiloss, which adds such information into the loss function,is proposed to increase the gradient update and allow lowerlayers. The multiloss architecture achieves a better bit-errorrate (BER) performance compared with the previous DNN-based BP decoder.

In [27], the aforementioned fully connected DNN-basedBP decoder is transformed into an RNN architecture, whichis named as BP-RNN decoder, by unifying the weights ineach iteration and feeding back the outputs of parity lay-ers into the inputs of variable layers, as shown in Fig. 8.The number of time steps equals to that of iterations. Thisprocess significantly reduces the number of parameters andresults in a performance that is comparable with that of theformer decoder. The multiloss concept is also adopted in thisarchitecture. These proposed methods outperform the plainBP algorithm whether on the regular or sparser Tanner graphrepresentations of the codes. A modified random redundantiterative decoding (mRRD)-RNN decoder, which combines theBP-RNN decoder with mRRD algorithms [41] by replacingits BP blocks, is proposed and outperforms the plain mRRDdecoder with only a slight increase in complexity.

A plain DNN architecture called NN decoder (NND), isproposed in [28] to decode codewords of length N with Kinformation bits, as shown in Fig. 9. An encoded codeword oflength N passes through a modulation layer and an AWGNchannel layer to represent the communication channel effect.The LLR information of this noisy codeword is then generatedas an N -dimensional input of the network with three hiddenlayers that is trained to output K estimated information bits.This NND works for 16-bit length random and structuredcodes (e.g., polar codes) and achieves maximum a posteriori(MAP) performance; however, such performance degradesalong with an increasing number of information bits. Fortu-nately, the NND can generalize a subset of codewords (fortraining) to unseen codewords when decoding structured codesso that it is promising to address the issue of dimensionalityby learning a form of decoding algorithm.

7

LLR-values

…………

input

outputknown bits

outputknown bits

input

input

output

decoded codeword

BP

decoding stages

1st NND

2nd NND

rth NND

Fig. 10. A partitioned NN decoding architecture for polar codes with eachNND decoding a sub-codeword [29].

Other advantages, such as parallel architecture and one-shot decoding (i.e., no iterations) with low latency, makesthe NND a promising alternative channel-decoding algorithm.The authors in [28] suggest the existence of an optimal SNRfor training to classify the codewords over arbitrary SNRs,and argue that having more training epochs can lead to betterperformance. Training with direct channel values or LLR whileusing mean squared error or binary cross-entropy as a lossfunction has no significant effect on the final results.

To further scale DL-based decoding to large codewords,several former NNDs, with each decoding a sub-codeword, arecombined in [29]. These NNDs are firstly trained individuallyto meet the MAP performance, and then combined to replacethe sub-blocks of a conventional decoder for polar codes. Thus,a large codeword is concurrently decoded. Specifically, theencoding graph of the polar codes defined as partitionablecodes in [29] can be partitioned into sub-blocks that can bedecoded independently. Thus, the corresponding BP decoderis partitioned into the sub-blocks replaced by NN architecturesthat are connected in the remaining BP decoding stages.

A partitioned neural network (PNN) architecture is proposedas shown in Fig. 10. The received LLR values from the channelare propagated step by step corresponding to the BP updatealgorithm and arrive at the first NND to decode the firstsub-codeword, which is then propagated in the remaining BPdecoding steps as known bits. This decoding process continuesuntil all sub-codewords are sequentially decoded. Pipelineimplementation can also be applied to decode multiple code-words. This proposed PNN architecture can compete well withconventional successive cancellation and BP decoding algo-rithms; however, its performance deteriorates along with anincreasing number of partitioned sub-blocks, thereby limitingits application for large codes. Nevertheless, PNN still offersa promising solution to dimensionality problems.

C. Detection

Along with the increasing application of advanced com-munication systems with promising performance, capacity,and resources (e.g., massive MIMO and mmWave), the avail-able communication scenarios or communication channelsare becoming increasingly complex, thereby increasing the

kth layer (k+1)th layer

L layers

x(k) x (k+1)

b2k

v(k)v (k+1)

b1k b3k b1k+1

ReLU

activation layer(w

eights: )1k

w

H HT

H yT

piecewise linear

soft sign operator(w

eights: )2k

w

weighted

sum ( )

3kw  

……z (k)

z (k)R

eLU activation layer

(weights: )

1k+1

w

Fig. 11. A single layer structure of the DetNet architecture [7]. L layers ofthe DetNet represent L iterations of the projected gradient descent algorithm.

computational complexity of channel models and the corre-sponding detection algorithms. Conventional (iterative) detec-tion algorithms form a computational bottleneck in real-timeimplementation. In comparison, given their expressive capacityand tractable parameter optimization, DL methods can beused for detection by unfolding specific iterative detectionalgorithms (similar to channel decoding) and for makinga tradeoff between accuracy and complexity by leveragingflexible layer structures. Data detection becomes a simpleforward pass through the network and can be performed inreal time.

In [7], a DL-based detector called DetNet, which aims toreconstruct the transmitted x by treating the received y andchannel matrix H (assumed to be perfectly known) as inputs, isintroduced by unfolding a projected gradient descent algorithmfor maximum likelihood optimization, in which the iterationis computed as follows given by [7]:

x(k+1) = Π

(x(k) − δk

∂‖y −Hx‖2

∂x

∣∣∣∣x=x(k)

)= Π

(x(k) − δkHTy + δkH

THx(k)),

(1)

where δk is the step size, x(k) is the estimate in the kthiteration, and Π(·) is a nonlinear projection operating on alinear combination of factors that is formed by the data. Thesedata are lifted into the higher dimension and operated by ReLUactivation function (represented by ρ in (2) given by [7]) toleverage the DNN architecture. As shown in Fig. 11, eachoriginal iteration is unfolded to the following layer:

z(k) = ρ

W1k

HTyx(k)

HTHx(k)

v(k)

+ b1k,

x(k+1) = ψtk

(W2kz

(k) + b2k

),

v(k+1) = W3kz(k) + b3k,

x(1) = 0,

(2)

8

where {W1k,W2k,W3k} and {b1k,b2k,b3k} are theweights and bias, respectively, and ψt(·) is a defined piecewiselinear soft sign operator. L layers (i.e., L iterations) are appliedin DetNet. A loss function covering the outputs of all layersis adopted to prevent the gradients from vanishing, and theAdam optimization algorithm is used for the training.

To test the robust performance of DetNet in complex chan-nels and generalization, two channel scenarios are considered,namely, the fixed channel (FC) model with a deterministicyet ill condition and the varying channel (VC) model withan H that is randomly generated by a known distribution.Compared with the conventional approximate message passing(AMP) and semidefinite relaxation (SDR) algorithms thatprovide near-optimal detection accuracy, the simulation resultsin the FC scenario indicate that DetNet outperforms AMP andachieves similar accuracy as SDR but runs 30 times faster,whether trained by the FC or VC channel. Therefore, DetNetremains robust in ill-conditioned channels and is generalizedduring training to detect arbitrary channels. Similarly, DetNetworks 30 times faster than SDR in the VC channel but shows acomparable performance. DetNet works even faster at shallowlayers but with less accuracy, thereby illustrating a tradeoffbetween accuracy and complexity. Accordingly, the DetNetpresents a promising solution to computationally challengingdetection tasks in advanced complex scenarios.

Despite the complex channel models, no mathematicallytractable channel models are available to characterize thephysical propagation process accurately in highly complexcases, such as molecular and underwater communications.Therefore, new detection approaches that do not require chan-nel information must be devised for the novel systems. DLoffers a promising solution to this problem because of its ex-pressive capacity, data-driven characteristic, non-requirementfor a defined channel model, and ability to optimize end-to-end performance. In [4] a fully connected DNN, CNN, andRNN are applied for detection in a molecular communicationsystem. A molecular communication experimental platformis established to generate an adequate dataset by repeatedlytransmitting a consecutive sequence of N symbols from Mpossible types. Chemical signals, acids (representing bit-0),and bases (representing bit-1) are used to encode pH levelinformation. After transmission in water, the pH values in eachsymbol interval are treated as the received signals. In a simplebaseline detection, bit-0 or bit-1 is detected in each symbolinterval according to a decrease or increase in pH values,respectively, which is decided by the difference between thesubintervals that the original symbol interval is divided into(i.e., positive for bit-1 and negative for bit-0).

When designing the DL-based detector, a simple memory-less system is considered such that the nth received signalis determined by the nth transmitted signal xn. Symbol-to-symbol detection can be implemented using Dense-Net, abasic fully connected DNN architecture (Fig. 12) that inputsthe former pH difference values and some absolute valuesrepresenting as the received feature vector yn and outputsan M -dimensional probability vector. A CNN-based detectoris also introduced to adapt to the effect of random shift.The system with inter symbol interference (ISI) and memory

1

0.020.85

0

0.03 M

input layer

hidden layer 1(dense layer)

hidden layer 2(dense layer)

output layer(softm

ax activation)yn

xn^

feature vector 1

010

0 M

estimated

Fig. 12. A dense-Net for symbol-to-symbol detection to detect an estimatedxn in one-hot representation [4].

input layer

LSTM 1

LSTM 2

output layer(softmax)

input layer

LSTM 1

LSTM 2

output layer(softmax)

hn-1

(1)h n

(1)hn+1

(1)

hn-1

(2)h n

(2)h n+1

(2)

yn-1 yn

x n-1 xn^^

Fig. 13. A LSTM-based detector for sequence detection with L LSTM layersfollowed by the dense output layer with softmax activation function [4].

presents a highly sophisticated yet realistic scenario thatrequires sequence detection and can be implemented by theLSTM network, a typical algorithm for sequence processingbelonging to RNN. As shown in Fig. 13, the nth estimated xn

is detected from the previously and currently received signals.The simulation results demonstrate that all these DL-based

detectors outperform the baseline, whereas the LSTM-baseddetector shows an outstanding performance in the molecularcommunication system with ISI. This result validates the po-tential application of DL in future novel systems and highlightsthe importance of selecting suitable DL architectures thatadequately reflect the characteristics of the physical channel.

Aside from complex scenarios, DL can be applied to well-researched channel conditions to enhance its performancefurther. For example, in [32], a five-layer fully connected DNNis embedded into an OFDM system for channel estimation anddetection by treating the channel as a black box. During theoffline training, the original data and pilots are formed intoframes, and pass through a statistic channel model with dis-tortion after inverse discrete Fourier transform processing andadding a cyclic prefix (CP), so that the received signal in timedomain is generated. The frequency domain complex signal isobtained by removing CP from the former time domain signaland performing discrete Fourier transform. The frequencydomain signal comprising data and pilot information, is thenfed into the DNN detector to reconstruct the transmitted datain frequency domain. In comparison with the conventional

9

message s transmitted x

received y

reconstruction

estimated

transmitter

receiver

channel

s

Fig. 14. A simple form of communication system that reconstructs thetransmitted message at the receiver side [8].

minimum mean square error (MMSE) method, the DNNdetector achieves comparable performance in online testing butshows better performance when less pilots or no CP is usedor when clipping distortion is introduced to reduce the peak-to-average power ratio. The DNN detector also shows a stableperformance when tested in channel models with differentdelays and path numbers, thereby highlighting its robustnessand ability to further improve conventional communicationsystems.

IV. DL AS A NOVEL COMMUNICATION ARCHITECTURE

Despite their promising performance, DL-based algorithmsare often proposed as alternatives for one or two processingblocks of the classic block-structure communication system(Fig. 5). However, basic communication tasks aim to propagatesignals from one point to another through a physical commu-nication channel, and use a transmitter and receiver to managethe practical channel effect and ensure reliability when no rigidblock structure is requested. Thus, the optimization in eachblock cannot guarantee global optimization for the commu-nication problem because performance improvements can beachieved if two or more blocks are jointly optimized. O’Sheaet al. recast communication as an end-to-end reconstructionoptimization task and propose a novel concept based on DLtheories to represent the simplified system as an autoencoder.They first introduce an autoencoder to the field of com-munication [1] and then propose radio transformer network(RTN) to combine the DL architecture with the knowledgeof communication experts. This autoencoder system is alsoextended to multi-user and MIMO scenarios in [1] and [34],respectively.

A. Autoencoder-based End-to-end System

In [1], communication is considered an end-to-end re-construction problem where the transmitted messages arereconstructed at the receiver side over a physical channel(Fig. 14). Therefore, the autoencoder can represent the entirecommunication system and jointly optimize the transmitter andreceiver over an AWGN channel. An original autoencoder is anunsupervised DL algorithm that learns a compressed represen-tation form of inputs that can be used to reconstruct the inputsat the output layer. In the proposed approach, the transmitterand receiver are represented as fully connected DNNs, whereasthe AWGN channel between them is represented as a simplenoise layer with a certain variance. Thus, the communication

dense layers

normalization layer

transmitter receiverchannel

1

0.020.85

0

0.03 M

1

010

0 M

noise layer

dense layers

s

x y

p ssoftm

ax layer1

010

0 M

Fig. 15. A simple autoencoder for an end-to-end communication system [1]that encodes s in one hot representation to an N -dimensional transmittedsignal x. This encoded signal after adding a noise y is then decoded to anM -dimensional probability vector p, and then s is determined.

system can be regarded as a large autoencoder that aims tolearn from s, which is one out of the M possible messagesfor propagation, to generate a representation of the transmittedsignal x that is robust against the imperfect channel. Therefore,at the receiver side, the original message can be reconstructedas s with a low error rate by learning from the receivedy. The entire autoencoder-based communication system istrained to achieve end-to-end performance, such as BER orblock error rate (BLER). However, this process may addredundancy in representation x, which differs from the originalDL autoencoder that learns to compress inputs restoratively.

In an implementation example (Fig. 15) [1], s is representedas an M-dimensional one-hot vector. Therefore, K = log2(M)bits are transmitted simultaneously. After being fed into theDNN transmitter with multiple dense layers followed by anormalization layer, an N -dimensional vector x with energyconstraints is generated. Therefore, the communication rateof such system is R = K/N . The received N -dimensionalsignal y noised by a channel represented as a conditionalprobability density function p(y|x) is subsequently learned bythe DNN receiver with multiple dense layers. The last layerof the receiver is a softmax activation layer that outputs anM -dimensional probability vector p, in which the sum of itselements (≥ 0) is equal to 1. The index of the largest elementwith the highest probability determines which of the Mpossible messages is the decoded s. The autoencoder is trainedby SGD at a fixed SNR with categorical cross-entropy as a lossfunction to optimize BLER performance. The autoencoder-based communication system achieves a comparable or betterperformance than the conventional BPSK with Hamming code,thereby indicating that this system has learned a joint codingand modulation scheme.

Such autoencoder architecture can solve other communica-tion problems in the physical layer, such as pulse shaping andoffset compensation, by dealing with IQ samples and can beapplied to complex scenarios where communication channelsare unknown.

B. Extended Architecture with Expert KnowledgeAside from interpreting the communication system as a

plain DL model, it is reasonable to consider introducingcommunication expert knowledge or adjusting the DL ar-chitecture to accommodate certain communication scenarios

10

dense layers

normalization layer

transmitter receiverchannel

1

010

0 M

1

0.020.85

0

0.03 M

1

010

0 M

noise layer

dense layers

s

x y

p s

transformation layer

ω

linear activation

dense layers

softmax layer

y

y_

Fig. 16. A receiver implemented as an RTN [1]. In the receiver, a new blockconsisting of dense layers with linear activation function and a deterministictransformation layer, is added to the original autoencoder-based communica-tion system.

interference channel

x1

x 2

s1

s2

n1

n2

y1

y2

s 1

s2

^

^

transmitter 1

transmitter 2

receiver 1

receiver 2

Fig. 17. Two-user scenario with interfering channel [1]. Each receiver has todetect their own messages based on received signals from two transmitters.

or accelerate the training phase. As shown in [1], certainparametric transformations correspond to the channel effect.The inverse forms of these transformations can compensatefor the channel distortion whereas the estimation of theirparameters are highly related to received signals, therebyenabling the integration of communication knowledge into aDL system by generating parameters from these signals forsuch deterministic transformations.

RTN that extends the former DNN receiver by adding aparameter estimation module before learning to discriminateis proposed [1] (Fig. 16). Specifically, a parameter vector ωis learned from the received y by a fully connected DNNwith linear activation in the last layer and then fed into adeterministic transformation layer that is parameterized by ω,which corresponds to specific communication properties. Thetransformation is performed on y to generate a canonicalizedy. The formerly learned DNN in the receiver with softmax ac-tivation completes the following discrimination task to outputan estimated s. In addition to its application to the receiver,the RTN architecture can be used for any scenario wheredeterministic transformations with estimated parameters exist.The simulation results indicate that the autoencoder with RTNoutperforms and converges faster than the plain autoencoderarchitecture, thereby validating the effectiveness of combiningthe DL model with prior expert knowledge.

C. Autoencoder for Multi-user

The autoencoder system must be realizable for highlycomplex scenarios, such as multi-user communication overinterference channels, to become a universal communicationarchitecture. The application of autoencoders to a simple two-user scenario is explored in [1], where two autoencoder-based

normalization layer

transmitter receiverchannel

1

0.020.85

0

0.03 M

1

010

0 M

noise layer

R

eLU layers

s

x y

p s

softmax layer

1

010

0 M

complex m

ultiply layer

MIMO channel generator

(r×t)

H

ReLU

layers

linear layer

Fig. 18. A general MIMO channel autoencoder architecture [34]. This systemdeals with the open-loop case where no CSI feedback exists.

transmitter-receiver pairs attempt to communicate simultane-ously over the same interfering channel. The only difference ofthis scenario from the single-user case is that the entire systemis trained to achieve conflicting goals with interference at thereceiver side (Fig. 17), that is, each transmitter-receiver pairaims to optimize the system to propagate their own messagesaccurately.

One proposed method is to optimize the weighted sum ofcross-entropy loss functions as J = αJ1 + (1 − α)J2. J1and J2 represent the losses of the first and second transmitter-receiver pairs, respectively. α is a dynamic weighted factorranging from 0 to 1 and is related to mini-batch parameters.The autoencoder system achieves the same or even betterBLER performance at the same communication rate thanconventional uncoded QAM schemes, thereby validating itspotential application in multi-user cases.

D. Autoencoder for MIMO

In [34], the autoencoder communication system is extendedto MIMO channels. Two types of conventional MIMO commu-nication schemes are considered, namely, an open-loop systemwithout channel state information (CSI) feedback and a closed-loop system with CSI feedback. Unlike the AWGN channelmodel used before, an r × t MIMO channel response His randomly generated before adding noise in the channelnetwork block.

In the first open-loop case, the transmitter in the primaryautoencoder system is modified to encode message s to thetransmitted signal x as t parallel streams of N time samples.The message is then multiplied by a pre-defined H beforepassing through the noise layer to generate the received signaly as r streams of N time samples (Fig. 18). Therefore, theestimated s is learned from y in the following procedures.In the simulation of a 2 × 1 MIMO system, the adjustedautoencoder architecture outperforms the conventional open-loop MIMO schemes, such as Alamouti STBC with an MMSEreceiver, when SNR ≥ 15dB.

In the closed-loop case, an idealized situation is consideredwhere the transmitter can obtain perfect CSI information fromthe receiver. Compared with the open-loop case, a feedbackis added to the general MIMO autoencoder architecture, as

11

concatenation

H

normalization layer

transmitter receiverchannel

1

0.020.85

0

0.03 M

1

010

0 M

noise layer

s

x y

p s

softmax layer

1

010

0 M

complex m

ultiply layer

MIMO channel generator

(r×t)

H

ReLU

layers

linear layer

ReLU

layers

ReLU

layers softm

ax layer

Hb

Fig. 19. A MIMO channel autoencoder for closed-loop scenarios with perfectCSI or quantized CSI feedback [34]. In the perfect CSI case, a concatenationlayer is added to the general MIMO autoencoder to concatenate H with s asinputs. In the quantized CSI case, a block in yellow dashed box to compactH as Hb before concatenation, is further added to the former architecture.

shown in Fig. 19. More specifically, The channel responseH generated in the channel module is propagated to thetransmitter as an input, which is concatenated with s beforeencoded to x. The MIMO autoencoder with perfect CSIoutperforms the conventional singular value decomposition-based precoding scheme at most SNRs.

However, CSI errors exist in practice, such as inaccurateestimation, compaction, and unreliable feedback propagation.Therefore, a quantized-CSI scheme is introduced to the autoen-coder to represent the practical situation. The only differenceof this scheme from the former MIMO autoencoder for aperfect CSI is that before it is concatenated with message sat the transmitter side, the real-valued H is compacted as a b-bit vector Hb by another NN to represent 2b modes, which isshown as the yellow dashed box in Fig. 19. The simulation ofa 2×2 system demonstrates that in contrast to the performancedecrease of the conventional LLoyd algorithm with fewer bitsof CSI, the autoencoder outperforms the perfect CSI in somequantized CSI cases. Thus, such quantization method helpsthe system achieve better convergence for each channel mode.

V. DISCUSSION AND FUTURE WORKS

The application of DL to the physical layer of wirelesscommunication systems presents a new research field thatis still in its early stage. Although previous studies haveshown promising results, some extensive challenges are worthexploring further in future investigations.

1) Further extensions on extant researches: Emerging re-searches have attempted to introduce DL as an alternative forcertain modules of the conventional communication system.The performance achieved in these researches inspire anextension of the proposed approaches to further applicationsor evolution.

For example, the success of DL-based modulation recogni-tion methods verifies the capability of DL in feature extractionand recognition. It implies the potential to apply DL forefficient recognition of other system parameters, such as rec-ognizing source coding or channel coding schemes, extractingCSI, and learning characteristics from signals, which enables

the wireless communication system more “knowledgeable” or“intelligent” in the physical layer. Combining the intelligencein the physical layer and upper layers straightforwardly, thecommunication systems can adapt to the features of trans-mitted signals and propagation environment automatically andachieve flexible deployment.

In [8], the learned architecture has difficulties with VCs.Inspired by the competitive performance achieved by CSIfeedback in the MIMO autoencoder system in [34], thiscondition can be improved if the channel information is sentto the receiver, as suggested by the expert knowledge from thecommunication domain. Similarly, the performance of the DLapproaches presented in Figs. 12, 13, and 15 can be furtherimproved by introducing channel information.

2) Specialized DL architecture for communication: The DLarchitecture used for learning has an important influence on thefinal performance of the network, whereas the current designschemes for communication networks are simple. For certainalgorithms with iterations (e.g., data detection [7], decoding[26], and compressive sensing [42], [43], [44]), a straightfor-ward method is to unfold the iterations as layer structures,and weights can be added to the original iterative formulas fortraining. The sub-blocks of a partitionable algorithm can alsobe replaced, as shown in [29]. Despite their low complexityand comparable performance, these approaches only modifythe conventional algorithms and do not create any significantdifferences or learn any new algorithms. Some studies [28],[1] have applied plain DL methods to communication cases.However, these methods suffer from dimensionality problemsbecause the complexity of networks and training phases growsexponentially along with the increasing number of messagesor codewords, thereby limiting their application in practicalscenarios.

Therefore, expert knowledge from the communication do-main must be introduced to DL architecture design. As pre-sented in [4], the performance of the system can be improvedusing suitable DL networks that better characterize the channelconditions. The RTN proposed in [1], represents the firstattempt to add a priori knowledge to the plain DL archi-tectures, and illustrates the benefits of reducing complexityand accelerating convergence. Despite its limited salability,RTN has inspired a design of specialized DL systems forcommunication fields based on the basic DL architecture.However, the specialized design is not limited to the proposedRTN architecture and worth extensive exploration. Noveland advanced DL-based systems that correspond to expertknowledge in propagation must be proposed to devise effectivealgorithms for future communication systems and address thelimitations in their scalability.

3) Learning strategies and performance analysis: Althoughthe recently proposed DL-based communication algorithmsdemonstrate a competitive performance, they lack solid foun-dations for theoretical derivation and analysis. The perfor-mance boundaries and the minimum dataset required fortraining also lack any certification. Moreover, given that theresearch on the application of DL to the physical layer of wire-less communication systems is still in its early stage, the rulesof learning strategies remain unknown and warrant further

12

exploration. Unlike CV where the dataset is often representedas pixel values, the system design in the communicationdomain relies on practical channel conditions, and the signalsare considered man-made representations for reliable propa-gation. Thus, the optimal input and output representations forDL communication systems remain unknown. The inputs arerepresented as binary or one-hot vectors in previous studiesbut are not limited to the two schemes without verificationof optimality. The selection of loss functions and trainingstrategies, as well as the training of the DL system whetheron a fixed SNR or a range, present other topics worthy ofinvestigation in future research.

4) From simulation to implementation: Most of the DL-based algorithms designed for the physical layer of wirelesscommunication systems are still in their simulation stages. Tothe best of our knowledge, only [8] attempts to investigatethe implementation of these algorithms. Therefore, researchersstill have to improve these algorithms considerably before theycan be implemented. First, an authentic set of data from realcommunication systems or prototype platforms in actual phys-ical environments must be made available to all researchers tohelp them train their DL architectures on common measureddata and compare the performance of different algorithmsobjectively. Second, the communication channels in the sim-ulation are often generated by certain models. Therefore, DLsystems achieve comparable performance because of theirimpressive expressive capacity. However, the diverse physicalchannel scenarios are considerably more complex in realityand change over time. Given that the current DL systemsare mainly trained offline, their generalization capability mustbe guaranteed. Designing specialized systems for specificscenarios or general systems that dynamically adapt to VCconditions is also imperative. DL tools for hardware, such asfield programmable gate array, must be developed to deploythe DL methods on hardware and achieve fast realization.

VI. CONCLUSION

This paper reviews the literature on the application of DLmethods to the physical layer of wireless communicationsystems to replace parts of the conventional communicationsystem or create a new DL-based architecture (i.e., an au-toencoder system). Given their excellent expressive capacityand convenient optimization, the DL-based algorithms showcompetitive performance with less complexity or latency andhave potential application in future communication systemswhere conventional theories are challenged. The application ofDL to the physical layer of wireless communication systemspresents a promising research area that is far from maturity.Further studies, including solid theoretical analyses, must beconducted and new DL-based architectures must be proposedto implement DL-based ideas in actual communication scenar-ios.

REFERENCES

[1] T. J. O’Shea and J. Hoydis. (2017) An introduction to deep learningfor the physical layer. [Online]. Available: https://arxiv.org/abs/1702.00832,preprint.

[2] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “MassiveMIMO for next generation wireless systems,” IEEE Commun. Mag.,vol. 52, no. 2, pp. 186–195, Feb. 2014.

[3] N. Gonzalez-Prelcic, A. Ali, V. Va, and R. W. Heath Jr. (2017)Millimeter wave communication with out-of-band information. [Online].Available: https://arxiv.org/abs/1703.10638,preprint.

[4] N. Farsad and A. Goldsmith. (2017) Detection algorithms forcommunication systems using deep learning. [Online]. Available:https://arxiv.org/abs/1705.08044,preprint.

[5] Y.-S. Jeon, S.-N. Hong, and N. Lee. (2016) Supervised-learning-aidedcommunication framework for massive MIMO systems with low-resolution ADCs. [Online]. Available: https://arxiv.org/abs/1610.07693,preprint.

[6] W. Tan, S. Jin, C. K. Wen, and Y. Jing, “Spectral efficiency of mixed-ADC receivers for massive MIMO systems,” IEEE Access, vol. 4, pp.7841–7846, Sep. 2016.

[7] N. Samuel, T. Diskin, and A. Wiesel. (2017) Deep MIMO detection.[Online]. Available: https://arxiv.org/abs/1706.01151,preprint.

[8] S. Dorner, S. Cammerer, J. Hoydis, and S. ten Brink. (2017)Deep learning-based communication over the air. [Online]. Available:https://arxiv.org/abs/1707.03384,preprint.

[9] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,2016, http://www.deeplearningbook.org.

[10] U. Challita, L. Dong, and W. Saad. (2017) Proactive resourcemanagement in LTE-U systems: A deep learning perspective. [Online].Available: https://arxiv.org/abs/1702.07031,preprint.

[11] R. C. Daniels, C. M. Caramanis, and R. W. Heath, “Adaptation in con-volutionally coded MIMO-OFDM wireless systems through supervisedlearning and SNR ordering,” IEEE Trans. Veh. Technol., vol. 59, no. 1,pp. 114–126, Jan. 2010.

[12] S. K. Pulliyakode and S. Kalyani. (2017) Reinforcement learningtechniques for outer loop link adaptation in 4G/5G systems. [Online].Available: https://arxiv.org/abs/1708.00994,preprint.

[13] J. Vieira, E. Leitinger, M. Sarajlic, X. Li, and F. Tufvesson. (2017)Deep convolutional neural networks for massive MIMO fingerprint-based positioning. [Online]. Available: https://arxiv.org/abs/1708.06235,preprint.

[14] A. Fehske, J. Gaeddert, and J. H. Reed, “A new approach to signalclassification using spectral correlation and neural networks,” in Proc.IEEE Int. Symp. New Frontiers in Dynamic Spectrum Access Networks(DYSPAN), 2005, pp. 144–150.

[15] E. E. Azzouz and A. K. Nandi, “Modulation recognition using artifi-cial neural networks,” in Proc. Automatic Modulation Recognition ofCommunication Signals, 1996, pp. 132–176.

[16] M. Ibukahla, J. Sombria, F. Castanie, and N. J. Bershad, “Neuralnetworks for modeling nonlinear memoryless communication channels,”IEEE Trans. Commun., vol. 45, no. 7, pp. 768–771, Jul. 1997.

[17] J. Sjoberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P.-Y. Gloren-nec, H. Hjalmarsson, and A. Juditsky, “Nonlinear black-box modeling insystem identification: a unified overview,” Automatica, vol. 31, no. 12,pp. 1691–1724, Oct. 1995.

[18] J. Bruck and M. Blaum, “Neural networks, error-correcting codes, andpolynomials over the binary n-cube,” IEEE Trans. Inf. Theory, vol. 35,no. 5, pp. 976–987, Sep. 1989.

[19] I. Ortuno, M. Ortuno, and J. Delgado, “Error correcting neural networksfor channels with Gaussian noise,” in Proc. IJCNN International JointConference on Neural Networks, vol. 4, 1992, pp. 295–300.

[20] C. K. Wen, S. Jin, K. K. Wong, J. C. Chen, and P. Ting, “Channel Esti-mation for Massive MIMO Using Gaussian-Mixture Bayesian Learning,”IEEE Trans. Wireless Commun., vol. 14, no. 3, pp. 1356–1368, Mar.2015.

[21] S. Chen, G. Gibson, C. Cowan, and P. Grant, “Adaptive equalization offinite non-linear channels using multilayer perceptrons,” Elsevier Signalprocessing, vol. 20, no. 2, pp. 107–119, Jun. 1990.

[22] J. Cid-Sueiro and A. R. Figueiras-Vidal, “Digital equalization usingmodular neural networks: an overview,” in Proc. Signal Processing inTelecommunications. Springer, 1996, pp. 337–345.

[23] M. Ibnkahla, “Applications of neural networks to digitalcommunications–a survey,” Elsevier Signal processing, vol. 80,no. 7, pp. 1185–1215, Jul. 2000.

[24] C. Jiang, H. Zhang, Y. Ren, Z. Han, K.-C. Chen, and L. Hanzo,“Machine learning paradigms for next-generation wireless networks,”IEEE Wireless Commun., vol. 24, no. 2, pp. 98–105, Apr. 2017.

[25] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforwardnetworks are universal approximators,” Neural Networks, vol. 2, no. 5,pp. 359–366, 1989.

13

[26] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linearcodes using deep learning,” in Proc. Communication, Control, andComputing (Allerton), 2016, pp. 341–346.

[27] E. Nachmani, E. Marciano, D. Burshtein, and Y. Be’ery. (2017)RNN decoding of linear block codes. [Online]. Available: https://arxiv.org/abs/1702.07560,preprint.

[28] T. Gruber and S. Cammerer and J. Hoydis and S. ten Brink, “On deeplearning-based channel decoding,” in Proc. of CISS, 2017, pp. 1–6.

[29] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink. (2017) Scalingdeep learning-based decoding of polar codes via partitioning. [Online].Available: https://arxiv.org/abs/1702.06901,preprint.

[30] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, andY. Beery. (2017) Deep learning methods for improved decoding of linearcodes. [Online]. Available: https://arxiv.org/abs/1706.07043,preprint.

[31] F. Liang, C. Shen, and F. Wu. (2017) An iterative BP-CNN architecturefor channel decoding. [Online]. Available: https://arxiv.org/abs/1707.05697,preprint.

[32] H. Ye, G. Y. Li, and B.-H. F. Juang. (2017) Power of deep learning forchannel estimation and signal detection in OFDM systems. [Online].Available: https://arxiv.org/abs/1708.08514,preprint.

[33] D. Neumann, T. Wiese, and W. Utschick. (2017) Learning the MMSEchannel estimator. [Online]. Available: https://arxiv.org/abs/1707.05674,preprint.

[34] T. J. O’Shea, T. Erpek, and T. C. Clancy. (2017) Deep learning basedMIMO communications. [Online]. Available: https://arxiv.org/abs/1707.07980,preprint.

[35] A. K. Nandi and E. E. Azzouz, “Algorithms for automatic modulationrecognition of communication signals,” IEEE Trans. Commun., vol. 46,no. 4, pp. 431–436, Apr. 1998.

[36] X.-Z. Lv, P. Wei, and X.-C. Xiao, “Automatic identification of digitalmodulation signals using high order cumulants,” Electronic Warfare,vol. 6, p. 001, Jun. 2004.

[37] A. K. Nandi and E. E. Azzouz, “Automatic analogue modulationrecognition,” Elsevier Signal processing, vol. 46, no. 2, pp. 211–222,Oct. 1995.

[38] W. R. Caid and R. W. Means, “Neural network error correcting decodersfor block and convolutional codes,” in Proc. IEEE Global Telecommun.Conf. (GLOBECOM), 1990, pp. 1028–1031.

[39] A. Di Stefano, O. Mirabella, G. Di Cataldo, and G. Palumbo, “On the useof neural networks for Hamming coding,” in Proc. IEEE Int. Sympoisumon Circuits and Systems, 1991, pp. 1601–1604.

[40] X.-A. Wang and S. B. Wicker, “An atificial neural net Viterbi decoder,”IEEE Trans. Commun., vol. 44, no. 2, pp. 165–171, Feb. 1996.

[41] I. Dimnik and Y. Be’ery, “Improved random redundant iterative HDPCdecoding,” IEEE Trans. Commun., vol. 57, no. 7, Jul. 2009.

[42] A. Mousavi, A. B. Patel, and R. G. Baraniuk, “A deep learning approachto structured signal recovery,” in Proc. Communication, Control, andComputing (Allerton), 2015, pp. 1336–1343.

[43] A. Mousavi and R. G. Baraniuk, “Learning to invert: Signal recoveryvia deep convolutional networks,” in Proc. Int. Conf. Acoustics, Speechand Signal Processing, 2017, pp. 2272–2276.

[44] S. Lohit, K. Kulkarni, R. Kerviche, P. Turaga, and A. Ashok.(2017) Convolutional neural networks for non-iterative reconstructionof compressively sensed images. [Online]. Available: https://arxiv.org/abs/1708.04669,preprint.

Tianqi Wang Tianqi Wang was born in Jiangsu,China, in 1993. She received the B.S. degree fromNanjing University of Science and Technology, Nan-jing, China in 2016. She is currently working to-ward the M.S. degree with the School of Informa-tion Science and Engineering, Southeast University.Her main research interests include deep learningapplication in communication and massive MIMOsystems.

Chao-Kai Wen Chao-Kai Wen (S’00-M’04) re-ceived the Ph.D. degree from the Institute of Com-munications Engineering, National Tsing Hua Uni-versity, Taiwan, in 2004. He was with IndustrialTechnology Research Institute, Hsinchu, Taiwan andMediaTek Inc., Hsinchu, Taiwan, from 2004 to 2009.He is currently an Associate Professor of the Insti-tute of Communications Engineering, National SunYat-sen University, Kaohsiung, Taiwan. His researchinterests center around the optimization in wirelessmultimedia networks.

Hangqing Wang Hanqing Wang (S16) received theB.S. degree in communications engineering fromAnhui University, Hefei, China, in 2013, and theM.S. degree in information and communicationsengineering from Southeast University, Nanjing,China, in 2016, where he is currently pursuing thePh.D. degree in information and communicationsengineering with the School of Information Scienceand Engineering. His current research interests in-clude detection and estimation theory, compressivesensing, and their application to communication sys-

tems with hardware imperfection and nonlinear distortion.

Feifei Gao Feifei Gao received the B.Eng. degreefrom Xi’an Jiaotong University, Xi’an, China in2002, the M.Sc. degree from McMaster University,Hamilton, ON, Canada in 2004, and the Ph.D. degreefrom National University of Singapore, Singapore in2007. He was a Research Fellow with the Institutefor Infocomm Research (I2R), A*STAR, Singaporein 2008 and an Assistant Professor with the Schoolof Engineering and Science, Jacobs University, Bre-men, Germany from 2009 to 2010. In 2011, hejoined the Department of Automation, Tsinghua

University, Beijing, China, where he is currently an Associate Professor.Prof. Gao’s research areas include communication theory, signal processingfor communications, array signal processing, and convex optimizations, withparticular interests in MIMO techniques, multi-carrier communications, co-operative communication, and cognitive radio networks. He has authored/coauthored more than 100 refereed IEEE journal papers and more than 100IEEE conference proceeding papers, which have been cited more than 4000times from Google Scholar. Prof. Gao has served as an Editor of IEEETransactions on Wireless Communications, IEEE Communications Letters,IEEE Signal Processing Letters, IEEE Wireless Communications Letters,International Journal on Antennas and Propagations, and China Commu-nications. He has also served as the symposium co-chair for 2015 IEEEConference on Communications (ICC), 2014 IEEE Global CommunicationsConference (GLOBECOM), 2014 IEEE Vehicular Technology ConferenceFall (VTC), 2018 IEEE Vehicular Technology Conference Spring,as well asTechnical Committee Members for many other IEEE conferences.

Tao Jiang Tao Jiang is currently a DistinguishedProfessor in the School of Electronics Informationand Communications, Huazhong University of Sci-ence and Technology, Wuhan, P. R. China. He re-ceived Ph.D. degree in information and communica-tion engineering from Huazhong University of Sci-ence and Technology, Wuhan, P. R. China, in April2004. From Aug. 2004 to Dec. 2007, he workedin some universities, such as Brunel University andUniversity of Michigan-Dearborn, respectively. Hehas authored or co-authored about 300 technical

papers in major journals and conferences and 9 books/chapters in the areasof communications and networks. He served or is serving as symposiumtechnical program committee membership of some major IEEE conferences,including INFOCOM, GLOBECOM, and ICC, etc.. He was invited toserve as TPC Symposium Chair for the IEEE GLOBECOM 2013, IEEEEWCNC 2013 and ICCC 2013. He is served or serving as associate editor ofsome technical journals in communications, including in IEEE Transactionson Signal Processing, IEEE Communications Surveys and Tutorials, IEEETransactions on Vehicular Technology, IEEE Internet of Things Journal, andhe is the associate editor-in-chief of China Communications, etc.. He is arecipient of the NSFC for Distinguished Young Scholars Award in 2013. Hewas awarded as the Most Cited Chinese Researchers announced by Elsevierin 2014, 2015 and 2016.

14

Shi Jin Shi Jin (S’06–M’07) received the B.S.degree in communications engineering from GuilinUniversity of Electronic Technology, Guilin, China,in 1996; the M.S. degree from Nanjing Universityof Posts and Telecommunications, Nanjing, China,in 2003; and the Ph.D. degree in communicationsand information systems from Southeast University,Nanjing, in 2007. From June 2007 to October 2009,he was a Research Fellow with the Adastral ParkResearch Campus, University College London, Lon-don, U.K. He is currently with the faculty of the

National Mobile Communications Research Laboratory, Southeast University.His research interests include space-time wireless communications, randommatrix theory, and information theory. Dr. Jin serves as an Associate Editorfor the IEEE Transactions on Wireless Communications, the IEEE Commu-nications Letters, and IET Communications. He and his coauthors receivedthe 2010 Young Author Best Paper Award by the IEEE Signal ProcessingSociety and the 2011 IEEE Communications Society Stephen O. Rice PrizePaper Award in the field of communication theory.