Opportunistic Access to Spectrum Holes Between Packet ...

13
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011 2497 Opportunistic Access to Spectrum Holes Between Packet Bursts: A Learning-Based Approach Kae Won Choi, Member, IEEE, and Ekram Hossain, Senior Member, IEEE Abstractโ€”We present a cognitive radio (CR) mechanism for opportunistic access to the frequency bands licensed to a data- centric primary user (PU) network. Secondary users (SUs) aim to exploit the short-lived spectrum holes (or opportunities) created between packet bursts in the PU network. The PU traf๏ฌc pattern changes over both time and frequency according to upper layer events in the PU network, and fast variation in PU activity may cause high sensing error probability and low spectrum utilization in dynamic spectrum access. The proposed mechanism learns a PU traf๏ฌc pattern in real-time and uses the acquired information to access the frequency channel in an ef๏ฌcient way while limiting the probability of collision with the PUs below a target limit. To design the channel learning algorithm, we model the CR system as a hidden Markov model (HMM) and present a gradient method to ๏ฌnd the underlying PU traf๏ฌc pattern. We also analyze the identi๏ฌability of the proposed HMM to provide a condition for the convergence of the proposed learning algorithm. Simulation results show that the proposed algorithm greatly outperforms the traditional listen-before-talk algorithm which does not possess any learning functionality. Index Termsโ€”Cognitive radio, opportunistic spectrum access, energy detection, hidden Markov model (HMM), partially ob- servable Markov decision process (POMDP). I. I NTRODUCTION T HE concept of opportunistic spectrum access (OSA) is motivated by low spectrum utilization of traditional ๏ฌxed spectrum allocation strategies. In order to make ef๏ฌcient use of precious spectrum resources, OSA allows a secondary user (SU) to exploit the spectrum bands that a primary user (PU) has priority to access, under the condition that the SU does not cause harmful interference to the PU. With- out explicit negotiation with the PU, the SU autonomously senses spectrum bands, ๏ฌnds spectrum holes (i.e., spectrum temporarily unused by the PUs), and accesses them by tuning its operating parameters. This process requires an intelligent cognition cycle, and therefore, an SU network is considered as a cognitive radio (CR) network. In this paper, we propose a CR mechanism for an SU network which shares spectrum bands with a data-centric PU network. In particular, we are interested in exploiting short- lived spectrum opportunities created between packet bursts Manuscript received February 2, 2010; revised October 30, 2010 and February 7, 2011; accepted May 21, 2011. The associate editor coordinating the review of this paper and approving it for publication was Q. Zhang. This work was supported by Natural Sciences and Engineering Research Council (NSERC), Canada. K. W. Choi is with the Department of Computer Science and Engineering, Seoul National University of Science and Technology, Gongneung 2-dong, Nowon-gu, Seoul, Korea. E. Hossain is with the Dept. of Electrical and Computer Engineering, University of Manitoba, Canada (e-mail: [email protected]). Digital Object Identi๏ฌer 10.1109/TWC.2011.060711.100154 of a PU network. Experimental researches on potential PU networks (e.g., GSM networks) [1]โ€“[6] have shown that there exist abundant spectrum opportunities between packet bursts. In [1], [2], it was revealed that there are plenty of gaps between consecutive packets in an 802.11b-based WLAN, even when a WLAN continuously uses a channel for packet transmissions. However, exploiting these spectrum opportunities poses sig- ni๏ฌcant challenges due to the following two characteristics of a data-centric PU network. First, the channel usage pattern of PUs changes over time and frequencies according to upper layer events and traf๏ฌc loads. Therefore, it is very dif๏ฌcult for an SU to have a proper knowledge of the channel usage pattern. Accessing a spectrum without knowing the channel usage pattern potentially leads to harmful interference to PUs and also performance degradation of the SU. In the literature (e.g., in [7]โ€“[13]), the channel usage pattern of PUs was modeled either as a two-state Markov or a semi-Markov chain, and the distributions of the lengths of a spectrum opportunity and a packet burst were assumed to be stationary and known to the SU. However, in a data-centric PU network, an SU may not know the channel usage pattern in advance. Therefore, an SU should estimate the channel usage pattern by using an online learning algorithm. The second characteristic of a data-centric PU network is that the lengths of spectrum opportunities and packet bursts are very short (e.g., of the order of milliseconds to seconds). This means that an SU has to perform channel sensing very frequently to catch up with the fast variations of PU activity. Since an SU (with a single radio) has to stop data transmission during channel sensing, frequent channel sensing leads to low spectrum utilization [14]. Moreover, the channel sensing time should be much shorter than the average length of a spectrum opportunity. Due to short channel sensing time, the sensing error probability (i.e., false alarm and misdetection probabilities) tends to be high. Most of the related work in the literature (e.g., in [7]โ€“[13], [15], [16]) assumed perfect sensing (i.e., sensing error probability is zero) and that the channel sensing time is short enough to be neglected. In a practical CR network, an SU requires to be resilient to high sensing error probability while reducing the channel sensing time in an intelligent way. The above-mentioned problems related to spectrum sharing with a data-centric PU network have not been addressed well in the previous studies in the literature. This motivates us to design a channel sensing and channel access scheme considering the characteristics of a data-centric PU network. The proposed scheme operates on a learning and access cycle where it learns the channel usage pattern and then accesses the 1536-1276/11$25.00 c โƒ 2011 IEEE Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Transcript of Opportunistic Access to Spectrum Holes Between Packet ...

Page 1: Opportunistic Access to Spectrum Holes Between Packet ...

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011 2497

Opportunistic Access to Spectrum Holes BetweenPacket Bursts: A Learning-Based Approach

Kae Won Choi, Member, IEEE, and Ekram Hossain, Senior Member, IEEE

Abstractโ€”We present a cognitive radio (CR) mechanism foropportunistic access to the frequency bands licensed to a data-centric primary user (PU) network. Secondary users (SUs) aim toexploit the short-lived spectrum holes (or opportunities) createdbetween packet bursts in the PU network. The PU traffic patternchanges over both time and frequency according to upper layerevents in the PU network, and fast variation in PU activitymay cause high sensing error probability and low spectrumutilization in dynamic spectrum access. The proposed mechanismlearns a PU traffic pattern in real-time and uses the acquiredinformation to access the frequency channel in an efficient waywhile limiting the probability of collision with the PUs below atarget limit. To design the channel learning algorithm, we modelthe CR system as a hidden Markov model (HMM) and presenta gradient method to find the underlying PU traffic pattern.We also analyze the identifiability of the proposed HMM toprovide a condition for the convergence of the proposed learningalgorithm. Simulation results show that the proposed algorithmgreatly outperforms the traditional listen-before-talk algorithmwhich does not possess any learning functionality.

Index Termsโ€”Cognitive radio, opportunistic spectrum access,energy detection, hidden Markov model (HMM), partially ob-servable Markov decision process (POMDP).

I. INTRODUCTION

THE concept of opportunistic spectrum access (OSA) ismotivated by low spectrum utilization of traditional fixed

spectrum allocation strategies. In order to make efficient useof precious spectrum resources, OSA allows a secondaryuser (SU) to exploit the spectrum bands that a primary user(PU) has priority to access, under the condition that theSU does not cause harmful interference to the PU. With-out explicit negotiation with the PU, the SU autonomouslysenses spectrum bands, finds spectrum holes (i.e., spectrumtemporarily unused by the PUs), and accesses them by tuningits operating parameters. This process requires an intelligentcognition cycle, and therefore, an SU network is consideredas a cognitive radio (CR) network.

In this paper, we propose a CR mechanism for an SUnetwork which shares spectrum bands with a data-centric PUnetwork. In particular, we are interested in exploiting short-lived spectrum opportunities created between packet bursts

Manuscript received February 2, 2010; revised October 30, 2010 andFebruary 7, 2011; accepted May 21, 2011. The associate editor coordinatingthe review of this paper and approving it for publication was Q. Zhang.

This work was supported by Natural Sciences and Engineering ResearchCouncil (NSERC), Canada.

K. W. Choi is with the Department of Computer Science and Engineering,Seoul National University of Science and Technology, Gongneung 2-dong,Nowon-gu, Seoul, Korea.

E. Hossain is with the Dept. of Electrical and Computer Engineering,University of Manitoba, Canada (e-mail: [email protected]).

Digital Object Identifier 10.1109/TWC.2011.060711.100154

of a PU network. Experimental researches on potential PUnetworks (e.g., GSM networks) [1]โ€“[6] have shown that thereexist abundant spectrum opportunities between packet bursts.In [1], [2], it was revealed that there are plenty of gaps betweenconsecutive packets in an 802.11b-based WLAN, even when aWLAN continuously uses a channel for packet transmissions.However, exploiting these spectrum opportunities poses sig-nificant challenges due to the following two characteristics ofa data-centric PU network.

First, the channel usage pattern of PUs changes over timeand frequencies according to upper layer events and trafficloads. Therefore, it is very difficult for an SU to have a properknowledge of the channel usage pattern. Accessing a spectrumwithout knowing the channel usage pattern potentially leads toharmful interference to PUs and also performance degradationof the SU. In the literature (e.g., in [7]โ€“[13]), the channel usagepattern of PUs was modeled either as a two-state Markov ora semi-Markov chain, and the distributions of the lengths of aspectrum opportunity and a packet burst were assumed to bestationary and known to the SU. However, in a data-centricPU network, an SU may not know the channel usage pattern inadvance. Therefore, an SU should estimate the channel usagepattern by using an online learning algorithm.

The second characteristic of a data-centric PU network isthat the lengths of spectrum opportunities and packet burstsare very short (e.g., of the order of milliseconds to seconds).This means that an SU has to perform channel sensing veryfrequently to catch up with the fast variations of PU activity.Since an SU (with a single radio) has to stop data transmissionduring channel sensing, frequent channel sensing leads tolow spectrum utilization [14]. Moreover, the channel sensingtime should be much shorter than the average length of aspectrum opportunity. Due to short channel sensing time, thesensing error probability (i.e., false alarm and misdetectionprobabilities) tends to be high. Most of the related work inthe literature (e.g., in [7]โ€“[13], [15], [16]) assumed perfectsensing (i.e., sensing error probability is zero) and that thechannel sensing time is short enough to be neglected. In apractical CR network, an SU requires to be resilient to highsensing error probability while reducing the channel sensingtime in an intelligent way.

The above-mentioned problems related to spectrum sharingwith a data-centric PU network have not been addressedwell in the previous studies in the literature. This motivatesus to design a channel sensing and channel access schemeconsidering the characteristics of a data-centric PU network.The proposed scheme operates on a learning and access cyclewhere it learns the channel usage pattern and then accesses the

1536-1276/11$25.00 cโƒ 2011 IEEE

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 2: Opportunistic Access to Spectrum Holes Between Packet ...

2498 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011

channel based on the learned channel usage pattern. These twofunctionalities are carried out by a channel learning algorithmand a channel access algorithm, respectively. Note that thefunctionality of channel selection in a multi-channel scenario(i.e., determining the order in which the channels need to besensed and/or accessed) is out of the scope of the proposedscheme. The optimal frequency channel selection problem wasaddressed in [7], [8], [16]โ€“[18].

Taking the sensing results obtained by a channel sensingmethod as inputs, the channel learning algorithm estimatesthe channel usage pattern in the PU network. To deal witherroneous sensing results, we design this algorithm by usinga hidden Markov model (HMM) [19]. Based on a sequenceof sensing results, which act as observations in the HMM,the channel usage pattern is calculated iteratively by usingthe gradient method [20]. This algorithm estimates not onlythe traffic pattern of PUs but also the signal-to-noise ratio(SNR) corresponding to a PU signal. To show under whatcondition the channel usage pattern can be estimated, weprovide an analysis of the equivalence and the identifiabilityof the proposed HMM. The channel usage pattern is used bythe channel access algorithm for efficient data transmission inthe SU network. Although in the literature there have been fewalgorithms for estimating the PU traffic pattern (e.g., in [15],[21]), they are neither robust to high sensing error probabilitiesnor able to estimate the SNR of a PU signal. There have beenfew works (e.g., [22] and [23]) which modeled a CR system asan HMM. However, these works did not address the problemof parameter estimation from the erroneous sensing results.

Using the channel access algorithm, which is developedbased on a partially observable Markov decision process(POMDP) framework [24], an SU transmits data packetswhile avoiding interference to the PU network. The algorithmadaptively decides whether to perform channel sensing ortransmit user data in each time slot to prevent unnecessarysensing.

The main contributions of the paper can be summarized asfollows:

โˆ™ We present an optimized OSA scheme for cognitive ra-dios coexisting with a data-centric PU network. With thisscheme, an SU can effectively use spectrum opportunitiesbetween packet bursts, maximize spectrum utilization,and maintain its data connection even when a spectrumis densely occupied by PUs. The proposed scheme notonly detects instantaneous PU activity but also learnsthe channel usage pattern in the PU network. Based onthe estimated channel usage information, the proposedscheme adjusts the parameters for accessing a frequencychannel. This learning and access cycle makes it possiblefor an SU to adapt itself to a time-varying channel usagepattern in the PU network. Also, the proposed scheme isfavorable to practical implementation, since it needs verylittle prior knowledge about the PU network.

โˆ™ The channel learning algorithm is developed by solvingthe parameter estimation problem in the HMM. Thisalgorithm is resilient to sensing errors and can estimatethe SNR of a PU signal, which the existing parameterestimation algorithms for the CR systems are not capableof. We also analyze the identifiability of the proposed

TABLE ITABLE OF SYMBOLS

Symbol Definition๐‘€ Number of frequency channels๐‘Š Bandwidth of a frequency channel๐œ† Transition rate from state 0 to state 1 in PU traffic model๐œ‡ Transition rate from state 1 to state 0 in PU traffic model๐›พ SNR of a PU signalu Channel usage pattern, i.e., u := (๐œ†, ๐œ‡, ๐›พ)๐’ฐ Set of possible channel usage patterns๐›ฟ Threshold for energy detection

๐ท(๐œŒ) Probability that an SU detects PU to be active duringa slot when the average SNR of PU signal is ๐œŒ

๐‘๐ฟ Number of slots in a channel learning subframe๐‘๐ด Number of slots in a channel access subframe๐‘‡ Length of a slotu๐‘˜ Channel usage pattern in frame ๐‘˜u๐‘˜ Estimate of the channel usage pattern in frame ๐‘˜

๐œ๐ฟ๐‘˜,๐‘› Sensing result generated in slot ๐‘› in thechannel learning subframe of frame ๐‘˜

๐œ๐ด๐‘˜,๐‘› Sensing result generated in slot ๐‘› in thechannel access subframe of frame ๐‘˜

๐›ผ๐‘› PU activity at time ๐‘ก = (๐‘› โˆ’ 1)๐‘‡ ,when ๐‘ก = 0 at the start of a subframe

s๐‘› State of slot ๐‘›, i.e., s๐‘› := (๐›ผ๐‘›, ๐›ผ๐‘›+1)๐’ฎ State space, i.e., ๐’ฎ := {(0, 0), (0, 1), (1, 0), (1, 1)}๐‘œ๐‘› Observation in slot ๐‘›๐’ช Observation space

๐‘๐‘–,๐‘—๐‘™,๐‘š State transition probability from s๐‘› = (๐‘™, ๐‘š) to s๐‘›+1 = (๐‘–, ๐‘—)

๐‘Ÿ๐‘–,๐‘— State transition probability from ๐›ผ๐‘› = ๐‘– to ๐›ผ๐‘›+1 = ๐‘—๐‘ž๐‘š๐‘–,๐‘— Observation probability that the observation ๐‘œ๐‘› is

๐‘š given that the state s๐‘› is (๐‘–, ๐‘—)๐‘Ž๐‘› Action in slot ๐‘›๐’œ Action space, i.e., ๐’œ := {0, 1}๐ถ Collision probability

๐ถlim Collision probability limit๐‘…(s, ๐‘Ž) Reward for given state s and action ๐‘Ž๐…๐‘› Belief vector for slot ๐‘›, i.e., ๐…๐‘› := (๐œ‹๐‘›

0,0, ๐œ‹๐‘›0,1, ๐œ‹๐‘›

1,0, ๐œ‹๐‘›1,1)

ฮ  Domain of a belief vector๐‘‰ โˆ—๐‘› Optimal value function for slot ๐‘›

๐œทโˆ— Optimal policy, i.e., ๐œทโˆ— := (๐›ฝโˆ—1 , . . . , ๐›ฝโˆ—

๐‘๐ด)

๐œทsub Suboptimal policy, i.e., ๐œทsub := (๐›ฝsub1 , . . . , ๐›ฝsub

๐‘๐ด)

HMM and show that the proposed channel learningalgorithm can estimate the channel usage pattern undersome mild conditions. To our knowledge, the problem ofthe identifiability of an HMM was not addressed in theexisting works on CR systems.

The rest of the paper is organized as follows. Section IIdescribes the system model and assumptions and proposesthe OSA scheme for exploiting short spectrum opportunitiesbetween packet bursts. The channel learning algorithm is de-scribed in Section III. In Section IV, we introduce the channelaccess algorithm. In Section V, we present representativenumerical results. Section VI concludes the paper. A list ofthe key mathematical symbols used in this paper is shown inTable I.

II. SYSTEM MODEL AND PROPOSED SPECTRUM ACCESS

PROTOCOL

A. Network Model

The PU network has a license to use๐‘€ frequency channelseach of which has a bandwidth of๐‘Š . In Section II-B, we willdescribe the channel usage model of the PU network. The SUnetwork could be either an ad hoc or an infrastructure-based

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 3: Opportunistic Access to Spectrum Holes Between Packet ...

CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2499

(b) Time-domain example ofchannel usage pattern

Time

(a) Two-state Markov chain

0ยต

1

: PU is active

SNR

Fig. 1. Two-state Markov model and an example of channel usage pattern.

network. We focus on the operation of a single SU in theSU network. The SU can communicate with other SUs (orthe secondary network controller) via one radio transceiverthat can be tuned to one of the ๐‘€ frequency channels at atime. The SU can access a frequency channel only when thereis no PU activity in that channel. We assume that the SUperforms spectrum sensing by means of energy detection. Thespectrum sensing model will be described in Section II-C.We will explain the details of the OSA scheme for an SU inSection II-D.

B. Primary User Channel Usage Model

We adopt a two-state continuous-time Markov chain(CTMC) to model PU traffic in a channel [7]โ€“[11], [13],[25].1 Fig. 1 shows the two-state CTMC model in which thestates represent PU activity in a channel. The PU activity ona frequency channel alternates between state 1 (i.e., active)and state 0 (i.e., inactive). The lengths of an active period andan inactive period in a channel are exponentially distributedwith the average length of 1/๐œ‡ and 1/๐œ†, respectively, where๐œ† and ๐œ‡ denote PU state transition rates. We also incorporatethe SNR of a PU signal, ๐›พ, into the PU channel usage model,since it significantly affects the channel sensing performance.Now, the PU channel usage is completely determined bythree parameters ๐œ†, ๐œ‡, and ๐›พ. We define the โ€œchannel usagepatternโ€, denoted by u, as the vector of these parameters, i.e.,u := (๐œ†, ๐œ‡, ๐›พ).

Many experimental studies on potential PU networks haveshown that traffic characteristics vary over time [1], [2], [4],[6] and frequencies [3], [5]. There can be several reasons forthis PU behavior. First, the channel usage pattern can varyaccording to the configurations of the upper layer protocols.For example, the channel usage pattern is affected by the typeof PU application (e.g., voice call, video streaming, file trans-fer, and web browsing, etc.) and its parameter settings (e.g.,source rate of video streaming)2. PU applications determinethe traffic properties such as the packet length and the packetarrival rate, which, in turn, affect the channel usage pattern.

1In some works (e.g., [2], [12], [15], [21]), PU traffic was modeled bya two-state semi-Markov process, which is a generalization of the two-stateMarkov process. In the semi-Markov process, the sojourn time on each statefollows an arbitrary distribution (e.g., hyper-Erlang distribution [2]). Althoughthe semi-Markov process provides a more accurate fit for empirical data, theMarkov process is a good approximation with mathematical tractability [11].

2For example, in [2], the authors presented the distribution of idle periodsexperimentally estimated from an IEEE 802.11b-based WLAN with the userdatagram protocol (UDP) traffic. It was shown that the distribution of idleperiods differs for two different packet arrival rates of 25 packets/s and 100packets/s.

Second, the channel usage pattern depends on the traffic loadin the PU network, which may vary over time. In [4], [6], itwas shown that traffic load in voice-centric cellular networksvaries according to the time of the day.

An SU should track the variation of the channel usagepattern in order to access the channel in an optimal way. Weassume that the channel usage pattern is restricted to a certainregion ๐’ฐ , i.e., u โˆˆ ๐’ฐ . Also, it is assumed that the channelusage pattern varies slowly so that an SU can estimate thechannel usage pattern by gathering statistical information froma number of packet bursts and spectrum opportunities.

C. Secondary User Energy Detection Model

An SU performs energy detection on a frequency chan-nel for a time duration of ๐‘‡ . Recall that ๐‘Š denotes thebandwidth of a frequency channel. The energy detector takes๐‘Š๐‘‡ baseband complex signal samples during an energydetection period. Let ๐‘ฆ๐‘– denote the ๐‘–th signal sample. Then,we have ๐‘ฆ๐‘– = ๐‘ฅ๐‘– + ๐‘›๐‘–, where ๐‘ฅ๐‘– is a PU signal and ๐‘›๐‘– isthe thermal noise with the noise spectral density of ๐‘๐‘œ. Togenerate a test statistic, denoted by ๐œ‰, the energy detectorestimates the normalized energy in the signal samples as๐œ‰ = 1

๐‘Š๐‘‡๐‘๐‘œ

โˆ‘๐‘Š๐‘‡๐‘–=1 โˆฃ๐‘ฆ๐‘–โˆฃ2. Let ๐œ denote the sensing result. To

conclude whether the channel is in use or not, the energydetector compares ๐œ‰ with a given threshold ๐›ฟ. If ๐œ‰ > ๐›ฟ, thedetector concludes that the channel is in use (i.e., ๐œ = 1).Otherwise, ๐œ = 0.

We require to find the distribution of the test statistic andcalculate the detection probability. Let ๐œŒ denote the averageSNR of a PU signal during an energy detection period,i.e., ๐œŒ := 1

๐‘Š๐‘‡๐‘๐‘œ

โˆ‘๐‘Š๐‘‡๐‘–=1 E[โˆฃ๐‘ฅ๐‘–โˆฃ2]. If the number of signal

samples (i.e., ๐‘Š๐‘‡ ) is sufficiently large, the test statistic ๐œ‰follows a normal distribution with mean (1 + ๐œŒ) and variance(1+2๐œŒ)/(๐‘Š๐‘‡ ) [26]. From the distribution of the test statistic,we can calculate the probability that an SU senses the channelto be active (i.e., ๐œ = 1) as a function of the average SNR, ๐œŒ.From [26], we have

๐ท(๐œŒ) := Pr[๐œ‰ โ‰ฅ ๐›ฟ] = ๐‘„(

๐›ฟ โˆ’ (1 + ๐œŒ)โˆš(1 + 2๐œŒ)/(๐‘Š๐‘‡ )

)(1)

where ๐‘„ denotes the Q-function defined as ๐‘„(๐‘ฅ) :=1โˆš2๐œ‹

โˆซโˆž๐‘ฅ exp(โˆ’๐‘ข2

2 )๐‘‘๐‘ข.

D. Channel Sensing and Access to Exploit Short-Lived Spec-trum Opportunities

For the proposed scheme, time is divided into frames(Fig. 2) which are indexed by ๐‘˜. It is assumed that framesynchronization is maintained in the SU network. The lengthof a frame is short enough so that the channel usage patternremains unchanged during a frame. A frame is further di-vided into a channel learning subframe and a channel accesssubframe3. An SU estimates the channel usage pattern on thecurrent channel during a channel learning subframe, and basedon the estimated channel usage pattern, it exchanges user datawith other SUs during a channel access subframe. A channellearning subframe and a channel access subframe consist of

3We will explain the rationale behind this frame structure in Section III-E.

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 4: Opportunistic Access to Spectrum Holes Between Packet ...

2500 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011

Data packet

Time

Frame k+1 Frame k+2

Channel m

Channel (m+1)

Channel learningsubframe

Channel access subframe

: Sensing: Data transmission

Frame k-2 Frame k-1

: PU is active

Frame k

Data transmission

Energy detection

Sensing

T

NA slots

T

NL slots

Fig. 2. Frame structure of the proposed scheme.

Fig. 3. Overall operation of the proposed scheme.

๐‘๐ฟ slots and ๐‘๐ด slots, respectively. The length of a slot is๐‘‡ . We have to set the length of a slot short enough to preventPU activity from changing multiple times during a slot. AnSU senses the channel and produces a sensing result in eachslot during the channel learning subframe. On the other hand,an SU either performs sensing or transmits user data duringthe channel access subframe.

The overall operation of the proposed scheme for an SUis summarized in Fig. 3. From the sensing results obtainedduring a channel learning subframe of frame ๐‘˜, the SUcalculates the estimate of the channel usage pattern in frame ๐‘˜,denoted by u๐‘˜ = (๏ฟฝ๏ฟฝ๐‘˜, ๏ฟฝ๏ฟฝ๐‘˜, ๐›พ๐‘˜). Then, based on the estimatedchannel usage pattern, u๐‘˜, it decides whether to change thechannel or not. If the SU judges that there are sufficientspectrum opportunities to support its quality-of-service (QoS)requirements,4 it stays on the current channel and exchangesdata packets during the following channel access subframe.Otherwise, it switches to another frequency channel in thenext frame. The SU can simply switch to the next availablefrequency channel, or it can use more sophisticated algorithms

4For example, the SU can decide that the QoS is supported if the dutycycle, ๏ฟฝ๏ฟฝ๐‘˜/(๏ฟฝ๏ฟฝ๐‘˜ + ๏ฟฝ๏ฟฝ๐‘˜), and the SNR, ๐›พ๐‘˜ , exceed their respective thresholds.

proposed for the frequency channel selection problem in theliterature (e.g., in [7], [8], [16]โ€“[18]).

During the channel learning subframe in frame ๐‘˜, theSU estimates the current channel usage pattern, denoted byu๐‘˜ = (๐œ†๐‘˜, ๐œ‡๐‘˜, ๐›พ๐‘˜). Each of the ๐‘๐ฟ slots in the channellearning subframe is indexed by ๐‘› = 1, . . . , ๐‘๐ฟ. In eachslot, the SU performs energy detection and generates a binarysensing result. Let ๐œ๐ฟ๐‘˜,๐‘› denote the sensing result generated inslot ๐‘› in the channel learning subframe of frame ๐‘˜. From thesequence of the sensing results, ๐œป๐ฟ

๐‘˜ := {๐œ๐ฟ๐‘˜,1, . . . , ๐œ๐ฟ๐‘˜,๐‘๐ฟ}, the

โ€œchannel learning algorithmโ€ in the SU calculates the estimateof the channel usage pattern, u๐‘˜. In Section III, we will explainthe channel learning algorithm in detail.

Let us explain the operation of an SU when it decides toaccess the current channel during a channel access subframe.Each of the ๐‘๐ด slots in the channel learning subframe isindexed by ๐‘› = 1, . . . , ๐‘๐ด. During a slot of the channel accesssubframe, the SU can either perform sensing or transmit userdata. If it chooses to perform sensing in slot ๐‘›, it obtains ๐œ๐ด๐‘˜,๐‘›,which denotes the sensing result generated in slot ๐‘› in thechannel access subframe of frame ๐‘˜. Otherwise, it transmitsdata packet(s) in slot ๐‘›. For each slot ๐‘› in the channel accesssubframe, the โ€œchannel access algorithmโ€ residing in the SUdecides whether to perform sensing or data transmission,based on the sensing results from slot 1 to slot (๐‘› โˆ’ 1).The channel access algorithm also utilizes the channel usagepattern estimated in the preceding channel learning subframe.From this information, the channel access algorithm adjustsits parameters so that it can maximize the channel utilizationwhile limiting the interference caused to the PU network to thetolerable level. We will explain the channel access algorithmin Section IV.

III. LEARNING CHANNEL USAGE PATTERN DURING

CHANNEL LEARNING SUBFRAME

A. Hidden Markov Model for Channel Learning Subframe

We model a channel learning subframe as an HMM [19].An HMM is described by state space, state transition probabil-ity, observation space, and observation probability. Considerregularly spaced discrete time instants (e.g., beginning of timeslots). At any time instant, the system is in one of the statesin the countable state space. The evolution of states over timefollows a Markov process in accordance with the state transi-tion probability. The state is hidden to the agent and can onlybe inferred from noisy observations. At each time, the agentreceives an observation from the observation space accordingto the observation probability. For an HMM, the standardgradient method can be used to find the model parameters,which are most likely, given the received observation sequence[27]. We will use this technique to estimate the channel usagepattern. For more information on HMM, please refer to [19]and [27].

In our system model, the SU (i.e., the agent) obtains noisysensing results about underlying PU activities. Therefore, PUactivities in a channel can be modeled as hidden states,while sensing results are modeled as observations. Then, thestate transition probabilities depend on the state transitionrates in PU activity (i.e., ๐œ† and ๐œ‡), and the observation

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 5: Opportunistic Access to Spectrum Holes Between Packet ...

CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2501

probabilities are related to the detection probabilities, whichin turn are determined mainly by the SNR of a PU signal(i.e., ๐›พ). This means that the state transition and observationprobabilities are functions of the channel usage pattern. Fromthe HMM, we can calculate the log-likelihood of the receivedsensing results, ๐œป๐ฟ

๐‘˜ , given the channel usage pattern, u, thatis, ln(Pr[๐œป๐ฟ

๐‘˜ โˆฃu]). To find the most likely channel usage patternfor the received sensing results, the SU updates the estimateof the channel usage pattern toward the gradient direction sothat ln(Pr[๐œป๐ฟ

๐‘˜ โˆฃu]) increases in each iteration. We will explainthe details of the algorithm later in this section.

To set up an HMM, we first define states and observations.As seen in Fig. 4, a state is defined for each slot to reflect thePU activities at the start and the end of the slot. Let ๐‘ก = 0 atthe start of the channel learning subframe. Then, ๐›ผ๐‘› denotesthe PU activity at time ๐‘ก = (๐‘› โˆ’ 1)๐‘‡ (i.e., at the start ofslot ๐‘› or at the end of slot (๐‘› โˆ’ 1)). We have ๐›ผ๐‘› = 1, ifthe PU is active at ๐‘ก = (๐‘› โˆ’ 1)๐‘‡ ; and ๐›ผ๐‘› = 0 otherwise.The state of slot ๐‘›, which is denoted by ๐‘ ๐‘›, is defined asthe vector of the PU activities at the start and the end of slot๐‘›, i.e., ๐‘ ๐‘› := (๐›ผ๐‘›, ๐›ผ๐‘›+1). Then, ๐‘ ๐‘› is one of four possiblestates in the state space ๐’ฎ := {(0, 0), (0, 1), (1, 0), (1, 1)}. Ifwe consider an HMM of length ๐‘ , a sequence of the statesis given by s := {๐‘ 1, . . . , ๐‘ ๐‘}. We assume that a slot is shortenough so that the PU activity does not change more than oncewithin a slot. Then, if the state is (0, 0) or (1, 1), the PU staysinactive or active all along a slot. On the other hand, if thestate is (0, 1) or (1, 0), the PU activity changes once during aslot. The observation in slot ๐‘›, which is from the observationspace ๐’ช := {0, 1}, is denoted by ๐‘œ๐‘›. The observation ๐‘œ๐‘› isequal to the sensing result from slot ๐‘›. That is, if the currentframe is ๐‘˜, we have ๐‘œ๐‘› = ๐œ๐‘˜,๐‘›. Let o := {๐‘œ1, . . . , ๐‘œ๐‘} be asequence of the observations.

Now, we define the state transition and observation probabil-ities. Let ๐‘๐‘–,๐‘—๐‘™,๐‘š denote the state transition probability from state(๐‘™,๐‘š) to state (๐‘–, ๐‘—). That is, ๐‘๐‘–,๐‘—๐‘™,๐‘š := Pr[๐‘ ๐‘›+1 = (๐‘–, ๐‘—)โˆฃ๐‘ ๐‘› =(๐‘™,๐‘š)]. Since the PU activity at the end of a slot is the same asthat at the start of the next slot, we have ๐‘๐‘–,๐‘—๐‘™,๐‘š = 0 for ๐‘š โˆ•= ๐‘–.If ๐‘š = ๐‘–, then ๐‘๐‘–,๐‘—๐‘™,๐‘š is equal to the probability that ๐›ผ๐‘›+1 = ๐‘—given ๐›ผ๐‘› = ๐‘–, i.e., Pr[๐›ผ๐‘›+1 = ๐‘—โˆฃ๐›ผ๐‘› = ๐‘–]. Let ๐‘Ÿ๐‘–,๐‘— denotePr[๐›ผ๐‘›+1 = ๐‘—โˆฃ๐›ผ๐‘› = ๐‘–]. If u = (๐œ†, ๐œ‡, ๐›พ) is the channel usagepattern in the frame of interest, we can calculate ๐‘Ÿ0,0 = ๐‘’โˆ’๐œ†๐‘‡ ,๐‘Ÿ0,1 = 1โˆ’๐‘’โˆ’๐œ†๐‘‡ , ๐‘Ÿ1,0 = 1โˆ’๐‘’โˆ’๐œ‡๐‘‡ , and ๐‘Ÿ1,1 = ๐‘’โˆ’๐œ‡๐‘‡ . Therefore,we can calculate the state transition probability matrix as

p :=

โŽกโŽขโŽขโŽขโŽฃ๐‘0,00,0 ๐‘0,00,1 ๐‘0,01,0 ๐‘0,01,1

๐‘0,10,0 ๐‘0,10,1 ๐‘0,11,0 ๐‘0,11,1

๐‘1,00,0 ๐‘1,00,1 ๐‘1,01,0 ๐‘1,01,1

๐‘1,10,0 ๐‘1,10,1 ๐‘1,11,0 ๐‘1,11,1

โŽคโŽฅโŽฅโŽฅโŽฆ

=

โŽกโŽขโŽขโŽฃ๐‘’โˆ’๐œ†๐‘‡ 0 ๐‘’โˆ’๐œ†๐‘‡ 0

1โˆ’ ๐‘’โˆ’๐œ†๐‘‡ 0 1โˆ’ ๐‘’โˆ’๐œ†๐‘‡ 00 1โˆ’ ๐‘’โˆ’๐œ‡๐‘‡ 0 1โˆ’ ๐‘’โˆ’๐œ‡๐‘‡

0 ๐‘’โˆ’๐œ‡๐‘‡ 0 ๐‘’โˆ’๐œ‡๐‘‡

โŽคโŽฅโŽฅโŽฆ . (2)

The initial state distribution is denoted by ๐… :=(๐œ‹0,0, ๐œ‹0,1, ๐œ‹1,0, ๐œ‹1,1)

๐‘‡ , where ๐œ‹๐‘–,๐‘— := Pr[s1 = (๐‘–, ๐‘—)]. Itis assumed that the initial state distribution is equal tothe stationary state distribution. Therefore, we have ๐… =

Fig. 4. State transition in a subframe.

(๐‘Ÿ0,0๐‘Ÿ1,0/(๐‘Ÿ0,1 + ๐‘Ÿ1,0), ๐‘Ÿ0,1๐‘Ÿ1,0/(๐‘Ÿ0,1 + ๐‘Ÿ1,0), ๐‘Ÿ1,0๐‘Ÿ0,1/(๐‘Ÿ0,1 +๐‘Ÿ1,0), ๐‘Ÿ1,1๐‘Ÿ0,1/(๐‘Ÿ0,1 + ๐‘Ÿ1,0))

๐‘‡ .We define ๐‘ž๐‘š๐‘–,๐‘— as the probability that the observation ๐‘œ๐‘› is

๐‘š given that the state ๐‘ ๐‘› is (๐‘–, ๐‘—). That is, ๐‘ž๐‘š๐‘–,๐‘— := Pr[๐‘œ๐‘› =๐‘šโˆฃ๐‘ ๐‘› = (๐‘–, ๐‘—)]. Recall that ๐ท(๐œŒ) is the probability of detectingPU activity during a slot when the average SNR correspondingto a PU signal is ๐œŒ. If the state is (0, 0), the average SNR ofa PU signal during the slot is 0, and therefore ๐‘ž10,0 = ๐ท(0).In the case that the state is (1, 1), the average SNR during theslot is ๐›พ, since the SU receives a PU signal all along the slot.Thus, we have ๐‘ž11,1 = ๐ท(๐›พ). On the other hand, when the stateis (1, 0), the PU activity changes from active to inactive at atime point during the slot. If the channel becomes inactive aftertime ๐‘ก from the start of the slot, the average SNR during theslot is ๐›พ๐‘ก/๐‘‡ . Also, the probability density function (pdf) of theelapsed time until the PU activity changes is given as ๐œ‡๐‘’โˆ’๐œ‡๐‘ก

1โˆ’๐‘’โˆ’๐œ‡๐‘‡ .

Therefore, we have ๐‘ž11,0 =โˆซ ๐‘‡

0๐œ‡๐‘’โˆ’๐œ‡๐‘ก

1โˆ’๐‘’โˆ’๐œ‡๐‘‡ ๐ท(๐›พ๐‘ก/๐‘‡ )๐‘‘๐‘ก. We can

also calculate ๐‘ž10,1 =โˆซ ๐‘‡

0๐œ†๐‘’โˆ’๐œ†๐‘ก

1โˆ’๐‘’โˆ’๐œ†๐‘‡ ๐ท(๐›พ โˆ’ ๐›พ๐‘ก/๐‘‡ )๐‘‘๐‘ก in a similarway. To simplify the HMM model, we introduce ฮฅ(๐›พ) :=1๐‘‡

โˆซ ๐‘‡

0๐ท(๐›พ๐‘ก/๐‘‡ )๐‘‘๐‘ก. Then, ๐‘ž11,0 =

โˆซ ๐‘‡

0๐œ†๐‘’โˆ’๐œ†๐‘ก

1โˆ’๐‘’โˆ’๐œ†๐‘‡ ๐ท(๐›พ โˆ’ ๐›พ๐‘ก/๐‘‡ )๐‘‘๐‘กand ๐‘ž10,1 =

โˆซ ๐‘‡

0๐œ†๐‘’โˆ’๐œ‡๐‘ก

1โˆ’๐‘’โˆ’๐œ‡๐‘‡ ๐ท(๐›พ๐‘ก/๐‘‡ )๐‘‘๐‘ก can well be approximatedby ฮฅ(๐›พ), when ๐œ† and ๐œ‡ are sufficiently small. From thisapproximation, the observation probability matrix is given as

q :=

[๐‘ž00,0 ๐‘ž00,1 ๐‘ž01,0 ๐‘ž01,1๐‘ž10,0 ๐‘ž10,1 ๐‘ž11,0 ๐‘ž11,1

]

=

[1โˆ’๐ท(0) 1โˆ’ฮฅ(๐›พ) 1โˆ’ฮฅ(๐›พ) 1โˆ’๐ท(๐›พ)๐ท(0) ฮฅ(๐›พ) ฮฅ(๐›พ) ๐ท(๐›พ)

]. (3)

Given the HMM defined by the state transition and ob-servation probabilities, the problem at hand is the parameterestimation problem in which the true channel usage patternis estimated from the received sensing results (i.e., the obser-vation, o = {๐‘œ1, . . . , ๐‘œ๐‘}). The true channel usage pattern isdenoted by uโˆ— = (๐œ†โˆ—, ๐œ‡โˆ—, ๐›พโˆ—).

B. Equivalence, Identifiability, and Consistency of ProposedHidden Markov Model

The problem of parameter estimation in the proposed HMMis not a trivial problem since the SU can only see theobservations, not the underlying states. For example, whenthe observation changes, the SU does not know whether it iscaused by a PU state transition or a channel sensing error.Thus, one can suspect that the high sensing error rate canbe misinterpreted as the high PU transition rate, leading toincorrect estimation of the true channel usage pattern. Fortu-nately, the PU state transition and the channel sensing error

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 6: Opportunistic Access to Spectrum Holes Between Packet ...

2502 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011

induce different statistical characteristics of the observationsequence, and the true channel usage pattern is identifiablefrom the standpoint of the SU only by imposing some mildconditions.

Let us explain the equivalence and the identifiability ofHMMs. Two HMMs with different parameters, u and u, aresaid to be equivalent if and only if they generate the samestochastic observation sequence as

Pr[o = xโˆฃu] = Pr[o = xโˆฃu], โˆ€๐‘ = 1, 2, . . . ,

and โˆ€๐‘ฅ๐‘› โˆˆ {0, 1} for ๐‘› = 1, . . . , ๐‘ (4)

where x := {๐‘ฅ1, . . . , ๐‘ฅ๐‘}. With a slight abuse of notation,let ๐œ‹๐‘–,๐‘—(u) := Pr[s1 = (๐‘–, ๐‘—)โˆฃu], ๐‘Ÿ๐‘–,๐‘—(u) := Pr[๐›ผ๐‘›+1 =๐‘—โˆฃ๐›ผ๐‘› = ๐‘–,u], and ๐‘ž๐‘š๐‘–,๐‘—(u) := Pr[๐‘œ๐‘› = ๐‘šโˆฃ๐‘ ๐‘› = (๐‘–, ๐‘—),u] denotethe initial, transition, and observation probabilities given thechannel usage pattern u. We can calculate

Pr[o = xโˆฃu] =โˆ‘๐‘ฆ1,...,๐‘ฆ๐‘+1

๐œ‹๐‘ฆ1,๐‘ฆ2(u)

๐‘โˆ๐‘›=2

๐‘Ÿ๐‘ฆ๐‘›,๐‘ฆ๐‘›+1(u)

๐‘โˆ๐‘›=1

๐‘ž๐‘ฅ๐‘›๐‘ฆ๐‘›,๐‘ฆ๐‘›+1

(u) (5)

where ๐‘ฆ๐‘› โˆˆ {0, 1} for all ๐‘›. If two HMMs are equivalent,it is impossible to distinguish these HMMs based on theobservations.

To test the equivalence of two HMMs, we can apply thealgorithm proposed in [28] for the aggregated Markov process(AMP). The AMP is a class of the HMM where an observationis a deterministic function of a state. Our HMM can beconverted to an AMP. Different from the state of an HMM,the state of the corresponding AMP is a vector composed ofa sensing result and a PU state, that is, s๐‘› = (๐‘œ๐‘›, ๐›ผ๐‘›+1). Thetransition probability matrix of an AMP is a 4-by-4 matrixsuch that

h :=

[h0 h0

h1 h1

],where h๐‘š :=

[๐‘Ÿ0,0๐‘ž

๐‘š0,0 ๐‘Ÿ1,0๐‘ž

๐‘š1,0

๐‘Ÿ0,1๐‘ž๐‘š0,1 ๐‘Ÿ1,1๐‘ž

๐‘š1,1

],

for ๐‘š = 0, 1. (6)

The initial state distribution is equal to the stationary state dis-tribution. Let ๐‘“ denote the deterministic function mapping thestate to the observation. We have ๐‘“((0, 0)) = 0, ๐‘“((0, 1)) = 0,๐‘“((1, 0)) = 1, and ๐‘“((1, 1)) = 1. We can easily verify that thisAMP is exactly the same as the original HMM. The followingtheorem states the condition for two AMPs to be equivalent.

Theorem 1 (Equivalence of two AMPs). The AMP with thetransition probability matrix h is equivalent to the AMP withthe transition probability matrix h if and only if the followingconditions are met.

โˆ™ If 1๐‘‡h0๐‰ = 0 and 1๐‘‡ h0๐‰ = 0, the following equalityholds: 1๐‘‡h0 = 1๐‘‡ h0.

โˆ™ Otherwise, there exists a 2-by-2 matrix X such that1๐‘‡X = 1๐‘‡ , Xh0 = h0X, and Xh1 = h1X,

where ๐‰ = (1,โˆ’1)๐‘‡ and 1 is a column vector of all ones.

Proof: See Appendix A for the proof.An HMM with the true parameter uโˆ— โˆˆ ๐’ฐ is said to be

identifiable if and only if for all u โˆˆ ๐’ฐ such that u โˆ•= uโˆ—,the HMM with the parameter u is not equivalent to the HMMwith the true parameter uโˆ—. We can estimate the true parameter

of an HMM from the observations only if the HMM isidentifiable. In the following theorem, we provide a conditionfor the AMP corresponding to an HMM to be identifiable.

Theorem 2 (Identifiability of an AMP). The AMP with thetransition probability matrix h is identifiable if 1๐‘‡h0๐‰ โˆ•= 0and there does not exist any 2-by-2 matrix X โˆ•= I and ๐›พ โ‰ฅ 0that satisfies

1๐‘‡X = 1๐‘‡ and F(๐›พ) โˆ˜ (Xh0Xโˆ’1) = G(๐›พ) โˆ˜ (Xh1X

โˆ’1)(7)

where I is the identity matrix, the notation โˆ˜ is the entrywise(Hadamard) product, and F(๐›พ) and G(๐›พ) are 2-by-2 matricessuch that

F(๐›พ) :=

[๐ท(0) ฮฅ(๐›พ)ฮฅ(๐›พ) ๐ท(๐›พ)

]and G(๐›พ) :=

[1โˆ’๐ท(0) 1โˆ’ฮฅ(๐›พ)1โˆ’ฮฅ(๐›พ) 1โˆ’๐ท(๐›พ)

].

(8)

Proof: See Appendix B for the proof.Roughly speaking, X and ๐›พ satisfying the condition in

(7) do not exist in general, since the condition involves fivevariables (i.e., ๐›พ and four entries in X) while there are sixequations. Although it is hard to make more precise statement,we can say that the proposed HMM is identifiable in mostcases if 1๐‘‡h0๐‰ โˆ•= 0 is satisfied.

As long as an HMM is identifiable, the maximum likelihood(ML) estimation can find the true channel usage pattern. Letus define ฮž(o;u) := ln(Pr[oโˆฃu]) as the log-likelihood of theobservation o given the channel usage pattern u. The MLestimator of the true channel usage pattern uโˆ— is obtainedfrom

u = argmaxuโˆˆ๐’ฐ

ฮž(o;u). (9)

The ML estimator u of uโˆ— is said to be strongly consistentwhen u almost surely converges to uโˆ— as the length ofobservations,๐‘ , goes to infinity. In [29], it was proven that thestrong consistency holds if an HMM with the true parameteruโˆ— is identifiable. In our problem, the strong consistencymeans that the ML estimator in (9) can estimate the truechannel usage pattern uโˆ— in ๐’ฐ if the length of the channellearning subframe is long enough.

C. Gradient Method for Maximum Likelihood Estimation ofChannel Usage Pattern

For the given observation, the ML estimator in (9) canbe found by using either the expectation-maximization (EM)algorithm or the standard gradient method [19]. In this paper,we adopt the gradient method since the EM algorithm can onlybe used in case of the usual parametrization and the gradientmethod can easily be modified so that it recursively updatesthe parameter. Unfortunately, the gradient method as well asthe EM algorithm can only find a local optimal point sinceฮž(o;u) is not a convex function. Algorithms that globallymaximize the log-likelihood function of a general HMM arenot known yet [27].

In each iteration, the gradient method updates the estimateof the channel usage pattern toward the gradient directionof the log-likelihood function ฮž(o;u). Let u(๐‘—) denote theestimate of the channel usage pattern at the ๐‘—th iteration. The

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 7: Opportunistic Access to Spectrum Holes Between Packet ...

CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2503

initial estimate u(0) can be set to an arbitrary channel usagepattern in ๐’ฐ . At the ๐‘—th iteration, the gradient method updatesthe estimate as follows:

u(๐‘—) = ฮ˜๐’ฐ [u(๐‘—โˆ’1) + ๐œŽ(๐‘—) โ‹… โˆ‡ฮž(o; u(๐‘—โˆ’1))] (10)

where ๐œŽ(๐‘—) is a step size, ฮ˜๐’ฐ [โ‹…] is the projection onto the set๐’ฐ , and โˆ‡ฮž(o;u) is the gradient of ฮž(o;u) such that

โˆ‡ฮž(o;u) :=(โˆ‚ฮž

โˆ‚๐œ†(o;u),

โˆ‚ฮž

โˆ‚๐œ‡(o;u),

โˆ‚ฮž

โˆ‚๐›พ(o;u)

). (11)

The iteration stops when u(๐‘—) sufficiently converges to acertain channel usage pattern.

The gradient in (11) can be derived by calculating thepartial derivatives of ฮž(o;u) with respect to ๐œ†, ๐œ‡, and ๐›พ.In Appendix C, we calculate the partial derivatives. We cancalculate ๐œ™(o;u), ๐œ”๐‘–(o;u), ๐œ’๐‘–,๐‘—(o;u), and ๐œ“๐‘š

๐‘–,๐‘—(o;u) by usingthe forward-backward method in [19].

D. Recursive Algorithm for Maximum Likelihood Estimation

The above-mentioned gradient method has to update thechannel usage pattern multiple times within a frame, whichcan be computationally complex. To reduce the complexity,we can alternatively adopt the recursive algorithm [20]. Therecursive algorithm updates the estimate of the channel usagepattern only once in each frame ๐‘˜ on the basis of its sens-ing result ๐œป๐ฟ

๐‘˜ . Over multiple frames, the estimate graduallyconverges to the true channel usage pattern. If u๐‘˜ denotes theestimate of the channel usage pattern in frame ๐‘˜, the recursivealgorithm updates the estimate as

u๐‘˜ = ฮ˜๐’ฐ [u๐‘˜โˆ’1 + ๐œŽ๐‘˜ โ‹… โˆ‡ฮž(๐œป๐ฟ๐‘˜ ; u๐‘˜โˆ’1)] (12)

where ๐œŽ๐‘˜ is the step size for frame ๐‘˜.The recursive algorithm minimizes the following Kullback-

Leibler divergence [20]:

๐พ(u) = Euโˆ—

[ln

Pr[oโˆฃuโˆ—]Pr[oโˆฃu]

]. (13)

If the HMM with the true parameter uโˆ— is identifiable, theKullback-Leibler divergence has a unique minimizer at uโˆ—. Inaddition, โˆ’โˆ‡ฮž(๐œป๐ฟ

๐‘˜ ; u๐‘˜โˆ’1) in (12) is the stochastic gradientof the Kullback-Leibler divergence. Therefore, the recursivealgorithm in (12) can estimate the true channel usage patternby minimizing the Kullback-Leibler divergence. Similar to thegradient method in (10) for the ML estimator, the recursivealgorithm can only find a local minimum since the Kullback-Leibler divergence is generally not a convex function. How-ever, if the initial estimate is close enough to uโˆ—, we can saythat u๐‘˜ converges to uโˆ— with high probability.

E. Rationale Behind the Proposed Frame Structure

In the proposed frame structure, we have assigned thechannel learning subframe dedicated to the estimation ofthe channel usage pattern, instead of just embedding theestimation algorithm in the traditional listen-before-talk policyand making use of the sensing results generated for datatransmission. In this section, we will explain the advantagesof the proposed structure over the latter strategy.

We can easily adapt the proposed HMM (AMP) so that itcan also be applied to the listen-before-talk policy. The listen-before-talk policy senses the channel every ๐ฝ slots and uses therest of slots for data transmission. Without loss of generality,sensing slot ๐‘› starts at time ๐‘ก = (๐‘›โˆ’ 1)๐ฝ๐‘‡ and ends at time๐‘ก = (๐‘›โˆ’ 1)๐ฝ๐‘‡ + ๐‘‡ . Let ๐›ผ๐‘›+1 denote the PU activity at time๐‘ก = (๐‘›โˆ’ 1)๐ฝ๐‘‡ + ๐‘‡ and let ๐‘œ๐‘› denote the sensing result fromsensing slot ๐‘›. Then, we can define the transition probability๐‘Ÿ๐‘–,๐‘— and the observation probability ๐‘ž๐‘š๐‘–,๐‘— in the same way asthe original HMM.

We will show that the estimation of the channel usagepattern becomes more difficult as ๐ฝ increases. As ๐ฝ increases,the PU activity ๐›ผ๐‘›+1 becomes less dependent upon the previ-ous PU activity ๐›ผ๐‘›. Therefore, the transition probability ๐‘Ÿ๐‘–,๐‘—converges to the stationary probability as ๐ฝ goes to infinity.That is, ๐‘Ÿ๐‘–,0 โ†’ ๐œ‡/(๐œ†+๐œ‡) and ๐‘Ÿ๐‘–,1 โ†’ ๐œ†/(๐œ†+๐œ‡) for ๐‘– = 0, 1 as๐ฝ โ†’โˆž. Similarly, the observation probability also convergesas ๐‘œ๐‘š1,๐‘— โˆ’ ๐‘œ๐‘š0,๐‘— โ†’ 0 for ๐‘— = 0, 1 and ๐‘š = 0, 1 as ๐ฝ โ†’ โˆž.From (6), we can see that 1๐‘‡h0๐‰ โ†’ 0 as ๐ฝ โ†’ โˆž. Recallthat, according to Theorem 2, an AMP is unidentifiable if1๐‘‡h0๐‰ = 0. Therefore, we can say that an AMP becomes lessidentifiable as ๐ฝ increases. Roughly speaking, this is because,when ๐ฝ is large, the transition in PU activity looks similar tothe sensing error due to statistical independence between thePU activities at consecutive sensing slots.

From this observation, we can conclude that the proposedchannel learning subframe (i.e., ๐ฝ = 1) performs better thanthe estimation algorithm used in the listen-before-talk policy(i.e., ๐ฝ > 1) and is capable of estimating the channel usagepattern with high transition rates.

IV. DATA TRANSMISSION DURING CHANNEL ACCESS

SUBFRAME

A. Partially Observable Markov Decision Process Model forChannel Access Subframe

During a channel access subframe, the SU exploits spectrumopportunities to transmit its own data. The channel accessalgorithm is responsible for transmitting user data whilelimiting the probability of collision with a PU. This algorithmshould be able to cope with sensing errors. At the same time,it should reduce the time wasted on channel sensing as muchas possible to maximize channel utilization. The proposedalgorithm adopts a strategy different from the traditional listen-before-talk policy. First, the algorithm combines the mostrecent sensing result with previous sensing results to extractreliable information from erroneous sensing results. Second,the algorithm adaptively decides whether to perform sensingor transmit user data in each time slot to prevent unnecessarysensing [30]. We devise an algorithm that accomplishes thesetasks by using a POMDP framework [24]. In addition, thechannel access algorithm should have correct knowledge of thecurrent channel usage pattern of the PU so that it can properlyconfigure the parameters for channel access. Therefore, thealgorithm makes use of the channel usage pattern estimatedin the preceding channel learning subframe.

To design the channel access algorithm, we model thechannel access subframe as a POMDP [24], [31]. In aPOMDP model, similar to HMM, the agent only receives

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 8: Opportunistic Access to Spectrum Holes Between Packet ...

2504 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011

probabilistic observations, while the states are hidden to theagent. However, unlike HMM, the agent does not only receiveobservations in a passive manner, but also takes actions toexert influence on the system. The action taken by the agentaffects state transition and observation probabilities. Moreover,the agent acquires a reward according to the action. At eachtime point, the agent takes into account the observationsreceived until then to choose a right action which is expectedto return a maximum reward. In our model, the agent (i.e., theSU) chooses an action between sensing and data transmission.A reward value depends on whether data transmission issuccessful or results in collision with PU traffic.

We need to define the states, the actions, and the observa-tions for our model. The definition of a state is the same as thatin the HMM. Thus, s๐‘› denotes the state of slot ๐‘› during thechannel access subframe, which represents the PU activity atthe start and the end of slot ๐‘›. Let ๐‘Ž๐‘› denote the action in slot๐‘›. If the SU opts to transmit data in slot ๐‘›, we have ๐‘Ž๐‘› = 1; ifit chooses to sense during slot ๐‘›, we have ๐‘Ž๐‘› = 0. We define๐’œ := {0, 1} as the action space. The observation is also similarto that in the HMM, except for the case that the SU doesnot perform sensing for transmitting data. If the SU performssensing during slot ๐‘›, i.e., if ๐‘Ž๐‘› = 0, the observation (i.e., ๐‘œ๐‘›)is equal to the sensing result, ๐œ๐ด๐‘˜,๐‘›. For slot ๐‘› with ๐‘Ž๐‘› = 1, theobservation ๐‘œ๐‘› is a null observation,โˆ…. Hence, the observationspace for a channel access subframe is ๐’ช := {โˆ…, 0, 1}.

The state transition and observation probabilities are calcu-lated from the channel usage pattern estimated in the channellearning subframe. In our model, an action does not affect thestate transition probabilities. The state transition probabilitiesin the POMDP are the same as those in the HMM. Thatis, we use ๐‘๐‘–,๐‘—๐‘™,๐‘š to denote the state transition probabilityfrom state (๐‘™,๐‘š) to state (๐‘–, ๐‘—), and calculate it from thestate transition probability matrix (2) by substituting ๐œ† and๐œ‡ with ๏ฟฝ๏ฟฝ๐‘˜ and ๏ฟฝ๏ฟฝ๐‘˜, respectively. Different from the HMM,the observation probabilities in the POMDP depend on anaction, since the SU receives a null observation when itselects to transmit data. Let ๐‘ž๐‘š๐‘–,๐‘—(๐‘Ž) denote the observationprobability such that ๐‘œ๐‘› = ๐‘š given s๐‘› = (๐‘–, ๐‘—) and ๐‘Ž๐‘› = ๐‘Ž,i.e., ๐‘ž๐‘š๐‘–,๐‘—(๐‘Ž) := Pr[๐‘œ๐‘› = ๐‘šโˆฃs๐‘› = (๐‘–, ๐‘—), ๐‘Ž๐‘› = ๐‘Ž]. If theaction is sensing, i.e., if ๐‘Ž = 0, the observation probability๐‘ž๐‘š๐‘–,๐‘—(๐‘Ž) is equal to ๐‘ž๐‘š๐‘–,๐‘— of the HMM for (๐‘–, ๐‘—) โˆˆ ๐’ฎ and๐‘š = 0, 1. Therefore, these observation probabilities can bederived from the observation probability matrix (3) by usingthe estimate of the channel usage pattern, u๐‘˜. In addition, wehave ๐‘žโˆ…๐‘–,๐‘—(0) = 0, ๐‘žโˆ…๐‘–,๐‘—(1) = 1, ๐‘ž0๐‘–,๐‘—(1) = 0, and ๐‘ž1๐‘–,๐‘—(1) = 0.

Let us explain the reward model. First, we define two perfor-mance measures: channel utilization and collision probability.The channel utilization is defined as the probability of success-ful data transmission. Data transmission is successful in thecase that the SU transmits data (i.e., ๐‘Ž๐‘› = 1) in a slot duringwhich there is no PU activity (i.e., s๐‘› = (0, 0)). Then, thechannel utilization is

โˆ‘๐‘๐ด

๐‘›=1 Pr[s๐‘› = (0, 0), ๐‘Ž๐‘› = 1]/๐‘๐ด. Wedefine the collision probability as the probability that the PU isactive (i.e., s๐‘› โˆ•= (0, 0)) when the SU attempts to transmit data(i.e., ๐‘Ž๐‘› = 1). Formally, the collision probability is defined as๐ถ :=

(โˆ‘๐‘๐ด

๐‘›=1 Pr[s๐‘› โˆ•= (0, 0), ๐‘Ž๐‘› = 1])/(โˆ‘๐‘๐ด

๐‘›=1 Pr[๐‘Ž๐‘› = 1]).

We maximize the channel utilization while limiting the colli-

sion probability as follows:

max

โˆ‘๐‘๐ด

๐‘›=1 Pr[s๐‘› = (0, 0), ๐‘Ž๐‘› = 1]

๐‘๐ด

s. t. ๐ถ =

โˆ‘๐‘๐ด

๐‘›=1 Pr[s๐‘› โˆ•= (0, 0), ๐‘Ž๐‘› = 1]โˆ‘๐‘๐ด

๐‘›=1 Pr[๐‘Ž๐‘› = 1]โ‰ค ๐ถlim (14)

where ๐ถlim denotes the collision probability limit. We releasethe constraint by applying the Lagrange multiplier ๐œˆ to theconstraint. Then, the optimization problem reduces to

max

๐‘๐ดโˆ‘๐‘›=1

E[๐‘…(s๐‘›, ๐‘Ž๐‘›)] (15)

where ๐‘…(s, ๐‘Ž) is the reward for given state s and action ๐‘Ž,such that

๐‘…(s, ๐‘Ž) =

โŽงโŽจโŽฉ๐œˆ โ‹… ๐ถlim + 1/๐‘๐ด, if s = (0, 0) and ๐‘Ž = 1

๐œˆ โ‹… ๐ถlim โˆ’ ๐œˆ, if s โˆ•= (0, 0) and ๐‘Ž = 1

0, otherwise.(16)

B. Channel Access Algorithm

We now design the channel access algorithm that selects anaction in each slot in order to maximize the objective functionin (15). To decide an action for slot ๐‘›, the algorithm considersthe observations obtained until slot ๐‘›, i.e., ๐‘œ1, . . . , ๐‘œ๐‘›โˆ’1.Instead of directly using the observations, the algorithm cal-culates the belief vector and uses it to decide an action. Itis known that the belief vector summarizes all the necessaryinformation required to make an optimal decision [31]. Let๐…๐‘› := (๐œ‹๐‘›0,0, ๐œ‹

๐‘›0,1, ๐œ‹

๐‘›1,0, ๐œ‹

๐‘›1,1) denote the belief vector for slot

๐‘›. In the belief vector, ๐œ‹๐‘›๐‘–,๐‘— represents the belief that the statein slot ๐‘› is (๐‘–, ๐‘—) given ๐‘Ž1, . . . , ๐‘Ž๐‘›โˆ’1 and ๐‘œ1, . . . , ๐‘œ๐‘›โˆ’1. Thatis, ๐œ‹๐‘›๐‘–,๐‘— := Pr[s๐‘› = (๐‘–, ๐‘—)โˆฃ๐…1, ๐‘Ž1, . . . , ๐‘Ž๐‘›โˆ’1, ๐‘œ1, . . . , ๐‘œ๐‘›โˆ’1].Let ฮ  denote the domain of a belief vector, i.e., ฮ  :={(๐œ‹๐‘–,๐‘—)(๐‘–,๐‘—)โˆˆ๐’ฎ โˆฃ

โˆ‘(๐‘–,๐‘—)โˆˆ๐’ฎ ๐œ‹๐‘–,๐‘— โ‰ค 1 and ๐œ‹๐‘–,๐‘— โ‰ฅ 0 for (๐‘–, ๐‘—) โˆˆ ๐’ฎ}.

The initial belief vector ๐…1 is the stationary distribution of thehidden process. The belief vector in slot ๐‘› is updated from thebelief vector in slot (๐‘›โˆ’ 1) as follows:

๐œ‹๐‘›๐‘–,๐‘— = ๐œ‚๐‘–,๐‘—(๐…๐‘›โˆ’1; ๐‘Ž๐‘›โˆ’1, ๐‘œ๐‘›โˆ’1), for (๐‘–, ๐‘—) โˆˆ ๐’ฎ (17)

where

๐œ‚๐‘–,๐‘—(๐…; ๐‘Ž, ๐‘œ) =

โˆ‘(๐‘™,๐‘š)โˆˆ๐’ฎ ๐‘

๐‘–,๐‘—๐‘™,๐‘š โ‹… ๐‘ž๐‘œ๐‘™,๐‘š(๐‘Ž) โ‹… ๐œ‹๐‘™,๐‘š๐œƒ(๐…; ๐‘Ž, ๐‘œ)

(18)

and

๐œƒ(๐…; ๐‘Ž, ๐‘œ) =โˆ‘

(๐‘–,๐‘—)โˆˆ๐’ฎ

โˆ‘(๐‘™,๐‘š)โˆˆ๐’ฎ

๐‘๐‘–,๐‘—๐‘™,๐‘š โ‹… ๐‘ž๐‘œ๐‘™,๐‘š(๐‘Ž) โ‹… ๐œ‹๐‘™,๐‘š. (19)

Note that the update of the belief vector is slightly differentfrom the one in [31], since only the observations from untilthe previous slot are available.

The channel access algorithm selects an action accordingto a policy. Let ๐œท := (๐›ฝ1, . . . , ๐›ฝ๐‘๐ด) denote a policy. A policyin slot ๐‘›, i.e., ๐›ฝ๐‘› : ฮ  โ†’ ๐’œ, is a mapping of a belief vector๐…๐‘› to an action ๐‘Ž๐‘›. In slot ๐‘›, the channel access algorithmchooses ๐›ฝ๐‘›(๐…๐‘›) as an action. Among the policies, we definethe optimal policy ๐œทโˆ— := (๐›ฝโˆ—1 , . . . , ๐›ฝโˆ—๐‘๐ด

) as the one that

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 9: Opportunistic Access to Spectrum Holes Between Packet ...

CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2505

maximizes the objective function in (15). To derive the optimalpolicy, we define the optimal value function ๐‘‰ โˆ—

๐‘› : ฮ  โ†’ โ„œ asthe maximum expected reward that will be earned from slot ๐‘›for the current belief vector. The optimal value function canbe found by the following dynamic programming recursion[31]:

๐‘‰ โˆ—๐‘๐ด

(๐…) = max๐‘Žโˆˆ๐’œ

{ โˆ‘(๐‘–,๐‘—)โˆˆ๐’ฎ

๐œ‹๐‘–,๐‘—๐‘…((๐‘–, ๐‘—), ๐‘Ž)

}(20)

๐‘‰ โˆ—๐‘› (๐…) = max

๐‘Žโˆˆ๐’œ

{ โˆ‘(๐‘–,๐‘—)โˆˆ๐’ฎ

๐œ‹๐‘–,๐‘—๐‘…((๐‘–, ๐‘—), ๐‘Ž) +

โˆ‘๐‘œโˆˆ๐’ช

๐œƒ(๐…; ๐‘Ž, ๐‘œ) โ‹… ๐‘‰ โˆ—๐‘›+1(๐œผ(๐…; ๐‘Ž, ๐‘œ))

}(21)

where ๐œผ(๐…; ๐‘Ž, ๐‘œ) := (๐œ‚๐‘–,๐‘—(๐…; ๐‘Ž, ๐‘œ))(๐‘–,๐‘—)โˆˆ๐’ฎ . The optimal policy๐œทโˆ— is a policy such that ๐›ฝโˆ—๐‘› for each ๐‘› maps a belief vectorto a maximizing argument in (20) and (21).

Although we can calculate the optimal policy from (20)and (21), the complexity of the dynamic programming in anuncountable set can be prohibitive [31]. Moreover, we shouldalso find the Lagrange multiplier ๐œˆ that makes the collisionprobability constraint in (14) satisfied, which requires a highcomplexity iterative algorithm such as the subgradient method.To overcome this difficulty, we suggest a simple stationarysuboptimal policy that exhibits a near-optimal performancein terms of channel utilization while restricting the collisionprobability within the collision probability limit ๐ถlim. Thesuboptimal policy is

๐›ฝsub๐‘› (๐…) =

{1, 1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlim

0, otherwiseโˆ€๐‘› = 1, . . . , ๐‘๐ด. (22)

In Appendix D, we prove that this suboptimal policy satisfiesthe collision probability constraint. Also, in Section V, itis shown by using simulations that the suboptimal policyachieves a near-optimal performance. In Fig. 5, we summa-rize the operation of the channel access algorithm when thesuboptimal policy is applied.

V. NUMERICAL RESULTS

We first evaluate the performances of the channel learningand the channel access algorithms separately, and then studythe benefit of the combined use of both algorithms. The sim-ulation parameters are as follows: bandwidth of a frequencychannel (๐‘Š ) is 10 MHz; length of a frame is 200 ms; lengthof a slot (๐‘‡ ) is 20 ๐œ‡s. There are 1000 and 9000 slots in achannel learning subframe and in a channel access subframe,respectively. The threshold for energy detection (๐›ฟ) is set to1.16. The set of possible channel usage patterns is given as๐’ฐ = {(๐œ†, ๐œ‡, ๐›พ)โˆฃ๐œ† โ‰ค 1 kHz, ๐œ‡ โ‰ค 1 kHz, ๐œŒ โ‰ฅ โˆ’10 dB}. Weuse the recursive algorithm for estimating the channel usagepattern. We use a constant step size, ๐œŽ๐‘˜ = 10โˆ’5, for therecursive algorithm. We assume that the SU does not switchthe frequency channel during simulation time.

Fig. 6 demonstrates how well the channel learning algorithmestimates the time-varying channel usage pattern. The channelusage pattern changes in frames 1000, 2000, and 3000. Inthis figure, we can see that the estimate fluctuates around

1: Calculate the state transition and observationprobabilities from u๐‘˜

2: Calculate the initial belief vector, ๐…1

3: for ๐‘› = 1 to ๐‘๐ด do4: if 1โˆ’ ๐œ‹๐‘›0,0 โ‰ค ๐ถlim then5: SU exchanges user data in slot ๐‘›6: ๐‘Ž๐‘› โ† 17: ๐‘œ๐‘› โ† โˆ…

8: else9: SU performs energy detection in slot ๐‘› and

calculates the sensing result ๐œ๐ด๐‘›10: ๐‘Ž๐‘› โ† 011: ๐‘œ๐‘› โ† ๐œ๐ด๐‘›12: end if13: ๐œ‹๐‘›+1

๐‘–,๐‘— โ† ๐œ‚๐‘–,๐‘—(๐…๐‘›; ๐‘Ž๐‘›, ๐‘œ๐‘›) for (๐‘–, ๐‘—) โˆˆ ๐’ฎ14: end for

Fig. 5. The channel access algorithm in the channel access subframe offrame ๐‘˜.

0 500 1000 1500 2000 2500 3000 3500 40000.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 k k k k ^^^

Stat

e tra

nsiti

on ra

tes

(kH

z)

Frame

-7

-6

-5

-4

-3

-2

-1 k k

SN

R (d

B)

Fig. 6. Estimates of the channel usage pattern over frames.

the real channel usage pattern due to the constant step size.Nonetheless, the channel learning algorithm well tracks thevariations of the channel usage pattern. Note that the speedand the accuracy of convergence can be controlled by adjustingthe step size ๐œŽ๐‘˜.

We evaluate the performance of the channel access algo-rithm in Figs. 7 and 8. For these figures, we assume thatthe channel usage pattern remains the same over time and isknown to the SU so that we can focus on the performance ofthe channel access algorithm. Fig. 7 shows the utilization andthe collision probability for the proposed channel access algo-rithm with the suboptimal policy as function of the collisionprobability limit. We can see in the figure that the utilizationconverges to the probability that a slot is not occupied by thePU as the collision probability increases. This figure showsthat the collision probability does not exceed the collisionprobability limit, regardless of the channel usage pattern. By

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 10: Opportunistic Access to Spectrum Holes Between Packet ...

2506 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011

0.01 0.04 0.07 0.11E-3

0.01

0.1

1U

tiliz

atio

n an

d co

llisi

on p

roba

bilit

y

Collision probability limit

Collision probability limit Utilization, = = 0.2 kHz, SNR = -3 dB Collision Prob., = = 0.2 kHz, SNR = -3 dB Utilization, = 0.2 kHz, = 0.1 kHz, SNR = -5 dB Collision Prob., = 0.2 kHz, = 0.1 kHz, SNR = -5 dB

Fig. 7. Variations in utilization and collision probability with collisionprobability limit for the proposed channel access algorithm.

0.01 0.1 0.50.0

0.1

0.2

0.3

0.4

0.5

Util

izat

ion

Collision Probability

Proposed, suboptimal, = = 0.2 kHz Proposed, optimal, = = 0.2 kHz Heuristic, = = 0.2 kHz Proposed, suboptimal, = 0.2 kHz, = 0.1 kHz Proposed, optimal, = 0.2 kHz, = 0.1 kHz Heuristic, = 0.2 kHz, = 0.1 kHz

Fig. 8. Performance comparison of the proposed channel access algorithmswith suboptimal and optimal policies, and the heuristic channel accessalgorithm in terms of utilization and collision probability. The SNR of aPU signal is set to -4 dB.

lowering the collision probability limit, we can decrease thecollision probability at the cost of the utilization.

Fig. 8 compares the performances of the proposed channelaccess algorithm (with suboptimal and optimal policies) andthe performances of the heuristic channel access algorithm.We compare the proposed algorithm with a simple listen-before-talk heuristic algorithm. If the sensing result in slot(๐‘› โˆ’ 1) indicates that the channel is inactive, the heuristicalgorithm transmits data for ๐œ consecutive slots from slot ๐‘›until it performs another energy detection. Thus, ๐œ balancesthe tradeoff between the utilization and the collision proba-bility for the heuristic algorithm. The graphs are plotted byvarying ๐ถlim for the proposed algorithm with the suboptimalpolicy, ๐œˆ and ๐ถlim for the proposed algorithm with the optimalpolicy, and ๐œ for the heuristic algorithm. In this figure, we cansee that the proposed algorithm with the suboptimal policyexhibits performance very close to the optimal one. Therefore,we can say that the suboptimal policy is a very useful low-complexity alternative to the optimal policy, accomplishing

100 500 1000 1500 2000 2500 3000 3500 40001E-3

0.01

0.1

1

Util

izat

ion

and

colli

sion

pro

babi

lity

Frame

Proposed with learning, utilization Proposed with learning, collision prob. Proposed w/o learning, utilization Proposed w/o learning, collision prob.

Fig. 9. Time variation of utilization and collision probability for the proposedschemes with and without the channel learning algorithm. The utilization andcollision probability are time-averaged over every 100 frames.

0.003 0.01 0.1 10.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Utilizationcdf

Utilization and collision probability

Proposed with learning Proposed w/o learning Heuristic

Collision probability

Fig. 10. Cumulative density functions of utilization and collision probabilitywhen the proposed schemes with and without the channel learning algorithmand the heuristic channel access algorithm are used.

a near-optimal performance as well as effectively limitingthe collision probability. We also observe that the proposedalgorithm outperforms the heuristic algorithm. The proposedalgorithm can achieve very low collision probability owing toits resilience to sensing errors, whereas the heuristic algorithmcannot.

In Figs. 9-10, we consider the channel learning algorithmas well as the channel access algorithm to investigate theimpact of channel learning on the system performance. Fig. 9shows the time variation of the utilization and the collisionprobability of the proposed schemes with and without thechannel learning algorithm. Since the proposed scheme withlearning consumes additional ๐‘๐ฟ slots for channel learning,for fairness in comparison, we multiply ๐‘๐ด/(๐‘๐ฟ + ๐‘๐ด) tothe utilization of the proposed scheme with learning. Forboth the schemes, the collision probability limit, ๐ถlim, is setto 0.03. While the proposed scheme with learning utilizes

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 11: Opportunistic Access to Spectrum Holes Between Packet ...

CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2507

the channel usage pattern estimated by the channel learningalgorithm to adjust the parameters of the channel access algo-rithm, the proposed scheme without learning just assumes that๐œ† = ๐œ‡ = 0.3 kHz and ๐›พ = โˆ’3 dB. The channel usage patternu๐‘˜ varies over time as follows: (0.4 kHz, 0.4 kHz,โˆ’3 dB)for ๐‘˜ = 1, . . . , 1000, (0.6 kHz, 0.2 kHz,โˆ’5 dB) for๐‘˜ = 1001, . . . , 2000, (0.1 kHz, 0.6 kHz,โˆ’2 dB) for ๐‘˜ =2001, . . . , 3000, and (0.4 kHz, 0.2 kHz,โˆ’6 dB) for ๐‘˜ =3001, . . . , 4000. From Fig. 9, we observe that the proposedscheme without learning violates the collision probability limitand imposes excessive interference to PU traffic, when thechannel usage pattern is unfavorable. On the other hand, forthe proposed scheme with learning, the collision probabilityremains below the collision probability limit, irrespective ofhow the channel usage pattern varies. This is due to the factthat the scheme with learning is able to adapt its parametersto the varying channel usage pattern.

In Fig. 10, we compare the cumulative distribution functions(cdfโ€™s) of the utilization and the collision probability when theproposed schemes with and without learning and the heuristicchannel access algorithm are used. We estimate the utilizationand the collision probability in each frame and calculate thecorresponding cumulative distribution functions. The channelusage pattern randomly changes over frames. The durationbetween consecutive changes in the channel usage patternfollows a geometric distribution with an average of 1000frames. The state transition rates ๐œ† and ๐œ‡ are selected froma uniform distribution over [0.1 kHz, 1 kHz], and the SNRof PU signals is uniformly distributed over [โˆ’6 dB,โˆ’3 dB].The collision probability limit is set to 0.03. The proposedscheme without learning assumes that ๐œ† = ๐œ‡ = 0.8 kHz and๐›พ = โˆ’6 dB. For the heuristic algorithm, we set ๐œ = 1 toreduce the collision probability of the heuristic algorithm asmuch as possible. From Fig. 10, we observe that the colli-sion probability limit is frequently violated by the proposedscheme without learning and the heuristic algorithm, whilethe proposed scheme with learning well keeps the collisionprobability below the limit. The proportions of the frames inwhich the collision probability exceeds the limit are 0.07, 0.08,and 0.61 for the proposed schemes with and without learning,and the heuristic algorithm, respectively. From this figure,we can conclude that the proposed scheme with learning caneffectively maintain the collision probability under the targetlimit. While keeping the collision probability, the proposedscheme with learning also has the average utilization (i.e.,0.31) considerably higher than the proposed scheme withoutlearning (i.e., 0.18) and the heuristic algorithm (i.e., 0.24).

VI. CONCLUSION

We have proposed a channel sensing and channel accessscheme that opportunistically exploits frequency channelsoccupied by a data-centric primary user network. The pro-posed scheme repeats a learning and access cycle, driven bythe channel learning and the channel access algorithms. Tomake the scheme robust to high sensing error probability, wehave applied the hidden Markov model (HMM) and partiallyobservable Markov decision process (POMDP) frameworksto the channel learning and the channel access algorithms,

respectively. The simulation results have shown that, by adapt-ing to varying channel usage pattern, the proposed schemeprovides efficient access to spectrum opportunities while con-straining the interference to the primary users below the targetlimit. The proposed scheme outperforms a heuristic algorithmwithout any learning functionality. Extension of the schemeto a distributed multiuser scenario will be considered in ourfuture work.

APPENDIX

A. Proof of the Condition for Equivalence of Two AMPs

Let u and u be the channel usage patterns correspondingto the AMPs with h and h, respectively. The probability ofan observation sequence x = {๐‘ฅ1, . . . , ๐‘ฅ๐‘} given the channelusage pattern u can be rewritten as

Pr[o = xโˆฃu] = 1๐‘‡ โ‹… I๐‘ฅ๐‘h โ‹… I๐‘ฅ๐‘โˆ’1h โ‹… โ‹… โ‹… I๐‘ฅ2h โ‹… I๐‘ฅ1๐…

= 1๐‘‡h๐‘ฅ๐‘h๐‘ฅ๐‘โˆ’1 โ‹… โ‹… โ‹…h๐‘ฅ2๐…๐‘ฅ1

(23)

where ๐… := (๐…0,๐…1)๐‘‡ is a column vector of the initial state

distribution in which ๐…0 and ๐…1 are 2-by-1 column vectors,I0 := diag(1, 1, 0, 0), and I1 := diag(0, 0, 1, 1).

We first consider the case that 1๐‘‡h0๐‰ = 0 and 1๐‘‡ h0๐‰ = 0.In this case, we have 1๐‘‡h๐‘ฅ = 1๐‘‡ ๐‘ฆ๐‘ฅ and 1๐‘‡ h๐‘ฅ = 1๐‘‡ ๐‘ฆ๐‘ฅ for๐‘ฅ = 0, 1 and some real values ๐‘ฆ0, ๐‘ฆ1, ๐‘ฆ0, and ๐‘ฆ1. Then, wehave Pr[o = xโˆฃu] = ๐‘ฆ๐‘ฅ๐‘ ๐‘ฆ๐‘ฅ๐‘โˆ’1 โ‹… โ‹… โ‹… ๐‘ฆ๐‘ฅ2๐‘ฆ๐‘ฅ1 and Pr[o = xโˆฃu] =๐‘ฆ๐‘ฅ๐‘ ๐‘ฆ๐‘ฅ๐‘โˆ’1 โ‹… โ‹… โ‹… ๐‘ฆ๐‘ฅ2๐‘ฆ๐‘ฅ1 . The AMPs with h and h are equivalentif and only if ๐‘ฆ๐‘ฅ๐‘ ๐‘ฆ๐‘ฅ๐‘โˆ’1 โ‹… โ‹… โ‹… ๐‘ฆ๐‘ฅ2๐‘ฆ๐‘ฅ1 and ๐‘ฆ๐‘ฅ๐‘ ๐‘ฆ๐‘ฅ๐‘โˆ’1 โ‹… โ‹… โ‹… ๐‘ฆ๐‘ฅ2๐‘ฆ๐‘ฅ1

are the same for all observation sequences x. This conditionis satisfied only when ๐‘ฆ๐‘ฅ = ๐‘ฆ๐‘ฅ for ๐‘ฅ = 0, 1. Therefore, wecan conclude that 1๐‘‡h0 = 1๐‘‡ h0 should be satisfied for theequivalence of two AMPs.

We now consider the case that 1๐‘‡h0๐‰ โˆ•= 0 or 1๐‘‡ h0๐‰ โˆ•= 0.The proof for this case is based on the result in [28]. Let ๐’ฑdenote the null space defined by

๐’ฑ := {๐…โˆฃ1๐‘‡ โ‹… I๐‘ฅ๐‘h โ‹… I๐‘ฅ๐‘โˆ’1h โ‹… โ‹… โ‹… I๐‘ฅ2h โ‹… I๐‘ฅ1๐… = 0 โˆ€x}.(24)

The vector in the null space should satisfy 1๐‘‡๐…0 = 0,1๐‘‡h0๐…0 = 0, 1๐‘‡๐…1 = 0, and 1๐‘‡h0๐…1 = 0. If 1๐‘‡h0๐‰ โˆ•= 0,the only vector satisfying the condition is the zero vector. In[28], it is shown that the AMPs with h and h are equivalentif and only if h and h are similar via some block diagonalmatrix preserving the probability, on the quotient space wherethe null space is factored out. Since the null space has zerodimension in this case, the AMPs are equivalent if and onlyif there exists a 2-by-2 matrix X such that

1๐‘‡X = 1๐‘‡ , Xh0 = h0X, and Xh1 = h1X. (25)

B. Proof of the Condition for Identifiability of an AMP

If 1๐‘‡h0๐‰ = 0, there can be an infinite number of AMPswith the transition probability matrix h โˆ•= h that satisfies1๐‘‡ h0 = 1๐‘‡h0. Since these AMPs are equivalent to the AMPwith h from Theorem 1, it should be satisfied that 1๐‘‡h0๐‰ โˆ•= 0for the AMP to be identifiable.

Suppose that there exists an AMP with h that is equivalentto the AMP with h when 1๐‘‡h0๐‰ โˆ•= 0. Then, from Theorem 1,there exists X โˆ•= I such that 1๐‘‡X = 1๐‘‡ , Xh0 = h0X, and

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 12: Opportunistic Access to Spectrum Holes Between Packet ...

2508 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011

Xh1 = h1X. We can calculate h0 = Xh0Xโˆ’1 and h1 =

Xh1Xโˆ’1. These matrices should satisfy, for some ๐‘Ÿ๐‘–,๐‘— and ๐›พ,

h0 =

[๐‘Ÿ0,0(1โˆ’๐ท(0)) ๐‘Ÿ1,0(1โˆ’ฮฅ(๐›พ))๐‘Ÿ0,1(1 โˆ’ฮฅ(๐›พ)) ๐‘Ÿ1,1(1 โˆ’๐ท(๐›พ))

](26)

and

h1 =

[๐‘Ÿ0,0๐ท(0) ๐‘Ÿ1,0ฮฅ(๐›พ)๐‘Ÿ0,1ฮฅ(๐›พ) ๐‘Ÿ1,1๐ท(๐›พ)

]. (27)

Therefore, we have

1๐‘‡X = 1๐‘‡ and F(๐›พ) โˆ˜ (Xh0Xโˆ’1) = G(๐›พ) โˆ˜ (Xh1X

โˆ’1).(28)

If there is no X โˆ•= I and ๐›พ โ‰ฅ 0 satisfying the above condition,we can say that there is no AMP equivalent to the AMP withh.

C. Calculation of the Gradient of ฮž(o;u)

We calculate the partial derivatives of ฮž(o;u) with respectto ๐œ†, ๐œ‡, and ๐›พ. To do this, we first define ๐œ™(o;u) := Pr[oโˆฃu].Recall that ๐›ผ๐‘› is the PU activity at time ๐‘ก = (๐‘›โˆ’ 1)๐‘‡ when๐‘ก = 0 at the start of the channel learning subframe. Let usdefine ๐œถ := (๐›ผ1, . . . , ๐›ผ๐‘๐ฟ+1). Then, ๐œ™(o;u) can be rewrittenas the sum of the probabilities Pr[o,๐œถโˆฃu]โ€™s for all possible๐œถโ€™s, that is, ๐œ™(o;u) =

โˆ‘๐œถ ๐œ…(o,๐œถ;u), where

๐œ…(o,๐œถ;u) = Pr[o,๐œถโˆฃu] = ๐‘๐›ผ1

๐‘๐ฟโˆ๐‘›=1

๐‘Ÿ๐›ผ๐‘›,๐›ผ๐‘›+1 โ‹… ๐‘ž๐‘œ๐‘›๐›ผ๐‘›,๐›ผ๐‘›+1.

(29)

In the above equation, we define ๐‘๐‘– := Pr[๐›ผ1 = ๐‘–] and ๐‘Ÿ๐‘–,๐‘— :=Pr[๐›ผ๐‘›+1 = ๐‘—โˆฃ๐›ผ๐‘› = ๐‘–]. Then, we have ๐‘0 = ๐œ‡/(๐œ† + ๐œ‡), ๐‘1 =๐œ†/(๐œ† + ๐œ‡), ๐‘Ÿ0,0 = ๐‘’โˆ’๐œ†๐‘‡ , ๐‘Ÿ0,1 = 1 โˆ’ ๐‘’โˆ’๐œ†๐‘‡ , ๐‘Ÿ1,0 = 1โˆ’ ๐‘’โˆ’๐œ‡๐‘‡ ,and ๐‘Ÿ1,1 = ๐‘’โˆ’๐œ‡๐‘‡ . In addition, using the definition of ฮฅ(๐›พ),we have ๐‘ž00,0 = 1 โˆ’ ๐ท(0), ๐‘ž10,0 = ๐ท(0), ๐‘ž00,1 = 1 โˆ’ ฮฅ(๐›พ),๐‘ž10,1 = ฮฅ(๐›พ), ๐‘ž01,0 = 1โˆ’ฮฅ(๐›พ), ๐‘ž11,0 = ฮฅ(๐›พ), ๐‘ž01,1 = 1โˆ’๐ท(๐›พ),and ๐‘ž11,1 = ๐ท(๐›พ).

First, we calculate the derivative of ๐œ…(o,๐œถ;u) with respectto an arbitrary variable ๐‘ฅ. That is,

โˆ‚๐œ…

โˆ‚๐‘ฅ(o,๐œถ;u) =

โˆ‘๐‘–โˆˆ{0,1}

โˆ‚๐‘๐‘–โˆ‚๐‘ฅโ‹… 1๐‘๐‘–โ‹… 1๐›ผ1=๐‘– Pr[o,๐œถโˆฃu]

+โˆ‘

(๐‘–,๐‘—)โˆˆ๐’ฎ

โˆ‚๐‘Ÿ๐‘–,๐‘—โˆ‚๐‘ฅโ‹… 1

๐‘Ÿ๐‘–,๐‘—โ‹…๐‘๐ฟโˆ‘๐‘›=1

1s๐‘›=(๐‘–,๐‘—) Pr[o,๐œถโˆฃu]

+โˆ‘

(๐‘–,๐‘—)โˆˆ๐’ฎ

โˆ‘๐‘šโˆˆ๐’ช

โˆ‚๐‘ž๐‘š๐‘–,๐‘—โˆ‚๐‘ฅโ‹… 1

๐‘ž๐‘š๐‘–,๐‘—โ‹…๐‘๐ฟโˆ‘๐‘›=1

1s๐‘›=(๐‘–,๐‘—),๐‘œ๐‘›=๐‘š Pr[o,๐œถโˆฃu]

(30)

where ๐’ฎ is the state space, ๐’ช is the observation space, and1๐‘‹ is a function that is 1 if ๐‘‹ is true; and 0 otherwise. Now,

we calculate โˆ‚ฮž/โˆ‚๐‘ฅ as

โˆ‚ฮž

โˆ‚๐‘ฅ(o;u) =

1

๐œ™(o;u)โ‹… โˆ‚๐œ™(o;u)

โˆ‚๐‘ฅ=

1

๐œ™(o;u)โ‹…โˆ‘๐œถ

โˆ‚๐œ…(o,๐œถ;u)

โˆ‚๐‘ฅ

=1

๐œ™(o;u)โ‹…( โˆ‘

๐‘–โˆˆ{0,1}

โˆ‚๐‘๐‘–โˆ‚๐‘ฅโ‹… 1๐‘๐‘–โ‹… ๐œ”๐‘–(o;u)

+โˆ‘

(๐‘–,๐‘—)โˆˆ๐’ฎ

โˆ‚๐‘Ÿ๐‘–,๐‘—โˆ‚๐‘ฅโ‹… 1

๐‘Ÿ๐‘–,๐‘—โ‹… ๐œ’๐‘–,๐‘—(o;u)

+โˆ‘

(๐‘–,๐‘—)โˆˆ๐’ฎ

โˆ‘๐‘šโˆˆ๐’ฑ

โˆ‚๐‘ž๐‘š๐‘–,๐‘—โˆ‚๐‘ฅโ‹… 1

๐‘ž๐‘š๐‘–,๐‘—โ‹… ๐œ“๐‘š

๐‘–,๐‘—(o;u)

)(31)

where we define ๐œ”๐‘–(o;u) := Pr[๐›ผ1 = ๐‘–,oโˆฃu],๐œ’๐‘–,๐‘—(o;u) :=

โˆ‘๐‘๐ฟ

๐‘›=1 Pr[s๐‘› = (๐‘–, ๐‘—),oโˆฃu], and ๐œ“๐‘š๐‘–,๐‘—(o;u) :=โˆ‘

๐‘›โˆฃ๐‘œ๐‘›=๐‘š Pr[s๐‘› = (๐‘–, ๐‘—),oโˆฃu]. From this equation, we cancalculate โˆ‚ฮž/โˆ‚๐œ†, โˆ‚ฮž/โˆ‚๐œ‡, and โˆ‚ฮž/โˆ‚๐›พ. For example, we canderive โˆ‚ฮž/โˆ‚๐œ† asโˆ‚ฮž

โˆ‚๐œ†(o;u) =

1

๐œ™(o;u)โ‹…(โˆ‚๐‘0โˆ‚๐œ†

โ‹… 1

๐‘0โ‹… ๐œ”0(o;u) +

โˆ‚๐‘1โˆ‚๐œ†

โ‹… 1

๐‘1โ‹… ๐œ”1(o;u)

+โˆ‚๐‘Ÿ0,0โˆ‚๐œ†

โ‹… 1

๐‘Ÿ0,0โ‹… ๐œ’0,0(o;u)

+โˆ‚๐‘Ÿ0,1โˆ‚๐œ†

โ‹… 1

๐‘Ÿ0,1โ‹… ๐œ’0,1(o;u)

)

=1

๐œ™(o;u)โ‹…(๐œ‡ โ‹… ๐œ”1(o;u)

๐œ†(๐œ†+ ๐œ‡)โˆ’ ๐œ”0(o;u)

๐œ†+ ๐œ‡

+๐‘‡ โ‹… ๐œ’0,1(o;u)

๐‘’๐œ†๐‘‡ โˆ’ 1โˆ’ ๐‘‡ โ‹… ๐œ’0,0(o;u)

). (32)

We can also calculate โˆ‚ฮž/โˆ‚๐œ‡ and โˆ‚ฮž/โˆ‚๐›พ in a similar way.

D. The Suboptimal Policy Satisfies the Collision ProbabilityConstraint

Proof: We prove that collision probability does not exceedthe collision probability limit, i.e., ๐ถ โ‰ค ๐ถlim, when thesuboptimal policy ๐œทsub = (๐›ฝsub

1 , . . . , ๐›ฝsub๐‘๐ด

) is applied. Providedthat ๐œทsub is used, we can rewrite the collision probability as

๐ถ =

โˆ‘๐‘๐ด

๐‘›=1 Pr[s๐‘› โˆ•= (0, 0), ๐‘Ž๐‘› = 1]โˆ‘๐‘๐ด

๐‘›=1 Pr[๐‘Ž๐‘› = 1]

=

โˆ‘๐‘๐ด

๐‘›=1

โˆ‘ฮ“๐‘›

Pr[s๐‘› โˆ•= (0, 0), 1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlimโˆฃฮ“๐‘›] โ‹… Pr[ฮ“๐‘›]โˆ‘๐‘๐ด

๐‘›=1

โˆ‘ฮ“๐‘›

Pr[1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlimโˆฃฮ“๐‘›] โ‹… Pr[ฮ“๐‘›]

(33)

where ฮ“๐‘› := {๐…1, ๐‘Ž1, . . . , ๐‘Ž๐‘›โˆ’1, ๐‘œ1, . . . , ๐‘œ๐‘›โˆ’1}.Since ๐œ‹0,0 only depends on ฮ“๐‘›, the value of Pr[1โˆ’ ๐œ‹0,0 โ‰ค

๐ถlimโˆฃฮ“๐‘›] in the denominator in (33) is one if 1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlim;and zero, otherwise. Also, Pr[s๐‘› โˆ•= (0, 0), 1โˆ’๐œ‹0,0 โ‰ค ๐ถlimโˆฃฮ“๐‘›]in the numerator in (33) is calculated as

Pr[s๐‘› โˆ•= (0, 0), 1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlimโˆฃฮ“๐‘›] ={1โˆ’ ๐œ‹0,0, if 1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlim

0, otherwise.(34)

Therefore, the inequality Pr[s๐‘› โˆ•= (0, 0), 1 โˆ’ ๐œ‹0,0 โ‰ค๐ถlimโˆฃฮ“๐‘›] โ‰ค ๐ถlim โ‹…Pr[1โˆ’๐œ‹0,0 โ‰ค ๐ถlimโˆฃฮ“๐‘›] is satisfied. Applyingthis inequality to (33), we can conclude that

๐ถ โ‰คโˆ‘๐‘๐ด

๐‘›=1

โˆ‘ฮ“๐‘›

๐ถlim โ‹… Pr[1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlimโˆฃฮ“๐‘›] โ‹… Pr[ฮ“๐‘›]โˆ‘๐‘๐ด๐‘›=1

โˆ‘ฮ“๐‘›

Pr[1โˆ’ ๐œ‹0,0 โ‰ค ๐ถlimโˆฃฮ“๐‘›] โ‹… Pr[ฮ“๐‘›]= ๐ถlim.

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

Page 13: Opportunistic Access to Spectrum Holes Between Packet ...

CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2509

REFERENCES

[1] S. Geirhofer, L. Tong, and B. M. Sadler, โ€œA measurement-based modelfor dynamic spectrum access in WLAN channels,โ€ in Proc. IEEE MIL-COM Oct. 2006.

[2] S. Geirhofer, L. Tong, and B. M. Sadler, โ€œDynamic spectrum accessin the time domain: modeling and exploiting white space,โ€ IEEECommun. Mag., vol. 45, no. 5, pp. 66โ€“72, May 2007.

[3] S. D. Jones, E. Jung, X. Liu, N. Merheb, and I. J. Wang, โ€œChar-acterization of spectrum activities in the U.S. public safety band foropportunistic spectrum access,โ€ in Proc. IEEE DySPAN Apr. 2007.

[4] M. Wellens, J. Riihijarvi, and P. Mahonen, โ€œEmpirical time and fre-quency domain models of spectrum use,โ€ Physical Commun. (Elsevier),vol. 2, no. 1โ€“2, pp. 10โ€“32, Mar. 2009.

[5] M. Wellens and P. Mahonen, โ€œLessons learned from an extensivespectrum occupancy measurement campaign and a stochastic duty cyclemodel,โ€ in Proc. TridentCom Apr. 2009.

[6] D. Willkomm, S. Machiraju, J. Bolot, and A. Wolisz, โ€œPrimary userbehavior in cellular networks and implications for dynamic spectrumaccess,โ€ IEEE Commun. Mag., vol. 47, no. 3, pp. 88โ€“95, Mar. 2009.

[7] Q. Zhao, L. Tong, A. Swami, and Y. Chen, โ€œDecentralized cognitiveMAC for opportunistic spectrum access in ad hoc networks: a POMDPframework,โ€ IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589โ€“600,Apr. 2007.

[8] Q. Zhao, B. Krishnamachari, and K. Liu, โ€œOn myopic sensing formulti-channel opportunistic access: structure, optimality, and perfor-mance,โ€ IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 5431โ€“5440,Dec. 2008.

[9] Q. Zhao, S. Geirhofer, L. Tong, and B. M. Sadler, โ€œOpportunisticspectrum access via periodic channel sensing,โ€ IEEE Trans. SignalProcess., vol. 56, no. 2, pp. 785โ€“796, Feb. 2008.

[10] H. Su and X. Zhang, โ€œCross-layer based opportunistic MAC protocolsfor QoS provisionings over cognitive radio wireless networks,โ€ IEEEJ. Sel. Areas Commun, vol. 26, no. 1, pp. 118โ€“129, Jan. 2008.

[11] S. Geirhofer, L. Tong, and B. M. Sadler, โ€œCognitive medium access: con-straining interference based on experimental models,โ€ IEEE J. Sel. AreasCommun, vol. 26, no. 1, pp. 95โ€“105, Jan. 2008.

[12] S. Huang, X. Liu, and Z. Ding, โ€œOpportunistic spectrum access incognitive radio networks,โ€ in Proc. IEEE INFOCOM Apr. 2008.

[13] R. Urgaonkar and M. J. Neely, โ€œOpportunistic scheduling with reliabilityguarantees in cognitive radio networks,โ€ IEEE Trans. Mobile Comput.,vol. 8, no. 6, pp. 766โ€“777, June 2009.

[14] Y.-C. Liang, Y. Zeng, E. C. Y. Peh, and A. T. Hoang, โ€œSensing-throughput tradeoff for cognitive radio networks,โ€ IEEE Trans. WirelessCommun., vol. 7, no. 4, pp. 1326โ€“1337, Apr. 2008.

[15] H. Kim and K. G. Shin, โ€œEfficient discovery of spectrum opportunitieswith MAC-layer sensing in cognitive radio networks,โ€ IEEE Trans. Mo-bile Comput., vol. 7, no. 5, pp. 533โ€“545, May 2008.

[16] L. Lai, H. El Gamal, H. Jiang, and H. V. Poor, โ€œCogni-tive medium access: exploration, exploitation and competition,โ€IEEE/ACM Trans. Netw., submitted for publication. Available:http://www.ece.osu.edu/โˆผhelgamal/

[17] H. Jiang, L. Lai, R. Fan, and H. V. Poor, โ€œOptimal selection of channelsensing order in cognitive radio,โ€ IEEE Trans. Wireless Commun., vol. 8,no. 1, pp. 297โ€“307, Jan. 2009.

[18] R. Fan and H. Jiang, โ€œChannel sensing-order setting in cognitive radionetworks: a two-user case,โ€ IEEE Trans. Veh. Technol., vol. 58, no. 9,pp. 4997โ€“5008, Nov. 2009.

[19] L. R. Rabiner, โ€œA tutorial on hidden Markov models and selectedapplications in speech recognition,โ€ Proc. IEEE, vol. 77, no. 2, pp. 257โ€“286, Feb. 1989.

[20] T. Ryden, โ€œOn recursive estimation for hidden Markov models,โ€Stochastic Processes and their Applications, vol. 66, no. 1, pp. 79โ€“96,Feb. 1997.

[21] S. Huang, X. Liu, and Z. Ding, โ€œOptimal transmission strategies for dy-namic spectrum access in cognitive radio networks,โ€ IEEE Trans. MobileComput., vol. 8, no. 12, pp. 1636โ€“1648, Dec. 2009.

[22] T. Clancy and B. Walker, โ€œPredictive dynamic spectrum access,โ€ inProc. SDR Forum Technical Conference, Nov. 2006.

[23] I. A. Akbar and W. H. Tranter, โ€œDynamic spectrum allocation incognitive radio using hidden Markov models: Poisson distributed case,โ€in Proc. SoutheastCon Mar. 2007.

[24] G. E. Monahan, โ€œA survey of partially observable Markov decision pro-cesses: theory, models, and algorithms,โ€ Management Science, vol. 28,no. 1, pp. 1โ€“16, Jan. 1982.

[25] J. Jia, Q. Zhang, and X. Shen, โ€œHC-MAC: a hardware-constrainedcognitive MAC for efficient spectrum management,โ€ IEEE J. Sel. AreasCommun, vol. 26, no. 1, pp. 106โ€“117, Jan. 2008.

[26] H. Urkowitz, โ€œEnergy detection of unknown deterministic signals,โ€Proc. IEEE, vol. 55, no. 4, pp. 523โ€“531, Apr. 1967.

[27] Y. Ephraim and N. Merhav, โ€œHidden Markov processes,โ€ IEEETrans. Inf. Theory, vol. 48, no. 6, pp. 1518โ€“1569, June 2002.

[28] H. Ito, S.-I. Amari, and K. Kobayashi, โ€œIdentifiability of hidden Markovinformation sources and their minimum degrees of freedom,โ€ IEEETrans. Inf. Theory, vol. 38, no. 2, pp. 324โ€“333, Mar. 1992.

[29] L. E. Baum and T. Petrie, โ€œStatistical inference for probabilisticfunctions of finite state Markov chains,โ€ The Annals of MathematicalStatistics, vol. 37, no. 6, pp. 1554โ€“1563, Dec. 1966.

[30] K. W. Choi, โ€œAdaptive sensing technique to maximize spectrum uti-lization in cognitive radio,โ€ IEEE Trans. Veh. Technol., vol. 59, no. 2,pp. 992โ€“998, Feb. 2010.

[31] W. S. Lovejoy, โ€œA survey of algorithmic methods for partially observableMarkov decision processes,โ€ Annals of Operations Research, vol. 28,no. 1, pp. 47โ€“66, Dec. 1991.

Kae Won Choi received the B.S. degree in civil,urban, and geosystem engineering in 2001, and theM.S. and Ph.D. degrees in electrical engineering andcomputer science in 2003 and 2007, respectively,all from Seoul National University, Seoul, Korea.From 2008 to 2009, he was with TelecommunicationBusiness of Samsung Electronics Co., Ltd., Korea.From 2009 to 2010, he was a postdoctoral researcherin the Department of Electrical and Computer En-gineering, University of Manitoba, Winnipeg, MB,Canada. In 2010, he joined the faculty at Seoul

National University of Science and Technology, Korea, where he is currentlyan assistant professor in the Department of Computer Science. His researchinterests include cognitive radio, wireless network optimization, radio resourcemanagement, and mobile cloud computing.

Ekram Hossain (Sโ€™98-Mโ€™01-SMโ€™06) is a full Pro-fessor in the Department of Electrical and ComputerEngineering at University of Manitoba, Winnipeg,Canada. He received his Ph.D. in Electrical En-gineering from University of Victoria, Canada, in2001. Dr. Hossainโ€™s research interests include de-sign, analysis, and optimization of wireless/mobilecommunications networks and cognitive radiosystems (http://www.ee.umanitoba.ca/โˆผekram). Heserves as the Area Editor for the IEEE TRANS-ACTIONS ON WIRELESS COMMUNICATIONS in the

area of โ€œResource Management and Multiple Access,โ€ an Editor for the IEEETRANSACTIONS ON MOBILE COMPUTING, the IEEE COMMUNICATIONS

SURVEYS AND TUTORIALS, and IEEE Wireless Communications. Dr. Hossainhas several research awards to his credit which include the University ofManitoba Merit Award in 2010 (for Research and Scholarly Activities) andthe 2011 IEEE Communications Society Fred Ellersick Prize Paper Award.He is a registered Professional Engineer in the province of Manitoba, Canada.

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.