Opportunistic Access to Spectrum Holes Between Packet ...
Transcript of Opportunistic Access to Spectrum Holes Between Packet ...
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011 2497
Opportunistic Access to Spectrum Holes BetweenPacket Bursts: A Learning-Based Approach
Kae Won Choi, Member, IEEE, and Ekram Hossain, Senior Member, IEEE
AbstractโWe present a cognitive radio (CR) mechanism foropportunistic access to the frequency bands licensed to a data-centric primary user (PU) network. Secondary users (SUs) aim toexploit the short-lived spectrum holes (or opportunities) createdbetween packet bursts in the PU network. The PU traffic patternchanges over both time and frequency according to upper layerevents in the PU network, and fast variation in PU activitymay cause high sensing error probability and low spectrumutilization in dynamic spectrum access. The proposed mechanismlearns a PU traffic pattern in real-time and uses the acquiredinformation to access the frequency channel in an efficient waywhile limiting the probability of collision with the PUs below atarget limit. To design the channel learning algorithm, we modelthe CR system as a hidden Markov model (HMM) and presenta gradient method to find the underlying PU traffic pattern.We also analyze the identifiability of the proposed HMM toprovide a condition for the convergence of the proposed learningalgorithm. Simulation results show that the proposed algorithmgreatly outperforms the traditional listen-before-talk algorithmwhich does not possess any learning functionality.
Index TermsโCognitive radio, opportunistic spectrum access,energy detection, hidden Markov model (HMM), partially ob-servable Markov decision process (POMDP).
I. INTRODUCTION
THE concept of opportunistic spectrum access (OSA) ismotivated by low spectrum utilization of traditional fixed
spectrum allocation strategies. In order to make efficient useof precious spectrum resources, OSA allows a secondaryuser (SU) to exploit the spectrum bands that a primary user(PU) has priority to access, under the condition that theSU does not cause harmful interference to the PU. With-out explicit negotiation with the PU, the SU autonomouslysenses spectrum bands, finds spectrum holes (i.e., spectrumtemporarily unused by the PUs), and accesses them by tuningits operating parameters. This process requires an intelligentcognition cycle, and therefore, an SU network is consideredas a cognitive radio (CR) network.
In this paper, we propose a CR mechanism for an SUnetwork which shares spectrum bands with a data-centric PUnetwork. In particular, we are interested in exploiting short-lived spectrum opportunities created between packet bursts
Manuscript received February 2, 2010; revised October 30, 2010 andFebruary 7, 2011; accepted May 21, 2011. The associate editor coordinatingthe review of this paper and approving it for publication was Q. Zhang.
This work was supported by Natural Sciences and Engineering ResearchCouncil (NSERC), Canada.
K. W. Choi is with the Department of Computer Science and Engineering,Seoul National University of Science and Technology, Gongneung 2-dong,Nowon-gu, Seoul, Korea.
E. Hossain is with the Dept. of Electrical and Computer Engineering,University of Manitoba, Canada (e-mail: [email protected]).
Digital Object Identifier 10.1109/TWC.2011.060711.100154
of a PU network. Experimental researches on potential PUnetworks (e.g., GSM networks) [1]โ[6] have shown that thereexist abundant spectrum opportunities between packet bursts.In [1], [2], it was revealed that there are plenty of gaps betweenconsecutive packets in an 802.11b-based WLAN, even when aWLAN continuously uses a channel for packet transmissions.However, exploiting these spectrum opportunities poses sig-nificant challenges due to the following two characteristics ofa data-centric PU network.
First, the channel usage pattern of PUs changes over timeand frequencies according to upper layer events and trafficloads. Therefore, it is very difficult for an SU to have a properknowledge of the channel usage pattern. Accessing a spectrumwithout knowing the channel usage pattern potentially leads toharmful interference to PUs and also performance degradationof the SU. In the literature (e.g., in [7]โ[13]), the channel usagepattern of PUs was modeled either as a two-state Markov ora semi-Markov chain, and the distributions of the lengths of aspectrum opportunity and a packet burst were assumed to bestationary and known to the SU. However, in a data-centricPU network, an SU may not know the channel usage pattern inadvance. Therefore, an SU should estimate the channel usagepattern by using an online learning algorithm.
The second characteristic of a data-centric PU network isthat the lengths of spectrum opportunities and packet burstsare very short (e.g., of the order of milliseconds to seconds).This means that an SU has to perform channel sensing veryfrequently to catch up with the fast variations of PU activity.Since an SU (with a single radio) has to stop data transmissionduring channel sensing, frequent channel sensing leads tolow spectrum utilization [14]. Moreover, the channel sensingtime should be much shorter than the average length of aspectrum opportunity. Due to short channel sensing time, thesensing error probability (i.e., false alarm and misdetectionprobabilities) tends to be high. Most of the related work inthe literature (e.g., in [7]โ[13], [15], [16]) assumed perfectsensing (i.e., sensing error probability is zero) and that thechannel sensing time is short enough to be neglected. In apractical CR network, an SU requires to be resilient to highsensing error probability while reducing the channel sensingtime in an intelligent way.
The above-mentioned problems related to spectrum sharingwith a data-centric PU network have not been addressedwell in the previous studies in the literature. This motivatesus to design a channel sensing and channel access schemeconsidering the characteristics of a data-centric PU network.The proposed scheme operates on a learning and access cyclewhere it learns the channel usage pattern and then accesses the
1536-1276/11$25.00 cโ 2011 IEEE
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
2498 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011
channel based on the learned channel usage pattern. These twofunctionalities are carried out by a channel learning algorithmand a channel access algorithm, respectively. Note that thefunctionality of channel selection in a multi-channel scenario(i.e., determining the order in which the channels need to besensed and/or accessed) is out of the scope of the proposedscheme. The optimal frequency channel selection problem wasaddressed in [7], [8], [16]โ[18].
Taking the sensing results obtained by a channel sensingmethod as inputs, the channel learning algorithm estimatesthe channel usage pattern in the PU network. To deal witherroneous sensing results, we design this algorithm by usinga hidden Markov model (HMM) [19]. Based on a sequenceof sensing results, which act as observations in the HMM,the channel usage pattern is calculated iteratively by usingthe gradient method [20]. This algorithm estimates not onlythe traffic pattern of PUs but also the signal-to-noise ratio(SNR) corresponding to a PU signal. To show under whatcondition the channel usage pattern can be estimated, weprovide an analysis of the equivalence and the identifiabilityof the proposed HMM. The channel usage pattern is used bythe channel access algorithm for efficient data transmission inthe SU network. Although in the literature there have been fewalgorithms for estimating the PU traffic pattern (e.g., in [15],[21]), they are neither robust to high sensing error probabilitiesnor able to estimate the SNR of a PU signal. There have beenfew works (e.g., [22] and [23]) which modeled a CR system asan HMM. However, these works did not address the problemof parameter estimation from the erroneous sensing results.
Using the channel access algorithm, which is developedbased on a partially observable Markov decision process(POMDP) framework [24], an SU transmits data packetswhile avoiding interference to the PU network. The algorithmadaptively decides whether to perform channel sensing ortransmit user data in each time slot to prevent unnecessarysensing.
The main contributions of the paper can be summarized asfollows:
โ We present an optimized OSA scheme for cognitive ra-dios coexisting with a data-centric PU network. With thisscheme, an SU can effectively use spectrum opportunitiesbetween packet bursts, maximize spectrum utilization,and maintain its data connection even when a spectrumis densely occupied by PUs. The proposed scheme notonly detects instantaneous PU activity but also learnsthe channel usage pattern in the PU network. Based onthe estimated channel usage information, the proposedscheme adjusts the parameters for accessing a frequencychannel. This learning and access cycle makes it possiblefor an SU to adapt itself to a time-varying channel usagepattern in the PU network. Also, the proposed scheme isfavorable to practical implementation, since it needs verylittle prior knowledge about the PU network.
โ The channel learning algorithm is developed by solvingthe parameter estimation problem in the HMM. Thisalgorithm is resilient to sensing errors and can estimatethe SNR of a PU signal, which the existing parameterestimation algorithms for the CR systems are not capableof. We also analyze the identifiability of the proposed
TABLE ITABLE OF SYMBOLS
Symbol Definition๐ Number of frequency channels๐ Bandwidth of a frequency channel๐ Transition rate from state 0 to state 1 in PU traffic model๐ Transition rate from state 1 to state 0 in PU traffic model๐พ SNR of a PU signalu Channel usage pattern, i.e., u := (๐, ๐, ๐พ)๐ฐ Set of possible channel usage patterns๐ฟ Threshold for energy detection
๐ท(๐) Probability that an SU detects PU to be active duringa slot when the average SNR of PU signal is ๐
๐๐ฟ Number of slots in a channel learning subframe๐๐ด Number of slots in a channel access subframe๐ Length of a slotu๐ Channel usage pattern in frame ๐u๐ Estimate of the channel usage pattern in frame ๐
๐๐ฟ๐,๐ Sensing result generated in slot ๐ in thechannel learning subframe of frame ๐
๐๐ด๐,๐ Sensing result generated in slot ๐ in thechannel access subframe of frame ๐
๐ผ๐ PU activity at time ๐ก = (๐ โ 1)๐ ,when ๐ก = 0 at the start of a subframe
s๐ State of slot ๐, i.e., s๐ := (๐ผ๐, ๐ผ๐+1)๐ฎ State space, i.e., ๐ฎ := {(0, 0), (0, 1), (1, 0), (1, 1)}๐๐ Observation in slot ๐๐ช Observation space
๐๐,๐๐,๐ State transition probability from s๐ = (๐, ๐) to s๐+1 = (๐, ๐)
๐๐,๐ State transition probability from ๐ผ๐ = ๐ to ๐ผ๐+1 = ๐๐๐๐,๐ Observation probability that the observation ๐๐ is
๐ given that the state s๐ is (๐, ๐)๐๐ Action in slot ๐๐ Action space, i.e., ๐ := {0, 1}๐ถ Collision probability
๐ถlim Collision probability limit๐ (s, ๐) Reward for given state s and action ๐๐ ๐ Belief vector for slot ๐, i.e., ๐ ๐ := (๐๐
0,0, ๐๐0,1, ๐๐
1,0, ๐๐1,1)
ฮ Domain of a belief vector๐ โ๐ Optimal value function for slot ๐
๐ทโ Optimal policy, i.e., ๐ทโ := (๐ฝโ1 , . . . , ๐ฝโ
๐๐ด)
๐ทsub Suboptimal policy, i.e., ๐ทsub := (๐ฝsub1 , . . . , ๐ฝsub
๐๐ด)
HMM and show that the proposed channel learningalgorithm can estimate the channel usage pattern undersome mild conditions. To our knowledge, the problem ofthe identifiability of an HMM was not addressed in theexisting works on CR systems.
The rest of the paper is organized as follows. Section IIdescribes the system model and assumptions and proposesthe OSA scheme for exploiting short spectrum opportunitiesbetween packet bursts. The channel learning algorithm is de-scribed in Section III. In Section IV, we introduce the channelaccess algorithm. In Section V, we present representativenumerical results. Section VI concludes the paper. A list ofthe key mathematical symbols used in this paper is shown inTable I.
II. SYSTEM MODEL AND PROPOSED SPECTRUM ACCESS
PROTOCOL
A. Network Model
The PU network has a license to use๐ frequency channelseach of which has a bandwidth of๐ . In Section II-B, we willdescribe the channel usage model of the PU network. The SUnetwork could be either an ad hoc or an infrastructure-based
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2499
(b) Time-domain example ofchannel usage pattern
Time
(a) Two-state Markov chain
0ยต
1
: PU is active
SNR
Fig. 1. Two-state Markov model and an example of channel usage pattern.
network. We focus on the operation of a single SU in theSU network. The SU can communicate with other SUs (orthe secondary network controller) via one radio transceiverthat can be tuned to one of the ๐ frequency channels at atime. The SU can access a frequency channel only when thereis no PU activity in that channel. We assume that the SUperforms spectrum sensing by means of energy detection. Thespectrum sensing model will be described in Section II-C.We will explain the details of the OSA scheme for an SU inSection II-D.
B. Primary User Channel Usage Model
We adopt a two-state continuous-time Markov chain(CTMC) to model PU traffic in a channel [7]โ[11], [13],[25].1 Fig. 1 shows the two-state CTMC model in which thestates represent PU activity in a channel. The PU activity ona frequency channel alternates between state 1 (i.e., active)and state 0 (i.e., inactive). The lengths of an active period andan inactive period in a channel are exponentially distributedwith the average length of 1/๐ and 1/๐, respectively, where๐ and ๐ denote PU state transition rates. We also incorporatethe SNR of a PU signal, ๐พ, into the PU channel usage model,since it significantly affects the channel sensing performance.Now, the PU channel usage is completely determined bythree parameters ๐, ๐, and ๐พ. We define the โchannel usagepatternโ, denoted by u, as the vector of these parameters, i.e.,u := (๐, ๐, ๐พ).
Many experimental studies on potential PU networks haveshown that traffic characteristics vary over time [1], [2], [4],[6] and frequencies [3], [5]. There can be several reasons forthis PU behavior. First, the channel usage pattern can varyaccording to the configurations of the upper layer protocols.For example, the channel usage pattern is affected by the typeof PU application (e.g., voice call, video streaming, file trans-fer, and web browsing, etc.) and its parameter settings (e.g.,source rate of video streaming)2. PU applications determinethe traffic properties such as the packet length and the packetarrival rate, which, in turn, affect the channel usage pattern.
1In some works (e.g., [2], [12], [15], [21]), PU traffic was modeled bya two-state semi-Markov process, which is a generalization of the two-stateMarkov process. In the semi-Markov process, the sojourn time on each statefollows an arbitrary distribution (e.g., hyper-Erlang distribution [2]). Althoughthe semi-Markov process provides a more accurate fit for empirical data, theMarkov process is a good approximation with mathematical tractability [11].
2For example, in [2], the authors presented the distribution of idle periodsexperimentally estimated from an IEEE 802.11b-based WLAN with the userdatagram protocol (UDP) traffic. It was shown that the distribution of idleperiods differs for two different packet arrival rates of 25 packets/s and 100packets/s.
Second, the channel usage pattern depends on the traffic loadin the PU network, which may vary over time. In [4], [6], itwas shown that traffic load in voice-centric cellular networksvaries according to the time of the day.
An SU should track the variation of the channel usagepattern in order to access the channel in an optimal way. Weassume that the channel usage pattern is restricted to a certainregion ๐ฐ , i.e., u โ ๐ฐ . Also, it is assumed that the channelusage pattern varies slowly so that an SU can estimate thechannel usage pattern by gathering statistical information froma number of packet bursts and spectrum opportunities.
C. Secondary User Energy Detection Model
An SU performs energy detection on a frequency chan-nel for a time duration of ๐ . Recall that ๐ denotes thebandwidth of a frequency channel. The energy detector takes๐๐ baseband complex signal samples during an energydetection period. Let ๐ฆ๐ denote the ๐th signal sample. Then,we have ๐ฆ๐ = ๐ฅ๐ + ๐๐, where ๐ฅ๐ is a PU signal and ๐๐ isthe thermal noise with the noise spectral density of ๐๐. Togenerate a test statistic, denoted by ๐, the energy detectorestimates the normalized energy in the signal samples as๐ = 1
๐๐๐๐
โ๐๐๐=1 โฃ๐ฆ๐โฃ2. Let ๐ denote the sensing result. To
conclude whether the channel is in use or not, the energydetector compares ๐ with a given threshold ๐ฟ. If ๐ > ๐ฟ, thedetector concludes that the channel is in use (i.e., ๐ = 1).Otherwise, ๐ = 0.
We require to find the distribution of the test statistic andcalculate the detection probability. Let ๐ denote the averageSNR of a PU signal during an energy detection period,i.e., ๐ := 1
๐๐๐๐
โ๐๐๐=1 E[โฃ๐ฅ๐โฃ2]. If the number of signal
samples (i.e., ๐๐ ) is sufficiently large, the test statistic ๐follows a normal distribution with mean (1 + ๐) and variance(1+2๐)/(๐๐ ) [26]. From the distribution of the test statistic,we can calculate the probability that an SU senses the channelto be active (i.e., ๐ = 1) as a function of the average SNR, ๐.From [26], we have
๐ท(๐) := Pr[๐ โฅ ๐ฟ] = ๐(
๐ฟ โ (1 + ๐)โ(1 + 2๐)/(๐๐ )
)(1)
where ๐ denotes the Q-function defined as ๐(๐ฅ) :=1โ2๐
โซโ๐ฅ exp(โ๐ข2
2 )๐๐ข.
D. Channel Sensing and Access to Exploit Short-Lived Spec-trum Opportunities
For the proposed scheme, time is divided into frames(Fig. 2) which are indexed by ๐. It is assumed that framesynchronization is maintained in the SU network. The lengthof a frame is short enough so that the channel usage patternremains unchanged during a frame. A frame is further di-vided into a channel learning subframe and a channel accesssubframe3. An SU estimates the channel usage pattern on thecurrent channel during a channel learning subframe, and basedon the estimated channel usage pattern, it exchanges user datawith other SUs during a channel access subframe. A channellearning subframe and a channel access subframe consist of
3We will explain the rationale behind this frame structure in Section III-E.
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
2500 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011
Data packet
Time
Frame k+1 Frame k+2
Channel m
Channel (m+1)
Channel learningsubframe
Channel access subframe
: Sensing: Data transmission
Frame k-2 Frame k-1
: PU is active
Frame k
Data transmission
Energy detection
Sensing
T
NA slots
T
NL slots
Fig. 2. Frame structure of the proposed scheme.
Fig. 3. Overall operation of the proposed scheme.
๐๐ฟ slots and ๐๐ด slots, respectively. The length of a slot is๐ . We have to set the length of a slot short enough to preventPU activity from changing multiple times during a slot. AnSU senses the channel and produces a sensing result in eachslot during the channel learning subframe. On the other hand,an SU either performs sensing or transmits user data duringthe channel access subframe.
The overall operation of the proposed scheme for an SUis summarized in Fig. 3. From the sensing results obtainedduring a channel learning subframe of frame ๐, the SUcalculates the estimate of the channel usage pattern in frame ๐,denoted by u๐ = (๏ฟฝ๏ฟฝ๐, ๏ฟฝ๏ฟฝ๐, ๐พ๐). Then, based on the estimatedchannel usage pattern, u๐, it decides whether to change thechannel or not. If the SU judges that there are sufficientspectrum opportunities to support its quality-of-service (QoS)requirements,4 it stays on the current channel and exchangesdata packets during the following channel access subframe.Otherwise, it switches to another frequency channel in thenext frame. The SU can simply switch to the next availablefrequency channel, or it can use more sophisticated algorithms
4For example, the SU can decide that the QoS is supported if the dutycycle, ๏ฟฝ๏ฟฝ๐/(๏ฟฝ๏ฟฝ๐ + ๏ฟฝ๏ฟฝ๐), and the SNR, ๐พ๐ , exceed their respective thresholds.
proposed for the frequency channel selection problem in theliterature (e.g., in [7], [8], [16]โ[18]).
During the channel learning subframe in frame ๐, theSU estimates the current channel usage pattern, denoted byu๐ = (๐๐, ๐๐, ๐พ๐). Each of the ๐๐ฟ slots in the channellearning subframe is indexed by ๐ = 1, . . . , ๐๐ฟ. In eachslot, the SU performs energy detection and generates a binarysensing result. Let ๐๐ฟ๐,๐ denote the sensing result generated inslot ๐ in the channel learning subframe of frame ๐. From thesequence of the sensing results, ๐ป๐ฟ
๐ := {๐๐ฟ๐,1, . . . , ๐๐ฟ๐,๐๐ฟ}, the
โchannel learning algorithmโ in the SU calculates the estimateof the channel usage pattern, u๐. In Section III, we will explainthe channel learning algorithm in detail.
Let us explain the operation of an SU when it decides toaccess the current channel during a channel access subframe.Each of the ๐๐ด slots in the channel learning subframe isindexed by ๐ = 1, . . . , ๐๐ด. During a slot of the channel accesssubframe, the SU can either perform sensing or transmit userdata. If it chooses to perform sensing in slot ๐, it obtains ๐๐ด๐,๐,which denotes the sensing result generated in slot ๐ in thechannel access subframe of frame ๐. Otherwise, it transmitsdata packet(s) in slot ๐. For each slot ๐ in the channel accesssubframe, the โchannel access algorithmโ residing in the SUdecides whether to perform sensing or data transmission,based on the sensing results from slot 1 to slot (๐ โ 1).The channel access algorithm also utilizes the channel usagepattern estimated in the preceding channel learning subframe.From this information, the channel access algorithm adjustsits parameters so that it can maximize the channel utilizationwhile limiting the interference caused to the PU network to thetolerable level. We will explain the channel access algorithmin Section IV.
III. LEARNING CHANNEL USAGE PATTERN DURING
CHANNEL LEARNING SUBFRAME
A. Hidden Markov Model for Channel Learning Subframe
We model a channel learning subframe as an HMM [19].An HMM is described by state space, state transition probabil-ity, observation space, and observation probability. Considerregularly spaced discrete time instants (e.g., beginning of timeslots). At any time instant, the system is in one of the statesin the countable state space. The evolution of states over timefollows a Markov process in accordance with the state transi-tion probability. The state is hidden to the agent and can onlybe inferred from noisy observations. At each time, the agentreceives an observation from the observation space accordingto the observation probability. For an HMM, the standardgradient method can be used to find the model parameters,which are most likely, given the received observation sequence[27]. We will use this technique to estimate the channel usagepattern. For more information on HMM, please refer to [19]and [27].
In our system model, the SU (i.e., the agent) obtains noisysensing results about underlying PU activities. Therefore, PUactivities in a channel can be modeled as hidden states,while sensing results are modeled as observations. Then, thestate transition probabilities depend on the state transitionrates in PU activity (i.e., ๐ and ๐), and the observation
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2501
probabilities are related to the detection probabilities, whichin turn are determined mainly by the SNR of a PU signal(i.e., ๐พ). This means that the state transition and observationprobabilities are functions of the channel usage pattern. Fromthe HMM, we can calculate the log-likelihood of the receivedsensing results, ๐ป๐ฟ
๐ , given the channel usage pattern, u, thatis, ln(Pr[๐ป๐ฟ
๐ โฃu]). To find the most likely channel usage patternfor the received sensing results, the SU updates the estimateof the channel usage pattern toward the gradient direction sothat ln(Pr[๐ป๐ฟ
๐ โฃu]) increases in each iteration. We will explainthe details of the algorithm later in this section.
To set up an HMM, we first define states and observations.As seen in Fig. 4, a state is defined for each slot to reflect thePU activities at the start and the end of the slot. Let ๐ก = 0 atthe start of the channel learning subframe. Then, ๐ผ๐ denotesthe PU activity at time ๐ก = (๐ โ 1)๐ (i.e., at the start ofslot ๐ or at the end of slot (๐ โ 1)). We have ๐ผ๐ = 1, ifthe PU is active at ๐ก = (๐ โ 1)๐ ; and ๐ผ๐ = 0 otherwise.The state of slot ๐, which is denoted by ๐ ๐, is defined asthe vector of the PU activities at the start and the end of slot๐, i.e., ๐ ๐ := (๐ผ๐, ๐ผ๐+1). Then, ๐ ๐ is one of four possiblestates in the state space ๐ฎ := {(0, 0), (0, 1), (1, 0), (1, 1)}. Ifwe consider an HMM of length ๐ , a sequence of the statesis given by s := {๐ 1, . . . , ๐ ๐}. We assume that a slot is shortenough so that the PU activity does not change more than oncewithin a slot. Then, if the state is (0, 0) or (1, 1), the PU staysinactive or active all along a slot. On the other hand, if thestate is (0, 1) or (1, 0), the PU activity changes once during aslot. The observation in slot ๐, which is from the observationspace ๐ช := {0, 1}, is denoted by ๐๐. The observation ๐๐ isequal to the sensing result from slot ๐. That is, if the currentframe is ๐, we have ๐๐ = ๐๐,๐. Let o := {๐1, . . . , ๐๐} be asequence of the observations.
Now, we define the state transition and observation probabil-ities. Let ๐๐,๐๐,๐ denote the state transition probability from state(๐,๐) to state (๐, ๐). That is, ๐๐,๐๐,๐ := Pr[๐ ๐+1 = (๐, ๐)โฃ๐ ๐ =(๐,๐)]. Since the PU activity at the end of a slot is the same asthat at the start of the next slot, we have ๐๐,๐๐,๐ = 0 for ๐ โ= ๐.If ๐ = ๐, then ๐๐,๐๐,๐ is equal to the probability that ๐ผ๐+1 = ๐given ๐ผ๐ = ๐, i.e., Pr[๐ผ๐+1 = ๐โฃ๐ผ๐ = ๐]. Let ๐๐,๐ denotePr[๐ผ๐+1 = ๐โฃ๐ผ๐ = ๐]. If u = (๐, ๐, ๐พ) is the channel usagepattern in the frame of interest, we can calculate ๐0,0 = ๐โ๐๐ ,๐0,1 = 1โ๐โ๐๐ , ๐1,0 = 1โ๐โ๐๐ , and ๐1,1 = ๐โ๐๐ . Therefore,we can calculate the state transition probability matrix as
p :=
โกโขโขโขโฃ๐0,00,0 ๐0,00,1 ๐0,01,0 ๐0,01,1
๐0,10,0 ๐0,10,1 ๐0,11,0 ๐0,11,1
๐1,00,0 ๐1,00,1 ๐1,01,0 ๐1,01,1
๐1,10,0 ๐1,10,1 ๐1,11,0 ๐1,11,1
โคโฅโฅโฅโฆ
=
โกโขโขโฃ๐โ๐๐ 0 ๐โ๐๐ 0
1โ ๐โ๐๐ 0 1โ ๐โ๐๐ 00 1โ ๐โ๐๐ 0 1โ ๐โ๐๐
0 ๐โ๐๐ 0 ๐โ๐๐
โคโฅโฅโฆ . (2)
The initial state distribution is denoted by ๐ :=(๐0,0, ๐0,1, ๐1,0, ๐1,1)
๐ , where ๐๐,๐ := Pr[s1 = (๐, ๐)]. Itis assumed that the initial state distribution is equal tothe stationary state distribution. Therefore, we have ๐ =
Fig. 4. State transition in a subframe.
(๐0,0๐1,0/(๐0,1 + ๐1,0), ๐0,1๐1,0/(๐0,1 + ๐1,0), ๐1,0๐0,1/(๐0,1 +๐1,0), ๐1,1๐0,1/(๐0,1 + ๐1,0))
๐ .We define ๐๐๐,๐ as the probability that the observation ๐๐ is
๐ given that the state ๐ ๐ is (๐, ๐). That is, ๐๐๐,๐ := Pr[๐๐ =๐โฃ๐ ๐ = (๐, ๐)]. Recall that ๐ท(๐) is the probability of detectingPU activity during a slot when the average SNR correspondingto a PU signal is ๐. If the state is (0, 0), the average SNR ofa PU signal during the slot is 0, and therefore ๐10,0 = ๐ท(0).In the case that the state is (1, 1), the average SNR during theslot is ๐พ, since the SU receives a PU signal all along the slot.Thus, we have ๐11,1 = ๐ท(๐พ). On the other hand, when the stateis (1, 0), the PU activity changes from active to inactive at atime point during the slot. If the channel becomes inactive aftertime ๐ก from the start of the slot, the average SNR during theslot is ๐พ๐ก/๐ . Also, the probability density function (pdf) of theelapsed time until the PU activity changes is given as ๐๐โ๐๐ก
1โ๐โ๐๐ .
Therefore, we have ๐11,0 =โซ ๐
0๐๐โ๐๐ก
1โ๐โ๐๐ ๐ท(๐พ๐ก/๐ )๐๐ก. We can
also calculate ๐10,1 =โซ ๐
0๐๐โ๐๐ก
1โ๐โ๐๐ ๐ท(๐พ โ ๐พ๐ก/๐ )๐๐ก in a similarway. To simplify the HMM model, we introduce ฮฅ(๐พ) :=1๐
โซ ๐
0๐ท(๐พ๐ก/๐ )๐๐ก. Then, ๐11,0 =
โซ ๐
0๐๐โ๐๐ก
1โ๐โ๐๐ ๐ท(๐พ โ ๐พ๐ก/๐ )๐๐กand ๐10,1 =
โซ ๐
0๐๐โ๐๐ก
1โ๐โ๐๐ ๐ท(๐พ๐ก/๐ )๐๐ก can well be approximatedby ฮฅ(๐พ), when ๐ and ๐ are sufficiently small. From thisapproximation, the observation probability matrix is given as
q :=
[๐00,0 ๐00,1 ๐01,0 ๐01,1๐10,0 ๐10,1 ๐11,0 ๐11,1
]
=
[1โ๐ท(0) 1โฮฅ(๐พ) 1โฮฅ(๐พ) 1โ๐ท(๐พ)๐ท(0) ฮฅ(๐พ) ฮฅ(๐พ) ๐ท(๐พ)
]. (3)
Given the HMM defined by the state transition and ob-servation probabilities, the problem at hand is the parameterestimation problem in which the true channel usage patternis estimated from the received sensing results (i.e., the obser-vation, o = {๐1, . . . , ๐๐}). The true channel usage pattern isdenoted by uโ = (๐โ, ๐โ, ๐พโ).
B. Equivalence, Identifiability, and Consistency of ProposedHidden Markov Model
The problem of parameter estimation in the proposed HMMis not a trivial problem since the SU can only see theobservations, not the underlying states. For example, whenthe observation changes, the SU does not know whether it iscaused by a PU state transition or a channel sensing error.Thus, one can suspect that the high sensing error rate canbe misinterpreted as the high PU transition rate, leading toincorrect estimation of the true channel usage pattern. Fortu-nately, the PU state transition and the channel sensing error
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
2502 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011
induce different statistical characteristics of the observationsequence, and the true channel usage pattern is identifiablefrom the standpoint of the SU only by imposing some mildconditions.
Let us explain the equivalence and the identifiability ofHMMs. Two HMMs with different parameters, u and u, aresaid to be equivalent if and only if they generate the samestochastic observation sequence as
Pr[o = xโฃu] = Pr[o = xโฃu], โ๐ = 1, 2, . . . ,
and โ๐ฅ๐ โ {0, 1} for ๐ = 1, . . . , ๐ (4)
where x := {๐ฅ1, . . . , ๐ฅ๐}. With a slight abuse of notation,let ๐๐,๐(u) := Pr[s1 = (๐, ๐)โฃu], ๐๐,๐(u) := Pr[๐ผ๐+1 =๐โฃ๐ผ๐ = ๐,u], and ๐๐๐,๐(u) := Pr[๐๐ = ๐โฃ๐ ๐ = (๐, ๐),u] denotethe initial, transition, and observation probabilities given thechannel usage pattern u. We can calculate
Pr[o = xโฃu] =โ๐ฆ1,...,๐ฆ๐+1
๐๐ฆ1,๐ฆ2(u)
๐โ๐=2
๐๐ฆ๐,๐ฆ๐+1(u)
๐โ๐=1
๐๐ฅ๐๐ฆ๐,๐ฆ๐+1
(u) (5)
where ๐ฆ๐ โ {0, 1} for all ๐. If two HMMs are equivalent,it is impossible to distinguish these HMMs based on theobservations.
To test the equivalence of two HMMs, we can apply thealgorithm proposed in [28] for the aggregated Markov process(AMP). The AMP is a class of the HMM where an observationis a deterministic function of a state. Our HMM can beconverted to an AMP. Different from the state of an HMM,the state of the corresponding AMP is a vector composed ofa sensing result and a PU state, that is, s๐ = (๐๐, ๐ผ๐+1). Thetransition probability matrix of an AMP is a 4-by-4 matrixsuch that
h :=
[h0 h0
h1 h1
],where h๐ :=
[๐0,0๐
๐0,0 ๐1,0๐
๐1,0
๐0,1๐๐0,1 ๐1,1๐
๐1,1
],
for ๐ = 0, 1. (6)
The initial state distribution is equal to the stationary state dis-tribution. Let ๐ denote the deterministic function mapping thestate to the observation. We have ๐((0, 0)) = 0, ๐((0, 1)) = 0,๐((1, 0)) = 1, and ๐((1, 1)) = 1. We can easily verify that thisAMP is exactly the same as the original HMM. The followingtheorem states the condition for two AMPs to be equivalent.
Theorem 1 (Equivalence of two AMPs). The AMP with thetransition probability matrix h is equivalent to the AMP withthe transition probability matrix h if and only if the followingconditions are met.
โ If 1๐h0๐ = 0 and 1๐ h0๐ = 0, the following equalityholds: 1๐h0 = 1๐ h0.
โ Otherwise, there exists a 2-by-2 matrix X such that1๐X = 1๐ , Xh0 = h0X, and Xh1 = h1X,
where ๐ = (1,โ1)๐ and 1 is a column vector of all ones.
Proof: See Appendix A for the proof.An HMM with the true parameter uโ โ ๐ฐ is said to be
identifiable if and only if for all u โ ๐ฐ such that u โ= uโ,the HMM with the parameter u is not equivalent to the HMMwith the true parameter uโ. We can estimate the true parameter
of an HMM from the observations only if the HMM isidentifiable. In the following theorem, we provide a conditionfor the AMP corresponding to an HMM to be identifiable.
Theorem 2 (Identifiability of an AMP). The AMP with thetransition probability matrix h is identifiable if 1๐h0๐ โ= 0and there does not exist any 2-by-2 matrix X โ= I and ๐พ โฅ 0that satisfies
1๐X = 1๐ and F(๐พ) โ (Xh0Xโ1) = G(๐พ) โ (Xh1X
โ1)(7)
where I is the identity matrix, the notation โ is the entrywise(Hadamard) product, and F(๐พ) and G(๐พ) are 2-by-2 matricessuch that
F(๐พ) :=
[๐ท(0) ฮฅ(๐พ)ฮฅ(๐พ) ๐ท(๐พ)
]and G(๐พ) :=
[1โ๐ท(0) 1โฮฅ(๐พ)1โฮฅ(๐พ) 1โ๐ท(๐พ)
].
(8)
Proof: See Appendix B for the proof.Roughly speaking, X and ๐พ satisfying the condition in
(7) do not exist in general, since the condition involves fivevariables (i.e., ๐พ and four entries in X) while there are sixequations. Although it is hard to make more precise statement,we can say that the proposed HMM is identifiable in mostcases if 1๐h0๐ โ= 0 is satisfied.
As long as an HMM is identifiable, the maximum likelihood(ML) estimation can find the true channel usage pattern. Letus define ฮ(o;u) := ln(Pr[oโฃu]) as the log-likelihood of theobservation o given the channel usage pattern u. The MLestimator of the true channel usage pattern uโ is obtainedfrom
u = argmaxuโ๐ฐ
ฮ(o;u). (9)
The ML estimator u of uโ is said to be strongly consistentwhen u almost surely converges to uโ as the length ofobservations,๐ , goes to infinity. In [29], it was proven that thestrong consistency holds if an HMM with the true parameteruโ is identifiable. In our problem, the strong consistencymeans that the ML estimator in (9) can estimate the truechannel usage pattern uโ in ๐ฐ if the length of the channellearning subframe is long enough.
C. Gradient Method for Maximum Likelihood Estimation ofChannel Usage Pattern
For the given observation, the ML estimator in (9) canbe found by using either the expectation-maximization (EM)algorithm or the standard gradient method [19]. In this paper,we adopt the gradient method since the EM algorithm can onlybe used in case of the usual parametrization and the gradientmethod can easily be modified so that it recursively updatesthe parameter. Unfortunately, the gradient method as well asthe EM algorithm can only find a local optimal point sinceฮ(o;u) is not a convex function. Algorithms that globallymaximize the log-likelihood function of a general HMM arenot known yet [27].
In each iteration, the gradient method updates the estimateof the channel usage pattern toward the gradient directionof the log-likelihood function ฮ(o;u). Let u(๐) denote theestimate of the channel usage pattern at the ๐th iteration. The
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2503
initial estimate u(0) can be set to an arbitrary channel usagepattern in ๐ฐ . At the ๐th iteration, the gradient method updatesthe estimate as follows:
u(๐) = ฮ๐ฐ [u(๐โ1) + ๐(๐) โ โฮ(o; u(๐โ1))] (10)
where ๐(๐) is a step size, ฮ๐ฐ [โ ] is the projection onto the set๐ฐ , and โฮ(o;u) is the gradient of ฮ(o;u) such that
โฮ(o;u) :=(โฮ
โ๐(o;u),
โฮ
โ๐(o;u),
โฮ
โ๐พ(o;u)
). (11)
The iteration stops when u(๐) sufficiently converges to acertain channel usage pattern.
The gradient in (11) can be derived by calculating thepartial derivatives of ฮ(o;u) with respect to ๐, ๐, and ๐พ.In Appendix C, we calculate the partial derivatives. We cancalculate ๐(o;u), ๐๐(o;u), ๐๐,๐(o;u), and ๐๐
๐,๐(o;u) by usingthe forward-backward method in [19].
D. Recursive Algorithm for Maximum Likelihood Estimation
The above-mentioned gradient method has to update thechannel usage pattern multiple times within a frame, whichcan be computationally complex. To reduce the complexity,we can alternatively adopt the recursive algorithm [20]. Therecursive algorithm updates the estimate of the channel usagepattern only once in each frame ๐ on the basis of its sens-ing result ๐ป๐ฟ
๐ . Over multiple frames, the estimate graduallyconverges to the true channel usage pattern. If u๐ denotes theestimate of the channel usage pattern in frame ๐, the recursivealgorithm updates the estimate as
u๐ = ฮ๐ฐ [u๐โ1 + ๐๐ โ โฮ(๐ป๐ฟ๐ ; u๐โ1)] (12)
where ๐๐ is the step size for frame ๐.The recursive algorithm minimizes the following Kullback-
Leibler divergence [20]:
๐พ(u) = Euโ
[ln
Pr[oโฃuโ]Pr[oโฃu]
]. (13)
If the HMM with the true parameter uโ is identifiable, theKullback-Leibler divergence has a unique minimizer at uโ. Inaddition, โโฮ(๐ป๐ฟ
๐ ; u๐โ1) in (12) is the stochastic gradientof the Kullback-Leibler divergence. Therefore, the recursivealgorithm in (12) can estimate the true channel usage patternby minimizing the Kullback-Leibler divergence. Similar to thegradient method in (10) for the ML estimator, the recursivealgorithm can only find a local minimum since the Kullback-Leibler divergence is generally not a convex function. How-ever, if the initial estimate is close enough to uโ, we can saythat u๐ converges to uโ with high probability.
E. Rationale Behind the Proposed Frame Structure
In the proposed frame structure, we have assigned thechannel learning subframe dedicated to the estimation ofthe channel usage pattern, instead of just embedding theestimation algorithm in the traditional listen-before-talk policyand making use of the sensing results generated for datatransmission. In this section, we will explain the advantagesof the proposed structure over the latter strategy.
We can easily adapt the proposed HMM (AMP) so that itcan also be applied to the listen-before-talk policy. The listen-before-talk policy senses the channel every ๐ฝ slots and uses therest of slots for data transmission. Without loss of generality,sensing slot ๐ starts at time ๐ก = (๐โ 1)๐ฝ๐ and ends at time๐ก = (๐โ 1)๐ฝ๐ + ๐ . Let ๐ผ๐+1 denote the PU activity at time๐ก = (๐โ 1)๐ฝ๐ + ๐ and let ๐๐ denote the sensing result fromsensing slot ๐. Then, we can define the transition probability๐๐,๐ and the observation probability ๐๐๐,๐ in the same way asthe original HMM.
We will show that the estimation of the channel usagepattern becomes more difficult as ๐ฝ increases. As ๐ฝ increases,the PU activity ๐ผ๐+1 becomes less dependent upon the previ-ous PU activity ๐ผ๐. Therefore, the transition probability ๐๐,๐converges to the stationary probability as ๐ฝ goes to infinity.That is, ๐๐,0 โ ๐/(๐+๐) and ๐๐,1 โ ๐/(๐+๐) for ๐ = 0, 1 as๐ฝ โโ. Similarly, the observation probability also convergesas ๐๐1,๐ โ ๐๐0,๐ โ 0 for ๐ = 0, 1 and ๐ = 0, 1 as ๐ฝ โ โ.From (6), we can see that 1๐h0๐ โ 0 as ๐ฝ โ โ. Recallthat, according to Theorem 2, an AMP is unidentifiable if1๐h0๐ = 0. Therefore, we can say that an AMP becomes lessidentifiable as ๐ฝ increases. Roughly speaking, this is because,when ๐ฝ is large, the transition in PU activity looks similar tothe sensing error due to statistical independence between thePU activities at consecutive sensing slots.
From this observation, we can conclude that the proposedchannel learning subframe (i.e., ๐ฝ = 1) performs better thanthe estimation algorithm used in the listen-before-talk policy(i.e., ๐ฝ > 1) and is capable of estimating the channel usagepattern with high transition rates.
IV. DATA TRANSMISSION DURING CHANNEL ACCESS
SUBFRAME
A. Partially Observable Markov Decision Process Model forChannel Access Subframe
During a channel access subframe, the SU exploits spectrumopportunities to transmit its own data. The channel accessalgorithm is responsible for transmitting user data whilelimiting the probability of collision with a PU. This algorithmshould be able to cope with sensing errors. At the same time,it should reduce the time wasted on channel sensing as muchas possible to maximize channel utilization. The proposedalgorithm adopts a strategy different from the traditional listen-before-talk policy. First, the algorithm combines the mostrecent sensing result with previous sensing results to extractreliable information from erroneous sensing results. Second,the algorithm adaptively decides whether to perform sensingor transmit user data in each time slot to prevent unnecessarysensing [30]. We devise an algorithm that accomplishes thesetasks by using a POMDP framework [24]. In addition, thechannel access algorithm should have correct knowledge of thecurrent channel usage pattern of the PU so that it can properlyconfigure the parameters for channel access. Therefore, thealgorithm makes use of the channel usage pattern estimatedin the preceding channel learning subframe.
To design the channel access algorithm, we model thechannel access subframe as a POMDP [24], [31]. In aPOMDP model, similar to HMM, the agent only receives
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
2504 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011
probabilistic observations, while the states are hidden to theagent. However, unlike HMM, the agent does not only receiveobservations in a passive manner, but also takes actions toexert influence on the system. The action taken by the agentaffects state transition and observation probabilities. Moreover,the agent acquires a reward according to the action. At eachtime point, the agent takes into account the observationsreceived until then to choose a right action which is expectedto return a maximum reward. In our model, the agent (i.e., theSU) chooses an action between sensing and data transmission.A reward value depends on whether data transmission issuccessful or results in collision with PU traffic.
We need to define the states, the actions, and the observa-tions for our model. The definition of a state is the same as thatin the HMM. Thus, s๐ denotes the state of slot ๐ during thechannel access subframe, which represents the PU activity atthe start and the end of slot ๐. Let ๐๐ denote the action in slot๐. If the SU opts to transmit data in slot ๐, we have ๐๐ = 1; ifit chooses to sense during slot ๐, we have ๐๐ = 0. We define๐ := {0, 1} as the action space. The observation is also similarto that in the HMM, except for the case that the SU doesnot perform sensing for transmitting data. If the SU performssensing during slot ๐, i.e., if ๐๐ = 0, the observation (i.e., ๐๐)is equal to the sensing result, ๐๐ด๐,๐. For slot ๐ with ๐๐ = 1, theobservation ๐๐ is a null observation,โ . Hence, the observationspace for a channel access subframe is ๐ช := {โ , 0, 1}.
The state transition and observation probabilities are calcu-lated from the channel usage pattern estimated in the channellearning subframe. In our model, an action does not affect thestate transition probabilities. The state transition probabilitiesin the POMDP are the same as those in the HMM. Thatis, we use ๐๐,๐๐,๐ to denote the state transition probabilityfrom state (๐,๐) to state (๐, ๐), and calculate it from thestate transition probability matrix (2) by substituting ๐ and๐ with ๏ฟฝ๏ฟฝ๐ and ๏ฟฝ๏ฟฝ๐, respectively. Different from the HMM,the observation probabilities in the POMDP depend on anaction, since the SU receives a null observation when itselects to transmit data. Let ๐๐๐,๐(๐) denote the observationprobability such that ๐๐ = ๐ given s๐ = (๐, ๐) and ๐๐ = ๐,i.e., ๐๐๐,๐(๐) := Pr[๐๐ = ๐โฃs๐ = (๐, ๐), ๐๐ = ๐]. If theaction is sensing, i.e., if ๐ = 0, the observation probability๐๐๐,๐(๐) is equal to ๐๐๐,๐ of the HMM for (๐, ๐) โ ๐ฎ and๐ = 0, 1. Therefore, these observation probabilities can bederived from the observation probability matrix (3) by usingthe estimate of the channel usage pattern, u๐. In addition, wehave ๐โ ๐,๐(0) = 0, ๐โ ๐,๐(1) = 1, ๐0๐,๐(1) = 0, and ๐1๐,๐(1) = 0.
Let us explain the reward model. First, we define two perfor-mance measures: channel utilization and collision probability.The channel utilization is defined as the probability of success-ful data transmission. Data transmission is successful in thecase that the SU transmits data (i.e., ๐๐ = 1) in a slot duringwhich there is no PU activity (i.e., s๐ = (0, 0)). Then, thechannel utilization is
โ๐๐ด
๐=1 Pr[s๐ = (0, 0), ๐๐ = 1]/๐๐ด. Wedefine the collision probability as the probability that the PU isactive (i.e., s๐ โ= (0, 0)) when the SU attempts to transmit data(i.e., ๐๐ = 1). Formally, the collision probability is defined as๐ถ :=
(โ๐๐ด
๐=1 Pr[s๐ โ= (0, 0), ๐๐ = 1])/(โ๐๐ด
๐=1 Pr[๐๐ = 1]).
We maximize the channel utilization while limiting the colli-
sion probability as follows:
max
โ๐๐ด
๐=1 Pr[s๐ = (0, 0), ๐๐ = 1]
๐๐ด
s. t. ๐ถ =
โ๐๐ด
๐=1 Pr[s๐ โ= (0, 0), ๐๐ = 1]โ๐๐ด
๐=1 Pr[๐๐ = 1]โค ๐ถlim (14)
where ๐ถlim denotes the collision probability limit. We releasethe constraint by applying the Lagrange multiplier ๐ to theconstraint. Then, the optimization problem reduces to
max
๐๐ดโ๐=1
E[๐ (s๐, ๐๐)] (15)
where ๐ (s, ๐) is the reward for given state s and action ๐,such that
๐ (s, ๐) =
โงโจโฉ๐ โ ๐ถlim + 1/๐๐ด, if s = (0, 0) and ๐ = 1
๐ โ ๐ถlim โ ๐, if s โ= (0, 0) and ๐ = 1
0, otherwise.(16)
B. Channel Access Algorithm
We now design the channel access algorithm that selects anaction in each slot in order to maximize the objective functionin (15). To decide an action for slot ๐, the algorithm considersthe observations obtained until slot ๐, i.e., ๐1, . . . , ๐๐โ1.Instead of directly using the observations, the algorithm cal-culates the belief vector and uses it to decide an action. Itis known that the belief vector summarizes all the necessaryinformation required to make an optimal decision [31]. Let๐ ๐ := (๐๐0,0, ๐
๐0,1, ๐
๐1,0, ๐
๐1,1) denote the belief vector for slot
๐. In the belief vector, ๐๐๐,๐ represents the belief that the statein slot ๐ is (๐, ๐) given ๐1, . . . , ๐๐โ1 and ๐1, . . . , ๐๐โ1. Thatis, ๐๐๐,๐ := Pr[s๐ = (๐, ๐)โฃ๐ 1, ๐1, . . . , ๐๐โ1, ๐1, . . . , ๐๐โ1].Let ฮ denote the domain of a belief vector, i.e., ฮ :={(๐๐,๐)(๐,๐)โ๐ฎ โฃ
โ(๐,๐)โ๐ฎ ๐๐,๐ โค 1 and ๐๐,๐ โฅ 0 for (๐, ๐) โ ๐ฎ}.
The initial belief vector ๐ 1 is the stationary distribution of thehidden process. The belief vector in slot ๐ is updated from thebelief vector in slot (๐โ 1) as follows:
๐๐๐,๐ = ๐๐,๐(๐ ๐โ1; ๐๐โ1, ๐๐โ1), for (๐, ๐) โ ๐ฎ (17)
where
๐๐,๐(๐ ; ๐, ๐) =
โ(๐,๐)โ๐ฎ ๐
๐,๐๐,๐ โ ๐๐๐,๐(๐) โ ๐๐,๐๐(๐ ; ๐, ๐)
(18)
and
๐(๐ ; ๐, ๐) =โ
(๐,๐)โ๐ฎ
โ(๐,๐)โ๐ฎ
๐๐,๐๐,๐ โ ๐๐๐,๐(๐) โ ๐๐,๐. (19)
Note that the update of the belief vector is slightly differentfrom the one in [31], since only the observations from untilthe previous slot are available.
The channel access algorithm selects an action accordingto a policy. Let ๐ท := (๐ฝ1, . . . , ๐ฝ๐๐ด) denote a policy. A policyin slot ๐, i.e., ๐ฝ๐ : ฮ โ ๐, is a mapping of a belief vector๐ ๐ to an action ๐๐. In slot ๐, the channel access algorithmchooses ๐ฝ๐(๐ ๐) as an action. Among the policies, we definethe optimal policy ๐ทโ := (๐ฝโ1 , . . . , ๐ฝโ๐๐ด
) as the one that
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2505
maximizes the objective function in (15). To derive the optimalpolicy, we define the optimal value function ๐ โ
๐ : ฮ โ โ asthe maximum expected reward that will be earned from slot ๐for the current belief vector. The optimal value function canbe found by the following dynamic programming recursion[31]:
๐ โ๐๐ด
(๐ ) = max๐โ๐
{ โ(๐,๐)โ๐ฎ
๐๐,๐๐ ((๐, ๐), ๐)
}(20)
๐ โ๐ (๐ ) = max
๐โ๐
{ โ(๐,๐)โ๐ฎ
๐๐,๐๐ ((๐, ๐), ๐) +
โ๐โ๐ช
๐(๐ ; ๐, ๐) โ ๐ โ๐+1(๐ผ(๐ ; ๐, ๐))
}(21)
where ๐ผ(๐ ; ๐, ๐) := (๐๐,๐(๐ ; ๐, ๐))(๐,๐)โ๐ฎ . The optimal policy๐ทโ is a policy such that ๐ฝโ๐ for each ๐ maps a belief vectorto a maximizing argument in (20) and (21).
Although we can calculate the optimal policy from (20)and (21), the complexity of the dynamic programming in anuncountable set can be prohibitive [31]. Moreover, we shouldalso find the Lagrange multiplier ๐ that makes the collisionprobability constraint in (14) satisfied, which requires a highcomplexity iterative algorithm such as the subgradient method.To overcome this difficulty, we suggest a simple stationarysuboptimal policy that exhibits a near-optimal performancein terms of channel utilization while restricting the collisionprobability within the collision probability limit ๐ถlim. Thesuboptimal policy is
๐ฝsub๐ (๐ ) =
{1, 1โ ๐0,0 โค ๐ถlim
0, otherwiseโ๐ = 1, . . . , ๐๐ด. (22)
In Appendix D, we prove that this suboptimal policy satisfiesthe collision probability constraint. Also, in Section V, itis shown by using simulations that the suboptimal policyachieves a near-optimal performance. In Fig. 5, we summa-rize the operation of the channel access algorithm when thesuboptimal policy is applied.
V. NUMERICAL RESULTS
We first evaluate the performances of the channel learningand the channel access algorithms separately, and then studythe benefit of the combined use of both algorithms. The sim-ulation parameters are as follows: bandwidth of a frequencychannel (๐ ) is 10 MHz; length of a frame is 200 ms; lengthof a slot (๐ ) is 20 ๐s. There are 1000 and 9000 slots in achannel learning subframe and in a channel access subframe,respectively. The threshold for energy detection (๐ฟ) is set to1.16. The set of possible channel usage patterns is given as๐ฐ = {(๐, ๐, ๐พ)โฃ๐ โค 1 kHz, ๐ โค 1 kHz, ๐ โฅ โ10 dB}. Weuse the recursive algorithm for estimating the channel usagepattern. We use a constant step size, ๐๐ = 10โ5, for therecursive algorithm. We assume that the SU does not switchthe frequency channel during simulation time.
Fig. 6 demonstrates how well the channel learning algorithmestimates the time-varying channel usage pattern. The channelusage pattern changes in frames 1000, 2000, and 3000. Inthis figure, we can see that the estimate fluctuates around
1: Calculate the state transition and observationprobabilities from u๐
2: Calculate the initial belief vector, ๐ 1
3: for ๐ = 1 to ๐๐ด do4: if 1โ ๐๐0,0 โค ๐ถlim then5: SU exchanges user data in slot ๐6: ๐๐ โ 17: ๐๐ โ โ
8: else9: SU performs energy detection in slot ๐ and
calculates the sensing result ๐๐ด๐10: ๐๐ โ 011: ๐๐ โ ๐๐ด๐12: end if13: ๐๐+1
๐,๐ โ ๐๐,๐(๐ ๐; ๐๐, ๐๐) for (๐, ๐) โ ๐ฎ14: end for
Fig. 5. The channel access algorithm in the channel access subframe offrame ๐.
0 500 1000 1500 2000 2500 3000 3500 40000.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8 k k k k ^^^
Stat
e tra
nsiti
on ra
tes
(kH
z)
Frame
-7
-6
-5
-4
-3
-2
-1 k k
SN
R (d
B)
Fig. 6. Estimates of the channel usage pattern over frames.
the real channel usage pattern due to the constant step size.Nonetheless, the channel learning algorithm well tracks thevariations of the channel usage pattern. Note that the speedand the accuracy of convergence can be controlled by adjustingthe step size ๐๐.
We evaluate the performance of the channel access algo-rithm in Figs. 7 and 8. For these figures, we assume thatthe channel usage pattern remains the same over time and isknown to the SU so that we can focus on the performance ofthe channel access algorithm. Fig. 7 shows the utilization andthe collision probability for the proposed channel access algo-rithm with the suboptimal policy as function of the collisionprobability limit. We can see in the figure that the utilizationconverges to the probability that a slot is not occupied by thePU as the collision probability increases. This figure showsthat the collision probability does not exceed the collisionprobability limit, regardless of the channel usage pattern. By
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
2506 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011
0.01 0.04 0.07 0.11E-3
0.01
0.1
1U
tiliz
atio
n an
d co
llisi
on p
roba
bilit
y
Collision probability limit
Collision probability limit Utilization, = = 0.2 kHz, SNR = -3 dB Collision Prob., = = 0.2 kHz, SNR = -3 dB Utilization, = 0.2 kHz, = 0.1 kHz, SNR = -5 dB Collision Prob., = 0.2 kHz, = 0.1 kHz, SNR = -5 dB
Fig. 7. Variations in utilization and collision probability with collisionprobability limit for the proposed channel access algorithm.
0.01 0.1 0.50.0
0.1
0.2
0.3
0.4
0.5
Util
izat
ion
Collision Probability
Proposed, suboptimal, = = 0.2 kHz Proposed, optimal, = = 0.2 kHz Heuristic, = = 0.2 kHz Proposed, suboptimal, = 0.2 kHz, = 0.1 kHz Proposed, optimal, = 0.2 kHz, = 0.1 kHz Heuristic, = 0.2 kHz, = 0.1 kHz
Fig. 8. Performance comparison of the proposed channel access algorithmswith suboptimal and optimal policies, and the heuristic channel accessalgorithm in terms of utilization and collision probability. The SNR of aPU signal is set to -4 dB.
lowering the collision probability limit, we can decrease thecollision probability at the cost of the utilization.
Fig. 8 compares the performances of the proposed channelaccess algorithm (with suboptimal and optimal policies) andthe performances of the heuristic channel access algorithm.We compare the proposed algorithm with a simple listen-before-talk heuristic algorithm. If the sensing result in slot(๐ โ 1) indicates that the channel is inactive, the heuristicalgorithm transmits data for ๐ consecutive slots from slot ๐until it performs another energy detection. Thus, ๐ balancesthe tradeoff between the utilization and the collision proba-bility for the heuristic algorithm. The graphs are plotted byvarying ๐ถlim for the proposed algorithm with the suboptimalpolicy, ๐ and ๐ถlim for the proposed algorithm with the optimalpolicy, and ๐ for the heuristic algorithm. In this figure, we cansee that the proposed algorithm with the suboptimal policyexhibits performance very close to the optimal one. Therefore,we can say that the suboptimal policy is a very useful low-complexity alternative to the optimal policy, accomplishing
100 500 1000 1500 2000 2500 3000 3500 40001E-3
0.01
0.1
1
Util
izat
ion
and
colli
sion
pro
babi
lity
Frame
Proposed with learning, utilization Proposed with learning, collision prob. Proposed w/o learning, utilization Proposed w/o learning, collision prob.
Fig. 9. Time variation of utilization and collision probability for the proposedschemes with and without the channel learning algorithm. The utilization andcollision probability are time-averaged over every 100 frames.
0.003 0.01 0.1 10.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Utilizationcdf
Utilization and collision probability
Proposed with learning Proposed w/o learning Heuristic
Collision probability
Fig. 10. Cumulative density functions of utilization and collision probabilitywhen the proposed schemes with and without the channel learning algorithmand the heuristic channel access algorithm are used.
a near-optimal performance as well as effectively limitingthe collision probability. We also observe that the proposedalgorithm outperforms the heuristic algorithm. The proposedalgorithm can achieve very low collision probability owing toits resilience to sensing errors, whereas the heuristic algorithmcannot.
In Figs. 9-10, we consider the channel learning algorithmas well as the channel access algorithm to investigate theimpact of channel learning on the system performance. Fig. 9shows the time variation of the utilization and the collisionprobability of the proposed schemes with and without thechannel learning algorithm. Since the proposed scheme withlearning consumes additional ๐๐ฟ slots for channel learning,for fairness in comparison, we multiply ๐๐ด/(๐๐ฟ + ๐๐ด) tothe utilization of the proposed scheme with learning. Forboth the schemes, the collision probability limit, ๐ถlim, is setto 0.03. While the proposed scheme with learning utilizes
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2507
the channel usage pattern estimated by the channel learningalgorithm to adjust the parameters of the channel access algo-rithm, the proposed scheme without learning just assumes that๐ = ๐ = 0.3 kHz and ๐พ = โ3 dB. The channel usage patternu๐ varies over time as follows: (0.4 kHz, 0.4 kHz,โ3 dB)for ๐ = 1, . . . , 1000, (0.6 kHz, 0.2 kHz,โ5 dB) for๐ = 1001, . . . , 2000, (0.1 kHz, 0.6 kHz,โ2 dB) for ๐ =2001, . . . , 3000, and (0.4 kHz, 0.2 kHz,โ6 dB) for ๐ =3001, . . . , 4000. From Fig. 9, we observe that the proposedscheme without learning violates the collision probability limitand imposes excessive interference to PU traffic, when thechannel usage pattern is unfavorable. On the other hand, forthe proposed scheme with learning, the collision probabilityremains below the collision probability limit, irrespective ofhow the channel usage pattern varies. This is due to the factthat the scheme with learning is able to adapt its parametersto the varying channel usage pattern.
In Fig. 10, we compare the cumulative distribution functions(cdfโs) of the utilization and the collision probability when theproposed schemes with and without learning and the heuristicchannel access algorithm are used. We estimate the utilizationand the collision probability in each frame and calculate thecorresponding cumulative distribution functions. The channelusage pattern randomly changes over frames. The durationbetween consecutive changes in the channel usage patternfollows a geometric distribution with an average of 1000frames. The state transition rates ๐ and ๐ are selected froma uniform distribution over [0.1 kHz, 1 kHz], and the SNRof PU signals is uniformly distributed over [โ6 dB,โ3 dB].The collision probability limit is set to 0.03. The proposedscheme without learning assumes that ๐ = ๐ = 0.8 kHz and๐พ = โ6 dB. For the heuristic algorithm, we set ๐ = 1 toreduce the collision probability of the heuristic algorithm asmuch as possible. From Fig. 10, we observe that the colli-sion probability limit is frequently violated by the proposedscheme without learning and the heuristic algorithm, whilethe proposed scheme with learning well keeps the collisionprobability below the limit. The proportions of the frames inwhich the collision probability exceeds the limit are 0.07, 0.08,and 0.61 for the proposed schemes with and without learning,and the heuristic algorithm, respectively. From this figure,we can conclude that the proposed scheme with learning caneffectively maintain the collision probability under the targetlimit. While keeping the collision probability, the proposedscheme with learning also has the average utilization (i.e.,0.31) considerably higher than the proposed scheme withoutlearning (i.e., 0.18) and the heuristic algorithm (i.e., 0.24).
VI. CONCLUSION
We have proposed a channel sensing and channel accessscheme that opportunistically exploits frequency channelsoccupied by a data-centric primary user network. The pro-posed scheme repeats a learning and access cycle, driven bythe channel learning and the channel access algorithms. Tomake the scheme robust to high sensing error probability, wehave applied the hidden Markov model (HMM) and partiallyobservable Markov decision process (POMDP) frameworksto the channel learning and the channel access algorithms,
respectively. The simulation results have shown that, by adapt-ing to varying channel usage pattern, the proposed schemeprovides efficient access to spectrum opportunities while con-straining the interference to the primary users below the targetlimit. The proposed scheme outperforms a heuristic algorithmwithout any learning functionality. Extension of the schemeto a distributed multiuser scenario will be considered in ourfuture work.
APPENDIX
A. Proof of the Condition for Equivalence of Two AMPs
Let u and u be the channel usage patterns correspondingto the AMPs with h and h, respectively. The probability ofan observation sequence x = {๐ฅ1, . . . , ๐ฅ๐} given the channelusage pattern u can be rewritten as
Pr[o = xโฃu] = 1๐ โ I๐ฅ๐h โ I๐ฅ๐โ1h โ โ โ I๐ฅ2h โ I๐ฅ1๐
= 1๐h๐ฅ๐h๐ฅ๐โ1 โ โ โ h๐ฅ2๐ ๐ฅ1
(23)
where ๐ := (๐ 0,๐ 1)๐ is a column vector of the initial state
distribution in which ๐ 0 and ๐ 1 are 2-by-1 column vectors,I0 := diag(1, 1, 0, 0), and I1 := diag(0, 0, 1, 1).
We first consider the case that 1๐h0๐ = 0 and 1๐ h0๐ = 0.In this case, we have 1๐h๐ฅ = 1๐ ๐ฆ๐ฅ and 1๐ h๐ฅ = 1๐ ๐ฆ๐ฅ for๐ฅ = 0, 1 and some real values ๐ฆ0, ๐ฆ1, ๐ฆ0, and ๐ฆ1. Then, wehave Pr[o = xโฃu] = ๐ฆ๐ฅ๐ ๐ฆ๐ฅ๐โ1 โ โ โ ๐ฆ๐ฅ2๐ฆ๐ฅ1 and Pr[o = xโฃu] =๐ฆ๐ฅ๐ ๐ฆ๐ฅ๐โ1 โ โ โ ๐ฆ๐ฅ2๐ฆ๐ฅ1 . The AMPs with h and h are equivalentif and only if ๐ฆ๐ฅ๐ ๐ฆ๐ฅ๐โ1 โ โ โ ๐ฆ๐ฅ2๐ฆ๐ฅ1 and ๐ฆ๐ฅ๐ ๐ฆ๐ฅ๐โ1 โ โ โ ๐ฆ๐ฅ2๐ฆ๐ฅ1
are the same for all observation sequences x. This conditionis satisfied only when ๐ฆ๐ฅ = ๐ฆ๐ฅ for ๐ฅ = 0, 1. Therefore, wecan conclude that 1๐h0 = 1๐ h0 should be satisfied for theequivalence of two AMPs.
We now consider the case that 1๐h0๐ โ= 0 or 1๐ h0๐ โ= 0.The proof for this case is based on the result in [28]. Let ๐ฑdenote the null space defined by
๐ฑ := {๐ โฃ1๐ โ I๐ฅ๐h โ I๐ฅ๐โ1h โ โ โ I๐ฅ2h โ I๐ฅ1๐ = 0 โx}.(24)
The vector in the null space should satisfy 1๐๐ 0 = 0,1๐h0๐ 0 = 0, 1๐๐ 1 = 0, and 1๐h0๐ 1 = 0. If 1๐h0๐ โ= 0,the only vector satisfying the condition is the zero vector. In[28], it is shown that the AMPs with h and h are equivalentif and only if h and h are similar via some block diagonalmatrix preserving the probability, on the quotient space wherethe null space is factored out. Since the null space has zerodimension in this case, the AMPs are equivalent if and onlyif there exists a 2-by-2 matrix X such that
1๐X = 1๐ , Xh0 = h0X, and Xh1 = h1X. (25)
B. Proof of the Condition for Identifiability of an AMP
If 1๐h0๐ = 0, there can be an infinite number of AMPswith the transition probability matrix h โ= h that satisfies1๐ h0 = 1๐h0. Since these AMPs are equivalent to the AMPwith h from Theorem 1, it should be satisfied that 1๐h0๐ โ= 0for the AMP to be identifiable.
Suppose that there exists an AMP with h that is equivalentto the AMP with h when 1๐h0๐ โ= 0. Then, from Theorem 1,there exists X โ= I such that 1๐X = 1๐ , Xh0 = h0X, and
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
2508 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011
Xh1 = h1X. We can calculate h0 = Xh0Xโ1 and h1 =
Xh1Xโ1. These matrices should satisfy, for some ๐๐,๐ and ๐พ,
h0 =
[๐0,0(1โ๐ท(0)) ๐1,0(1โฮฅ(๐พ))๐0,1(1 โฮฅ(๐พ)) ๐1,1(1 โ๐ท(๐พ))
](26)
and
h1 =
[๐0,0๐ท(0) ๐1,0ฮฅ(๐พ)๐0,1ฮฅ(๐พ) ๐1,1๐ท(๐พ)
]. (27)
Therefore, we have
1๐X = 1๐ and F(๐พ) โ (Xh0Xโ1) = G(๐พ) โ (Xh1X
โ1).(28)
If there is no X โ= I and ๐พ โฅ 0 satisfying the above condition,we can say that there is no AMP equivalent to the AMP withh.
C. Calculation of the Gradient of ฮ(o;u)
We calculate the partial derivatives of ฮ(o;u) with respectto ๐, ๐, and ๐พ. To do this, we first define ๐(o;u) := Pr[oโฃu].Recall that ๐ผ๐ is the PU activity at time ๐ก = (๐โ 1)๐ when๐ก = 0 at the start of the channel learning subframe. Let usdefine ๐ถ := (๐ผ1, . . . , ๐ผ๐๐ฟ+1). Then, ๐(o;u) can be rewrittenas the sum of the probabilities Pr[o,๐ถโฃu]โs for all possible๐ถโs, that is, ๐(o;u) =
โ๐ถ ๐ (o,๐ถ;u), where
๐ (o,๐ถ;u) = Pr[o,๐ถโฃu] = ๐๐ผ1
๐๐ฟโ๐=1
๐๐ผ๐,๐ผ๐+1 โ ๐๐๐๐ผ๐,๐ผ๐+1.
(29)
In the above equation, we define ๐๐ := Pr[๐ผ1 = ๐] and ๐๐,๐ :=Pr[๐ผ๐+1 = ๐โฃ๐ผ๐ = ๐]. Then, we have ๐0 = ๐/(๐ + ๐), ๐1 =๐/(๐ + ๐), ๐0,0 = ๐โ๐๐ , ๐0,1 = 1 โ ๐โ๐๐ , ๐1,0 = 1โ ๐โ๐๐ ,and ๐1,1 = ๐โ๐๐ . In addition, using the definition of ฮฅ(๐พ),we have ๐00,0 = 1 โ ๐ท(0), ๐10,0 = ๐ท(0), ๐00,1 = 1 โ ฮฅ(๐พ),๐10,1 = ฮฅ(๐พ), ๐01,0 = 1โฮฅ(๐พ), ๐11,0 = ฮฅ(๐พ), ๐01,1 = 1โ๐ท(๐พ),and ๐11,1 = ๐ท(๐พ).
First, we calculate the derivative of ๐ (o,๐ถ;u) with respectto an arbitrary variable ๐ฅ. That is,
โ๐
โ๐ฅ(o,๐ถ;u) =
โ๐โ{0,1}
โ๐๐โ๐ฅโ 1๐๐โ 1๐ผ1=๐ Pr[o,๐ถโฃu]
+โ
(๐,๐)โ๐ฎ
โ๐๐,๐โ๐ฅโ 1
๐๐,๐โ ๐๐ฟโ๐=1
1s๐=(๐,๐) Pr[o,๐ถโฃu]
+โ
(๐,๐)โ๐ฎ
โ๐โ๐ช
โ๐๐๐,๐โ๐ฅโ 1
๐๐๐,๐โ ๐๐ฟโ๐=1
1s๐=(๐,๐),๐๐=๐ Pr[o,๐ถโฃu]
(30)
where ๐ฎ is the state space, ๐ช is the observation space, and1๐ is a function that is 1 if ๐ is true; and 0 otherwise. Now,
we calculate โฮ/โ๐ฅ as
โฮ
โ๐ฅ(o;u) =
1
๐(o;u)โ โ๐(o;u)
โ๐ฅ=
1
๐(o;u)โ โ๐ถ
โ๐ (o,๐ถ;u)
โ๐ฅ
=1
๐(o;u)โ ( โ
๐โ{0,1}
โ๐๐โ๐ฅโ 1๐๐โ ๐๐(o;u)
+โ
(๐,๐)โ๐ฎ
โ๐๐,๐โ๐ฅโ 1
๐๐,๐โ ๐๐,๐(o;u)
+โ
(๐,๐)โ๐ฎ
โ๐โ๐ฑ
โ๐๐๐,๐โ๐ฅโ 1
๐๐๐,๐โ ๐๐
๐,๐(o;u)
)(31)
where we define ๐๐(o;u) := Pr[๐ผ1 = ๐,oโฃu],๐๐,๐(o;u) :=
โ๐๐ฟ
๐=1 Pr[s๐ = (๐, ๐),oโฃu], and ๐๐๐,๐(o;u) :=โ
๐โฃ๐๐=๐ Pr[s๐ = (๐, ๐),oโฃu]. From this equation, we cancalculate โฮ/โ๐, โฮ/โ๐, and โฮ/โ๐พ. For example, we canderive โฮ/โ๐ asโฮ
โ๐(o;u) =
1
๐(o;u)โ (โ๐0โ๐
โ 1
๐0โ ๐0(o;u) +
โ๐1โ๐
โ 1
๐1โ ๐1(o;u)
+โ๐0,0โ๐
โ 1
๐0,0โ ๐0,0(o;u)
+โ๐0,1โ๐
โ 1
๐0,1โ ๐0,1(o;u)
)
=1
๐(o;u)โ (๐ โ ๐1(o;u)
๐(๐+ ๐)โ ๐0(o;u)
๐+ ๐
+๐ โ ๐0,1(o;u)
๐๐๐ โ 1โ ๐ โ ๐0,0(o;u)
). (32)
We can also calculate โฮ/โ๐ and โฮ/โ๐พ in a similar way.
D. The Suboptimal Policy Satisfies the Collision ProbabilityConstraint
Proof: We prove that collision probability does not exceedthe collision probability limit, i.e., ๐ถ โค ๐ถlim, when thesuboptimal policy ๐ทsub = (๐ฝsub
1 , . . . , ๐ฝsub๐๐ด
) is applied. Providedthat ๐ทsub is used, we can rewrite the collision probability as
๐ถ =
โ๐๐ด
๐=1 Pr[s๐ โ= (0, 0), ๐๐ = 1]โ๐๐ด
๐=1 Pr[๐๐ = 1]
=
โ๐๐ด
๐=1
โฮ๐
Pr[s๐ โ= (0, 0), 1โ ๐0,0 โค ๐ถlimโฃฮ๐] โ Pr[ฮ๐]โ๐๐ด
๐=1
โฮ๐
Pr[1โ ๐0,0 โค ๐ถlimโฃฮ๐] โ Pr[ฮ๐]
(33)
where ฮ๐ := {๐ 1, ๐1, . . . , ๐๐โ1, ๐1, . . . , ๐๐โ1}.Since ๐0,0 only depends on ฮ๐, the value of Pr[1โ ๐0,0 โค
๐ถlimโฃฮ๐] in the denominator in (33) is one if 1โ ๐0,0 โค ๐ถlim;and zero, otherwise. Also, Pr[s๐ โ= (0, 0), 1โ๐0,0 โค ๐ถlimโฃฮ๐]in the numerator in (33) is calculated as
Pr[s๐ โ= (0, 0), 1โ ๐0,0 โค ๐ถlimโฃฮ๐] ={1โ ๐0,0, if 1โ ๐0,0 โค ๐ถlim
0, otherwise.(34)
Therefore, the inequality Pr[s๐ โ= (0, 0), 1 โ ๐0,0 โค๐ถlimโฃฮ๐] โค ๐ถlim โ Pr[1โ๐0,0 โค ๐ถlimโฃฮ๐] is satisfied. Applyingthis inequality to (33), we can conclude that
๐ถ โคโ๐๐ด
๐=1
โฮ๐
๐ถlim โ Pr[1โ ๐0,0 โค ๐ถlimโฃฮ๐] โ Pr[ฮ๐]โ๐๐ด๐=1
โฮ๐
Pr[1โ ๐0,0 โค ๐ถlimโฃฮ๐] โ Pr[ฮ๐]= ๐ถlim.
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.
CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2509
REFERENCES
[1] S. Geirhofer, L. Tong, and B. M. Sadler, โA measurement-based modelfor dynamic spectrum access in WLAN channels,โ in Proc. IEEE MIL-COM Oct. 2006.
[2] S. Geirhofer, L. Tong, and B. M. Sadler, โDynamic spectrum accessin the time domain: modeling and exploiting white space,โ IEEECommun. Mag., vol. 45, no. 5, pp. 66โ72, May 2007.
[3] S. D. Jones, E. Jung, X. Liu, N. Merheb, and I. J. Wang, โChar-acterization of spectrum activities in the U.S. public safety band foropportunistic spectrum access,โ in Proc. IEEE DySPAN Apr. 2007.
[4] M. Wellens, J. Riihijarvi, and P. Mahonen, โEmpirical time and fre-quency domain models of spectrum use,โ Physical Commun. (Elsevier),vol. 2, no. 1โ2, pp. 10โ32, Mar. 2009.
[5] M. Wellens and P. Mahonen, โLessons learned from an extensivespectrum occupancy measurement campaign and a stochastic duty cyclemodel,โ in Proc. TridentCom Apr. 2009.
[6] D. Willkomm, S. Machiraju, J. Bolot, and A. Wolisz, โPrimary userbehavior in cellular networks and implications for dynamic spectrumaccess,โ IEEE Commun. Mag., vol. 47, no. 3, pp. 88โ95, Mar. 2009.
[7] Q. Zhao, L. Tong, A. Swami, and Y. Chen, โDecentralized cognitiveMAC for opportunistic spectrum access in ad hoc networks: a POMDPframework,โ IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589โ600,Apr. 2007.
[8] Q. Zhao, B. Krishnamachari, and K. Liu, โOn myopic sensing formulti-channel opportunistic access: structure, optimality, and perfor-mance,โ IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 5431โ5440,Dec. 2008.
[9] Q. Zhao, S. Geirhofer, L. Tong, and B. M. Sadler, โOpportunisticspectrum access via periodic channel sensing,โ IEEE Trans. SignalProcess., vol. 56, no. 2, pp. 785โ796, Feb. 2008.
[10] H. Su and X. Zhang, โCross-layer based opportunistic MAC protocolsfor QoS provisionings over cognitive radio wireless networks,โ IEEEJ. Sel. Areas Commun, vol. 26, no. 1, pp. 118โ129, Jan. 2008.
[11] S. Geirhofer, L. Tong, and B. M. Sadler, โCognitive medium access: con-straining interference based on experimental models,โ IEEE J. Sel. AreasCommun, vol. 26, no. 1, pp. 95โ105, Jan. 2008.
[12] S. Huang, X. Liu, and Z. Ding, โOpportunistic spectrum access incognitive radio networks,โ in Proc. IEEE INFOCOM Apr. 2008.
[13] R. Urgaonkar and M. J. Neely, โOpportunistic scheduling with reliabilityguarantees in cognitive radio networks,โ IEEE Trans. Mobile Comput.,vol. 8, no. 6, pp. 766โ777, June 2009.
[14] Y.-C. Liang, Y. Zeng, E. C. Y. Peh, and A. T. Hoang, โSensing-throughput tradeoff for cognitive radio networks,โ IEEE Trans. WirelessCommun., vol. 7, no. 4, pp. 1326โ1337, Apr. 2008.
[15] H. Kim and K. G. Shin, โEfficient discovery of spectrum opportunitieswith MAC-layer sensing in cognitive radio networks,โ IEEE Trans. Mo-bile Comput., vol. 7, no. 5, pp. 533โ545, May 2008.
[16] L. Lai, H. El Gamal, H. Jiang, and H. V. Poor, โCogni-tive medium access: exploration, exploitation and competition,โIEEE/ACM Trans. Netw., submitted for publication. Available:http://www.ece.osu.edu/โผhelgamal/
[17] H. Jiang, L. Lai, R. Fan, and H. V. Poor, โOptimal selection of channelsensing order in cognitive radio,โ IEEE Trans. Wireless Commun., vol. 8,no. 1, pp. 297โ307, Jan. 2009.
[18] R. Fan and H. Jiang, โChannel sensing-order setting in cognitive radionetworks: a two-user case,โ IEEE Trans. Veh. Technol., vol. 58, no. 9,pp. 4997โ5008, Nov. 2009.
[19] L. R. Rabiner, โA tutorial on hidden Markov models and selectedapplications in speech recognition,โ Proc. IEEE, vol. 77, no. 2, pp. 257โ286, Feb. 1989.
[20] T. Ryden, โOn recursive estimation for hidden Markov models,โStochastic Processes and their Applications, vol. 66, no. 1, pp. 79โ96,Feb. 1997.
[21] S. Huang, X. Liu, and Z. Ding, โOptimal transmission strategies for dy-namic spectrum access in cognitive radio networks,โ IEEE Trans. MobileComput., vol. 8, no. 12, pp. 1636โ1648, Dec. 2009.
[22] T. Clancy and B. Walker, โPredictive dynamic spectrum access,โ inProc. SDR Forum Technical Conference, Nov. 2006.
[23] I. A. Akbar and W. H. Tranter, โDynamic spectrum allocation incognitive radio using hidden Markov models: Poisson distributed case,โin Proc. SoutheastCon Mar. 2007.
[24] G. E. Monahan, โA survey of partially observable Markov decision pro-cesses: theory, models, and algorithms,โ Management Science, vol. 28,no. 1, pp. 1โ16, Jan. 1982.
[25] J. Jia, Q. Zhang, and X. Shen, โHC-MAC: a hardware-constrainedcognitive MAC for efficient spectrum management,โ IEEE J. Sel. AreasCommun, vol. 26, no. 1, pp. 106โ117, Jan. 2008.
[26] H. Urkowitz, โEnergy detection of unknown deterministic signals,โProc. IEEE, vol. 55, no. 4, pp. 523โ531, Apr. 1967.
[27] Y. Ephraim and N. Merhav, โHidden Markov processes,โ IEEETrans. Inf. Theory, vol. 48, no. 6, pp. 1518โ1569, June 2002.
[28] H. Ito, S.-I. Amari, and K. Kobayashi, โIdentifiability of hidden Markovinformation sources and their minimum degrees of freedom,โ IEEETrans. Inf. Theory, vol. 38, no. 2, pp. 324โ333, Mar. 1992.
[29] L. E. Baum and T. Petrie, โStatistical inference for probabilisticfunctions of finite state Markov chains,โ The Annals of MathematicalStatistics, vol. 37, no. 6, pp. 1554โ1563, Dec. 1966.
[30] K. W. Choi, โAdaptive sensing technique to maximize spectrum uti-lization in cognitive radio,โ IEEE Trans. Veh. Technol., vol. 59, no. 2,pp. 992โ998, Feb. 2010.
[31] W. S. Lovejoy, โA survey of algorithmic methods for partially observableMarkov decision processes,โ Annals of Operations Research, vol. 28,no. 1, pp. 47โ66, Dec. 1991.
Kae Won Choi received the B.S. degree in civil,urban, and geosystem engineering in 2001, and theM.S. and Ph.D. degrees in electrical engineering andcomputer science in 2003 and 2007, respectively,all from Seoul National University, Seoul, Korea.From 2008 to 2009, he was with TelecommunicationBusiness of Samsung Electronics Co., Ltd., Korea.From 2009 to 2010, he was a postdoctoral researcherin the Department of Electrical and Computer En-gineering, University of Manitoba, Winnipeg, MB,Canada. In 2010, he joined the faculty at Seoul
National University of Science and Technology, Korea, where he is currentlyan assistant professor in the Department of Computer Science. His researchinterests include cognitive radio, wireless network optimization, radio resourcemanagement, and mobile cloud computing.
Ekram Hossain (Sโ98-Mโ01-SMโ06) is a full Pro-fessor in the Department of Electrical and ComputerEngineering at University of Manitoba, Winnipeg,Canada. He received his Ph.D. in Electrical En-gineering from University of Victoria, Canada, in2001. Dr. Hossainโs research interests include de-sign, analysis, and optimization of wireless/mobilecommunications networks and cognitive radiosystems (http://www.ee.umanitoba.ca/โผekram). Heserves as the Area Editor for the IEEE TRANS-ACTIONS ON WIRELESS COMMUNICATIONS in the
area of โResource Management and Multiple Access,โ an Editor for the IEEETRANSACTIONS ON MOBILE COMPUTING, the IEEE COMMUNICATIONS
SURVEYS AND TUTORIALS, and IEEE Wireless Communications. Dr. Hossainhas several research awards to his credit which include the University ofManitoba Merit Award in 2010 (for Research and Scholarly Activities) andthe 2011 IEEE Communications Society Fred Ellersick Prize Paper Award.He is a registered Professional Engineer in the province of Manitoba, Canada.
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.