1 SpecMonitor: Towards Efficient Passive Traffic Monitoring...

12
1 SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive Radio Networks Qiben Yan * Ming Li Feng Chen Tingting Jiang * Wenjing Lou * Y. Thomas Hou * Chang-Tien Lu * * Virginia Polytechnic Institute and State University, Blacksburg, VA. Email: {qbyan,virjtt03,wjlou,thou,ctlu}@vt.edu Utah State University, Logan, Utah. Email: [email protected] University at Albany - SUNY, Albany, NY. Email: [email protected] Abstract—Passive monitoring by distributed wireless sniffers has been used to strategically capture the network traffic, as the basis of automatic network diagnosis. However, the traditional monitoring techniques fall short in cognitive radio networks (CRNs) due to the much larger number of channels to be moni- tored, and the secondary users’ channel availability uncertainty imposed by primary user activities. To better serve CRNs, we propose a systematic passive monitoring framework, SpecMon- itor, for traffic collection using a limited number of sniffers in Wi-Fi like CRNs. We jointly consider primary user activity and secondary user channel access pattern to optimize the traffic capturing strategy. In particular, we exploit a non-parametric density estimation method to learn and predict secondary users’ access pattern in an online fashion, which rapidly adapts to the users’ dynamic behaviors and supports accurate estimation of merged access patterns from multiple users. We also design near-optimal monitoring algorithms that maximize two levels of quality-of-monitoring goals respectively, based on the predicted channel access patterns. The simulations and experiments show that SpecMonitor outperforms the existing schemes significantly. KeywordsCognitive radio network, passive monitoring, non- parametric density estimation, optimization algorithm. I. I NTRODUCTION Cognitive Radio (CR) has been envisioned as a new paradig- m to better utilize the spectrum resources, by allowing unli- censed or secondary users (SUs) to opportunistically access the licensed bands, as long as they do not cause any inter- ference to licensed or primary users (PUs). While most of the prior research in CRNs focused on the problem of establishing a single link between SUs [1], recent research has gone beyond a single link to identify the challenges of implementing a Wi- Fi like CR network [2] consisting of secondary Access Points (APs) associated with multiple secondary clients. Passive monitoring has been used to measure Wi-Fi net- works [3]–[5] using a dedicated set of hardware devices, called sniffers. It has been shown to complement the wire side monitoring by gathering detailed PHY/MAC information. Passive monitoring serves as the basis of numerous appli- cations ranging from network forensics, fault diagnosis to resource management. As the quality of those applications mainly depends on that of traffic monitoring, it is non-trivial to build a traffic monitoring framework with excellent monitoring performance. Passive monitoring is particularly important to CRNs, because: (1) cognitive radios are programmable and difficult to manage; (2) the interference requirement in CRNs is mandatory and extremely high. In this paper, we consider the construction of a passive monitoring framework for Wi-Fi like CR networks, or “WhiteFi” networks for short. However, passive monitoring becomes a challenging task in WhiteFi networks. First, WhiteFi networks have a much wider spectrum (50MHz-698MHz) than traditional wireless networks, which makes it infeasible to deploy one sniffer for each channel. As a result, the sniffers have to decide which subsets of channels they will operate on, referred to as sniffer channel assignment problem. Second, SUs have to va- cate the channels immediately once PUs start transmissions on the corresponding channels. Such inevitable channel switching behavior potentially complicates the sniffers’ traffic monitoring strategies. Last but not the least, network traffic on each channel typically comes from multiple SUs, who share the spectrum by following a certain medium access control (MAC) mechanism. Thus, traffic patterns observed by the sniffers are highly dynamic, further complicating the sniffers’ monitoring strategies. To meet these challenges, we propose a monitoring frame- work, SpecMonitor, which utilizes a non-parametric density estimation method to model SUs’ channel usage pattern. This method makes no assumptions on the unknown distribution of channel access pattern, thus offers accurate and flexible models which can be updated in an online fashion with acceptable complexity. Moreover, we design a sliding window method to perform online learning of data dynamics, and an accumulative combination method to further improve modeling accuracy. Then, SpecMonitor takes inputs from SUs’ channel usage model to construct monitoring strategies. In this paper, we consider two levels of monitoring ob- jectives: frame-level and user-level, to diagnose different net- work issues. The frame-level objective can be interpreted as maximizing the frame-level quality-of-monitoring (FL-QoM), defined as the amount of captured MAC frames of inter- est, due to their significance for the subsequent aggregated traffic analysis [5]. The user-level objective is to maximize the user-level quality-of-monitoring (UL-QoM), defined as the expected number of active users monitored, which can fa- cilitate user behavior analysis [6]. We cast the monitoring optimization problem as a sniffer channel assignment problem with objective of maximizing the corresponding QoMs. In this paper, we make the following contributions:

Transcript of 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring...

Page 1: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

1

SpecMonitor: Towards Efficient Passive TrafficMonitoring for Cognitive Radio Networks

Qiben Yan∗ Ming Li† Feng Chen‡ Tingting Jiang∗ Wenjing Lou∗ Y. Thomas Hou∗ Chang-Tien Lu∗∗ Virginia Polytechnic Institute and State University, Blacksburg, VA. Email:

{qbyan,virjtt03,wjlou,thou,ctlu}@vt.edu† Utah State University, Logan, Utah. Email: [email protected]

‡ University at Albany - SUNY, Albany, NY. Email: [email protected]

Abstract—Passive monitoring by distributed wireless sniffershas been used to strategically capture the network traffic, as thebasis of automatic network diagnosis. However, the traditionalmonitoring techniques fall short in cognitive radio networks(CRNs) due to the much larger number of channels to be moni-tored, and the secondary users’ channel availability uncertaintyimposed by primary user activities. To better serve CRNs, wepropose a systematic passive monitoring framework, SpecMon-itor, for traffic collection using a limited number of sniffers inWi-Fi like CRNs. We jointly consider primary user activity andsecondary user channel access pattern to optimize the trafficcapturing strategy. In particular, we exploit a non-parametricdensity estimation method to learn and predict secondary users’access pattern in an online fashion, which rapidly adapts tothe users’ dynamic behaviors and supports accurate estimationof merged access patterns from multiple users. We also designnear-optimal monitoring algorithms that maximize two levels ofquality-of-monitoring goals respectively, based on the predictedchannel access patterns. The simulations and experiments showthat SpecMonitor outperforms the existing schemes significantly.

Keywords—Cognitive radio network, passive monitoring, non-parametric density estimation, optimization algorithm.

I. INTRODUCTION

Cognitive Radio (CR) has been envisioned as a new paradig-m to better utilize the spectrum resources, by allowing unli-censed or secondary users (SUs) to opportunistically accessthe licensed bands, as long as they do not cause any inter-ference to licensed or primary users (PUs). While most of theprior research in CRNs focused on the problem of establishinga single link between SUs [1], recent research has gone beyonda single link to identify the challenges of implementing a Wi-Fi like CR network [2] consisting of secondary Access Points(APs) associated with multiple secondary clients.

Passive monitoring has been used to measure Wi-Fi net-works [3]–[5] using a dedicated set of hardware devices,called sniffers. It has been shown to complement the wireside monitoring by gathering detailed PHY/MAC information.Passive monitoring serves as the basis of numerous appli-cations ranging from network forensics, fault diagnosis toresource management. As the quality of those applicationsmainly depends on that of traffic monitoring, it is non-trivial tobuild a traffic monitoring framework with excellent monitoringperformance. Passive monitoring is particularly important toCRNs, because: (1) cognitive radios are programmable and

difficult to manage; (2) the interference requirement in CRNsis mandatory and extremely high. In this paper, we considerthe construction of a passive monitoring framework for Wi-Filike CR networks, or “WhiteFi” networks for short.

However, passive monitoring becomes a challenging taskin WhiteFi networks. First, WhiteFi networks have a muchwider spectrum (50MHz-698MHz) than traditional wirelessnetworks, which makes it infeasible to deploy one snifferfor each channel. As a result, the sniffers have to decidewhich subsets of channels they will operate on, referred to assniffer channel assignment problem. Second, SUs have to va-cate the channels immediately once PUs start transmissions onthe corresponding channels. Such inevitable channel switchingbehavior potentially complicates the sniffers’ traffic monitoringstrategies. Last but not the least, network traffic on eachchannel typically comes from multiple SUs, who share thespectrum by following a certain medium access control (MAC)mechanism. Thus, traffic patterns observed by the sniffers arehighly dynamic, further complicating the sniffers’ monitoringstrategies.

To meet these challenges, we propose a monitoring frame-work, SpecMonitor, which utilizes a non-parametric densityestimation method to model SUs’ channel usage pattern. Thismethod makes no assumptions on the unknown distribution ofchannel access pattern, thus offers accurate and flexible modelswhich can be updated in an online fashion with acceptablecomplexity. Moreover, we design a sliding window method toperform online learning of data dynamics, and an accumulativecombination method to further improve modeling accuracy.Then, SpecMonitor takes inputs from SUs’ channel usagemodel to construct monitoring strategies.

In this paper, we consider two levels of monitoring ob-jectives: frame-level and user-level, to diagnose different net-work issues. The frame-level objective can be interpreted asmaximizing the frame-level quality-of-monitoring (FL-QoM),defined as the amount of captured MAC frames of inter-est, due to their significance for the subsequent aggregatedtraffic analysis [5]. The user-level objective is to maximizethe user-level quality-of-monitoring (UL-QoM), defined as theexpected number of active users monitored, which can fa-cilitate user behavior analysis [6]. We cast the monitoringoptimization problem as a sniffer channel assignment problemwith objective of maximizing the corresponding QoMs.

In this paper, we make the following contributions:

Page 2: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

2

(1) We design a general framework to monitor the WhiteFinetworks, which jointly considers the channel availability andsecondary user access pattern. In particular, we design anonline non-parametric density estimation mechanism to mod-el the secondary user channel activity, which is able to supportdynamic and complex access patterns.

(2) We formulate the sniffer channel assignment problemsas integer programming (IP) problems by incorporating thechannel switching costs with the QoM objective, for which weprovide algorithms to optimize two different levels of QoMsrespectively.

(3) We present approximation algorithms to provide near-optimal solutions. Numerical analysis shows the solutionscan offer objective values that are very close to the optimalvalue, thus confirming their near-optimality. Furthermore, weconduct extensive simulations and experiments to validate theefficacy and efficiency of our statistical model and monitoringframework.

The remainder of this paper is organized as follows. Weintroduce the related work in section II. In section III, wedescribe the monitoring system model. The secondary userchannel access model is depicted in section IV, followedby section V, which formulates the monitoring optimizationproblems and provides near-optimal solutions. Section VIpresents the evaluation results using both synthetic data fromsimulations and real data from experiments. Finally, sectionVII concludes the paper.

II. RELATED WORK

Passive Monitoring in Traditional Wireless Networks:Passive monitoring in wireless networks has been an activeresearch area. Yeo et al. were the first to use dedicatedsniffers to passively measure a Wi-Fi network, successfullyidentifying protocol anomalies and malicious WLAN usages[3]. Cheng et al. presented Jigsaw, which is a large-scalepassive monitoring infrastructure to collect and dissect wirelesstraffic for cross-layer network diagnosis in a large enterpriseWi-Fi network [4], [5]. While the above works focused ondeveloping the monitoring infrastructure, some recent worksinvestigated the problem of optimal sniffer channel assignmentto maximize the amount of monitored information. Shin et al.[7] formulated the sniffer channel assignment problem in thewireless mesh network as a maximal coverage problem, anddesigned approximation algorithms to solve this problem. In[8], Chhetri et al. further extended the preceding work bytaking into account the users’ access patterns. They proposedtwo monitoring models: user-centric model and sniffer-centricmodel. However, they assume the statistics for different users’activities are known. Recently, Arora et al. [9] proposed touse multi-armed bandit to perform sequential learning of theunknown channel statistics, which can be used to facilitateoptimal channel assignments. However, multi-armed banditis too complex to be used for online and efficient channelassignments. In this paper, we present an efficient onlinechannel assignment mechanism without any prior knowledgeof channel access statistics, which is the first mechanism in theliterature to provide optimized channel assignments in real-time. All the above works only considered maximizing the

number of active users covered by the sniffers, while we furtheraddress the problem of maximizing the number of capturedframes.Spectrum Monitoring in Cognitive Radio Networks: Chenet al. studied frame capturing problem for network forensicsin CRNs [10], in which support vector regression (SVR)method is employed to predict the frame arrival time to guidechannel assignments. They have similar objectives as ours,however, our method has the following advantages: 1) SVRmethod requires a time consuming training phase, while weutilize density estimation to produce new estimates in anonline fashion avoiding of the expensive training and retrainingphases; 2) SVR method falls short of dealing with interleavedtraffic from multiple users, which corresponds to dynamictraffic statistics, while our scheme can adapt promptly tothe traffic dynamics; 3) their monitoring framework has poorperformance when the monitored channels carry high datarate traffic, because of frequent channel switching behaviorinduced by the heuristic channel assignments. In contrast,as we jointly consider channel switching costs and framecapturing gains to optimize channel assignments, our methodcan achieve better performance with fast traffic flows. Recently,Yi et al. formulated the secondary user data capturing problemas multi-armed bandit problem [11], which takes a long periodof learning process before it is able to produce an accurateestimation of user access pattern. Thus, their method is notefficient enough to capture adaptive and interleaved trafficpatterns either.

III. SYSTEM MODEL

A. Monitoring System ModelIn this section, we describe the monitoring system model for

CR networks. We consider CR networks with coexisting PUsand SUs. The most common PUs are TV towers and wirelessmicrophones (WMs). Our monitoring system is interested inthe network traffic from SUs including APs and clients whoform a WhiteFi network, as illustrated in Fig. 1. Curiousreaders please refer to [2] for the design and implementationdetails of WhiteFi system.

For a multi-hop network, we segment the whole networkregion into small regions, called monitoring areas, and assigna certain number of sniffers to monitor the traffic for eachmonitoring area. Each sniffer may be equipped with multipleantennas, which allow him/her to sense/capture traffic overmultiple channels at one time. We assume different AP-clientpairs in the monitoring area pick different working channelsto avoid interference, and each sniffer can overhear all theinbound and outbound traffic from any secondary deviceinside its monitoring area if they tune into the same channel.Similar to [10], some sniffers are used as dedicated inspectionsniffers that periodically sense channels to gain channel usagestatistics, while other sniffers called operation sniffers areresponsible for capturing information. All the sniffers areconnected to a sniffer center for centralized decision making,as shown in Fig. 1. Each inspection sniffer is assigned multiplechannels to scan. A sensing slot is a period during which theinspection sniffer scans through all the assigned channels. In

Page 3: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

3

Fig. 1: Monitoring systemarchitecture for WhiteFi network

inside a monitoring areaFig. 2: The percentage of frames in

active slots (20 ms slot length)

Fig. 3: Frame/Active slot interarrivaltime distribution (20 ms sensing slot,

2 ms sensing period)

the following, a slot stands for the sensing slot unless otherwisenoted.

B. Channel Access ModelA sensing slot is composed of channel sensing and channel

switching time, whose length depends on the number ofchannels to be scanned. Typically, a channel sensing period isapproximately 1ms per channel using energy detection [12],while channel switching for a commodity 802.11b/g networkcard takes about 1− 5ms [13].

During each slot, the inspection sniffer scans several chan-nels to reveal their channel states (active/idle). Here, an activeslot indicates a slot during which the sniffer spots SUs’ trafficafter channel sensing, while idle slot represents the opposite.As channel sensing period is less than a full slot length, thediscovered active/idle state may not reflect the genuine stateof a slot. Let Xi

k be the state of channel i at k-th sensingslot, which takes only binary values “1/0”, corresponding toactive/idle state (in the following, we omit the subscript ifor i-th channel). Then, the sequential data Xk (k = 1, 2, . . .)are used to calculate the active slot interarrival time for eachchannel, which is defined as the time interval between twoconsecutive active slots. In our design, inspection sniffersproduce active slot interarrival time as their sensing outcomes,which will be used as the inputs to build channel access modelas explained in Section IV.

Note that one straightforward way of meeting frame cap-turing objectives is to predict SUs’ frame arrival time bymodeling frame arrival pattern. However, it is infeasible toderive optimized channel assignments with dynamic framelength and unslotted frame transmission [10]. Instead of di-rectly modeling frame arrival pattern, we model the active slotinterarrival pattern as the basis of our monitoring framework,in which monitoring an active slot implies capturing all theframes in the slot. To motivate/justify the adoption of activeslot interarrival time, we performed a real-world experimentusing the traffic from different types of applications (e.g. Webbrowsing, Bittorrent, FTP) in operational 802.11g WLAN. Theexperiment settings are illustrated in Section VI-B. Fig. 2shows the percentage of frames in active slots corresponding

K Number of time slotsN Number of channelsN Channel index setM Number of antennas of all operation sniffersSop Operation sniffer antenna index setXk Sensing results of inspection sniffers at slot kZ(k) Current data set at slot kZin Input data set for density estimationTint Active slot interarrival timeW Sliding window size for online estimationnw Number of previous windows considered for combinationtw Number of different samples in the previous window for

combinationf Probability density estimateF Cumulative density estimateSCAPi Slotted channel access probability of channel i4 Sensing slot lengthIdleCount Number of idle slots countedyi Vector of binary random variables for channel izs,i Vector of binary random variables indicating whether sniffer

s is assigned to channel iα Switching cost weightUi Number of distinct users in channel i

TABLE I: Summary of Symbols and Notations

to different sensing periods, from which we notice most of theframes reside in the identified active slots, especially when thesensing period is longer than 2ms. In other words, by capturingframes in active slots, we are able to collect most of the frames.Fig. 3 plots the histograms of frame interarrival time versus ac-tive slot interarrival time, with each bar showing the percentageof frames or active slots whose interarrival time is indicatedby the x-axis. These two distributions appear very similar toeach other with most of the frames concentrated within smallinterarrival time region, indicating active slot interarrival timewell characterizes channel usage pattern. For ease of reference,the commonly used notations are summarized in Table I.

IV. USER CHANNEL ACCESS PREDICTION

In this section, we propose a unified model to estimatesecondary user channel access pattern, as the front-end ofSpecMonitor. In order to build the unified model, we firststudy the primary user detection issue, and then we design anonline non-parametric density estimation mechanism to predictSUs’ slotted channel access probability (SCAP ) pertained toeach sensing slot. As its name suggests, slotted channel access

Page 4: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

4

probability is defined as the probability of SUs’ channel accessduring each slot.

A. Primary User DetectionTo enable CR communications, SUs need real time knowl-

edge of PUs’ activity to identify available spectrum. Sim-ilarly, the sniffers are also required to detect PUs’ activi-ty in order not to waste time and energy listening on theprimary-occupied channels. Primary user detection can beachieved by using either spectrum sensing or by querying ageo-location white space database over the internet. Spectrumsensing is expensive in cost, energy consumption and com-plexity of hardware. On the other hand, the database approachis easier to implement, which allows devices to report theirlocations to a web server that returns a list of available channelsat that location. However, database approach suffers fromutilization inefficiency, since it uses propagation models todecide the available spectrum, and hence, is conservative inthe channels it returns for a given location. Either of thesetwo approaches can be applied to our monitoring framework.

Feature detection is one popular spectrum sensing methodfor the sniffers to detect PUs’ appearance. The feature detec-tion algorithms described in [14] can be used to sample theUHF spectrum to detect the presence of TV broadcasts andwireless microphone signals, which can effectively differenti-ate between the SUs’ and PUs’ signals. Then, the sniffers candirectly perform feature detection in the beginning of everyslot to sense the availability of monitored channels.

The database approach allows both sniffers and SUs to querythe database for spectrum availability at a certain location.After querying the database, the SUs begin operating on aset of available channels, while inspection sniffers tune ontothese available channels to monitor SUs’ traffic patterns, andoperation sniffers are assigned to the SU-occupied channelscorrespondingly. In SpecMonitor, we adopt the database ap-proach for simplicity.

B. Secondary User Channel Access ModelIn this section, we propose a framework to estimate the

secondary users’ SCAP at each slot by modeling the activeslot interarrival time distribution. The SUs’ channel accesspattern in WhiteFi networks is complicated, mainly due to thedynamics brought by time-evolving mixed traffic from multipleSUs with channel switching behavior.

1) Non-parametric Density Estimation Model: Instead ofassuming a specific active slot interarrival time distributionfor quantifying SUs’ traffic pattern, we propose a SU chan-nel usage model using the non-parametric density estimationmethod to better capture SUs’ traffic dynamics. Current-ly, one of the most popular non-parametric density estima-tion approaches is Kernel Density Estimator (KDE) witha Gaussian kernel function [15]. Given n independent re-alizations Xi (i = 1, 2, . . . , n) drawn from an unknownprobability density function (pdf) f(x), the Gaussian KDEwith bandwidth σ is defined as:

f̂(x;σ) :=1

n

n∑i=1

KG(x,Xi, σ), x ∈ R, (1)

where

KG(x,Xi, σ) =1√2πσ

e−(x−Xi)2/(2σ2), (2)

from which we can see that Gaussian KDE is essentially theoverall sum of Gaussian kernels centered at location Xi withan equal bandwidth σ.

In fact, the setting of σ is of utmost importance for the den-sity estimation performance. A classic measure to determinethe optimal σ is Mean Integrated Squared Error (MISE):

MISE{f̂}(σ) := E[f̂(x;σ)− f(x)]2, (3)

where f(x) is the underlying genuine distribution. Assuming alarge sample set, we can obtain an asymptotic approximationto MISE, denoted as asymptotic MISE (AMISE), written as[15]:

AMISE{f̂}(σ) =1

4σ4‖f ′′(x)‖2 +

1

2n√πσ

, (4)

where f ′′(x) is the second derivative of f(x), and ‖·‖ denotesthe Euclidean norm on R. Thus, the asymptotic optimal valueof σ∗ is obtained by minimizing AMISE:

σ∗ = (1

2n√π‖f ′′(x)‖2

)1/5. (5)

In order to compute σ∗ from Eq. 5, we need to approx-imate ‖f ′′(x)‖2 by estimating the general form ‖f (j)(x)‖2for arbitrary j. The corresponding optimal solution σ∗j =

( 12n√π‖f(j)(x)‖2 )1/5 with a generalized term of ‖f (j)(x)‖2

can be solved in a recursive form, namely σ∗j = γj(σ∗j+1),

where γj is a complicated formula given in [15]. Then, a fixedpoint iteration method is employed to compute σ∗2 , which isequivalent to the target value σ∗. This KDE algorithm providesa viable means of automatically selecting optimal bandwidthswith superior density estimation performance.

2) Modeling Active Slot Interarrival Time Distribution:The KDE collects the data set of active slot interarrivaltime measured by inspection sniffers to generate the densityestimates. Since the distribution of collected data sets may varyover time, the modeling accuracy of the KDE will be affectedby taking into account outdated historic data. Thereby, only themost recent data should be imported into the modeling process.On the other hand, the modeling accuracy also largely dependson the size of the input data sets. If we only consider themost recent observations by discarding all the historical ones,the modeling accuracy will be brought down significantly.Furthermore, the amount of inputs to KDE has great impactson its computational efficiency. Generally speaking, KDE witha small data set runs more efficiently than that with a largedata set. Therefore, the major issue of this model is to decidehow much historical data should be incorporated for densityestimation, in order to produce an accurate and efficient model.

Now we present our proposed online non-parametric densityestimation protocol. The basic idea is to use sliding windowmethod to perform online updating of the density estimates,and to incorporate additional historic data sets for improvingthe estimation accuracy. The whole protocol is presented in

Page 5: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

5

Algorithm 1, which is repeated for each channel. Whenevera new observation arrives, the online estimation model onlytakes the data in a sliding window of size W , i.e., the datasets exporting to the KDE only hold the most recent Wobservations. The setting of window size W is pertained to thedata dynamics, thus is empirical. A simple guideline wouldbe: first, we set an initial value for the sliding window sizeand run KDE; second, we move the sliding window forwardto see whether the estimated distribution changes over time;third, if the change is significant, we decrease the windowsize, otherwise, increase it, until we reach a satisfactorywindow size. Specifically in the WhiteFi network scenario,we set a relatively small W as 50 data samples, since thedata distribution will change more dynamically than that in atraditional wireless network.

One of the most favorable features of sliding windowmethod is attributed to its support for online learning ofdensity estimates. As time advances, our density estimator willtake newest sets of data falling inside the sliding window tocompute the latest estimate. Therefore, our model enables theeffective characterization of the time-evolving active slot inter-arrival distribution, and allows us to update density estimateswith every newly arrival observation.

However, the major drawback of the sliding window methodresides in the following fact: the sliding window to specifyinput data also deteriorates the accuracy of KDE, because thesize of sliding window restricts the number of observations(only W ). Hence, we need to improve the estimates byexpanding the input data size.

As depicted in Algorithm 1, we propose to combine the datasets from multiple sliding windows according to some well-defined criteria, in order to enlarge the sample space. Howto define such criteria for merging sample space is crucialto the ultimate estimation performance. At first glance, morerecent windows of data sets should have higher relevance tocurrent window. Therefore, one intuitive method to achievemore accurate estimation is to combine the most recent densityestimates from latest windows to capture the data freshness[16]. However, because of the uncertain channel availabilityand underlying MAC protocol, multiple clients may generateinterleaved traffic due to alternate channel accesses. Therefore,the most recent windows may not necessarily reflect theunderlying density of current window best, while some earlierhistorical data originating from the same clients pertainingto the current window might do. Accordingly, we proposean accumulative combination method to make the decisionof merging historical data based on statistical correlationamong the samples. As shown in Algorithm 1, we simpli-fy the computation of statistical correlations by employingKolmogorov-Smirnov test (KS test). KS test is character-ized as a non-parametric inferential statistical method, sinceit makes no assumption about the distributions of samples, thusis completely data-driven. The Kolmogorov-Smirnov statisticis defined as follows:

Definition 1: Consider two sets of observations Z1 and Z2,with n1 = |Z1| and n2 = |Z2| samples. The Kolmogorov-

Algorithm 1 Online non-parametric density estimation proto-col

1: Input: W , nw , tw , current sensing result Xk at k-th sensing slot.2: If Xk! = 03: Calculate the new observed active slot interarrival time Tint(k);4: Update the current data set Z(k) = {Tint(k), . . . , Tint(k−W+1)};5: Update the input data set Zin = Z(k);

Update the current density estimate [F̃ (k), f̃(k)] = KDE(Zin);6: for i← 1 to nw

7: Perform KS test: KStest(Z(k),Z(k − i · tw));8: If pass KS test9: Update the input data set Zin = {Z(k) ∪ Z(k − i · tw)};

10: end11: Update the current density estimate [F̃ (k), f̃(k)] = KDE(Zin);12: else return.

Smirnov statistic is defined as:

Dz1,z2 = supx|F1(x)− F2(x)|,

where F1 and F2 represent the empirical cumulative distribu-tion functions (cdfs) of the samples in Z1 and Z2, respectively.

Then, given Dz1,z2 , we can confirm two sample sets arefrom the same distribution with a certain significance levelβ, if

√n1n2

n1+n2Dz1,z2 ≤ Kβ , where Kβ can be set according

to a well-defined table [17]. Note that cdf is a byproduct ofthe KDE, denoted as F̃ (k) in Algorithm 1. After KS test,we combine all the data sets passing the tests into one singledata set, which is provided for the KDE to update densityestimates f̃(k) for the current slot. To tradeoff the performanceimprovement and computational overhead, we limit the numberof KS tests by only preserving the previous nw windowsof data sets for each channel. Meanwhile, two consecutivewindows only differ with one data point, thus it becomes morebeneficial to test windows with interval of tw samples. In thisway, every previous window passing the KS test can exporttw more samples into the merged data set (see line 10 ofAlgorithm 1).

Consequently, we derive an accurate density estimate foractive slot interarrival time distribution at each channel, byonline learning of data dynamics and cumulative combinationof historic data.

3) Computing Slotted Channel Access Probability: Theproblem we are going to address in this section is how toestimate the SCAP based on the predicted distribution of ac-tive slot interarrival time. As mentioned before, SCAP (k+1)represents the probability that (k+1)-th slot is active. In theory,the predicted secondary user SCAP at (k+1)-th slot should berepresented as SCAP (k + 1) = Pr(Xk+1 = 1|X1, . . . , Xk),whose computation appears intractable since it takes all thehistoric channel states into consideration. However, we cantake advantage of the updated active slot interarrival timedistribution to simplify the computation. We note that if thecurrent slot is active, the current active slot interarrival timewill be the time period between the current slot and themost recent active slot. Consequently, the probability thatcurrent slot is active SCAP (k + 1) can be interpreted asthe probability that the active slot interarrival time is equalto the time period between the current slot and the preceding

Page 6: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

6

Algorithm 2 The Computation of Slotted Channel AccessProbability

1: Input: current density estimate f̃(k), current sensing result Xk , thesensing slot length ∆.

2: Initialization: IdleCount = 13: If Xk! = 04: Compute SCAP (k + 1) =

∫ ∆0 f̃(k)dt;

5: Reset IdleCount = 1;6: else7: Update IdleCount = IdleCount+ 1;8: Compute SCAP (k + 1) =

∫ (IdleCount·∆)(IdleCount−1)·∆ f̃(k)dt;

9: end

active slot. If we assume the preceding active slot is k,SCAP (k + 1) = Pr(Xk+1 = 1|Xk = 1) with active slotinterarrival time becoming ∆. If we assume the precedingactive slot is j, SCAP (k + 1) = Pr(Xk+1 = 1|Xk =0, . . . , Xj+1 = 0, Xj = 1) with active alot interarrival timebecoming (k+1−j)·∆. Therefore, the predicted SCAP (k+1)can be written as follows:

SCAP (k + 1) =

Pr(Xk+1 = 1|Xk = 1), if Xk = 1,

P r(Xk+1 = 1|Xk = 0, . . . , Xj+1 = 0, Xj = 1),

if Xk = 0.

=

∫ ∆0 f̃(k)dt, if Xk = 1∫ (k+1−j)·∆

(k−j)·∆ f̃(k)dt, if Xk = 0,(6)

where ∆ is defined as the sensing slot length. The algorithmto compute the SCAP for each slot is given in Algorithm 2.SCAP provides an appropriate measure for quantifying thesecondary user channel access pattern, which takes into ac-count the channel availability, SUs’ current activity, and SUs’traffic pattern learnt from their past activities. The major goalof the inspection sniffers is to predict SCAP (k+1) that guidesthe operation sniffers’ channel assignment strategies, which isthe main focus of the following section.

V. NEAR-OPTIMAL MONITORING MECHANISM

The monitoring mechanism of SpecMonitor addresses theproblem of sniffer channel assignment to maximize two d-ifferent levels of QoMs, which is carried out by the sniffercenter. In particular, at k-th slot, the sniffer center collectsall the channel usage information gathered by the inspectionsniffers to produce a prediction set of SCAP (k + 1) for allthe channels simultaneously. This set of predicted SCAP isthen leveraged to provide optimized channel assignments forthe forthcoming slot.

Although channel switching enables the sniffers to capturechannel dynamics adaptively, its negative effects should not beneglected in computing QoMs, especially in the CRNs withchannel availability issue. We claim that channel switchingindeed produces non-negligible overhead in terms of framelosses in practice. In the following, we show our formulation ofsniffer channel assignment problem with two levels of QoMs,respectively.

A. Frame-level Quality-of-Monitoring OptimizationThe goal of FL-QoM optimization is to maximize the num-

ber of captured frames, given a set of channels and operationsniffers inside one monitoring area. In section III, we showthat active slot interarrival pattern is closely associated withframe arrival pattern, so that the number of captured framesduring K slots from a certain channel can be written as:Nf =

∑Kk=0(Ik · n(k)

f ), where Ik is an indicator indicatingwhether the k-th slot is active, n(k)

f denotes the number offrames inside the k-th slot. Therefore, instead of directlymaximizing the number of captured frames, we transform FL-QoM into an objective of maximizing the number of activeslots captured. For notation convenience, let us define indexsets i ∈ N = {1, . . . , N}, s ∈ Sop = {1, . . . ,M} forindexing channels and operation sniffer antennas respectively.The optimization problem can be formulated as the followinginteger programming (IP) problem:

maximizeN∑i=1

SCAPi(k + 1) · yi(k + 1)

− αM∑s=1

N∑i=1

1

2[zs,i(k + 1)− zs,i(k)]2

(7)

subject toN∑i=1

zs,i(k) ≤ 1,∀s ∈ Sop,∀k (8)

M∑s=1

zs,i(k) ≤ 1,∀i ∈ N ,∀k (9)

yi(k) =

M∑s=1

zs,i(k),∀i ∈ N ,∀k (10)

yi(k), zs,i(k) ∈ {0, 1},∀s ∈ Sop, i ∈ N ,∀k.(11)

Each operation sniffer antenna in the set Sop is associatedwith a binary decision vector zs,i(k) ∈ {0, 1}, i ∈ N , whichis called sniffer channel assignment indicator, with zs,i(k) = 1if the sniffer is assigned to channel i at slot k; 0 otherwise.yi(k + 1) is the binary variable indicating whether or not thechannel i is monitored by some sniffer in (k+1)-th slot. TheIP formulation is supposed to run iteratively: at k-th slot, afterobtaining zs,i(k) and predicted SCAPi(k+1), we can acquireyi(k+1) and zs,i(k+1) by solving the IP problem. Clearly, thesniffer channel assignment is updated once every slot, whichallows our mechanism to quickly adapt to the traffic dynamics.

Note that the objective function Eq. (7) is comprised oftwo parts: the positive part represents the average numberof captured active slots, while the negative part indicates thechannel switching costs. For simplicity, we use the numberof channel switches between every two subsequent slots toapproximate the channel switching costs. In addition, we set aswitching cost weight α to represent the relative significance ofchannel switching costs w.r.t. the gains obtained from capturedslots, which is a constant value residing within [0, 1]. Here,we define α as the ratio of channel switching duration to theslot duration. If channel switching takes 5ms and one slot

Page 7: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

7

has 20ms, we have α = 1/4. However, the definition ofα can be further extended to incorporate more sophisticatedmetrics for channel switching costs. For instance, we canfurther incorporate the probability that the current channel willbe idle in the next slot, because sniffer’s channel switchingdoes not incur frame loss overhead if the sniffer listens on anexpected idle channel.

The constraints (8), (10) arise due to the facts that onesniffer antenna can only monitor one channel, and one channelis better to be covered by one sniffer antenna inside themonitoring area. In particular, we put forward the secondconstraint, because if we allow multiple antennas to listen overthe same channel in the same area, their captured frames willprovide duplicate information. This IP problem can be viewedas a NP-hard problem following the proof in [8], thus we needto find an approximation algorithm to solve the IP problem.

LP rounding algorithm has been adopted to solve the IPproblem [7], [8]. This algorithm solves the LP-relaxationof the IP formulation, and then rounds the fraction-al results into integral solutions using for example theprobabilistic rounding algorithm (PRA) [18]. However, thisalgorithm is only applicable to linear program problem, whilein our formulation, the objective function contains somequadratic terms. We then reformulate the objective functionto remove the nonlinear terms. As zs,i(k)2 = zs,i(k) whenzs,i(k) ∈ {0, 1}, the objective function Eq. (7) can be rewritteninto a linear form as follows:

N∑i=1

SCAPi(k + 1) · yi(k + 1)−

α

M∑s=1

N∑i=1

1

2[zs,i(k + 1) + zs,i(k)− 2zs,i(k) · zs,i(k + 1)]

Note that zs,i(k) is already known before solving optimizationproblem. The PRA algorithm has been proven [18] to produce(1 − 1/e)-optimal sniffer channel assignment in linear time.However, the execution of PRA disregards the constrain-t (10) completely. Hence, the resulted channel assignmentobtained from PRA cannot prevent multiple antennas fromlistening on the same channel. We define this problem aschannel conflict problem, and the sniffer antennas assigned tothe same channel as conflict sniffer set.

In response, we propose a heuristic sniffer fixing strategyto address the channel conflict problem, which takes thefollowing steps:

(1) Find all the conflict sniffer sets in the solution obtainedfrom the PRA algorithm;

(2) Pick one sniffer antenna in each conflict sniffer setrandomly, and fix it to the conflicted channel;

(3) Run LP rounding algorithm again to get a new solution;(4) Test whether the new solution contains any conflict

sniffer set: if yes, go to step (1); otherwise return the solution.The above heuristic channel assignment strategy fixes one

sniffer antenna to one channel every round by adding con-straints, thus it guarantees to provide a feasible solution ofchannel assignments for all the sniffers within linear time,which turns out to be a near-optimal solution for the sniffer

channel assignment problem, as shown in Section V-C. In theend, all the confliction will be addressed after running througha sequence of LP rounding, which guarantees the convergenceof the algorithm.

We call the channels to be assigned as potential channels.The resulted channel assignment strategy can provide thesniffers with the assignments of potential channels for the nextslot. Then, the sniffer center checks every potential channel todetermine whether it has already been monitored: if yes, itskips assigning this channel; if no, it selects a sniffer antennawhich is not listening on any other potential channels tomonitor this channel. In this way, the channel switching costsare further alleviated.

B. User-level Quality-of-Monitoring OptimizationThe objective of UL-QoM optimization is to maximize the

expected number of active users monitored. In order to capturethe user-level information, it is indispensable to identify thesource of each frame, even encrypted frames. Let Ui(k) for i ∈N denote the number of active users operating in channel i atthe k-th slot. We assume once a sniffer is tuned into a channel,it covers all the active users operating in this channel. We donot consider the channel switching costs in this case, becausethere are typically multiple frames from a single user so thata small number of frame loss due to channel switching doesnot have a big impact on the number of users measured. TheUL-QoM optimization problem can be casted as the followingIP problem:

maximizeN∑i=1

Ui(k) · SCAPi(k + 1) · yi(k + 1) (12)

subject to (8)− (11).

The above optimization problem can be solved using exactlythe same approximation algorithm illustrated in the previoussection, thus is omitted here. Note that Ui(k) can be measuredby counting the number of different MAC addresses fromframes passing through the AP running in channel i withintime slot k. In practice, Ui(k) may not be available at thebeginning of the k-th slot, so it can be approximated by themeasurement of Ui(k − 1), assuming users remain operatingin the same channel for the next time slot. Small errors inestimating Ui(k) would not affect the performance much. Inthe extreme case when false MAC addresses are inserted bythe attackers, more sophisticated approach is required. Forinstance, machine learning methods to perform Internet trafficclassification [19] can be used to differentiate different usersbased on their identified traffic types. This is out of the scopeof this paper.

C. Numerical AnalysisIn this section, we present numerical results for our approxi-

mation algorithm and compare them to the upper bound of theproblem. Without loss of generality, we focus on the FL-QoMoptimization problem.Deriving an Upper Bound: The complexity of the op-timization problem formulated in Section V-A stems from

Page 8: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

8

(a) Normalized objective (with respectto the computed upper bound) for 20channels and 10 sniffers with α=0.3

(b) Normalized objectives for differ-ent α

Fig. 4: Numerical analysis

the binary yi(k) and zs,i(k) variables, for ∀ k. To derive anupper bound for the problem, we relax the integer (binary)requirement on yi(k) and zs,i(k) with 0 ≤ yi(k) ≤ 1 and0 ≤ zs,i(k) ≤ 1. The relaxed problem is a standard LPproblem, the solution of which can be obtained in polynomialtime. Since the relaxation enlarges the optimization space, thesolution to the relaxed LP problem yields an upper bound forthe original optimization problem.Numerical Results: We consider N = 20 or 30 channels,M = 10 or 20 sniffer antennas. SCAP values are randomlygenerated for every channel over 1000 slots. We first presentthe simulation results for 20 channels and 10 sniffer antennas.We used the PRA approximation and sniffer fixing algorithmsto determine a feasible solution which serves as a low bound,and compared the corresponding objective value with the upperbound. Fig. 4(a) shows the normalized objective values with re-spect to the computed upper bound (i.e. feasible solution/upperbound) for 20 channels and 10 sniffer antennas. The averagenormalized objective value obtained among 1000 slots is 0.95and the standard deviation is 0.03. We further adjust switchingcost weight α to examine the variations of the normalizedobjective values. Fig. 4(b) shows the tiny gap between theachieved solution and the upper bound for different α.

Since the actual optimal value lies between the feasiblesolution value and the upper bound, the solution value ofour approximation algorithm must be even closer to theoptimal value than the foregoing normalized ratio (normalizedobjective value). Thus, the derived solution value of ourapproximation algorithm is close to optimality, confirming itsnear-optimality. Finally, we run an experiment to compare theaverage number of captured active slot using our algorithmwith the upper bound in Fig. 5. We notice that the differencebetween the monitoring solution and upper bound is keptsmall over the time, which proves the near-optimality of ourapproximation algorithm from the experimental perspective.

D. Complexity AnalysisIn this section, we analyze the complexity of the above

approximation algorithms. We first analyze the complexity ofLP rounding algorithm. The LP rounding algorithm involvestwo major steps: (1) solving LP relaxation, and (2) executingPRA algorithm. We notice that the above two IP problemscontain (N + MN) unknown variables. Therefore, the com-plexity of solving the LP relaxation of IP formulation is given

Fig. 5: Number of captured active slots using our algorithmvs. Upper bound for 20 channels and 10 sniffers with α=0.3

as O((N +MN)3/log(N +MN)) [7], which is determinedby the complexity of LP solver. On the other hand, the PRAalgorithm has a linear complexity O(M ∗ N), governed bythe input vector size (M ∗ N) [18]. Thus, the LP roundingalgorithm can be solved with polynomial time complexityO((N +MN)3/log(N +MN)).

Second, the heuristic sniffer fixing strategy will solvechannel conflict problem by running through a series of LProunding algorithm. In the worst case scenario, it will invokeLP rounding M times. Hence, the sniffer channel assignmentproblem can be solved with an overall worst case complexity ofO(M ∗(N+MN)3/log(N+MN)). The efficiency evaluationof the algorithm implementation is presented in Section VI-A.

VI. EVALUATION

In this section, we conduct extensive simulations and exper-iments to evaluate the performance of SpecMonitor for CRNs.The simulations leverage synthetic traces, which allow us tovary the number of channels and sniffers, as well as the trafficpatterns of different users. We also carry out experiments andtest the performance of SpecMonitor on real traces collectedfrom the experiments. Aside from implementing the proposedSpecMonitor framework, we also implemented the followingalgorithms for comparison.• Random channel assignment: the sniffer channels are

randomly assigned.• Greedy channel assignment: the sniffers are always

assigned to the predicted busiest channels based on SCAP atevery sensing slot, i.e. the channels with the largest SCAP .• Support Vector Regression (SVR) channel assignment:

the sniffers are assigned to the channels in which the nextframe is predicted to arrive within a short period based on theframe interarrival time predication using SVR method [10].

We assume the PUs’ presence can be detected promptly byboth inspection and operation sniffers, as illustrated in sectionIV-A. The default systematic parameters used in the evaluationare shown in the Table II.

A. Real-Time Monitoring PerformanceAs illustrated in Section IV,V, our monitoring framework

has a very stringent real-time requirement. Basically, we arerequired to complete the channel assignments before a slotends, i.e., within 20 ms according to our setting. In thissection, we evaluate the running time of SpecMonitor using

Page 9: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

9

Parameters ValuesW 50nw 10tw 25Sensing slot length ∆ 20msSensing period 2msMulti-slot updating 5 slotsGaussian mean values of synthetictraces

[3, 42]

Gaussian standard deviation of synthetictraces

2

TABLE II: Parameters

experiments, and propose to relax the stringent requirementwithout compromising the monitoring performance. We im-plement SpecMonitor framework using MATLAB R2011b ona Windows machine with 3.2 GHz Intel Xeon W3565 CPU and18 GB memory, including the channel access model and near-optimal channel assignment algorithm. In our original design,whenever there is a new observation of active slot interarrivaltime, SCAP values are updated and channels are reassigned.

We carry out experiments to count the running time ofthe monitoring framework and breakdown the running timeinto different sections to identify the bottleneck as shown inTable. III. Note that the recorded running time is the averagevalue after running 1000 slots. The overall running time is91.2 ms, with 97.5% of time spent on KDE operation andoptimization algorithm. By delving into the KDE functionand optimization algorithm, we find the bottleneck of KDEoperation is on the fixed point iteration algorithm [15] con-suming 95% of overall time for each KDE invocation, whilethe bottleneck of optimization algorithm is on the linprogMATLAB function (52%) and LP rounding algorithm withsniffer fixing strategy (40%). In our experiment, the snifferfixing strategy runs through LP rounding algorithm two timeson average until a valid channel assignment is generated.Consequently, the overall running time far exceeds a slotduration. To address this issue, we can convert the code usingmore computationally efficient programming language such asC.

Another alternative and more viable approach is to relax thestringent real-time requirement. Instead of updating SCAP andassigning channels every slot, we relax the per-slot updatingrequirement into T -slot updating requirement, which allowsSCAP to be renewed every T slots. To satisfy the relaxedrequirement, the sniffer center only needs to check for newobservations and incorporate them in the channel usage modelevery T slots. In other words, SCAP gets updated and channelsare reassigned, only if there is at least one new observationduring T slots. In our implementation, we can set T = 5, sothat the implemented model update and channel assignmentcomplete before the next channel assignment can be carriedout, i.e., within 100ms or 5 slots (as 91.2ms < 100ms). Weshow in the following sections that per 5-slot updating does notsacrifice the monitoring performance much compared with per-slot updating, thus it satisfies the real-time processing whileretaining an excellent performance. Note that we neglect thewire side communication costs between inspection sniffers andsniffer center, which are in the order of microsecond, regardedas negligible.

KDE Op-eration

KStest Op-eration

SCAP Com-putation

OptimizationAlgorithm

TotalTime

58.1 0.6 0.6 30.8 91.2

TABLE III: Average running time with 20 channels and 10sniffer antennas (in ms)

B. Frame Capturing PerformanceIn this section, frame capturing performances of different

channel assignment algorithms are evaluated. First, we gen-erate synthetic traces to evaluate the frame capture rate andchannel switching cost. Frame capture rate is defined as theratio of the number of captured frames versus the overallnumber of frames passing through all the channels up to thecurrent time, while channel switching cost is represented by thenegative part of Eq. 7. Then, we collect real-world traces usingAirPcap Nx [20] with Wireshark. The traffic traces are capturedfrom multiple channels of operative WLANs (802.11g mode)to emulate the scenarios in WhiteFi networks. We evaluateSpecMonitor by comparing its frame capturing performancewith other algorithms. For all the following evaluations, wemeasure the performance of algorithms running through 1000slots for 100 rounds.

1) Synthetic Traces: First, we generate synthetic time seriestraces to represent frame interarrival time, using Gaussiandistribution with an exponential correlation function. Eachtrace corresponds to the traffic generated in one channel withdifferent mean values to simulate different traffic loads. Weevaluate the capturing performance w.r.t. different switchingcost weights α. Generally speaking, one channel switchingcauses a penalty of losing α slot, α ∈ [0, 1].

We assume the training process of SVR scheme has al-ready been done, which takes about 35 training samples [10].Fig. 6(a) shows the frame capture rates of different methods.With the increase of α, frame capture rates of all threecompared schemes fall down steadily because of the increasingpenalty for channel switching. However, with excessive α,SpecMonitor will force the sniffers to switch channel onlywhen the reward from channel switching is higher than thepenalty; otherwise, it keeps the sniffers staying in the currentchannels. In this way, SpecMonitor retains an excellent framecapture performance. Regarding the SVR scheme, it performsbest when the channel switching costs are neglected (α = 0or 0.1), which means SVR achieves accurate estimations offrame interarrival time. However, when α grows larger than0.2, frame capture rate of SVR scheme drops steadily becauseof the aggravating switching penalty caused by frame loss.Note that the capturing performance of SpecMonitor also dropsa bit due to higher switching costs, but then reverts back tosurpass the performance curves of any other schemes.

Fig. 6(b) shows channel switching costs w.r.t. α, from whichwe can see SVR methods induce highest switching costsamong all methods, because of its unslotted and heuristicswitching strategy. In contrast, the switching cost of Spec-Monitor remains the lowest.

Finally, Fig. 7(a) shows the different capturing capabilitiesw.r.t. the number of sniffer antennas, the frame capturing per-formance of all the methods keeps growing with the increasingnumber of antennas. SpecMonitor achieves the highest frame

Page 10: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Alpha

Fram

e C

aptu

re R

ate

SpecMonitorSVRRandomGreedy

(a) Frame Capture Rates

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Alpha

Cha

nnel

Sw

itchi

ng C

ost

SpecMonitorSVRRandomGreedy

(b) Channel Switching Costs

Fig. 6: Performance with different methods using synthetictime series data (5 channels, 3 sniffer antennas)

6 7 8 9 10 11 12 13 14 15 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

11

Number of Sniffers

Fra

me

Cap

ture

Rat

e

Our SchemeSVR SchemeRandom SchemeGreedy Scheme

(a) Frame Capture Rates

6 8 10 12 14 160

1000

2000

3000

4000

5000

6000

Number of Sniffers

Cha

nnel

Sw

itchi

ng C

ost

SpecMonitorSVRRandomGreedy

(b) Channel Switching Costs

Fig. 7: Performance with varied number of sniffer antennasusing different methods (α=0.25, 20 channels)

capture rate. We also compare channel switching costs inFig. 7(b). With more sniffer antennas, the channel switchingcosts of SVR method decreases significantly, because the in-creased traffic capturing capability refrains SVR method fromaggressive channel switching behavior. Meanwhile, the othermethods have much less, yet more stable channel switchingcosts.

2) Real Traces: We collect the real traces from 802.11gWLAN network, captured by a sniffer listening on the channelestablished by one AP and client pair running various applica-tions. The captured traces include both the uplink traffic to APand the downlink traffic from AP. We consider five differenttypes of trace data (FTP, BT, Web Browsing, Skype Voice andSkype Video) with one trace per channel. FTP and BT tracesare obtained by running an automated script on the client todownload/upload several files from/to a server continuously,and we write another automated script to browse severalwebsites to collect Web Browsing trace. Skype voice trace andvideo trace are collected by connecting to a client using Skypevoice call or Skype video call. We evaluate the performancewith seven channels of real-world traffic, while the additionaltwo channels contain mixed traffic pattern. Namely, one isthe traffic combined from two clients using Skype Voice andBT, and the other one is generated from two clients usingSkype Voice and Web Browsing. The performance is shownin Fig. 8(a) and Fig. 8(b), from which we can see SVR methodperforms even worse than random scheme. The reason is thatthe SVR method takes a long time for retraining, when thepredicted value has a large deviation from the genuine onebecause of the real-world traffic dynamics. Frequent retrainingand channel switching operations significantly deteriorate the

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Alpha

Fra

me

Cap

ture

Rat

e

SpecMonitorSVRRandomGreedy

(a) Frame Capture Rates

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

200

400

600

800

1000

1200

1400

1600

1800

2000

Alpha

Cha

nnel

Sw

itchi

ng C

ost

SpecMonitorSVRRandomGreedy

(b) Channel Switching Costs

Fig. 8: Performance using real-world traffic (7 channels, 4sniffer antennas)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

11

Alpha

Fra

me

Cap

ture

Rat

e

Per−slot updating5−slot updating10−slot updating

Fig. 9: Average frame capture rate comparison of multi-slotupdating and per-slot updating with 20 channels and 10

sniffer antennas

capturing capability of SVR method. However, SpecMonitorretains the best performance, except in the case of small α, thegreedy method performs better when channel switching onlyincurs a small penalty. This comparison result also indicatesthat our model can accurately capture the traffic statisticsregardless of whether the traffic is interleaved or not.

3) Comparison of multi-slot updating and per-slot updating:As mentioned in Section VI-A, multi-slot updating relaxesthe real-time requirement. In this section, we compare theperformance of multi-slot updating with per-slot updating.Fig. 9 presents the frame capturing performance comparisonfor multi-slot updating and per-slot updating, which showsa slight performance degradation using multi-slot updatingmethod. Interestingly, 5-slot updating achieves a better framecapture performance when α = 0.7, because it incurs lessswitching costs by switching at least every 5 slots. However,in most cases, per-slot updating captures more frames, due toits more rapid adaptation to the traffic dynamics. From thisperformance comparison, we conclude that 5-slot updating re-tains an excellent frame capturing performance while fulfillingthe real-time requirements as presented in Section VI-A.

The intuition behind the fact that T-slot updating achievessimilar results is that the data distribution presented in ourexperiments does not change rapidly over the course of T slots.However, this fact does not apply to all the traffic scenarios.For example, for traffic scenarios when the traffic statistics arerapidly changing, T should be assigned a small value. Theexact value of T can be picked using the above performancecomparison method, via evaluating the performance of variousT values and selecting a larger one with acceptable perfor-

Page 11: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

11

SpecMonitor Greedy Random0

10

20

30

40

50

60

70E

xpect

ed N

um

ber

of

Captu

red U

sers

per

Slo

t

5 Sniffers10 Sniffers15 Sniffers

(a) User-Level QoM Objective

6 7 8 9 10 11 12 13 14 15 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.90.9

Number of Sniffers

Act

ive

Use

r C

aptu

re R

ate

SpecMonitorRandomGreedy

(b) Active User Capture Rate

Fig. 10: User capture performance with 20 channels

mance degradation.

C. User Capturing PerformanceFinally, we evaluate the performance for maximizing UL-

QoM using synthetic data. We assume different channel-s contain different numbers of SUs, and the numbers aredynamically changing within range [0, 10] (assume uniformdistribution); also the frame interarrival time is exponentiallydistributed with mean values residing in [1, 40], specifyingthe traffic pattern. First, we compare the expected numberof captured users per slot using three different monitoringschemes in Fig. 10(a). The result indicates that SpecMonitoris able to capture more users per slot, because the optimizedmonitoring strategy keeps the sniffers watching the channelswith more users.

Then, we define Active User Capture Rate as the ratio ofnumber of active users captured versus the overall number ofthe active users appeared in all the channels. The performanceof active user capture rate w.r.t. different number of sniffers isshown in Fig. 10(b), from which we notice that SpecMonitorcan select best sets of channels to maximize the number ofactive users captured during the monitoring period. The resultimplies SpecMonitor significantly outperforms two baselineschemes, in terms of user capturing performance.

VII. CONCLUSION

In this paper, we have introduced a systematic passivemonitoring framework, SpecMonitor, for Wi-Fi like CRNs tomaximize two levels of QoMs incorporating switching costs.Both the primary user and secondary user channel usagepatterns are considered to optimize the monitoring strategy.Specifically, we proposed an online non-parametric densityestimation scheme to learn and predict the time-evolving mixedtraffic pattern from SUs. Based on the predicted traffic pattern,the optimization problems of sniffer channel assignment areformulated, for which we designed near-optimal monitoringalgorithms. One major limitation of the SpecMonitor systemis that SpecMonitor requires a substantial amount of traffic ofinterest on the channel in order to produce a reasonable channelaccess model. If the traffic amount over a channel is small, theproduced model may be unreliable. In future research, we willconsider the impact of traffic amount to the channel accessmodel. We plan to evaluate the modeling accuracy in real time.The model built with a small traffic amount will be deemed as

unreliable, which will not be used for future predictions andchannel assignments.

ACKNOWLEDGEMENT

We thank Shaxun Chen, Kai Zeng, Fan Zhang for theiruseful inputs and discussions. We thank Huy Nguyen fromUniversity of Houston for providing the source code forprobabilistic rounding algorithm. We appreciate anonymousreviewers for their helpful comments. This work was supportedin part by NSF Grants 1405747, 1343222, 1247830, 1217889and ONR Grant N000141310080. The work of T. Jiang wassupported by an NSF Graduate Research Fellowship.

REFERENCES

[1] Q. Zhao, L. Tong, A. Swami, and Y. Chen, “Decentralized cognitiveMAC for opportunistic spectrum access in ad hoc networks: A POMDPframework,” IEEE Journal on Selected Areas in Communications,vol. 25, April 2007.

[2] P. Bahl, R. Chandra, T. Moscibroda, R. Murty, and M. Welsh, “Whitespace networking with wi-fi like connectivity,” in SIGCOMM ’09,August 2009, pp. 27–38.

[3] J. Yeo, M. Youssef, and A. Agrawala, “A framework for wireless LANmonitoring and its applications,” in Wise 2004, October 2004, pp. 70–79.

[4] Y.-C. Cheng, J. Bellardo, P. Benko, A. C. Snoeren, G. M. Voelker, andS. Savage, “Jigsaw: Solving the puzzle of enterprise 802.11 analysis,”in SIGCOMM ’06, Sep. 2006, pp. 39–50.

[5] Y.-C. Cheng, M. Afanasyev, P. Verkail, P. Benko, J. Chiang, A. C. Sno-eren, S. Savage, and G. M. Voelker, “Automating cross-layer diagnosisof enterprise wireless networks,” in SIGCOMM ’07, Aug. 2007, pp.25–36.

[6] A. Balachandran, G. M. Voelker, P. Bahl, and P. V. Rangan, “Chrac-terizing user behavior and network performance in a public wirelessLAN,” in SIGMETRICS 2002, ACM, June 2002, pp. 195–205.

[7] D.-H. Shin and S. Bagchi, “Optimal monitoring in multi-channel multi-radio wireless mesh networks,” in MobiHoc’09, May 2010.

[8] A. Chhetri, H. Nguyen, G. Scalosub, and R. Zheng, “On qualityof monitoring for multi-channel wireless infrastructure networks,” inMobiHoc’10, Sep. 2010.

[9] P. Arora, C. Szepesvari, and R. Zheng, “Sequantial learning for optimalmonitoring of multi-channel wireless networks,” in INFOCOM 2011,IEEE, 2011.

[10] S. Chen, Z. Kai, and P. Mohapatra, “Efficient data capturing for networkforensics in cognitive radio networks,” in Network Protocols, 2011.(ICNP ’2011) 19th International Conference on, 2011, pp. 176–185.

[11] S. Yi, K. Zeng, and J. Xu, “Secondary user monitoring in unslotted cog-nitive radio networks with unknown models,” in Wireless Algorithms,Systems, and Applications, ser. Lecture Notes in Computer Science, vol.7405, 2012, pp. 648–659.

[12] D. Cabric, A. Tkachenko, and R. W. Brodersen, “Experimental study ofspectrum sensing based on energy detection and network cooperation,”in Proceedings of the first international workshop on Technology andpolicy for accessing spectrum, ser. TAPAS ’06, 2006.

[13] D. Murray, M. Dixon, and T. Koziniec, “Scanning delays in 802.11networks,” in The 2007 International Conference on Next GenerationMobile Applications, Services and Technologies, Sept. 2007, pp. 255–260.

[14] S. Narlanka, R. Chandra, P. Bahl, and J. I. Ferrell, “A hardware platformfor utilizing tv bands with a wi-fi radio,” in IEEE LANMAN, June 2007.

[15] Z. I. Botev, J. F. Grotowski, and D. P. Kroese, “Kernel density estimationvia diffusion,” Annals of Statistics, vol. 38, no. 5, pp. 2916–2957, Nov.2010.

Page 12: 1 SpecMonitor: Towards Efficient Passive Traffic Monitoring ...cse.unl.edu/~qyan/paper/TWC14_Yan.pdf · SpecMonitor: Towards Efficient Passive Traffic Monitoring for Cognitive

12

[16] C. Heinz and B. Seeger, “Towards kernel density estimation overstreaming data,” in International Conference on Management of Data(COMAD), Dec. 2006.

[17] G. W. Corder and D. I. Foreman, Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. New York, USA: Wiley, 2009.

[18] A. Srinivasan, “Distributions on level-sets with applications to approx-imation algorithms,” in FOCS, 2001.

[19] F. Zhang, W. He, X. Liu, and P. G. Bridges, “Inferring users’ onlineactivities through traffic analysis,” in WiSec 2011, June 2011, pp. 59–69.

[20] “Airpcap Adapter,” http://www.riverbed.com/products-solutions/products/network-performance-management/wireshark-enhancement-products/.

Qiben Yan (S’11) received his B.E. and M.E. inElectrical Engineering at Fudan University, China, in2007 and 2010, respectively. He is currently a Ph.Dstudent in Computer Science department at VirginiaTech. His current research interests include wirelessnetwork security and privacy, network monitoringand forensics.

Ming Li (S’08 - M’11) received his Ph.D. in Electri-cal and Computer Engineering from Worcester Poly-technic Institute. He joined the Computer ScienceDepartment at Utah State University as an assistantprofessor in 2011. His research interests are in thegeneral areas of wireless network and cyber securi-ty, with current emphases on network coexistence,wireless physical layer security, data security andprivacy, and cyber-physical system security. He isa recipient of the NSF CAREER Award in 2014. Heis a member of both IEEE and ACM.

Feng Chen is an assistant professor at Universityat Albany, SUNY. He received his B.S. from HunanUniversity, China, in 2001, M.S. degree from Bei-hang University, China, in 2004, and Ph.D. degreefrom Virginia Tech in 2012, all in Computer Science.He has published 30 referred articles in major datamining venues. His research focuses on the detectionof emerging events and other relevant patterns (e.g.,disease outbreaks, crime events, road traffic conges-tion) in the mobile context and/or data mining ofspatial temporal, textual, or social media data.

Tingting Jiang (S’11) received her B.S. degree(Summa Cum Laude) in Computer Science from Vir-ginia Tech, Blacksburg, VA, in 2007. During 2007-2009, she was a Software Engineer at Intrexon Corp.,Blacksburg, VA. Currently, she is a Ph.D. studentin Computer Science at Virginia Tech, Blacksburg,VA. She is a recipient of an NSF Graduate ResearchFellowship (2011 C 2014) and a Microsoft ResearchGraduate Women’s Scholarship (2011). Her researcharea is in Wireless Security.

Wenjing Lou (S’01-M’03- SM’08) is a professor inthe Computer Science department at Virginia Poly-technic Institute and State University. She receivedher Ph.D. in Electrical and Computer Engineeringfrom University of Florida. Her research interestsare in the broad area of wireless networks, withspecial emphases on wireless security and cross-layernetwork optimization.

Y. Thomas Hou (F’14) is a Professor in the BradleyDepartment of Electrical and Computer Engineer-ing, Virginia Tech, Blacksburg, VA. He receivedhis Ph.D. degree from NYU Polytechnic Schoolof Engineering (formerly Polytechnic Univ.). Hiscurrent research focuses on developing innovative so-lutions to complex cross-layer optimization problemsin wireless and mobile networks. He has publishedtwo graduate textbooks: Cognitive Radio Commu-nications and Networks: Principles and Practices(Academic Press/Elsevier, 2009) and Applied Opti-

mization Methods for Wireless Networks (Cambridge University Press, 2014).The first book has been selected as one of the Best Readings on CognitiveRadio by IEEE Communications Society. For his research accomplishments,Prof. Hou was awarded Virginia Tech College of Engineering Deans Award forExcellence in Research (2013) and Dean’s Faculty Fellow Award (2008). Heis currently an Area Editor of IEEE Transactions on Wireless Communications(overseeing a team of 10 editors in wireless networks area), and Editorfor IEEE Transactions on Mobile Computing, IEEE Journal on SelectedAreas in Communications (Cognitive Radio Series), and IEEE WirelessCommunications. He is the Steering Committee Chair of IEEE INFOCOMconference.

Chang-Tien Lu received the MS degree in computerscience from the Georgia Institute of Technologyin 1996 and the PhD degree in computer sciencefrom the University of Minnesota in 2001. He is anassociate professor in the Department of Comput-er Science, Virginia Polytechnic Institute and StateUniversity and is founding director of the SpatialLab. He served as Program Co-Chair of the 18thIEEE International Conference on Tools with Ar-tificial Intelligence in 2006, and General Co-Chairof the 20th IEEE International Conference on Tools

with Artificial Intelligence in 2008 and 17th ACM International Conferenceon Advances in Geographic Information Systems in 2009. He also servedas Vice Chair of the ACM Special Interest Group on Spatial Information(ACM SIGSPATIAL) from 2011 to 2014. His research interests includespatial databases, data mining, geographic information systems, and intelligenttransportation systems.