Security Enhanced Communications in Cognitive Networks · Security Enhanced Communications in...
Transcript of Security Enhanced Communications in Cognitive Networks · Security Enhanced Communications in...
Security Enhanced Communications in Cognitive Networks
Qiben Yan
Dissertation submitted to the Faculty of theVirginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Doctor of Philosophyin
Computer Science and Applications
Wenjing Lou (Chair)Y. Thomas HouIng-Ray ChenDanfeng Yao
Sushil Jajodia
June 23, 2014Falls Church, Virginia
Keywords: cognitive network security, cognitive radio network, reactive jamming attack,network monitoring, botnet detection.
c©Copyright 2014, Qiben Yan
Security Enhanced Communications in Cognitive Networks
Qiben Yan
ABSTRACT
With the advent of ubiquitous computing and Internet of Things (IoT), potentially billionsof devices will create a broad range of data services and applications, which will requirethe communication networks to efficiently manage the increasing complexity. Cognitivenetwork has been envisioned as a new paradigm to address this challenge, which has thecapability of reasoning, planning and learning by incorporating cutting edge technologiesincluding knowledge representation, context awareness, network optimization and machinelearning. Cognitive network spans over the entire communication system including the corenetwork and wireless links across the entire protocol stack. Cognitive Radio Network (CRN)is a part of cognitive network over wireless links, which endeavors to better utilize thespectrum resources. Core network provides a reliable backend infrastructure to the entirecommunication system. However, the CR communication and core network infrastructurehave attracted various security threats, which become increasingly severe in pace with thegrowing complexity and adversity of the modern Internet.
The focus of this dissertation is to exploit the security vulnerabilities of the state-of-the-artcognitive communication systems, and to provide detection, mitigation and protection mech-anisms to allow security enhanced cognitive communications including wireless communica-tions in CRNs and wired communications in core networks. In order to provide secure andreliable communications in CRNs: first, we incorporate security mechanisms into fundamen-tal CRN functions, such as secure spectrum sensing techniques that will ensure trustworthyreporting of spectrum reading. Second, as no security mechanism can completely prevent allpotential threats from entering CRNs, we design a systematic passive monitoring framework,SpecMonitor, based on unsupervised machine learning methods to strategically monitor thenetwork traffic and operations in order to detect abnormal and malicious behaviors. Third,highly capable cognitive radios allow more sophisticated reactive jamming attack, which im-poses a serious threat to CR communications. By exploiting MIMO interference cancellationtechniques, we propose jamming resilient CR communication mechanisms to survive in thepresence of reactive jammers. Finally, we focus on protecting the core network from botnetthreats by applying cognitive technologies to detect network-wide Peer-to-Peer (P2P) bot-nets, which leads to the design of a data-driven botnet detection system, called PeerClean.In all the four research thrusts, we present thorough security analysis, extensive simulationsand testbed evaluations based on real-world implementations. Our results demonstrate thatthe proposed defense mechanisms can effectively and efficiently counteract sophisticated yetpowerful attacks.
To my beloved wife, Luna Le Lu, and my parents Yaqin Chen and Shifu Yan
iii
Acknowledgments
To have reached this point in my life, I have been offered so many guidance and support
from people who have changed my life. I owe a debt of gratitude to them all, and I would
really like to express my deepest appreciations here.
First and formost, I would like to express my sincere gratitude to my advisor Dr. Wenjing
Lou. Her research attitude, sense of responsibility and pursuit of excellent really inspire me
along the past years. Dr. Lou was instrumental in helping me develop my research skills and
presentation skills, providing me with invaluable perspective and encouragement along the
way. I can’t thank her enough for giving me the opportunity to learn from and work with
her, for patiently meeting with me, talking about my ideas, answering my questions and
proofreading my papers. Her logical way of thinking and her keen sense of future technology
have been of great value to me. Dr. Lou not only guided my research in the past few years,
but she also cares for my life and personal growth with great thoughtfulness. I feel fortunate
to find an advisor who has always shown respects for my own interests, and done everything
she could to help make me successful. I am grateful for what she has done for me.
I am also extremely grateful to Dr. Thomas Hou, who gave me invaluable guidance and
advice on my life, research and career. Dr. Hou was working closely with me throughout
my Ph.D. studies. He has been an inspiration to me with his great interest and passion on
research. His detailed-oriented, determination and hardworking style encourage me to follow
iv
my heart, never lose hope and keep pursuing my dream.
I am also thankful to Dr. Ing-Ray Chen, Dr. Danfeng Yao and Dr. Sushil Jajodia for serving
on my dissertation committee. Their insightful questions and comments about my research
greatly contributed to improving this dissertation.
I would like to thank Dr. Ming Li. As a collaborator, he always provided helpful comments
and advices on my research. Dr. Li worked closely with me in the past few years, who
discussed with me about the research ideas and inspired me with his broad knowledge and
keen insights. I am grateful to Dr. Zhenyu Yang and Dr. Ning Cao, two previous colleagues,
who discussed ideas with me in various stages of my research and offered generous help into
my research and life.
I also want to thank Dr. Liang Xiao for her helpful discussions on wireless security and
privacy. I am thankful to Dr. Feng Chen for helping me embrace the machine learning and
artificial intelligence world. Feng gave me inspirations on applying and developing machine
learning mechanisms for security purposes.
I wish to thank my former colleague, Hanfei Zhao, and my labmates in the Complex Networks
and Security Research (CNSR) lab at Northern Virginia Center: Yao Zheng, Ning Zhang,
Bing Wang, Changlai Du, Wenhai Sun for creating an intellectual and enjoyable atmosphere,
and making my Ph.D. a memorable journey. I benefited tremendously from the discussions
and interactions with my labmates. I also want to thank my former and current labmates in
CNSR lab at Blacksburg: Dr. Canming Jiang, Dr. Liguang Xie, Huacheng Zeng, Xu Yuan,
who shared their knowledge on wireless networking and operations research with me. I am
especially indebted to Yao Zheng and Huacheng Zeng, who put great efforts in improving
and implementing our research ideas.
I am greatly indebted to my father Shifu Yan, and my mother Yaqin Chen. They always
v
understand me and support my choices with endless love. They have done so much for
me professionally and personally. They have sacrificed so much to support me along this
wonderful journey. I truly will never be able to repay them with what they have done for
me.
More importantly, I am specially indebted to my wife Luna Lu. In my mind, her life and
her dream are what I am living for. I truly would not be pursuing Ph.D. degree without her
support and understanding. My wife has been, and always will be my best friend. At my
best time, and more importantly, at my worst time, I know that she will always stand by my
side, celebrating my achievements, and giving me love and hope during the frustrating time.
I am fortunate to have a wife who cares for me more than herself. Her inspiring words can
always cheer me up, and her beautiful smile can always calm me down. Although we were
separated during these years by the whole North American continent, she always believed in
me, and encouraged me to work hard and play harder. I would surely not be able to reach
this far without her support and encouragement. I cannot even imagine where I would be
today were it not for the people I love, thank you so much for always believing in me.
vi
Contents
Abstract ii
Dedication iii
Acknowledgments iv
List of Figures xi
List of Tables xiv
1 Introduction 1
1.1 Cognitive Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Security Challenges in Cognitive Networks . . . . . . . . . . . . . . . . . . . 3
1.2.1 Spectrum Sensing Security in CRNs . . . . . . . . . . . . . . . . . . . 4
1.2.2 Network Security in CRNs . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Reliable CR Communications with Adversarial Software Radios . . . 7
1.2.4 Botnet Threats to Cognitive Communications in Core Networks . . . 8
1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Secure Distributed Consensus-based Spectrum Sensing in Cognitive RadioNetworks 14
2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
vii
2.2.1 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Distributed Consensus-based Spectrum Sensing . . . . . . . . . . . . 17
2.3 Vulnerability Analysis of Distributed Consensus-based Spectrum Sensing . . 19
2.3.1 Disruption of Sensing Operation . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 Stealthy Manipulation of Sensing Results . . . . . . . . . . . . . . . . 23
2.4 Protection of Distributed Consensus-based Spectrum Sensing . . . . . . . . . 26
2.4.1 Robust Distributed Outlier Detection with Adaptive Local Threshold 27
2.4.2 Hash-based Computation Verification of Neighbor State Update . . . 30
2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5.1 Impact of Covert Adaptive Data Injection Attacker . . . . . . . . . . 34
2.5.2 Effectiveness of Robust Distributed Outlier Detection with AdaptiveLocal Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.3 Security Analysis of Hash-based Computation Verification Approach . 36
2.5.4 Cost Evaluation of Hash-based Computation Verification Approach . 39
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 Non-Parametric Passive Traffic Monitoring in Cognitive Radio Networks 41
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.1 Monitoring System Model . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.2 Channel Access Model . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 User Channel Access Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Primary User Detection . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.2 Secondary User Channel Access Model . . . . . . . . . . . . . . . . . 50
3.4 Near-Optimal Monitoring Mechanism . . . . . . . . . . . . . . . . . . . . . . 56
3.4.1 Frame-level Quality-of-Monitoring Optimization . . . . . . . . . . . . 58
3.4.2 User-level Quality-of-Monitoring Optimization . . . . . . . . . . . . . 61
3.4.3 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
viii
3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.1 Performance of Secondary User Channel Access Model . . . . . . . . 67
3.5.2 Real-Time Monitoring Performance . . . . . . . . . . . . . . . . . . . 68
3.5.3 Frame Capturing Performance . . . . . . . . . . . . . . . . . . . . . . 70
3.5.4 User Capturing Performance . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 MIMO-based Jamming Resilient OFDM Communications in Wireless Net-works 78
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.2 Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.3 MIMO Interference Cancellation and OFDM Basics . . . . . . . . . . 83
4.3 Impact of Reactive Jamming Attack to MIMO-OFDM Communications . . . 85
4.4 Defense Mechanisms of Reactive Jamming Attack . . . . . . . . . . . . . . . 88
4.4.1 Defense Mechanism Overview . . . . . . . . . . . . . . . . . . . . . . 89
4.4.2 Decoding the Signal of Interest . . . . . . . . . . . . . . . . . . . . . 89
4.4.3 Detecting the Jamming Signal . . . . . . . . . . . . . . . . . . . . . . 94
4.4.4 Enhanced Defense Mechanism . . . . . . . . . . . . . . . . . . . . . . 95
4.4.5 Defending Against Reactive Jamming In a Multi-hop Network . . . . 98
4.4.6 Dealing with Other Types of Jammers . . . . . . . . . . . . . . . . . 99
4.4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.6.1 Impact of Received Signal Direction . . . . . . . . . . . . . . . . . . . 102
4.6.2 Impact of Channel Coherence Time . . . . . . . . . . . . . . . . . . . 103
4.6.3 Jamming Attack and Defense Performance . . . . . . . . . . . . . . . 105
4.6.4 Overhead Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
ix
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 Unveiling Peer-to-Peer Botnets through Dynamic Group Behavior Analy-sis 113
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2 Overview of PeerClean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.1 Flow Statistical Features . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.2 P2P Host Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.3.3 Dynamic Group Behavior Analysis . . . . . . . . . . . . . . . . . . . 124
5.3.4 Training and Classification . . . . . . . . . . . . . . . . . . . . . . . . 134
5.3.5 Refined Bot Identification . . . . . . . . . . . . . . . . . . . . . . . . 135
5.3.6 Putting Them All Together . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.7 Evasion Mechanisms and Limitations . . . . . . . . . . . . . . . . . . 136
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5.2 Clustering P2P Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.5.3 Identifying Bot-included Clusters via Classification . . . . . . . . . . 144
5.5.4 Refined Bot Identification Performance . . . . . . . . . . . . . . . . . 146
5.5.5 Comparisons with Other Detection Approaches . . . . . . . . . . . . 147
5.5.6 System Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6 Conclusion and Future Research 150
6.1 Research Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Bibliography 159
x
List of Figures
1.1 Cognitive network concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 The distributed hash-based verification of neighbor state update. . . . . . . 30
2.2 Performance of covert adaptive data injection attack to distributed consensus-based spectrum sensing protected by traditional detection scheme with detec-tion threshold -56dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Performance of robust distributed outlier detection with adaptive local threshold 36
2.4 Different collusion styles (solid points represent attackers, hollow points rep-resent honest SUs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1 An overview of SpecMonitor system . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Monitoring system architecture for WhiteFi network inside a monitoring area 45
3.3 The percentage of frames in active slots (20 ms slot length) . . . . . . . . . . 45
3.4 Frame/Active slot interarrival time distribution (20 ms sensing slot, 2 mssensing period) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Sliding window method (X axis denotes the interarrival time data, Y axisdenotes the channels) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Normalized objective (with respect to the computed upper bound) for 20channels and 10 sniffers with α=0.3 . . . . . . . . . . . . . . . . . . . . . . . 63
3.7 Normalized objective (with respect to the computed upper bound) for 30channels and 20 sniffers with α=0.3 . . . . . . . . . . . . . . . . . . . . . . . 63
3.8 Normalized objectives for different α . . . . . . . . . . . . . . . . . . . . . . 64
3.9 Number of captured active slots using our algorithm vs. Upper bound for 20channels and 10 sniffers with α=0.3 . . . . . . . . . . . . . . . . . . . . . . . 65
3.10 Performance of secondary user channel access model . . . . . . . . . . . . . . 68
xi
3.11 Performance with different methods using synthetic time series data (5 chan-nels, 3 sniffer antennas) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.12 Performance with varied number of sniffer antennas using different methods(α=0.25, 20 channels) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.13 Performance using real-world traffic (7 channels, 4 sniffer antennas) . . . . . 73
3.14 Average frame capture rate comparison of multi-slot updating and per-slotupdating with 20 channels and 10 sniffer antennas . . . . . . . . . . . . . . . 74
3.15 User capture performance with 20 channels . . . . . . . . . . . . . . . . . . . 76
4.1 Reactive jammer starts jamming after certain reaction time . . . . . . . . . . 81
4.2 1× 2 MIMO-OFDM link attacked by a Jammer . . . . . . . . . . . . . . . . 85
4.3 Different two-dimensional received signal spaces . . . . . . . . . . . . . . . . 88
4.4 Jamming attack performance by approaching the sender’s location (in thisexperiment, the device works on 2.45GHz central frequency with a half wave-length λ
2= c
2f≈ 6.12cm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5 A Flow Chart of Proposed Defense Mechanisms (Solid Box: Modules of theDefense Mechanism, Dashed Box: Modules of Enhanced Defense Mechanism) 90
4.6 Extended frame structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7 Soft error based jamming detection . . . . . . . . . . . . . . . . . . . . . . . 95
4.8 Burst of packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.9 Packet delivery rate performance with different angles between two receivedsignals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.10 Autocorrelation of the channel phase in an indoor environment (tested using500KHz bandwidth communications) . . . . . . . . . . . . . . . . . . . . . . 105
4.11 Testbed. The receiver is placed at A, while the sender and jammer are placedat the selected locations 1 to 9. . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.12 Packet delivery rate with and without jammer in 1× 2 link . . . . . . . . . . 107
4.13 Jamming attack and defense performance . . . . . . . . . . . . . . . . . . . . 108
4.14 PDR performance with different jamming powers . . . . . . . . . . . . . . . 109
4.15 PDR performance with different types of jamming signals . . . . . . . . . . . 110
4.16 Throughput performance using our defense mechanisms . . . . . . . . . . . . 110
xii
5.1 PeerClean system flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Cluster connectivity feature . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.3 (a): The shared neighbor ratio of one Emule host pair compares with that ofone ZeroAccess host pair (b): Group shared neighbor ratio . . . . . . . . . . 127
5.4 (a): Significant connection feature (b): Significant connection feature of Ke-lihos and ZeroAccess bots in 24 hours . . . . . . . . . . . . . . . . . . . . . . 131
5.5 Significant connection volatility . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.6 Box plot of Bot Compactness Ratio . . . . . . . . . . . . . . . . . . . . . . . 143
5.7 Classification performance with different percentages of training data . . . . 146
5.8 Running time of AP clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 148
xiii
List of Tables
2.1 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 Costs of hash-based computation verification scheme at one node for eachiteration (N is the number of neighbors, P is the length of state in bytes, his the length of hash in bytes.) . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1 Summary of symbols and notations . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Average running time with 20 channels and 10 sniffer antennas (in ms) . . . 70
4.1 Default system setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.2 Testbed setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1 Flow size statistical features . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.2 Host access pattern features . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.3 Group-level feature summary . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.4 Traffic summary (‘P2P in campus’ denotes the traffic flows of the campusnetwork after P2P host identification) . . . . . . . . . . . . . . . . . . . . . . 141
5.5 Clustering result using UDP traffic (BSR1, BSR2 denotes the BSRs of Salityand ZeroAccess bots respectively) . . . . . . . . . . . . . . . . . . . . . . . . 142
5.6 Clustering result using TCP traffic (BSR3 denotes the BSR of Kelihos bots) 142
5.7 Classification accuracy when trained on one type of feature. Shared neigh-bor feature and significant connection feature present the best classificationaccuracy. The classifier achieves the best performance when combining al-l the features. Accuracy=(TP+TN)/all; Precision=TP/(TP+FP); Recal-l=TP/(TP+FN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
xiv
5.8 Refined bot identification performance (the percentage in the parenthesis de-notes the bot detection rate and false alarm rate respectively) . . . . . . . . 146
5.9 Performance comparison with method A and method B (threshold: 0.2). . . 148
xv
Chapter 1
Introduction
The area of data communication technologies is one of the fastest changing areas, with nu-
merous services and applications having enormous impact on different aspects of modern
society, including economic growth, inter-human relationship, scientific development, educa-
tion and entertainment. Therefore, the development of a reliable and robust, yet flexible and
extensible communication infrastructure is of utmost importance to facilitate these human-
to-human as well as human-to-machine communications, providing services such as e-health,
e-commerce, e-banking, e-payment, e-learning. Future network will be more complex and
diversified, extending towards ubiquitous computing by involving various types of connected
devices including mobile devices, wearable devices, and smart home appliances. To effective-
ly manage the complexity faced by the future network and to provide optimized and secure
end-to-end data communications, a new networking paradigm, called cognitive network [1–3],
has been recently proposed. In this chapter, we introduce the definition of cognitive net-
works, expose the security challenges in cognitive communications, and present the major
research contributions of this dissertation.
1
2
1.1 Cognitive Networks
Cognitive involves conscious intellectual activity as thinking, reasoning or remembering, and
is based on or capable of being reduced to empirical factual knowledge, according to the
dictionary [4]. A cognitive network is a network which has knowledge representation about
the devices, systems, networks and events, and uses cognitive process/cycle that can perceive
current network conditions, and then plan, decide, act on those conditions. The network
can learn from the consequences of the actions to make future decisions, all while following
end-to-end goals. The knowledge representation and cognitive cycle are two main elements of
cognitive networks. Knowledge representation is a prerequisite for achieving self-awareness.
Knowledge-enpowered network builds up a model, based on which the network can perform
reasoning and updating to configure itself, explain itself and repair itself. The cognitive cycle
allows the cognitive systems to adjust their functions by perceiving the environment, which
is also called cognition cycle. The cognition cycle as described by Mitola [5] contains the
following states: observe, orient, plan, decide, act and learn. Similarly, the cognitive cycle
defined in [3] consists of six states: sense, plan, decide, act, learn, policy. In general, the
cognitive network uses sensors to sense the environment (sense), and plans the strategies
(plan) based on observations and policies (policy). Then, a learning module is employed
to learn from the observations and possible strategies (learn) to aid the decision making
(decide). Finally, the actuators will execute the adequate changes to the cognitive network
(act). Empowered with knowledge representation mechanism and cognitive cycle, cognitive
network becomes self-aware, analytical and adaptive.
Cognitive networks span over the CRNs and core networks, as illustrated in Fig. 1.1. CRNs
are formed by high performance software defined radios (SDRs) [6, 7], who are inherently
programmable and flexible. CRNs take advantage of the flexible capabilities of SDRs to
engage in dynamic spectrum access, who can learn the spectrum availability and radio prop-
3
Figure 1.1: Cognitive network concept
agation environment, and access the spectrum in a correct and cooperative manner. The
cognitive core network utilizes advanced tools from data analytics and machine learning to
ensure the reliability and robustness of data delivery.
1.2 Security Challenges in Cognitive Networks
In this section, we unfold the alarming security challenges in cognitive networks, including
security in CRNs and security in core networks. In cognitive networks, not only the cognitive
functionality of the network has vulnerabilities that can be exploited by the adversaries, but
the network itself suffers from highly capable and adaptive adversaries who may learn from
the environments to adjust their strategies.
The security goals we want to achieve in cognitive networks may include: confidentiality,
integrity, availability, authenticity & trustworthiness [8]. According to [9], the cognitive
network security can be defined as: the provisions and policies adopted by a network ad-
4
ministrator to prevent and monitor unauthorized access, misuse, modification, or denial of
computer network and network-accessible resources, in order to satisfy the security require-
ments. In this dissertation, we attempt to enhance the security of cognitive networks in
terms of achieving four security goals: first, we focus on enforcing sensing report integrity
in CR communications; second, we aim at monitoring unauthorized or malicious network
access, spectrum misuse to enforce the trustworthiness of CR communications; third, we try
to prevent denial of service caused by advanced jamming attacks to enforce the availability of
CR communications; fourth, we strive to achieve confidentiality, authenticity, trustworthiness
and availability of cognitive communications in core networks via detecting and eliminating
infected machines or bots.
Although this dissertation does not address all the security issues in cognitive networks (e.g.
we do not deal with data privacy issue to achieve confidentiality in CRNs), it contributes to
enhancing the security of the current cognitive networks in various aspects. In the following
sections, we list all the security challenges that this dissertation is going to address.
1.2.1 Spectrum Sensing Security in CRNs
Cognitive radio (CR) [10,11] has emerged as a key technology to enabling the use of licensed
spectrum bands from incumbents, also known as primary users (PUs), when they are idle.
An important challenge in CR technology is reliable spectrum sensing [12], by which cognitive
radio devices, also known as secondary users (SUs), detect and exploit a spectrum band when
it is unused, but vacate the channel immediately upon detecting the presence of primary
users. Cooperative spectrum sensing, which exploits the cooperation of multiple SUs and
leverages the spatial diversity among those location-dispersed SUs, has shown significant
advantages in achieving reliable spectrum sensing results [13].
5
Cooperation in spectrum sensing can be achieved in two models: centralized or distributed.
The former uses a common receiver (i.e., fusion center) to collect sensing results from all
SUs and to make final spectrum sensing decision. It relies on a centralized infrastructure
which may be unavailable in ad-hoc CR networks. In contrast, a distributed approach allows
SUs to share individual sensing results with their neighbors, and to make their own sensing
decisions. Therefore, distributed spectrum sensing model is more suitable for CR ad-hoc
networks [14].
Despite the many benefits cooperative spectrum sensing entitles, it is vulnerable to many
potential attacks. Attackers may generate a false primary user signal to launch a primary
user emulation (PUE) attack [15] in order to gain unfair share of the spectrum usage, or they
can manipulate SUs’ sensing reports by various means in order to invert the detection result-
s (i.e., presence or absence of PUs), which is often termed as data falsification attack [16].
Current research in securing cooperative spectrum sensing have been focusing on address-
ing these attacks under the centralized model [17, 18]. Similar security threats exist in the
distributed schemes but are left under addressed thus far. In fact, a distributed scheme is
even more vulnerable to such attacks due to its distributed and cooperative natures. As
an example, recently, a bio-inspired consensus-based distributed spectrum sensing algorithm
has been proposed [19,20]. It is merely based on localized sensing status measuring and ex-
changing, thus is very efficient and scalable. However, due to the distributed and cooperative
natures of the protocol, the impact of malicious behaviors of an attacker, if not defended
properly, would propagate through the whole network [21], causing long-term widespread
impacts. More involved attacks which would undermine the distributed spectrum sensing
mechanisms without being detected are also possible. Thus, we focus on the protection of
distributed spectrum sensing in CR ad hoc networks in Chapter 2.
6
1.2.2 Network Security in CRNs
Malicious network activities, such as false data injection attacks [22], Denial of Service
(DoS) attacks [23–26], spectrum misuse [27, 28] have posed serious threats to CRNs, which
will result in a significant performance degradation of CR communications. As no security
mechanism can completely prevent all potential threats from entering CRNs, it is of utmost
importance to detect and then prevent these potential threats. However, the detection of
these activities in CRNs remains largely untouched in the literature.Passive monitoring has
been used to measure Wi-Fi networks [29–31] using a dedicated set of hardware devices, called
sniffers. It has been shown to complement the wire side monitoring by gathering detailed
PHY/MAC information. Passive monitoring serves as the basis of numerous applications
ranging from network forensics, fault diagnosis to resource management. As the quality of
those applications mainly depends on that of traffic monitoring, it is non-trivial to build
a traffic monitoring framework with excellent monitoring performance. Passive monitoring
is particularly important to CRNs, because: (1) cognitive radios are programmable and
difficult to manage; (2) the interference requirement in CRN is mandatory and extremely
high. In this dissertation, we consider the construction of a passive monitoring framework
for Wi-Fi like CR networks, or “WhiteFi” networks for short.
However, passive monitoring becomes a challenging task in WhiteFi networks. First, WhiteFi
networks have a much wider spectrum (50MHz-698MHz) than traditional wireless networks,
which makes it infeasible to deploy one sniffer for each channel. As a result, the sniffers
have to decide which subsets of channels they will operate on, referred to as sniffer channel
assignment problem. Second, SUs have to vacate the channels immediately once PUs start
transmissions on the corresponding channels. Such inevitable channel switching behavior
potentially complicates the sniffers’ traffic monitoring strategies. Last but not the least,
network traffic on each channel typically comes from multiple SUs, who share the spectrum by
7
following a certain medium access control (MAC) mechanism. Thus, traffic patterns observed
by the sniffers are highly dynamic, further complicating the sniffers monitoring strategies.
To meet these challenges, Chapter 3 elaborates the design of a monitoring framework for
CRNs, SpecMonitor, which utilizes a non-parametric density estimation method to model
SUs’ channel usage pattern.
1.2.3 Reliable CR Communications with Adversarial Software Ra-
dios
Jamming has been a major denial-of-service attack to wireless communications [23, 32]. By
intentionally emitting jamming signals, adversaries can disturb network communications,
resulting in throughput degradation, network partition, or a complete zero connectivity
scenario. Reactive jamming is one of the most effective jamming attacks. A reactive jammer
continuously listens for the activities on the channel, and emits jamming signals whenever
it detects activities, otherwise it stays quiet when the sender is idle. Reactive jamming is
regarded as one of the most effective, stealthy, and energy-efficient jamming strategies [25,33].
The recent advance in the highly programmable software defined radio (SDR) has made
such sophisticated but powerful jamming attacks very realistic – [34,35] demonstrated that a
reactive jammer is readily implementable and the jamming results devastating. Furthermore,
the reactive jamming can be triggered rapidly on any field of the packet, making it a realistic
threat for wireless communications.
On the other hand, orthogonal frequency-division multiplexing (OFDM) has developed into
a popular scheme for broadband wireless communications. Modern wireless communica-
tion systems, such as WLAN, digital TV systems and cellular communication systems, all
adopt OFDM as one of the primary technologies. While OFDM systems are robust against
8
multipath fading and can cope with severe interference and noise, they are not ideal for
environments where adversaries try to intentionally jam communications. The increasingly
severe hostile environments with advanced jamming threats prompt the development of secu-
rity extensions to the OFDM systems. Multi-input multi-output (MIMO) has also emerged
as a key technology for wireless networks. New wireless devices are equipped with a growing
number of antennas. MIMO can be exploited to obtain diversity and spatial multiplexing
gains, and lead to an increase in the network capacity. More importantly, recent advance
in MIMO interference cancellation (IC) technique [36–38] has greatly enhanced MIMO com-
munication capability under multiple concurrent transmissions. This inspires us to ponder:
whether it is possible to exploit IC technique in MIMO to mitigate jamming attacks tar-
geting OFDM systems, in particular, SDR based reactive jamming attacks. In Chapter 4,
we try to answer this question by examining the jammer’s capability in disrupting OFDM-
MIMO communications, and devising MIMO-based defense mechanisms by utilizing MIMO
technology coupled with IC and transmit precoding techniques.
1.2.4 Botnet Threats to Cognitive Communications in Core Net-
works
Botnet has become a major threat to the health of modern core networks. Through large-
scale compromise of end hosts, botmasters can commit organized cyber-crimes, such as
launching distributed denial-of-service (DDoS) attacks to cause unavailable connections,
sending spam and performing click fraud to violate the authenticity and trustworthiness
of communications, or stealing sensitive information to violate confidentiality of involved
entities.
The C&C channel is one of the most essential components of a botnet, through which a
9
botmaster manages the bot army of compromised end hosts. One common type of C&C
infrastructure relies on a central C&C server, which has recently drawn a great deal of at-
tention from security researchers and law enforcement forces. For the attackers’ point of
view, such centralized architecture suffers from a single point of failure problem, because if
the C&C server is identified and taken down, the botmaster will lose control over the whole
botnet. As a response, sophisticated botnet developers attempt to build more advanced and
resilient P2P C&C infrastructure [39]. P2P C&C allows the bots to exchange C&C mes-
sages via their connected peers in a P2P manner. Therefore, despite of numerous takedown
attempts [40], P2P botnets kept reviving. Some notable examples of active P2P botnets
include Sality [41], ZeroAccess [42], and Kelihos [43], which have survived in the wild for a
long time and will likely continue to be alive in the near future [44].
To date, a few solutions have been proposed to detect P2P botnets [45–48]. Host-level mal-
ware detection techniques such as traditional signature-based approaches and more recently
behavior-based approaches [49] have been designed. However, these approaches are subject
to advanced malware obfuscation or polymorphism and require host-side installation, and
thus appear unappealing to the network administrators aiming at uncovering a network-wide
botnet. Alternatively, network-level techniques have been proposed to correlate the traffic
patterns of suspicious bots [46,47,50–53] or collect network communication graphs to identify
P2P bots [45,54]. Some of them apply deep packet inspection (DPI), which is not only com-
putationally expensive, but is also evadable through message encryption. Other approaches
are based upon network flow traffic analysis. For instance, Yen et al. [46] described an algo-
rithm to differentiate P2P file sharing applications with P2P bots based on network traffic
features such as traffic volume, peer churn rate and interstitial time distribution. Recently,
Zhang et al. [47] developed a botnet detection system to extract statistical fingerprints for
every host, and identify the bots based on a set of traffic features such as communication
10
persistency, fingerprint similarity, and shared contacts’ number. However, the traffic features
used in these approaches are not robust enough to identify bots in a dynamic network, as
observed from our experiments. On the other hand, the communication graph-based ap-
proach [45] seems more reliable, but it can only identify structural P2P subgraphs regardless
of whether the subgraphs contain bots. Also, it requires a list of honeypot hosts to bootstrap
its detection algorithm, limiting its practicality.
In Chapter 5, we jointly consider two sets of features: flow traffic statistics and network
connection behaviors, to reveal the presence of P2P bots within a monitored network, such
as a campus network or an ISP network. We introduce PeerClean to utilize the best of
these feature sets via a novel combination of clustering and classification. The bot detection
performance of PeerClean relies on a dynamic group behavior analysis (DGBA) method
which investigates the group-level connection behaviors of the infected machines.
1.3 Research Contributions
This dissertation research uncovers emerging security issues in the main functions of CRNs,
including the potential security vulnerabilities in the spectrum sensing and spectrum oppor-
tunity exploitation [55]. We provide a variety of defense mechanisms to enhance the security
of CRNs. Furthermore, we design jamming resilient communication mechanisms to enable
reliable CR communications in the presence of SDR based reactive jammers. Finally, as
the core networks are increasingly sabotaged by the sophisticated malwares and botnets, we
propose to enhance the self-awareness of cognitive core networks by unveiling P2P botnet-
s through intelligent traffic analysis empowered by dynamic feature analysis and machine
learning. We make the following contributions from this research.
11
1. Distributed Outlier Detection based Spectrum Sensing: we analyzed the vul-
nerabilities of distributed consensus-based spectrum sensing by proposing several novel
attacks. They include naive ones that aim at causing disruption to sensing operations,
and more sophisticated covert adaptive data injection attack, which is capable of ad-
justing attack strategies via learning through perceived environments and stealthily
manipulating the sensing results without being caught by traditional detection schemes.
The latter is the first adaptive attack with learning capability in the area of secure spec-
trum sensing. We also discussed advanced collusion attacks that are hard to defend
against. We presented several protection mechanisms corresponding to the various at-
tacks we have identified. In particular, we proposed a novel robust distributed outlier
detection algorithm with adaptive local threshold to defend against covert adaptive
data injection attack, and a hash-based computation verification scheme to defend
against colluding attackers. Through extensive simulation and analysis, we showed
the severe impacts of covert adaptive attacks to distributed spectrum sensing. We
also presented the effectiveness of our detection mechanisms under various detection
parameters, network topologies and sensing data variances. Moreover, the costs of
proposed protection mechanisms are shown to be low. This part of the dissertation is
published in part in [56].
2. Non-Parametric Prediction based Traffic Monitoring: we designed a gener-
al framework to monitor the WhiteFi networks, which jointly considers the channel
availability and secondary user access pattern. In particular, we designed an online
non-parametric density estimation mechanism to model the secondary user channel ac-
tivity, which is able to support dynamic and complex access patterns. We formulated
the sniffer channel assignment problems as integer programming problems by incor-
porating the channel switching costs with the QoM objective, for which we provided
12
algorithms to maximize two different levels of QoMs respectively. We conducted ex-
tensive simulations and experiments to validate our statistical model and monitoring
framework. This part of the dissertation is published in part in [57].
3. Interference Cancellation based Jamming Resilient Communication: we ex-
ploited the MIMO IC and transmit precoding techniques to counteract reactive jam-
ming attacks for securing OFDM wireless communications. We proposed two novel
mechanisms: iterative channel tracking and sender signal enhancement (SSE) to effec-
tively sustain acceptable throughput under reactive jamming attack. We implemented
the jamming attack and defense mechanisms using USRP radios, and conducted ex-
tensive experiments to evaluate the performance in terms of packet delivery rate and
throughput. The experimental results showed that in the presence of various types of
reactive jammers with different power levels, the packet delivery rate improves signif-
icantly using our defense mechanisms with IC and SSE. This part of the dissertation
is published in part in [58].
4. Group Behavior Analysis based Botnet Detection: we proposed a novel botnet
detection framework, PeerClean, using the high-level features extracted from network
flow data based on the flow-level traffic statistics and dynamic network connection
patterns. It explores the best of these different features with a novel combination
of unsupervised (clustering) and supervised (classification) machine learning methods.
We designed a dynamic group behavior analysis method to automatically extract the
collective connection features from P2P host clusters. We showed through experiments
that the extracted group features are robust, reliable and effective in identifying various
types of P2P botnets. Moreover, a prototype system is designed to evaluate the system
using network traces from various real-world botnet families (including Sality, Kelihos
and ZeroAccess), as well as background traces from a large campus network. We
13
demonstrated through experiments that PeerClean can identify different types of bots
with up to 95.8% accuracy, and negligible false positive rate.
1.4 Organization
This dissertation is organized as follows. Chapter 2 presents the vulnerability analysis and
protection mechanisms for securing distributed consensus-based spectrum sensing in CRNs.
In Chapter 3, we describe the design and performance evaluation of a passive monitoring
framework for CRNs, with the objective of maximizing quality of monitoring. In Chapter 4,
we investigate the impacts of reactive jammer to wireless OFDM communications and explore
the use of MIMO technology for enabling jamming resilient communications. Chapter 5
depicts a system design based on machine learning methods to detect and track down P2P
botnets, who recently impose serious threats to the core networks but are extremely hard
to track down. Finally, Chapter 6 summarizes the research achievements, concludes this
dissertation and points out several future research directions.
Chapter 2
Secure Distributed Consensus-basedSpectrum Sensing in Cognitive RadioNetworks
Cooperative spectrum sensing is key to the success of CRNs. Recently, fully distributed co-
operative spectrum sensing has been proposed for its high performance benefits particularly
in cognitive radio ad hoc networks. However, the cooperative and fully distributed natures
of such protocol make it highly vulnerable to malicious attacks, and make the defense very
difficult. In this chapter, we analyze the vulnerabilities of distributed sensing architecture
based on a representative distributed consensus-based spectrum sensing algorithm. We find
that such distributed algorithm is particularly vulnerable to a novel form of attack called
covert adaptive data injection attack. The vulnerabilities are even magnified under multiple
colluding attackers. We further propose effective protection mechanisms, which include a ro-
bust distributed outlier detection scheme with adaptive local threshold to thwart the covert
adaptive data injection attack, and a hash-based computation verification approach to cope
with collusion attacks. Through simulation and analysis, we demonstrate the destructive
power of the attacks, and validate the efficacy and efficiency of our proposed protection
mechanisms.
14
15
2.1 Related Work
The existing work on secure cooperative spectrum sensing mainly focused on centralized
secondary network model. The vulnerabilities in the centralized model lead to two types of
attacks. The spectrum sensing data falsification attack is considered in [15–17,59–70]. Li et
al. [59] proposed dependent attack to deviate the OR rule at fusion center, in which case the
attackers know the reports from other secondary users. In [60], Fatemieh et al. presented
omniscient attack to stealthily manipulate the average power, by which coordinated attackers
have the measurements of the whole network. However, these attackers are incapable of
adapting their attack strategies for each sensing period. In contrast, our proposed attacks
have the following differences: 1) our attacks employ adaptive attack strategies with learning
capability, by which the attackers can adjust their strategies according to their perceived
local environments and sensing algorithm; 2) we consider distributed model, with only local
information available; 3) our attacks focus on consensus-based spectrum sensing algorithm
[19] to disrupt the consensus operation or covertly deviate the sensing result. Another
existing attack termed as primary user emulation attack is proposed in [15, 59, 71–74], in
which an attacker may mimic a primary user’s signal to evict secondary users, which is
complementary to our attacks.
For defense mechanisms, outlier detection is widely used, either by statistics-based meth-
ods [16] or signal propagation-based methods [17, 60]. Our outlier detection mechanism
depends on signal propagation model for classifying original measurements from differen-
t SUs. However, our approach further introduces an adaptive detection algorithm with
varying parameters, which differs from all existing work based on fixed defense strategies.
Furthermore, oppose to current detection mechanisms, ours will ensure the correctness of
consensus operations by integrating hash-based computation verification at each node that is
able to testify the integrity of data involved in its neighbors’ computations. The verification
16
process is different from existing hash-based scheme [75] that only attests the data from the
node itself.
2.2 System Model
In this section, we describe the network model considered in this work, and we briefly review
the distributed consensus-based spectrum sensing algorithm.
2.2.1 Network Model
We consider a CR network where PUs and SUs coexist. PUs are located far away from
the secondary network, but usually have high transmission power, so we assume the whole
secondary network is within the PUs’ transmission range. Different PUs working under
the same spectrum are separated far away enough to reduce interference. As a result, in
one secondary network, the entire spectrum can be regarded as multiple disjoint primary
channels and each SU needs to sense all of them. Without loss of generality, we consider the
scenario that all the SUs in a secondary network are detecting the incumbent existence in
one primary channel.
SUs form a CRAHN. We assume the collective coverage range of the SUs that form the
CRAHN is small compared with the coverage range of a PU, while the distance between two
SUs is long enough for spatial diversity exploitation. We assume the primary and secondary
network topologies remain unchanged during one sensing period, which starts from each SU
measuring its local received PU power level and ends upon achieving a unanimous decision
by all SUs. We also assume the communication links between SUs employ some reliable
communication protocol so the communications are error-free.
17
We adopt energy detection spectrum sensing method. The sensing output of each SU ni is
the received power of the PU, Pi, which can be expressed by following the signal propagation
model [76] as follows:
Pi = P0 − (10αlog10(di/d0) + Si +Mi)(dB), (2.1)
where P0 is the transmit power of PU, α is the path-loss exponent, d0 is the reference
distance (in this work, d0 = 1m), di denotes the distance from SU to the measured PU,
Si represents power loss due to shadowing fading, and Mi represents the multi-path fading
effect. We adopt the widely used log-normal shadowing model [76], by which Si is modeled as
a Gaussian random variable with Si ∼ N(0, σ2). We consider Mi as negligible. Therefore, Pi
can be modeled as a Gaussian distribution Pi ∼ N(µi, σ2), in which µi = P0 − 10αlog10(di).
For simplicity, we assume σ is distance-independent to PU, so that each SU experiences i.i.d.
Gaussian shadowing and fading.
2.2.2 Distributed Consensus-based Spectrum Sensing
In a nutshell, a distributed consensus-based spectrum sensing algorithm works as follows.
It starts a sensing period by each SU taking the local measurement of the received PU’s
signal using an energy detection mechanism. The SUs then exchange their local sensing
measurements with their direct neighbors. Each SU, upon receiving the updates from all
its neighbors, updates its sensing state following a state update algorithm; if the differences
among these new sensing states are above a certain threshold, an update is necessary and the
SU will send a state update message to all its neighbors. This process continues iteratively
till no more update is triggered and the sensing state at every node in the network has
reached the consensus. In what follows, we briefly review a distributed consensus-based
18
spectrum sensing algorithm focusing on the state update algorithm and distributed state
update protocol.
The secondary network can be modeled as an undirected graph, denoted by a pair (N , E),
where N = {n1, n2, . . . , nm} denoting a set of secondary nodes, E ∈ N 2 denoting a set of
undirected edges. We will use secondary nodes and SUs interchangeably in the following
sections. The performance of consensus algorithm is associated with the connectivity of
secondary network, which can be represented by an adjacency matrix of the network graph.
The consensus-based spectrum sensing algorithm can be expressed using a discrete-time state
equation:
xi(k + 1) = xi(k) + ε∑j∈Ni
(xj(k)− xi(k)), (2.2)
where the initial state xi(0) is the original sensing measurement of node ni, and xi(k) is the
updated state at time step k, Ni = {j|(j, i) ∈ E} ∈ N denotes the neighbor set of node
ni, and ε is a consensus parameter. State update occurs at discrete time k = 0, 1, 2, . . . for
each node locally. With some constraints on network connectivity and parameter ε [77], the
final average consensus result α = [∑m
i=1 xi(0)]/m is asymptotically reached for all nodes.
The final sensing decision at each node is made by comparing the consensus result with a
primary detection threshold γ as follows:
α =
[ m∑i=1
xi(0)
]/m
H1
RH0
γ, (2.3)
where H1 and H0 denote the hypotheses corresponding to the presence or the absence of
PU. The primary detection threshold γ is determined by performance requirements. For
instance, if we want to keep the primary miss detection rate PMD = P (α < γ|H1) below a
threshold, while minimizing the primary false alarm rate PFA = P (α > γ|H0), the threshold
19
γ is given as follows [78]:
γ =σ√mQ−1(1− PMD) + µα, (2.4)
where α ∼ N(µα, σ2/m), in which µα = (
∑mi=1 µi)/m, Q−1(.) is the inverse of well-known
Q-function.
Compared to the centralized cooperative spectrum sensing schemes, distributed consensus
protocols are fully distributed, scalable, and with exponential convergence rate. It well suits
the CRAHN.
2.3 Vulnerability Analysis of Distributed Consensus-
based Spectrum Sensing
Although the distributed nature of the consensus-based protocols entitles such protocols sig-
nificant performance benefits, it also exposes such protocols to a number of security threats.
In this section, we identify and evaluate the vulnerabilities of consensus-based protocols
under various potential attacks. We consider both passive and active attackers. We also
consider both insider and outsider attackers. An outsider attacker is one who may intercept
other nodes’ states, inject false states, perform replay attack, camouflage other honest nodes
with their captured identities, etc., but who does not possess valid security keying material.
An insider attacker is a compromised SU, who has knowledge of all the keying material
stored in the SU node if any, capable of manipulating its sensing measurements/states, and
then disseminating the spoofed information, etc. Note that another type of attackers is
faulty nodes who may output measurements/states with a large deviation due to hardware
or software failure. We do not differentiate intentional attackers from faulty nodes as the
consequence caused is the same.
20
To facilitate our analysis, we classify the attacks into two categories based on the intended
objectives and consequences: disruption of sensing operation and stealthy manipulation of
sensing results. Attackers in the former category have limited information and capabilities,
and typically launch arbitrary attacks with the objective to disrupt the sensing operation.
In contrast, attackers in the latter category are more capable. To avoid being detected, they
can adapt their attack strategies to the perceived environment, and can collude with each
other.
Note that, we do not consider sybil attack in which a node fabricates multiple identities, or
radio-jamming attack. Those are considered outside the scope of this work.
2.3.1 Disruption of Sensing Operation
We identify two types of attacks that can lead to disruption of sensing operation. The attacks
in this category may come from insider attackers, outsider attackers or faulty nodes. We
also analyze their harmful impacts on consensus-based spectrum sensing algorithm.
Blocking attack
Blocking attack refers to unexpected cease of information transmission from a SU. This is
the weakest attack in the sense that the only induced damage is the isolation of the SU
and possible partition of the network graph. Intuitively, if the network graph is divided into
several subgraphs, the consensus can only be reached for each isolated subgraph. Its impacts
are stated in the following theorem:
Theorem 1. let A ∈ Mn×n be the adjacency matrix of a secondary network. After blocking
several secondary users by the attackers, if the adjacency matrix of the remaining network
A ∈Mn×n satisfies
21
(I + A)n−1 > 0,
the attackers achieve no more benefit than defeating several secondary users. Otherwise, the
whole secondary network is partitioned so that a global decision can never be attained.
Proof. The well known Perron-Frobenius theory [79] states that if a non-negative matrix
A ∈ Mn×n is irreducible, the graph based on adjacency matrix A has full connectivity.
Furthermore, based on the theorem about irreducible matrix [79], we learn that if (I +
A)n−1 > 0, the matrix A is irreducible. Thus, if (1) satisfies, the remaining graph is still
connected after eliminating the abnormal SUs, in which case the attackers gain no advantage
except stalling these SUs. Otherwise, the remaining graph is disconnected.
Arbitrary False Data Injection Attack
Here, the attacker injects forged data during each consensus iteration. Two forms of such
attack, differing in their ways of data injection, are presented as follows.
• Constant data injection: the attacker ignores the state update algorithm and keeps
transmitting a constant value in each state update message. We have the following theo-
rems to illustrate the impacts of single attacker scenario and multiple attackers’ scenario
separately:
Theorem 2. In a connected undirected graph with one arbitrary false data injection attacker
sending constant data, a consensus can be asymptotically reached which equals to the constant
value injected by the attacker, for any set of initial states.
Proof. Assume node nt is the attacker, we can write the consensus algorithm at this malicious
node as follows:
22
xt(k + 1) = xt(k). (2.5)
Thus, the graph around the attacker becomes a directed graph with zero input-degree and
multiple output-degree. With multiple output-degree, the attacker injects constants to its
neighbors. Therefore, we can consider the manipulated graph has a spanning tree1 rooted
from the attacker. Notice that the attacker converts the Perron matrix P into P with the
t-th row becoming an all-zero vector except the only ‘1’ at the t-th entry. However, P is
still a stochastic matrix with positive diagonal entries. As proven in [80], we know that if
the matrix has a spanning tree and positive diagonal entries, the discrete-time consensus
algorithm with Eq. (2.2) and Eq. (2.5) achieves consensus asymptotically. Unfortunately,
the whole network will converge to a static value, which is the constant value injected by
the attacker.
Theorem 3. In a connected undirected graph with more than one arbitrary false data injec-
tion attackers sending constant data at different values, a consensus cannot be reached for
the whole network.
Proof. The graph with more than one attackers will have no spanning tree, because more
than one node are disseminating different information with zero input-degree. As proven
in [80], if the directed graph has no spanning tree, the discrete-time consensus algorithm will
not converge.
Intuitively, the consensus algorithm with a single attacker will take a much longer time to
converge than the normal case, because the information flow is sourced from a single node,
while in the normal case information are more evenly distributed across the whole network
1A spanning tree of a directed graph is a directed tree formed by graph edges that connect all the nodesof the graph.
23
and all the nodes cooperatively propagate the information flow. Therefore, such attack also
delays the consensus reaching time.
• Random data injection: the attacker injects random values into its neighborhood at
each iteration. The impacts of random data injection attack are hard to analyze, but we
can conclude that unstopped random value injection will disrupt the consensus algorithm by
causing network divergence in most cases or converging to an arbitrary value.
2.3.2 Stealthy Manipulation of Sensing Results
The goal of stealthy manipulation of sensing results is to reverse the consensus result of a
PU’s status, either from absence to presence, i.e. exploitation objective, or from presence
to absence, i.e. vandalism objective [60]. With the exploitation objective, attackers can
evacuate other SUs to obtain exclusive usage of the available spectrum, while with vandalism
objective, attackers will cause severe interference to the primary network. We propose two
novel attacks: (1) covert adaptive data injection attack that achieves stealthy manipulation
of sensing results with independent attacker(s), and (2) covert adaptive data injection attack
with node collusion.
Covert Adaptive Data Injection Attack
This attack has two major features: “adaptive” means the attacker can adapt its strategy
based on neighbors’ state update information, with prior knowledge about the detection
algorithm; “covert” reflects the attacker’s goal of covertly manipulating the sensing results,
without being detected by the detection mechanism. Outsider attackers can be effectively
expelled from the network with an authentication mechanism. In this work, we focus on
insider attackers that reside in legitimate nodes.
24
Outlier detection algorithms are commonly used to defend against insider attackers perform-
ing data injection attacks. Generally, the current outlier detection algorithms all rely on a
static attack detection threshold2 λ to classify honest SUs and attackers. Suppose we di-
rectly apply these detection algorithms into the distributed system, i.e. at each iteration, a
detection algorithm automatically expels the abnormal SUs. However, according to Kerck-
hoffs’s principle, if the attacker knows the detection algorithm and detection threshold λ, a
covert adaptive data injection attacker defined below is capable of bypassing the traditional
detection algorithms.
Attack strategy. (1) At each iteration, the attacker first collects all its neighbors’ states
normally.
(2) The attacker then computes a maximal acceptable deviated state based on all its neigh-
bors’ submitted states and the threshold in the detection algorithm. The difference between
the deviated state and genuine state indicates the attack strength a(k), where k ∈ [0, kstop]
is the attack time.
(3) Finally, it injects the forged state into its neighborhood.
To illustrate the attacker strategy of computing a maximal acceptable deviated state, we give
an example of a simple detection scheme with a threshold λd, by which one node na is flagged
as attacker whenever any of its neighbors detect its abnormality. Assume the attacker has
vandalism objective with its neighbors’ states as {st1, ..., st|Na|}, then the maximal acceptable
deviated state can be { maxi=1→|Na|
(sti)− λd} to avoid being detected.
In practice, if the attacker is unable to collect all its neighbors’ states before sending its own
states, at current iteration, it replaces its previous state with a maximal acceptable deviated
2This threshold is for the purpose of detecting the presence of attacks, which is different from primarydetection threshold γ mentioned before. In the later section, unless otherwise noted, the detection thresholddenotes attack detection threshold.
25
state calculated by a collection of neighbors’ previous states. And then, it updates its current
state with the spoofed previous state and sends it to the neighbors. In this case, the attack
strength enforced at the current state will directly affect the next state of the network.
For an attacker, the knowledge of attack stop time kstop is crucial for launching a successful
attack. If kstop → ∞, the consensus protocol will not converge. The following inequalities
show the basic principle in terms of the amount of changes an attacker has to inject in order
to fulfill exploitation objective and vandalism objective respectively:
x+
∑kstopi=0 a(i)
m> γ, a(i) ≥ 0, exploitation objective, (2.6)
x+
∑kstopi=0 a(i)
m< γ, a(i) ≤ 0, vandalism objective, (2.7)
where x is the average value of the original measurements of the whole network, which is also
the legitimate consensus result, and m is the number of nodes in the network. Because the
consensus algorithm has invariant average quantity [77], the impact of each attack strength
a(i) can be quantified as augmenting the final consensus result by a(i)/m. However, the
attackers have no way of knowing x without the global knowledge. Therefore, we propose
an iterative stop strategy. Let xmin(k) and xmax(k) be the minimal and maximal state
from neighbors of the attacker at k-th iteration, the attacker injects forged states only when
xmin(k) < γ or xmax(k) > γ for exploitation or vandalism objectives respectively. Otherwise,
the attacker follows the consensus protocol. The proposed stop strategy guarantees the
attacker’s neighborhood to achieve the objective first, whose deviated states will then spread
through the whole network for reversing the consensus result.
Multiple attackers with the covert adaptive attack strategy can jointly set their attack
26
strengths, so that the protocol converges faster to their desired consensus objectives. Covert
adaptive data injection attack is effective in evading traditional outlier detection. In section
2.4.1, we will present a novel detection mechanism to invalidate such attack.
Covert Adaptive Data Injection Attack with Node Collusion
The covert adaptive data injection attack becomes even more powerful and harder to defend
against when nodes start to collude. Not only can such attack obtain a faster convergence
rate to their desired objectives, but it can also evade the computation verification scheme
proposed in section V-B. Both the insider attacker and outsider attacker can perform such
collusion attack.
When we involve the protection mechanism with computation verification to check the legit-
imacy of consensus operation, collusion attackers will avoid being caught by sending forged
verification to cover up each other’s false data. Moreover, stronger collusion attackers are
capable of employing more malicious neighbors for false validations. In section 2.4.2, we
will present a hash-based computation verification mechanism to defend against collusion
attacks.
2.4 Protection of Distributed Consensus-based Spec-
trum Sensing
The vulnerabilities of distributed consensus-based spectrum sensing algorithm demand for
effective protection mechanisms. In this section, we first present a robust distributed outlier
detection mechanism with adaptive local threshold to counter covert adaptive data injection
attack. Then, we put forward a hash-based computation verification mechanism that ensures
27
the correctness of a neighbor’s state update process to thwart collusion attacks by common
neighbor cross-validation.
2.4.1 Robust Distributed Outlier Detection with Adaptive Local
Threshold
The goal of this protection mechanism is to detect and eliminate the abnormal states injected
by attackers. As described in section 2.3.2, the traditional outlier detection mechanisms
used in the existing spectrum sensing rely on a fixed and known global detection threshold,
which enables an attacker to have strengthened attacking capabilities throughout the whole
consensus process.
According to the consensus algorithm, the maximum state of the network is monotonically
decreasing, while the minimum state of the network is monotonically increasing until reaching
consensus [81]. The updated states of each node are bounded by the maximum and minimum
states, which gradually converge on the same value, while the differences among updated
states of various SUs are bounded by the differences between the maximum and minimum
states, which gradually diminish until reaching zero.3 So the main idea of our detection
algorithm is to use localized detection threshold at each node, and adapt the threshold
with the diminishing behavior of state differences. The benefits of adaptive local threshold
are twofold: (1) it becomes more difficult for attackers to guess all the instant detection
thresholds of its neighbors without two-hop information; (2) the detection thresholds drop
with the shrinking of variances among different states. Especially at the final consensus
state, the detection thresholds reach zero which gives zero-tolerance to the attackers.4
3Although the differences between the updated states of any two SUs are not monotonically decreasing,the diminishing trends are guaranteed.
4Even if the attacker has global knowledge to guess all the instant detection thresholds, and compute adeviated state to bypass the detection at each iteration, the influence to the final results will be limited due
28
To illustrate the detection mechanism, we assume a common communication range for each
SU dcr. Consider to compute the threshold at honest SU nh in its neighborhood, we have its
two honest neighbors ni and nj with distances di and dj from the PU with dj = di + ∆dij,
0 < ∆dij ≤ 2dcr. We use the method in [17] to find a detection threshold λh0 for SU nh
at starting time, such that with high probability (e.g. 0.99), xi(0) − xj(0) ≤ λh0 satisfies.
According to Eq. (2.1), the distribution of difference is as follows:
xi(0)− xj(0) = N(10αlog10di + ∆dij
di, 2σ2). (2.8)
For a fixed di, ∆dij = 2dcr will maximize the mean of the distribution. We assume a
large attenuation factor α to reduce the influence from uncertainty of α. Then, di is
estimated through a robust statistic estimation. Median estimate is used in [17], while
biweight estimate [82] is another good candidate, which has a higher efficiency in terms of the
variance of estimation. To trade off the overhead and performance, we use median and bi-
weight estimation comparatively to determine the estimation of xi(0): xest =median(xk(0))
or biweight(xk(0)) for k ∈ Nh. Thus, di ≈ dest = 10P0−xest
10α . Now we have the following
distribution of difference:
xi(0)− xj(0) = N(10αlog10dest + 2dcr
dest, 2σ2). (2.9)
Assume Pr(xi(0)−xj(0) < λh0) > 1−µ, where µ is detection parameter (typically µ = 0.01),
we can calculate λh0 using Eq. (2.9) as λh0 =√
2Q−1(µ) + 10αlog10(dest+2dcrdest
).
Up to now, we obtain the detection threshold at starting time. Then the updating thresh-
old of node nh at k-th iteration is denoted as λhk. From the above deduction, we notice
to the shrinking thresholds.
29
the implicit meaning of detection threshold is the maximal acceptable difference between
two honest SUs in the neighborhood. Then in order to adapt the detection threshold ac-
cording to the shrinking difference, we calculate the robust statistic estimate for differences,
estdifhk =median(|xj1(k) − xj2(k)|) or biweight(|xj1(k) − xj2(k)|), where nj1, nj2 ∈ Nh.
Therefore, we propose the following updating algorithm:
λh(k+1) =estdifh(k+1)
estdifhkλhk. (2.10)
As estdifhkk→kfinal−−−−−−→ 0, we can guarantee λhk
k→kfinal−−−−−−→ 0, finally revealing zero-tolerance to
attackers. To prevent attackers from forging an alarm to exclude legitimate nodes, we adopt
majority rule to dispute any suspicious attacker. The whole protocol is described as follows:
• Every node computes its detection threshold at each iteration according to Eqs. (2.9) and
(2.10), and then identifies suspicious attackers in its neighborhood.
• Once a node discovers a suspicious attacker, it broadcasts a primitive alarm to its neighbors
which will not be forwarded.
• Assume the number of common neighbors between a node and the suspicious attacker is
B. If the node collects no less than dB/2e primitive alarms from the common neighbors, it
will broadcast a confirmed alarm and forward it to the remaining network.
• Finally, once the presence of covert adaptive data injection attackers is disclosed, it is
straightforward to handle them or eliminate their impacts.
30
Assume: every SU in the network owns a unique ID and shares a unique key with every neighbor;every pair of neighbor nodes has at least one common neighbor; there exists a secure neighborhooddiscovery mechanism, with which each node can obtain two-hop neighborhood information. Inthe k-th iteration (k > 0) (when k = 0, every node first submits its measurement to its neighborsusing authenticated broadcast.)
• Update Commit: (First round) after one node collects all its neighbors’ submissions, it com-putes and disseminates an updated submission using authenticated broadcast containinga hash commitment of its inputs together with its own data. Therefore, node nv receivesa collection of updated submissions from its neighbors.
• Distributed Verification: (Second round) every node disseminates all its neighbors’ datacollected at the beginning of first round using authenticated broadcast, so node nv receivesa collection of data from the two-hop neighbors. Then node nv performs the followingverification: (1) it checks the IDs in the collection are consistent with its stored neighborIDs; (2) it checks whether its own data and the common neighbors’ data are incorporatedin each updated state by recomputing each updated state and hash commitment; (3)whenever one of the verification for node np fails, multipleMACKvi
(ERR, IDp) will spreadthrough the whole network to stop the state update process, where ERR is a uniquemessage identifier, Kvi is the shared key between nv and its neighbor ni.
Figure 2.1: The distributed hash-based verification of neighbor state update.
2.4.2 Hash-based Computation Verification of Neighbor State Up-
date
The above outlier detection mechanism detects abnormal node measurements/states in the
statistical sense. It is only effective when the majority of nodes in a neighborhood are honest.
When malicious colluding nodes exist, the statistical outlier detection methods become less
effective. In order to defend against collusion attacks, each node must ensure: (1) the
authenticity and integrity of the updated states sent by neighbor nodes; (2) the state update
algorithm has been followed truthfully at a neighbor node nv, i.e. it has correctly executed
the update algorithm using all nv’s neighbors’ data.
To realize the above additional goals, we propose a hash-based approach with common
neighbor cross-validation to counter collusion attacks with honest common neighbors and
provide computation verification. To provide sensing data legitimacy check, data authen-
ticity/integrity and computation verification simultaneously, we can combine our outlier
31
detection with the hash-based verification mechanism. Next we will focus on the hash-based
verification mechanism.
We assume each node has a unique identifier and shares a unique secret symmetric key with
every neighbor. In addition, every node uses authenticated broadcast such as µTESLA [83]
to send messages to its neighbors to enforce message authenticity and integrity. In the fully
distributed CRAHN, every node should keep a unique one-way key chain and send the key
chain commitments to every neighbor.
The main goal of this approach is to ensure each neighbor node perform trustworthy state
updates. We adapt the idea of common neighbor cross-validation in traditional secure data
aggregation techniques [75] to counter collusion attacks. In our scheme, each SU is responsi-
ble for checking not only its own contributions, but also the common neighbors’ contributions
incorporated into the updated states. The details of our proposed scheme are illustrated in
Fig. 2.1.
We assume there exists a secure neighbor discovery mechanism [84], by which each node can
securely discover its two hop neighbors during the network initialization process. Each iter-
ation contains two phases: update commit and distributed verification. We give an example
to illustrate how the scheme works. In the k-th iteration, the initial submission of node nh
has the format: 〈k, IDh, value〉, where value is the power measurement. In the update com-
mit phase, node nh collects the following data from its neighbors: d(k−1)1 , d
(k−1)2 , . . . , d
(k−1)q ,
and its updated submission s(k)h has the format:
〈k, IDh, state(k),
H(k‖IDh‖state(k)‖ID1‖d(k−1)1 ‖ID2‖d(k−1)
2 ‖ . . .
32
Table 2.1: Simulation parameters
Parameter Value DescriptionN 50 Number of secondary usersRs 1km Length and Width of secondary networkdsp 5km Distance between primary user and center of
secondary networkP0 66dBm Transmission power in dBmdcr 300m Communication range of secondary userα 3 Path-loss exponentσ 3dB Standard variance for fading and shadowingN0 -80dB Noise powerε 0.05 Consensus parameter
|IDq‖d(k−1)q ‖)〉, k > 0.
The hash digest in each submission is called hash commitment used for computation in-
tegrity check. In the distributed verification phase, nh disseminates its neighbors’ data
〈ID1, d(k−1)1 , ID2, d
(k−1)2 , . . . , IDq, d
(k−1)q 〉 using authenticated broadcast. Each node in its
neighborhood can then recompute the updated states based on Eq. (2) and regenerate the
hash commitments, and compare the updated states and hash commitments with the received
ones to verify the computation. This approach enables each honest node to check whether
its neighbor nodes have performed the consensus-based state update algorithm correctly. In
addition, as long as each pair of colluding attackers shares one honest common neighbor,
this scheme can also detect colluding attacks. We provide a detailed security analysis in
Section 2.5.3.
33
0 10 20 30 40 50 60 70 80 90 100−50
−49
−48
−47
−46
−45
−44
−43
−42
−41
−40
Iteration Step
Nod
e S
tate
s
node 1node 2node 3node 4node 5node 6node 7node 8node 9node 10
(a) No attacker case
0 50 100 150 200 250 300−70
−65
−60
−55
−50
−45
−40
−35
Iteration Step
Nod
e S
tate
s
node 1(Attacker)node 2node 3node 4node 5node 6node 7node 8node 9node 10
(b) One covert adaptive attacker case
0 50 100 150 200 250 300−80
−75
−70
−65
−60
−55
−50
−45
−40
Iteration Step
Nod
e S
tate
s
node 1(Attacker)node 2(Attacker)node 3(Attacker)node 4(Attacker)node 5(Attacker)node 6node 7node 8node 9node 10
(c) Ten covert adaptive attackers case
0 2 4 6 8 10 12 14 16 18 200
50
100
150
Number of Attackers
Atta
ck S
top
Tim
e &
C
onve
rgen
ce T
ime
Attack Stop TimeConvergence Time
(d) Impact of attacker population
Figure 2.2: Performance of covert adaptive data injection attack to distributed consensus-based spectrum sensing protected by traditional detection scheme with detection threshold-56dB.
2.5 Evaluation
In this section, we first evaluate the vulnerabilities of distributed consensus-based spectrum
sensing. We then demonstrate the effectiveness and present the security analysis of our
proposed protection mechanisms, followed by a numerical analysis on their efficiency. Table
3.2 lists the system parameters used in our simulation with MATLAB. The performance
results are averaged over 1000 simulation runs.
34
2.5.1 Impact of Covert Adaptive Data Injection Attacker
Fig. 2.2(a)-Fig. 2.2(d) show how vulnerable the consensus-based spectrum sensing with tradi-
tional outlier detection scheme [17] is under covert adaptive data injection attacks. Fig. 2.2(a)
depicts the normal behavior in terms of protocol convergence without attackers. In less than
10 iterations, the difference among all the nodes becomes less than 1dB, which means a
consensus has been reached. Fig. 2.2(b) shows the effect of a single attacker launching the
attack. It stealthily deviates its states in order to subvert the consensus results for vandalism
objective. In around 60 iterations, the nodes’ states in the attacker’s neighborhood has been
dragged lower than the threshold so the attacker temporarily stops injecting false states and
starts to follow the algorithm properly. After that, the attacker repeatedly enforce attack
strength for several iterations whenever it finds the maximal state in its neighborhood stays
higher than γ, until the consensus of the whole network. At around the 100th iteration,
the network reaches a consensus but it is a wrong one (i.e., the opposite one). When there
are multiple attackers working for the same objectives, the consensus can be reached much
faster as shown in Fig. 2.2(c). Finally, Fig. 2.2(d) demonstrates the connection between the
attack stop time and the convergence time5 with respect to the attacker population. We
observe that with the growing of attacker population, both times will decrease gradually,
indicating an increasing attack power. In general, we observe that the proposed attack can
be very effective in bypassing the traditional outlier detection mechanism and manipulating
the final spectrum sensing result.
5The convergence time is defined as the number of iterations before the network-wide consensus is reached.Reaching a consensus means the difference among all the node states falls below 0.5dB.
35
2.5.2 Effectiveness of Robust Distributed Outlier Detection with
Adaptive Local Threshold
We evaluate our proposed outlier detection scheme by comparing with existing outlier de-
tection scheme [17]. We study the impacts of detection parameter, network topology and
sensing data variance to two primary detection performance criteria, primary miss detection
rate PMD and primary false alarm rate PFA, determined by attack capabilities under the pro-
tection mechanisms. Fig. 2.3(a)-Fig. 2.3(c) show the effectiveness of our detection scheme,
which can successfully eliminate the impacts of attackers. We observe from Fig. 2.3(a), the
detection performance is insensitive to µ, because the attackers know the values of detection
threshold and µ, and employ them to bypass the detection scheme. Fig. 2.3(b) shows that
when the SU’s communication range increases, correspondingly the density of the neighbor-
hood increases, both PMD and PFA of existing scheme increase.However, with our detection
scheme, PMD and PFA both decrease, owning to the increasing attacker detection rate with
more neighbors. We notice that when communication range is extremely small, existing
scheme is shown to outperform our scheme. This is because when neighborhood has a small
population, cross-validation is less effective. Some honest nodes may be mistaken as attack-
ers, which potentially amplifies the impacts of uncovered attackers to the remaining network.
Fig. 2.3(c) indicates with the increase of data variance, the attackers have a much larger in-
fluence to the existing detection method, but our method effectively impedes the influence.
To further evaluate attacker detection performance, we involve two more criteria: P aD as the
attacker detection rate6 and P aMI as the attacker miss identification rate7. Fig.2.3(d) shows
both P aD and P a
MI are steadily growing with the increasing of data variance with our scheme,
while the increasing of P aMI is a side effect of our scheme caused by adaptive threshold, but
6This rate is defined as the probability of detecting one attacker.7This rate is defined as the probability of mistakenly identifying a legitimate node as an attacker.
36
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Primary miss detection & false alarm performance
µ
PM
D
&
PF
A
PMD
with existing scheme
PMD
with our scheme
PFA
with existing scheme
PFA
with our scheme
(a) Impact of detection parameter µ
100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Primary miss detection & false alarm performance
Communication Range of Secondary users
PM
D
&
PF
A
PMD
with existing scheme
PMD
with our scheme
PFA
with existing scheme
PFA
with our scheme
(b) Impact of communication range
0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Primary miss detection & false alarm performance
Variance of Channel Fading and Shadowing
PM
D
&
PF
A
PMD
with existing scheme
PMD
with our scheme
PFA
with existing scheme
PFA
with our scheme
(c) Impact of sensing data variance
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9Attacker detection & miss identification performance
Variance of Channel Fading and Shadowing
Pa D
&
P
a MI
PaMI
with existing scheme
PaMI
with our scheme
PaD
with existing scheme
PaD
with our scheme
(d) Attacker detection performance
Figure 2.3: Performance of robust distributed outlier detection with adaptive local threshold
it will not degrade the primary detection performance.
2.5.3 Security Analysis of Hash-based Computation Verification
Approach
We first consider the case where there is a single attacker A in the network. The securi-
ty of the hash-based computation verification scheme is based on the following. (1) The
37
(a) Pairwise collusion attack (b) Collusion attack with honest common neigh-bors
(c) Collusion attack without honest commonneighbor
(d) Neighborhood collusion attack
Figure 2.4: Different collusion styles (solid points represent attackers, hollow points representhonest SUs)
secure neighborhood discovery scheme utilized in the network initialization process ensures
each node learn two-hop neighborhood information securely, so that the attacker can nei-
ther discard the state value of a legitimate neighbor node nor include a forged state value
while updating its state. (2) The message authenticity and integrity are guaranteed by the
broadcast authentication. (3) Whether the attacker A uses a different state value d′(k−1)B
from what submitted by a neighbor node B for state update computation, and includes
d′(k−1)B in the message sent to B in the verification phase; or computes a wrong state′(k)
using the correct neighbor states, the inconsistency will be detected by B. (4) Otherwise,
38
based on the collision resistant property of hash function, it is computationally infeasi-
ble to generate a valid hash commitment H(k||A||state′(k)||ID1||d′(k−1)1 || · · · ||IDn||d
′(k−1)n ) =
H(k||A||state(k)||ID1||d(k−1)1 || · · · ||IDn||d(k−1)
n ), where at least one of the primed state values
does not equal to the authentic ones.
Next, we discuss the security of state update algorithm with colluding attackers. In Fig. 2.4,
we identify four types of collusion attacks based on their increasing colluding capabilities:
• Pairwise collusion: this collusion (in Fig. 2.4(a)) emphasizes the collusion by two neigh-
boring attackers.
• Collusion attack with honest common neighbors: this collusion (in Fig. 2.4(b))
involves more neighbors of two pairwise attackers as their collusion companions, but every
pair of nodes has at least one honest common neighbor.
• Collusion attack without honest common neighbor: this collusion (in Fig. 2.4(c))
involves all the common neighbors of two pairwise attackers as their collusion companions.
•Neighborhood Collusion: If all the neighbors of an attacker are also malicious attackers,
they form a neighborhood collusion (in Fig. 2.4(d)).
We show that the hash-based computation verification scheme is able to deal with the former
two types of collusion attacks, but not the latter two. In cases that pairwise attackers exist
(including Fig. 2.4(a) and Fig. 2.4(b)), a malicious neighbor node can cover up the forgery
of the central node. However, inconsistency will still be discovered by a honest common
neighbor of these colluding nodes, because the honest neighbor overhears colluding attacker’s
input state value.
However, when there exists collusion attackers without honest common neighbor (Fig. 2.4(c)),
one node can arbitrarily deviate its malicious neighbor’s states without being detected. Note
39
that neighborhood collusion in Fig. 2.4(d) cannot be addressed by distributed computation
verification schemes. The central attacker in the colluded neighborhood is regarded as a
hidden attacker, whose malicious behavior is most difficult to detect, because none of its
neighbors is honest to follow the verification mechanism. Thus, the misbehavior of a hidden
node will continue and eventually, the entire network will be inevitably controlled by the
hidden attacker.
2.5.4 Cost Evaluation of Hash-based Computation Verification
Approach
Finally, we evaluate the efficiency of our proposed hash-based computation verification
scheme through its computational and communication costs. We skip the discussion of
the overhead for authenticated broadcast. We measure the computational cost by the num-
ber of hash operations at one node in each iteration, while assessing the communication cost
in terms of number of transmitted/received bytes. Another important metric for compu-
tational cost is key storage, which is defined as number of keys stored by each node. The
costs are listed in Table 2.2, where we estimate both the iteration number and node ID by
one byte. The computational costs of the hash operation depend on the number of input,
which relies on the number of neighbors. Therefore, the computational cost of this approach
is determined by the number of neighbors. The table shows that the computational and
communication costs are both acceptable.
40
Table 2.2: Costs of hash-based computation verification scheme at one node for each iteration(N is the number of neighbors, P is the length of state in bytes, h is the length of hash inbytes.)
Computation O(N + 1)Communication N(NP +N + h+ 1)Key Storage N
2.6 Summary
In this chapter, for the first time, we investigated the vulnerability and protection of consensus-
based spectrum sensing. We proposed various attacks that can disrupt the consensus algo-
rithm or stealthily subvert the sensing results, especially the covert adaptive attacks with
learning capability. We also developed a robust distributed outlier detection scheme with
adaptive local threshold to counter covert adaptive attacks by exploiting the state conver-
gence property. In addition, a hash-based computation verification scheme is presented to
effectively defend against colluding attackers. Our simulation results demonstrated the se-
vere vulnerabilities of distributed spectrum sensing, and also showed that our protection
mechanisms are secure, robust, and efficient.
Chapter 3
Non-Parametric Passive TrafficMonitoring in Cognitive RadioNetworks
Passive monitoring by distributed wireless sniffers has been used to strategically capture
the network traffic, as the basis of automatic network diagnosis. However, the traditional
monitoring techniques fall short in CRNs due to the much larger number of channels to be
monitored, and the secondary users’ channel availability uncertainty imposed by primary
user activities. To better serve CRNs, in this chapter, we propose a systematic passive
monitoring framework, SpecMonitor, for traffic collection using a limited number of sniffers
in Wi-Fi like CRNs. We utilize a non-parametric density estimation method to model SUs’
channel usage pattern. This method makes no assumptions on the unknown distribution
of channel access pattern, thus offers accurate and flexible models which can be updated
in an online fashion with little complexity. Moreover, we design a sliding window method
to perform online learning of data dynamics, and an accumulative combination method to
further improve modeling accuracy. Then, SpecMonitor takes inputs from SUs’ channel
usage model to construct near-optimal monitoring strategies.
We consider two levels of monitoring objectives: frame-level and user-level, to serve different
41
42
network diagnostic problems. The frame-level objective can be interpreted as maximizing
the frame-level quality-of-monitoring (FL-QoM), defined as the amount of captured MAC
frames of interest, due to their significance for the subsequent aggregated traffic analysis
[31]. The user-level objective is to maximize the user-level quality-of-monitoring (UL-QoM),
defined as the expected number of active users monitored, which can facilitate user behavior
analysis [85]. We cast the monitoring optimization problem as a sniffer channel assignment
problem with objective of maximizing the corresponding QoMs.
3.1 Related Work
Passive Monitoring in Traditional Wireless Networks: Passive monitoring in wireless
networks has been an active research area. Yeo et al. were the first to use dedicated sniffers to
passively measure a Wi-Fi network, successfully identifying protocol anomalies and malicious
WLAN usages [29]. Cheng et al. presented Jigsaw, which is a large-scale passive monitoring
infrastructure to collect and dissect wireless traffic for cross-layer network diagnosis in a
large enterprise Wi-Fi network [30, 31]. While the above works focused on developing the
monitoring infrastructure, some recent works investigated the problem of optimal sniffer
channel assignment to maximize the amount of monitored information. Shin et al. [86]
considered to obtain optimal strategies by selecting a limited number of sniffers to monitor
multiple channels in wireless mesh networks, in which they formulated the sniffer channel
assignment problem as a maximal coverage problem and design approximation algorithms
to solve this problem. In [87], Chhetri et al. further extended the preceding work by taking
into account the users’ access patterns. They proposed two monitoring models: user-centric
model and sniffer-centric model. However, they assume the statistics for different users’
activities are known. Recently, Arora et al. [88] proposed to use multi-armed bandit to
43
perform sequential learning of the unknown channel statistics, which can be used to facilitate
optimal channel assignments. However, multi-armed bandit is too complex to be used for
online and efficient channel assignments. In this chapter, we present an efficient online
channel assignment mechanism without any prior knowledge of channel access statistics,
which can provide optimized channel assignments in real-time. Note that all the above
works only considered maximizing the number of active users covered by the sniffers, while
we further address the problem of maximizing the number of captured frames.
Spectrum Monitoring in Cognitive Radio Networks: Chen et al. studied frame
capturing problem for network forensics in CRNs [89], in which support vector regression
(SVR) method is employed to predict the frame arrival time to guide channel assignments.
They have similar objectives as ours, however, our method has the following advantages: 1)
SVR method requires a time consuming training phase, while we utilize density estimation to
produce new estimates in an online fashion avoiding of the expensive training and retraining
phases; 2) SVR method falls short of dealing with interleaved traffic from multiple users,
which corresponds to dynamic traffic statistics, while our scheme can adapt promptly to the
traffic dynamics; 3) their monitoring framework has poor performance when the monitored
channels carry high data rate traffic, because of frequent channel switching behavior induced
by the heuristic channel assignments. In contrast, as we jointly consider channel switching
costs and frame capturing gains to optimize channel assignments, our method can achieve
better performance with fast traffic flows. Recently, Yi et al. formulated the secondary user
data capturing problem as multi-armed bandit problem [90], which takes a long period of
learning process before yielding an accurate estimation of user access pattern. Thus, their
method is not efficient enough to capture adaptive and interleaved traffic patterns either.
44
Figure 3.1: An overview of SpecMonitor system
3.2 System Model
3.2.1 Monitoring System Model
In this section, we describe the monitoring system model for CR networks. We consider CR
networks with coexisting PUs and SUs. The most common PUs are TV towers and wireless
microphones (WMs). As PUs’ networks are regulated by service providers or specific WM
users, they are out of the interests of our monitoring system. Instead, our monitoring system
is interested in the network traffic from SUs including APs and clients who form a WhiteFi
network, as illustrated in Fig. 3.2. In WhiteFi networks, multiple clients share their working
channels, decided by the APs, using widely adopted CSMA/CA mechanism. Curious readers
please refer to [91] for the design and implementation details of WhiteFi system.
Intuitively, one may consider the APs in the WhiteFi system as appropriate monitoring
devices, because all the traffic of SUs will directly go through APs. However, the passive
monitoring using sniffers have multiple advantages over AP-side monitoring: first, APs are
secondary users, who may be compromised by the adversaries; second, the sniffers are able
to monitor several WhiteFi networks operating on different channels concurrently; third, the
monitoring system can reveal the detailed PHY/MAC information, such as the physical layer
header information including the signal strength, noise level for every individual packet, etc.,
which is important for security monitoring [29], yet unattainable at AP side.
For a multi-hop network, we segment the whole network region into small regions, called
45
Figure 3.2: Monitoring system archi-tecture for WhiteFi network inside amonitoring area
Figure 3.3: The percentage of framesin active slots (20 ms slot length)
monitoring areas, and assign a certain number of sniffers to monitor the traffic for each
monitoring area. Each sniffer may be equipped with multiple antennas, which allow him/her
to sense/capture traffic over multiple channels at one time. We assume different AP-client
pairs in the monitoring area pick different working channels to avoid interference, and each
sniffer can overhear all the inbound and outbound traffic from any secondary device inside its
monitoring area if they tune into the same channel. Similar to [89], some sniffers are used as
dedicated inspection sniffers that periodically sense channels to gain channel usage statistics,
while other sniffers called operation sniffers are responsible for capturing information. All
the sniffers are connected to a sniffer center for centralized decision making. The monitoring
system architecture is demonstrated in Fig. 3.2. Each inspection sniffer is assigned multiple
channels to scan. A sensing slot is a period during which the inspection sniffer scans through
all the assigned channels. In the following, a slot stands for the sensing slot unless otherwise
noted.
An overview of the SpecMonitor system model is shown in Fig. 3.1. The sniffers first detect
the PUs’ activities in order to identify available channels. Then, the inspection sniffer scans
the available channels to build SUs’ channel access model. The channel access probability for
46
Figure 3.4: Frame/Active slot interarrival time distribution (20 ms sensing slot, 2 ms sensingperiod)
each channel can be derived based on the model. Finally, utilizing the channel access prob-
ability, FL-QoM optimization or UL-QoM optimization problems can be solved to provide
optimized channel assignments for the operation sniffers in the forthcoming slots.
3.2.2 Channel Access Model
The channel access/usage model capturing the patterns of secondary user activities is con-
structed using the sensing outcomes of the inspection sniffers, thus is closely related to the
duration of a sensing slot. A sensing slot is composed of channel sensing and channel switch-
ing time, whose length depends on the number of channels to be scanned. Typically, a
channel sensing period is approximately 1ms per channel using energy detection [92], while
channel switching for a commodity 802.11b/g network card takes about 1− 5ms [93]. Since
a longer sensing period benefits the frame capturing as shown below, we assume each inspec-
tion sniffer spends 2ms for sensing one channel and additional 2ms for switching to another
channel. If each antenna of an inspection sniffer is assigned 5 channels to scan, one sensing
slot will be 20ms long.
During each slot, the inspection sniffer scans 5 channels to reveal their channel states (ac-
tive/idle). Here, an active slot indicates a slot during which the sniffer spots SUs’ traffic
47
after channel sensing, while idle slot represents the opposite. As channel sensing period is
less than a full slot length, the discovered active/idle state may not reflect the genuine state
of a slot. Let X ik be the state of channel i at k-th sensing slot, which takes only binary values
“1/0”, corresponding to active/idle state (in the following, we omit the subscript i for i-th
channel). Then, the sequential data Xk (k = 1, 2, . . .) are used to calculate the active slot
interarrival time for each channel, which is defined as the time interval between two consec-
utive active slots. In our design, inspection sniffers produce active slot interarrival time as
their sensing outcomes, which will be used as the inputs to build channel access model as
explained in Section 3.3.
Note that one straightforward way of meeting frame capturing objectives is to predict SUs’
frame arrival time by modeling frame arrival pattern. However, it is infeasible to derive opti-
mized channel assignments with dynamic frame length and unslotted frame transmission [89].
Instead of directly modeling frame arrival pattern, we model the active slot interarrival pat-
tern as the basis of our monitoring framework, in which monitoring an active slot implies
capturing all the frames in the slot. To motivate/justify the adoption of active slot inter-
arrival time, we performed a real-world experiment using the traffic from different types
of applications (e.g. Web browsing, Bittorrent, FTP) in operational 802.11g WLAN. The
experiment settings are illustrated in Section 3.5.3. Fig. 3.3 shows the percentage of frames
in active slots corresponding to different sensing periods, from which we notice most of the
frames reside in the identified active slots, especially when the sensing period is longer than
2ms. In other words, by capturing frames in active slots, we are able to collect most of the
frames. Fig. 3.4 plots the histograms of frame interarrival time versus active slot interarrival
time, with each bar showing the percentage of frames or active slots whose interarrival time
is indicated by the x-axis. These two distributions appear very similar to each other with
most of the frames concentrated within small interarrival time region, indicating active slot
48
K Number of time slotsN Number of channelsN Channel index setM Number of antennas of all operation sniffersSop Operation sniffer antenna index setXk Sensing results of inspection sniffers at slot kZ(k) Current data set at slot kZin Input data set for density estimationTint Active slot interarrival timeW Sliding window size for online estimationnw Number of previous windows considered for
combinationtw Number of different samples in the previous
window for combinationf Probability density estimateF Cumulative density estimateSCAPi Slotted channel access probability of channel i4 Sensing slot lengthIdleCount Number of idle slots countedyi Vector of binary random variables for channel
izs,i Vector of binary random variables indicating
whether sniffer s is assigned to channel iα Switching cost weightUi Number of distinct users in channel i
Table 3.1: Summary of symbols and notations
interarrival time well characterizes channel usage pattern.
Intuitively, if the slot length becomes shorter, active slot interarrival pattern will approximate
frame interarrival pattern more closely. However, channel scanning with a shorter sensing slot
requires more inspection sniffers/antennas or faster channel sensing and switching operation.
Additionally, we infer from Fig. 3.4 that an operative 802.11g WLAN has a high traffic load,
since most of the frame interarrival time is rather short. For ease of reference, the commonly
used notations are summarized in Table 3.1.
3.3 User Channel Access Prediction
In this section, we propose a unified model to estimate secondary user channel access pat-
tern, as the front-end of SpecMonitor. In order to build the unified model, we first study
the primary user detection issue, and then we design an online non-parametric density es-
49
timation mechanism to predict SUs’ slotted channel access probability (SCAP ) pertained to
each sensing slot. As its name suggests, slotted channel access probability is defined as the
probability of SUs’ channel access during each slot.
3.3.1 Primary User Detection
To enable CR communications, SUs need real time knowledge of PUs’ activity to identify
available spectrum. Similarly, the sniffers are also required to detect PUs’ activity in order
not to waste time and energy listening on the primary-occupied channels. Primary user de-
tection can be achieved by using either spectrum sensing or by querying a geo-location white
space database over the internet. Spectrum sensing is expensive in cost, energy consump-
tion and complexity of hardware. On the other hand, the database approach is easier to
implement, which allows devices to report their locations to a web server that returns a list
of available channels at that location. However, database approach suffers from utilization
inefficiency, since it uses propagation models to decide the available spectrum, and hence, is
conservative in the channels it returns for a given location. Either of these two approaches
can be applied to our monitoring framework.
Feature detection is one popular spectrum sensing method for the sniffers to detect PUs’
appearance. The feature detection algorithms described in [94] can be used to sample the
UHF spectrum to detect the presence of TV broadcasts and wireless microphone signals,
which can effectively differentiate between the SUs’ and PUs’ signals. Then, the sniffers can
directly perform feature detection in the beginning of every slot to sense the availability of
monitored channels.
The database approach allows both sniffers and SUs to query the database for spectrum
availability at a certain location. After querying the database, the SUs begin operating
50
on a set of available channels, while inspection sniffers tune onto these available channels to
monitor SUs’ traffic patterns, and operation sniffers are assigned to the SU-occupied channels
correspondingly. In SpecMonitor, we adopt the database approach for simplicity.
3.3.2 Secondary User Channel Access Model
In this section, we propose a framework to estimate the secondary users’ SCAP at each slot
by modeling the active slot interarrival time distribution. The SUs’ channel access pattern
in WhiteFi networks is complicated, mainly due to the dynamics brought by time-evolving
mixed traffic from multiple SUs with channel switching behavior.
Non-parametric Density Estimation Model
Instead of assuming a specific active slot interarrival time distribution for quantifying SUs’
traffic pattern, we propose a SU channel usage model using the non-parametric density esti-
mation method to better capture SUs’ traffic dynamics. As mentioned previously, different
from support vector machine and neural network based methods, density estimation method
does not involve a time-consuming training phase, which makes it appropriate for online
prediction. More importantly, this non-parametric approach provides a greater flexibility
and accuracy in modeling a given data set, compared with other parametric approaches.
Currently, one of the most popular non-parametric density estimation approaches is Kernel
Density Estimator (KDE) with a Gaussian kernel function [95]. Given n independent real-
izations Xi (i = 1, 2, . . . , n) drawn from an unknown probability density function (pdf) f(x),
the Gaussian KDE with bandwidth σ is defined as:
f(x;σ) :=1
n
n∑i=1
KG(x,Xi, σ), x ∈ R, (3.1)
51
where
KG(x,Xi, σ) =1√2πσ
e−(x−Xi)2/(2σ2), (3.2)
from which we can see that Gaussian KDE is essentially the overall sum of Gaussian kernels
centered at location Xi with an equal bandwidth σ.
In fact, the setting of σ is of utmost importance for the density estimation performance. A
classic measure to determine the optimal σ is Mean Integrated Squared Error (MISE):
MISE{f}(σ) := E[f(x;σ)− f(x)]2, (3.3)
where f(x) is the underlying genuine distribution. Assuming a large sample set, we can ob-
tain an asymptotic approximation to MISE, denoted as asymptotic MISE (AMISE), written
as [95]:
AMISE{f}(σ) =1
4σ4‖f ′′(x)‖2 +
1
2n√πσ
, (3.4)
where f ′′(x) is the second derivative of f(x), and ‖ · ‖ denotes the Euclidean norm on R.
Thus, the asymptotic optimal value of σ∗ is obtained by minimizing AMISE:
σ∗ = (1
2n√π‖f ′′(x)‖2
)1/5. (3.5)
In order to compute σ∗ from Eq. 3.5, we need to approximate ‖f ′′(x)‖2 by estimating
the general form ‖f (j)(x)‖2 for arbitrary j. The corresponding optimal solution σ∗j =
( 12n√π‖f (j)(x)‖2 )1/5 with a generalized term of ‖f (j)(x)‖2 can be solved in a recursive form,
namely σ∗j = γj(σ∗j+1), where γj is a complicated formula given in [95]. Then, a fixed point
iteration method is employed to compute σ∗2, which is equivalent to the target value σ∗. This
KDE algorithm provides a viable means of automatically selecting optimal bandwidths with
superior density estimation performance.
52
Modeling Active Slot Interarrival Time Distribution
The KDE collects the data set of active slot interarrival time measured by inspection sniffers
to generate the density estimates. Since the distribution of collected data sets may vary over
time, the modeling accuracy of the KDE will be affected by taking into account outdated
historic data. Thereby, only the most recent data should be imported into the modeling
process. On the other hand, the modeling accuracy also largely depends on the size of
the input data sets. If we only consider the most recent observations by discarding all the
historical ones, the modeling accuracy will be brought down significantly. Furthermore,
the amount of inputs to KDE has great impacts on its computational efficiency. Generally
speaking, KDE with a small data set runs more efficiently than that with a large data set.
Therefore, the major issue of this model is to decide how much historical data should be
incorporated for density estimation, in order to produce an accurate and efficient model.
Now we present our proposed online non-parametric density estimation protocol. The basic
idea is to use sliding window method to perform online updating of the density estimates, and
to incorporate additional historic data sets for improving the estimation accuracy. The whole
protocol is presented in Algorithm 1, which is repeated for each channel. Whenever a new
observation arrives, the online estimation model only takes the data in a sliding window of
size W , i.e., the data sets exporting to the KDE only hold the most recent W observations.
The setting of window size W is pertained to the data dynamics, thus is empirical. A
simple guideline would be: first, we set an initial value for the sliding window size and run
KDE; second, we move the sliding window forward to see whether the estimated distribution
changes over time; third, if the change is significant, we decrease the window size, otherwise,
increase it, until we reach a satisfactory window size. Specifically in the WhiteFi network
scenario, we set a relatively small W as 50 data samples, since the data distribution will
change more dynamically than that in a traditional wireless network.
53
Figure 3.5: Sliding window method (X axis denotes the interarrival time data, Y axis denotesthe channels)
One of the most favorable features of sliding window method is attributed to its support for
online learning of density estimates. As time advances, our density estimator will take newest
sets of data falling inside the sliding window to compute the latest estimate, as illustrated
in Fig. 3.5. Therefore, our model enables the effective characterization of the time-evolving
active slot interarrival distribution, and allows us to update density estimates with every
newly arrival observation.
However, the major drawback of the sliding window method resides in the following fact:
the sliding window to specify input data also deteriorates the accuracy of KDE, because the
size of sliding window restricts the number of observations (only W ). Hence, we need to
improve the estimates by expanding the input data size.
As depicted in Algorithm 1, we propose to combine the data sets from multiple sliding win-
dows according to some well-defined criteria, in order to enlarge the sample space. How
to define such criteria for merging sample space is crucial to the ultimate estimation per-
formance. At first glance, more recent windows of data sets should have higher relevance
to current window. Therefore, one intuitive method to achieve more accurate estimation
is to combine the most recent density estimates from latest windows to capture the data
freshness [96]. However, because of the uncertain channel availability and underlying MAC
54
protocol, multiple clients may generate interleaved traffic due to alternate channel access-
es. Therefore, the most recent windows may not necessarily reflect the underlying density
of current window best, while some earlier historical data originating from the same clients
pertaining to the current window might do. Accordingly, we propose an accumulative combi-
nation method to make the decision of merging historical data based on statistical correlation
among the samples. As shown in Algorithm 1, we simplify the computation of statistical
correlations by employing Kolmogorov-Smirnov test (KS test). KS test is characterized as
a non-parametric inferential statistical method, since it makes no assumption about the dis-
tributions of samples, thus is completely data-driven. The Kolmogorov-Smirnov statistic is
defined as follows:
Definition 1. Consider two sets of observations Z1 and Z2, with n1 = |Z1| and n2 = |Z2|
samples. The Kolmogorov-Smirnov statistic is defined as:
Dz1,z2 = supx|F1(x)− F2(x)|,
where F1 and F2 represent the empirical cumulative distribution functions (cdfs) of the sam-
ples in Z1 and Z2, respectively.
Then, given Dz1,z2 , we can confirm two sample sets are from the same distribution with a
certain significance level β, if√
n1n2
n1+n2Dz1,z2 ≤ Kβ, where Kβ can be set according to a well-
defined table [97]. Note that cdf is a byproduct of the KDE, denoted as F (k) in Algorithm 1.
After KS test, we combine all the data sets passing the tests into one single data set, which
is provided for the KDE to update density estimates f(k) for the current slot. To tradeoff
the performance improvement and computational overhead, we limit the number of KS tests
by only preserving the previous nw windows of data sets for each channel. Meanwhile, two
consecutive windows only differ with one data point, thus it becomes more beneficial to test
55
Algorithm 1 Online non-parametric density estimation protocol
1: Input: W , nw, tw, current sensing result Xk at k-th sensing slot.2: if Xk! = 0 then3: Calculate the new observed active slot interarrival time Tint(k);4: Update the current data set Z(k) = {Tint(k), . . . , Tint(k −W + 1)};5: Update the input data set Zin = Z(k);6: Update the current density estimate [F (k), f(k)] = KDE(Zin);7: for i← 1 to nw do8: Perform KS test: KStest(Z(k),Z(k − i · tw));9: if pass KS test then
10: Update the input data set Zin = {Z(k) ∪ Z(k − i · tw)};11: end if12: end for13: Update the current density estimate [F (k), f(k)] = KDE(Zin);14: else15: return.
16: end if
windows with interval of tw samples. In this way, every previous window passing the KS test
can export tw more samples into the merged data set (see line 10 of Algorithm 1).
Consequently, we derive an accurate density estimate for active slot interarrival time distri-
bution at each channel, by online learning of data dynamics and cumulative combination of
historic data.
Computing Slotted Channel Access Probability
The problem we are going to address in this section is how to estimate the SCAP based on
the predicted distribution of active slot interarrival time. As mentioned before, SCAP (k+1)
represents the probability that (k+1)-th slot is active. In theory, the predicted secondary user
SCAP at (k+1)-th slot should be represented as SCAP (k+1) = Pr(Xk+1 = 1|X1, . . . , Xk),
whose computation appears intractable since it takes all the historic channel states into
consideration. However, we can take advantage of the updated active slot interarrival time
distribution to simplify the computation. We note that if the current slot is active, the
current active slot interarrival time will be the time period between the current slot and the
56
most recent active slot. Consequently, the probability that current slot is active SCAP (k+1)
can be interpreted as the probability that the active slot interarrival time is equal to the time
period between the current slot and the preceding active slot. If we assume the preceding
active slot is k, SCAP (k + 1) = Pr(Xk+1 = 1|Xk = 1) with active slot interarrival time
becoming ∆. If we assume the preceding active slot is j, SCAP (k+1) = Pr(Xk+1 = 1|Xk =
0, . . . , Xj+1 = 0, Xj = 1) with active alot interarrival time becoming (k+1−j)·∆. Therefore,
the predicted SCAP (k + 1) can be written as follows:
SCAP (k + 1) =
Pr(Xk+1 = 1|Xk = 1), if Xk = 1,
P r(Xk+1 = 1|Xk = 0, . . . , Xj+1 = 0, Xj = 1),
if Xk = 0.
=
∫∆
0f(k)dt, if Xk = 1∫ (k+1−j)·∆
(k−j)·∆ f(k)dt, if Xk = 0,(3.6)
where ∆ is defined as the sensing slot length. The algorithm to compute the SCAP for
each slot is given in Algorithm 2. SCAP provides an appropriate measure for quantifying
the secondary user channel access pattern, which takes into account the channel availability,
SUs’ current activity, and SUs’ traffic pattern learnt from their past activities. The major
goal of the inspection sniffers is to predict SCAP (k + 1) that guides the operation sniffers’
channel assignment strategies, which is the main focus of the following section.
3.4 Near-Optimal Monitoring Mechanism
The monitoring mechanism of SpecMonitor addresses the problem of sniffer channel assign-
ment to maximize two different levels of QoMs, which is carried out by the sniffer center. In
57
Algorithm 2 The Computation of Slotted Channel Access Probability
1: Input: current density estimate f(k), current sensing result Xk, the sensing slot length ∆.2: Initialization: IdleCount = 13: if Xk! = 0 then4: Compute SCAP (k + 1) =
∫∆
0f(k)dt;
5: Reset IdleCount = 1;6: else7: Update IdleCount = IdleCount+ 1;
8: Compute SCAP (k + 1) =∫ (IdleCount·∆)
(IdleCount−1)·∆ f(k)dt;
9: end if
particular, at k-th slot, the sniffer center collects all the channel usage information gathered
by the inspection sniffers to produce a prediction set of SCAP (k + 1) for all the channels
simultaneously. This set of predicted SCAP is then leveraged to provide optimized channel
assignments for the forthcoming slot.
Although channel switching enables the sniffers to capture channel dynamics adaptively, its
negative effects should not be neglected in computing QoMs, especially in the CRNs with
channel availability issue. We claim that channel switching indeed produces non-negligible
overhead in terms of frame losses in practice. For example, as mentioned previously, the
802.11b/g wireless cards take approximately 1 − 5ms for one channel switching operation.
If one slot lasts 20ms, at most 1/4 frames in the slot will be missing during the channel
switching operation which constitutes a non-negligible fraction of frames. To be more specific,
the typical frame interarrival time in a 10Mbps wireless network is 0.1ms (assume 1000 bit
frame). Then, during the 5ms channel switching, we may lose 50 frames, nearly 1/4 of the
total frames in this slot. In addition to that, frequent channel switching also raises energy
costs. These are the reasons why we integrate channel switching costs into the optimization
objectives. In the following, we show our formulation of sniffer channel assignment problem
with two levels of QoMs, respectively.
58
3.4.1 Frame-level Quality-of-Monitoring Optimization
The goal of FL-QoM optimization is to maximize the number of captured frames, given a
set of channels and operation sniffers inside one monitoring area. In section 3.2, we show
that active slot interarrival pattern is closely associated with frame arrival pattern, so that
the number of captured frames during K slots from a certain channel can be written as:
Nf =∑K
k=0(Ik · n(k)f ), where Ik is an indicator indicating whether the k-th slot is active, n
(k)f
denotes the number of frames inside the k-th slot. Therefore, instead of directly maximizing
the number of captured frames, we transform FL-QoM into an objective of maximizing
the number of active slots captured. For notation convenience, let us define index sets
i ∈ N = {1, . . . , N}, s ∈ Sop = {1, . . . ,M} for indexing channels and operation sniffer
antennas respectively. The optimization problem can be formulated as the following integer
programming (IP) problem:
maximizeN∑i=1
SCAPi(k + 1) · yi(k + 1)
− αM∑s=1
N∑i=1
1
2[zs,i(k + 1)− zs,i(k)]2
(3.7)
subject toN∑i=1
zs,i(k) ≤ 1,∀s ∈ Sop,∀k (3.8)
M∑s=1
zs,i(k) ≤ 1,∀i ∈ N ,∀k (3.9)
yi(k) =M∑s=1
zs,i(k),∀i ∈ N ,∀k (3.10)
yi(k), zs,i(k) ∈ {0, 1},∀s ∈ Sop, i ∈ N ,∀k. (3.11)
Each operation sniffer antenna in the set Sop is associated with a binary decision vector
zs,i(k) ∈ {0, 1}, i ∈ N , which is called sniffer channel assignment indicator, with zs,i(k) = 1
59
if the sniffer is assigned to channel i at slot k; 0 otherwise. yi(k + 1) is the binary variable
indicating whether or not the channel i is monitored by some sniffer in (k+1)-th slot. The IP
formulation is supposed to run iteratively: at k-th slot, after obtaining zs,i(k) and predicted
SCAPi(k + 1), we can acquire yi(k + 1) and zs,i(k + 1) by solving the IP problem. Clearly,
the sniffer channel assignment is updated once every slot, which allows our mechanism to
quickly adapt to the traffic dynamics.
Note that the objective function Eq. (3.7) is comprised of two parts: the positive part
represents the average number of captured active slots, while the negative part indicates
the channel switching costs. For simplicity, we use the number of channel switches between
every two subsequent slots to approximate the channel switching costs. In addition, we set a
switching cost weight α to represent the relative significance of channel switching costs w.r.t.
the gains obtained from captured slots, which is a constant value residing within [0, 1].
Here, we define α as the ratio of channel switching duration to the slot duration. In the
previous example with 5ms channel switching and 20ms slot, we have α = 1/4. However,
the definition of α can be further extended to incorporate more sophisticated metrics for
channel switching costs. For instance, we can further incorporate the probability that the
current channel will be idle in the next slot, because sniffer’s channel switching does not
incur frame loss overhead if the sniffer listens on an expected idle channel.
The constraints (3.8), (3.10) arise due to the facts that one sniffer antenna can only monitor
one channel, and one channel is better to be covered by one sniffer antenna inside the
monitoring area. In particular, we put forward the second constraint, because if we allow
multiple antennas to listen over the same channel in the same area, their captured frames
will provide duplicate information. This IP problem can be viewed as a NP-hard problem
following the proof in [87], thus we need to find an approximation algorithm to solve the IP
problem.
60
LP rounding algorithm has been adopted to solve the IP problem [86, 87]. This algorithm
solves the LP-relaxation of the IP formulation, and then rounds the fractional results into
integral solutions using for example the probabilistic rounding algorithm (PRA) [98]. How-
ever, this algorithm is only applicable to linear program problem, while in our formulation,
the objective function contains some quadratic terms. We then reformulate the objective
function to remove the nonlinear terms. As zs,i(k)2 = zs,i(k) when zs,i(k) ∈ {0, 1}, the
objective function Eq. (3.7) can be rewritten into a linear form as follows:
N∑i=1
SCAPi(k + 1) · yi(k + 1)−
αM∑s=1
N∑i=1
1
2[zs,i(k + 1) + zs,i(k)− 2zs,i(k) · zs,i(k + 1)]
Note that zs,i(k) is already known before solving optimization problem. The PRA algorithm
has been proven [98] to produce (1− 1/e)-optimal sniffer channel assignment in linear time.
However, the execution of PRA disregards the constraint (3.10) completely. Hence, the
resulted channel assignment obtained from PRA cannot prevent multiple antennas from
listening on the same channel. We define this problem as channel conflict problem, and the
sniffer antennas assigned to the same channel as conflict sniffer set.
In response, we propose a heuristic sniffer fixing strategy to address the channel conflict
problem, which takes the following steps:
(1) Find all the conflict sniffer sets in the solution obtained from the PRA algorithm;
(2) Pick one sniffer antenna in each conflict sniffer set randomly, and fix it to the conflicted
channel;
(3) Run LP rounding algorithm again to get a new solution;
(4) Test whether the new solution contains any conflict sniffer set: if yes, go to step (1);
61
otherwise return the solution.
The above heuristic channel assignment strategy fixes one sniffer antenna to one channel
every round by adding constraints, thus it guarantees to provide a feasible solution of channel
assignments for all the sniffers within linear time, which turns out to be a near-optimal
solution for the sniffer channel assignment problem, as shown in Section 3.4.3. In the end,
all the confliction will be addressed after running through a sequence of LP rounding, which
guarantees the convergence of the algorithm.
We call the channels to be assigned as potential channels. The resulted channel assignment
strategy can provide the sniffers with the assignments of potential channels for the next slot.
Then, the sniffer center checks every potential channel to determine whether it has already
been monitored: if yes, it skips assigning this channel; if no, it selects a sniffer antenna which
is not listening on any other potential channels to monitor this channel. In this way, the
channel switching costs are further alleviated.
3.4.2 User-level Quality-of-Monitoring Optimization
The objective of UL-QoM optimization is to maximize the expected number of active users
monitored. In order to capture the user-level information, it is indispensable to identify
the source of each frame, even encrypted frames. Let Ui(k) for i ∈ N denote the number
of active users operating in channel i at the k-th slot. We assume once a sniffer is tuned
into a channel, it covers all the active users operating in this channel. We do not consider
the channel switching costs in this case, because there are typically multiple frames from a
single user so that a small number of frame loss due to channel switching does not have a
big impact on the number of users measured. The UL-QoM optimization problem can be
62
casted as the following IP problem:
maximizeN∑i=1
Ui(k) · SCAPi(k + 1) · yi(k + 1) (3.12)
subject to (3.8)− (3.11).
The above optimization problem can be solved using exactly the same approximation al-
gorithm illustrated in the previous section, thus is omitted here. Note that Ui(k) can be
measured by counting the number of different MAC addresses from frames passing through
the AP running in channel i within time slot k. In practice, Ui(k) may not be available at
the beginning of the k-th slot, so it can be approximated by the measurement of Ui(k − 1),
assuming users remain operating in the same channel for the next time slot. Small errors
in estimating Ui(k) would not affect the performance much. In the extreme case when false
MAC addresses are inserted by the attackers, more sophisticated approach is required. For
instance, machine learning methods to perform Internet traffic classification [99] can be used
to differentiate different users based on their identified traffic types. This is out of the scope
of this chapter.
3.4.3 Numerical Analysis
In this section, we present numerical results for our approximation algorithm and compare
them to the upper bound of the problem. Without loss of generality, we focus on the FL-QoM
optimization problem.
Deriving an Upper Bound: The complexity of the optimization problem formulated in
Section 3.4.1 stems from the binary yi(k) and zs,i(k) variables, for ∀ k. To derive an upper
bound for the problem, we relax the integer (binary) requirement on yi(k) and zs,i(k) with
63
Figure 3.6: Normalized objective (with respect to the computed upper bound) for 20 channelsand 10 sniffers with α=0.3
Figure 3.7: Normalized objective (with respect to the computed upper bound) for 30 channelsand 20 sniffers with α=0.3
0 ≤ yi(k) ≤ 1 and 0 ≤ zs,i(k) ≤ 1. The relaxed problem is a standard LP problem, the
solution of which can be obtained in polynomial time. Since the relaxation enlarges the
optimization space, the solution to the relaxed LP problem yields an upper bound for the
original optimization problem.
Numerical Results: We consider N = 20 or 30 channels, M = 10 or 20 sniffer antennas.
SCAP values are randomly generated for every channel over 1000 slots. We first present the
simulation results for 20 channels and 10 sniffer antennas. We used the PRA approximation
and sniffer fixing algorithms to determine a feasible solution which serves as a low bound,
64
Figure 3.8: Normalized objectives for different α
and compared the corresponding objective value with the upper bound. Fig. 3.6 shows
the normalized objective values with respect to the computed upper bound (i.e. feasible
solution/upper bound) for 20 channels and 10 sniffer antennas. The average normalized
objective value obtained among 1000 slots is 0.95 and the standard deviation is 0.03. Fig. 3.7
shows the normalized objective for 30 channels and 20 sniffer antennas, with average objective
value as 0.96, and standard deviation as 0.01. We further adjust switching cost weight α
to examine the variations of the normalized objective values. Fig. 3.8 shows the tiny gap
between the achieved solution and the upper bound for different α.
Since the actual optimal value lies between the feasible solution value and the upper bound,
the solution value of our approximation algorithm must be even closer to the optimal val-
ue than the foregoing normalized ratio (normalized objective value). Thus, the derived
solution value of our approximation algorithm is close to optimality, thus confirming its
near-optimality. Finally, we run an experiment to compare the average number of captured
active slot using our algorithm with the upper bound, the result of which is shown in Fig. 3.9.
We notice that the difference between the monitoring solution and upper bound is kept small
over the time, which proves the near-optimality of our approximation algorithm from the
experimental perspective.
65
Figure 3.9: Number of captured active slots using our algorithm vs. Upper bound for 20channels and 10 sniffers with α=0.3
3.4.4 Complexity Analysis
In this section, we analyze the complexity of the above approximation algorithms. We first
analyze the complexity of LP rounding algorithm. The LP rounding algorithm involves two
major steps: (1) solving LP relaxation, and (2) executing PRA algorithm. We notice that
the above two IP problems contain (N+MN) unknown variables. Therefore, the complexity
of solving the LP relaxation of IP formulation is given as O((N + MN)3/log(N + MN))
[86], which is determined by the complexity of LP solver. On the other hand, the PRA
algorithm has a linear complexity O(M ∗ N), governed by the input vector size (M ∗ N)
[98]. Thus, the LP rounding algorithm can be solved with polynomial time complexity
O((N +MN)3/log(N +MN)).
Second, the heuristic sniffer fixing strategy will solve channel conflict problem by running
through a series of LP rounding algorithm. In the worst case scenario, it will invoke LP
rounding M times. Hence, the sniffer channel assignment problem can be solved with an
overall worst case complexity of O(M ∗(N+MN)3/log(N+MN)). The efficiency evaluation
of the algorithm implementation is presented in Section 3.5.2.
66
3.5 Evaluation
In this section, we conduct extensive simulations and experiments to evaluate the perfor-
mance of SpecMonitor for CRNs. The simulations leverage synthetic traces, which allow
us to vary the number of channels and sniffers, as well as the traffic patterns of different
users. We also carry out experiments and test the performance of SpecMonitor on real traces
collected from the experiments. Aside from implementing the proposed SpecMonitor frame-
work, we also implemented two baseline algorithms and one previously proposed algorithm,
listed as follows, for performance comparison purpose.
• Random channel assignment: the sniffer channels are randomly assigned.
• Greedy channel assignment: the sniffers are always assigned to the predicted busiest
channels based on SCAP at every sensing slot, i.e. the channels with the largest SCAP .
• Support Vector Regression (SVR) channel assignment: the sniffers are assigned
to the channels in which the next frame is predicted to arrive within a short period based
on the frame interarrival time predication using SVR method [89].
We assume the PUs’ presence can be detected promptly by both inspection and operation
sniffers, as illustrated in section 3.3.1. In the following sections, we first evaluate the per-
formance of the proposed secondary user channel access model, and then we evaluate the
real-time monitoring performance of SpecMonitor. Finally, the frame capturing performance
and user capturing performance are examined. The default systematic parameters used in
the evaluation are shown in the Table 3.2.
67
Table 3.2: Parameters
Parameters ValuesW 50nw 10tw 25Sensing slot length ∆ 20msSensing period 2msMulti-slot updating 5 slotsGaussian mean values of synthet-ic traces
[3, 42]
Gaussian standard deviation ofsynthetic traces
2
3.5.1 Performance of Secondary User Channel Access Model
The proposed secondary user channel access model utilizes the online non-parametric density
estimation to achieve an accurate estimated distribution of active slot interarrival time. As
mentioned in section 3.3.2, our channel model uses sliding window technique to track the
dynamic traffic patterns of SUs. We generate a data set with changing data distribution (pdf
of the data set changes from e−x to 15e−
x5 ). The tracking performance is shown in Fig. 3.10(a)
with W=50, nw=10, tw=25 (unless otherwise noted, those parameters are set as default),
from which we can see that the estimated distribution is gradually changing from the original
data distribution to the current data distribution. The transition happens when the current
window takes data from both distributions and completes when the whole current window
only contains data from the new distribution. The tracking speed is determined by the
window size, i.e., larger window size will cause more delays during the transition. We can
select the window size according to the variability of traffic patterns as mentioned in Section
3.3.2.
However, too small window size will affect the modeling accuracy, which motivates the com-
bination of historical samples. Recall that only the data sets passing the KS tests can be
68
0 5 10 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Data Value
Pro
babi
lity
Den
sity
Estimated Density with Original dataEstimated Density during the ChangeEstimated Density After the Change
(a) The tracking performance when the traffic pat-tern is changing
0 0.5 1 1.5 2 2.5 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Data Value
Pro
babi
lity
Den
sity
Real Density of the Data SetEstimated Density with Uncombined DataEstimated Density with Combined Data
(b) Performance improvement with data combina-tion
Figure 3.10: Performance of secondary user channel access model
combined to improve the estimation accuracy. Fig. 3.10(b) shows the performance compar-
ison between the density estimates with data combining and that without data combining.
We can see the significant improvement brought by the accumulative data combination.
3.5.2 Real-Time Monitoring Performance
As illustrated in Section 3.3,3.4, our monitoring framework has a very stringent real-time
requirement. Basically, we are required to complete the channel assignments before a slot
ends, i.e., within 20 ms according to our setting. In this section, we evaluate the running time
of SpecMonitor using experiments, and propose to relax the stringent requirement without
compromising the monitoring performance. We implement SpecMonitor framework using
MATLAB R2011b on a Windows machine with 3.2 GHz Intel Xeon W3565 CPU and 18 GB
memory, including the channel access model and near-optimal channel assignment algorithm.
In our original design, whenever there is a new observation of active slot interarrival time,
SCAP values are updated and channels are reassigned.
69
We carry out experiments to count the running time of the monitoring framework and
breakdown the running time into different sections to identify the bottleneck as shown in
Table. 3.3. Note that the recorded running time is the average value after running 1000
slots. The overall running time is 91.2 ms, with 97.5% of time spent on KDE operation and
optimization algorithm. By delving into the KDE function and optimization algorithm, we
find the bottleneck of KDE operation is on the fixed point iteration algorithm [95] consuming
95% of overall time for each KDE invocation, while the bottleneck of optimization algorithm
is on the linprog MATLAB function (52%) and LP rounding algorithm with sniffer fixing
strategy (40%). In our experiment, the sniffer fixing strategy runs through LP rounding
algorithm two times on average until a valid channel assignment is generated. Consequently,
the overall running time far exceeds a slot duration. To address this issue, we can convert
the code using more computationally efficient programming language such as C.
Another alternative and more viable approach is to relax the stringent real-time requirement.
Instead of updating SCAP and assigning channels every slot, we relax the per-slot updating
requirement into T -slot updating requirement, which allows SCAP to be renewed every
T slots. To satisfy the relaxed requirement, the sniffer center only needs to check for new
observations and incorporate them in the channel usage model every T slots. In other words,
SCAP gets updated and channels are reassigned, only if there is at least one new observation
during T slots. In our implementation, we can set T = 5, so that the implemented model
update and channel assignment complete before the next channel assignment can be carried
out, i.e., within 100ms or 5 slots (as 91.2ms < 100ms). We show in the following sections
that per 5-slot updating does not sacrifice the monitoring performance much compared
with per-slot updating, thus it satisfies the real-time processing while retaining an excellent
performance. Note that we neglect the wire side communication costs between inspection
sniffers and sniffer center, which are in the order of microsecond, regarded as negligible.
70
KDE Opera-tion
KStest Oper-ation
SCAP Com-putation
OptimizationAlgorithm
Total Time
58.1 0.6 0.6 30.8 91.2
Table 3.3: Average running time with 20 channels and 10 sniffer antennas (in ms)
3.5.3 Frame Capturing Performance
In this section, frame capturing performances of different channel assignment algorithms are
evaluated. First, we generate synthetic traces to evaluate the frame capture rate and channel
switching cost. Frame capture rate is defined as the ratio of the number of captured frames
versus the overall number of frames passing through all the channels up to the current time,
while channel switching cost is represented by the negative part of Eq. 3.7. Then, we collect
real-world traces using AirPcap Nx [100] with Wireshark. The traffic traces are captured
from multiple channels of operative WLANs (802.11g mode) to emulate the scenarios in
WhiteFi networks. We evaluate SpecMonitor by comparing its frame capturing performance
with other algorithms. For all the following evaluations, we measure the performance of
algorithms running through 1000 slots for 100 rounds.
Synthetic Traces
First, we generate synthetic time series traces to represent frame interarrival time, using
Gaussian distribution with an exponential correlation function. Each trace corresponds to
the traffic generated in one channel with different mean values to simulate different traffic
loads. We evaluate the capturing performance w.r.t. different switching cost weights α.
Generally speaking, one channel switching causes a penalty of losing α slot, α ∈ [0, 1].
We assume the training process of SVR scheme has already been done, which takes about 35
training samples [89]. Fig. 3.11(a) shows the frame capture rates of different methods. With
the increase of α, frame capture rates of all three compared schemes fall down steadily because
71
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Alpha
Fra
me
Cap
ture
Rat
e
SpecMonitorSVRRandomGreedy
(a) Frame Capture Rates
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Alpha
Cha
nnel
Sw
itchi
ng C
ost
SpecMonitorSVRRandomGreedy
(b) Channel Switching Costs
Figure 3.11: Performance with different methods using synthetic time series data (5 channels,3 sniffer antennas)
of the increasing penalty for channel switching. However, with excessive α, SpecMonitor will
force the sniffers to switch channel only when the reward from channel switching is higher
than the penalty; otherwise, it keeps the sniffers staying in the current channels. In this
way, SpecMonitor retains an excellent frame capture performance. Regarding the SVR
scheme, it performs best when the channel switching costs are neglected (α = 0 or 0.1),
which means SVR achieves accurate estimations of frame interarrival time. However, when
α grows larger than 0.2, frame capture rate of SVR scheme drops steadily because of the
aggravating switching penalty caused by frame loss. Note that the capturing performance of
SpecMonitor also drops a bit due to higher switching costs, but then reverts back to surpass
the performance curves of any other schemes.
Fig. 3.11(b) shows channel switching costs w.r.t. α, from which we can see SVR methods
induce highest switching costs among all methods, because of its unslotted and heuristic
switching strategy. In contrast, the switching cost of SpecMonitor remains the lowest.
Finally, Fig. 3.12(a) shows the different capturing capabilities w.r.t. the number of sniffer
72
6 7 8 9 10 11 12 13 14 15 160
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
11
Number of Sniffers
Fra
me
Cap
ture
Rat
e
Our SchemeSVR SchemeRandom SchemeGreedy Scheme
(a) Frame Capture Rates
6 8 10 12 14 160
1000
2000
3000
4000
5000
6000
Number of Sniffers
Cha
nnel
Sw
itchi
ng C
ost
SpecMonitorSVRRandomGreedy
(b) Channel Switching Costs
Figure 3.12: Performance with varied number of sniffer antennas using different methods(α=0.25, 20 channels)
antennas, the frame capturing performance of all the methods keeps growing with the in-
creasing number of antennas. SpecMonitor achieves the highest frame capture rate. We also
compare channel switching costs in Fig. 3.12(b). With more sniffer antennas, the channel
switching costs of SVR method decreases significantly, because the increased traffic captur-
ing capability refrains SVR method from aggressive channel switching behavior. Meanwhile,
the other methods have much less, yet more stable channel switching costs.
Real Traces
We collect the real traces from 802.11g WLAN network, captured by a sniffer listening on the
channel established by one AP and client pair running various applications. The captured
traces include both the uplink traffic to AP and the downlink traffic from AP. We consider
five different types of trace data (FTP, BT, Web Browsing, Skype Voice and Skype Video)
with one trace per channel. FTP and BT traces are obtained by running an automated script
on the client to download/upload several files from/to a server continuously, and we write
73
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Alpha
Fra
me
Cap
ture
Rat
e
SpecMonitorSVRRandomGreedy
(a) Frame Capture Rates
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
200
400
600
800
1000
1200
1400
1600
1800
2000
Alpha
Cha
nnel
Sw
itchi
ng C
ost
SpecMonitorSVRRandomGreedy
(b) Channel Switching Costs
Figure 3.13: Performance using real-world traffic (7 channels, 4 sniffer antennas)
another automated script to browse several websites to collect Web Browsing trace. Skype
voice trace and video trace are collected by connecting to a client using Skype voice call
or Skype video call. We evaluate the performance with seven channels of real-world traffic,
while the additional two channels contain mixed traffic pattern. Namely, one is the traffic
combined from two clients using Skype Voice and BT, and the other one is generated from
two clients using Skype Voice and Web Browsing. The performance is shown in Fig. 3.13(a)
and Fig. 3.13(b), from which we can see SVR method performs even worse than random
scheme. The reason is that the SVR method takes a long time for retraining, when the
predicted value has a large deviation from the genuine one because of the real-world traffic
dynamics. Frequent retraining and channel switching operations significantly deteriorate the
capturing capability of SVR method. However, SpecMonitor retains the best performance,
except in the case of small α, the greedy method performs better when channel switching only
incurs a small penalty. This comparison result also indicates that our model can accurately
capture the traffic statistics regardless of whether the traffic is interleaved or not.
74
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
11
Alpha
Fra
me
Cap
ture
Rat
e
Per−slot updating5−slot updating10−slot updating
Figure 3.14: Average frame capture rate comparison of multi-slot updating and per-slotupdating with 20 channels and 10 sniffer antennas
Comparison of multi-slot updating and per-slot updating
As mentioned in Section 3.5.2, multi-slot updating relaxes the real-time requirement. In
this section, we compare the performance of multi-slot updating with per-slot updating.
Fig. 3.14 presents the frame capturing performance comparison for multi-slot updating and
per-slot updating, which shows a slight performance degradation using multi-slot updating
method. Interestingly, 5-slot updating achieves a better frame capture performance when
α = 0.7, because it incurs less switching costs by switching at least every 5 slots. However,
in most cases, per-slot updating captures more frames, due to its more rapid adaptation to
the traffic dynamics. From this performance comparison, we conclude that 5-slot updating
retains an excellent frame capturing performance while fulfilling the real-time requirements
as presented in Section 3.5.2.
The intuition behind the fact that T-slot updating achieves similar results is that the data
distribution presented in our experiments does not change rapidly over the course of T slots.
However, this fact does not apply to all the traffic scenarios. For example, for traffic scenarios
when the traffic statistics are rapidly changing, T should be assigned a small value. The exact
75
value of T can be picked using the above performance comparison method, via evaluating
the performance of various T values and selecting a larger one with acceptable performance
degradation.
3.5.4 User Capturing Performance
Finally, we evaluate the performance for maximizing UL-QoM using synthetic data. We
assume different channels contain different numbers of SUs, and the numbers are dynamically
changing within range [0, 10] (assume uniform distribution); also the frame interarrival time is
exponentially distributed with mean values residing in [1, 40], specifying the traffic pattern.
First, we compare the expected number of captured users per slot using three different
monitoring schemes in Fig. 3.15(a). The result indicates that SpecMonitor is able to capture
more users per slot, because the optimized monitoring strategy keeps the sniffers watching
the channels with more users.
Then, we define Active User Capture Rate as the ratio of number of active users captured
versus the overall number of the active users appeared in all the channels. The performance
of active user capture rate w.r.t. different number of sniffers is shown in Fig. 3.15(b), from
which we notice that SpecMonitor can select best sets of channels to maximize the number
of active users captured during the monitoring period. The result implies SpecMonitor
significantly outperforms two baseline schemes, in terms of user capturing performance. We
plan to use real-world traffic to examine user capturing performance of SpecMonitor coupled
with reliable user identification schemes such as radiometric signature [101] in our future
work.
76
SpecMonitor Greedy Random0
10
20
30
40
50
60
70
Exp
ecte
d N
umbe
r of
Cap
ture
d U
sers
per
Slo
t
5 Sniffers10 Sniffers15 Sniffers
(a) User-Level QoM Objective
6 7 8 9 10 11 12 13 14 15 160
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.90.9
Number of Sniffers
Act
ive
Use
r C
aptu
re R
ate
SpecMonitorRandomGreedy
(b) Active User Capture Rate
Figure 3.15: User capture performance with 20 channels
3.6 Summary
In this chapter, we have introduced a systematic passive monitoring framework, SpecMonitor,
for Wi-Fi like CRNs to maximize two levels of QoMs incorporating switching costs. Both
the primary user and secondary user channel usage patterns are considered to optimize the
monitoring strategy. Specifically, we proposed an online non-parametric density estimation
scheme to learn and predict the time-evolving mixed traffic pattern from SUs. Based on
the predicted traffic pattern, the optimization problems of sniffer channel assignment are
formulated, for which we designed near-optimal monitoring algorithms. Our simulation and
experimental results both showed that SpecMonitor has superior capturing capability with
low channel switching overhead. One major limitation of the SpecMonitor system is that
SpecMonitor requires a substantial amount of traffic of interest on the channel in order to
produce a reasonable channel access model. If the traffic amount over a channel is small,
the produced model may be unreliable. In future research, we will consider the impact of
traffic amount to the channel access model. We plan to evaluate the modeling accuracy in
real time. The model built with a small traffic amount will be deemed as unreliable, which
77
will not be used for future predictions and channel assignments. In addition, we will explore
the capability of the proposed framework in identifying malicious or misbehaved network
activities. We also plan to design customized quality of monitoring objective for different
security services, such as optimizing the attack signature generation or facilitating link loss
analysis.
Chapter 4
MIMO-based Jamming ResilientOFDM Communications in WirelessNetworks
Jamming is a common but serious threat to wireless communication. In particular, reactive
jamming is considered the most powerful jamming attack as the attack efficiency is max-
imized while the risk of being detected is minimized, which has been implemented using
software defined radios. Currently, no effective anti-jamming solutions exist to enable reli-
able OFDM wireless communications in the presence of reactive jamming attack. On the
other hand, MIMO has emerged as a technology of great research interest in recent years
mostly due to its capacity gain. In this chapter, we explore the use of MIMO technology for
jamming resilient OFDM communication, especially its capability to communicate against
the powerful reactive jammer. We first investigate the jamming strategies and their impacts
on the MIMO-OFDM receivers. We then present a MIMO-based anti-jamming scheme that
exploits interference cancellation and transmit precoding capabilities of MIMO technolo-
gy to turn a jammed non-connectivity scenario into an operational network. Our testbed
evaluation shows the destructive power of reactive jamming attack, and also validates the ef-
ficacy and efficiency of our defense mechanisms in the presence of numerous types of reactive
78
79
jammers.
4.1 Related Work
Jamming Attack and Defense Mechanisms. The mainstream jamming defense mech-
anisms rely on FHSS and DSSS, either requiring the communicating parties to pre-share
secret keys [102, 103], or let them communicate without pre-shared keys [104, 105]. Recent-
ly, powerful reactive jamming has aroused many researchers’ interests. For instance, [34]
demonstrates the feasibility of reactive jamming using software-defined radios. [25] proposes
detection mechanism to unveil reactive jammer in sensor networks. [106] investigates the im-
pacts of reactive smart jamming attacks to IEEE 802.11 rate adaptation algorithms. Recent
studies consider to defend against more powerful wideband and high power jamming attack-
s [107, 108]. However, both of them only support low data rate communications. Besides
that, both of these two defense mechanisms only work for conventional wireless communica-
tions that are not OFDM-based. In [109], Vo-Huu et al. proposes a mechanical beamforming
scheme and a digital interference cancellation algorithm to cancel jamming signals. However,
they can only deal with static adversaries and require additional hardware costs, while our
mechanism is purely digital which is capable of dealing with mobile attackers as long as the
channel estimation is accurate. Further, they only focus on non-OFDM systems.
In the context of jamming resistent MIMO/OFDM communications, Rob Miller et al. [110]
study various jamming attacks to disrupt the MIMO communication by targeting its chan-
nel estimation procedure. Specifically, the adversary interferes with the preambles or pilots
to let sender and receiver perform false estimation. In similar essence, some recent work
investigate and attempt to alleviate the impacts of jamming attacks to the OFDM systems.
Han et al. [111] propose a jammed pilot detection and excision algorithm for OFDM systems
80
to counteract narrow-band jammer that jams the pilot tones. Clancy et al. [112] further
introduce pilot nulling attack that minimizes the received pilot energy to be more destruc-
tive, and provide mitigation schemes by randomizing the location and value of pilot tones.
However, they both specifically focus on the adversaries jamming pilot tones, who require
knowing the pilot locations and also demand very tight synchronization. Moreover, their
defense mechanisms will fail to recover signals when all the OFDM subcarriers including the
pilots are jammed as in the case of reactive jamming attack. Note that it is extremely diffi-
cult for the adversary to synchronize his/her transmission with the legitimate sender during
the short channel sounding period, while this chapter focuses on a more practical reactive
jamming attack.
Interference Cancellation Mechanisms. Research efforts in the interference manage-
ment area have developed novel interference cancellation techniques to improve the network
throughput [36], medium access protocol [38] and robustness [37] of MIMO networks. [36]
proposes a centralized solution to combine interference cancellation and alignment for decod-
ing concurrent transmissions in MIMO networks, doubling the throughput of MIMO LANs.
Lin et al. [38] extend the previous work by presenting a distributed random access pro-
tocol. Shen et al. [113] further develop a rate adaptation scheme through learning clients’
signal directions. However, all the above works consider interferences caused by concur-
rent transmissions from legitimate senders in the same network. The most relevant work
is [37], which enables MIMO communications under high-power cross-technology interferers.
Yet, our work exposes several significant differences: 1) we consider smart jammers, who
can adapt their attack strategy to be more destructive, while interferers are unintention-
al; 2) their channel estimation methods require to average over multiple OFDM symbols,
which is not applicable for tracking jammer’s channel due to jammer’s fast adaptation, while
our mechanism inserts pilots into known locations to jointly track the sender and jammer’s
81
Figure 4.1: Reactive jammer starts jamming after certain reaction time
channels instantaneously.
4.2 Problem Formulation
In this section, we present the system model, define the attack model and lay out preliminary
knowledge of MIMO-OFDM communication.
4.2.1 System Model
We consider an adverse wireless environment with a jammer targeting at the communication
link established by a sender and a receiver. We assume that the jammer is a common single-
antenna device, with the capability of taking any attack strategy to be most destructive.
The frames in OFDM wireless communications have signal structures as shown in Fig. 4.1.
A preamble is transmitted ahead of the data, which is used for signal acquisition, time
synchronization and initial channel estimation. We assume the sender transmits when the
jammer is not jamming, either by taking a random backoff between transmissions or by
sensing jamming activity [108]. We assume every sender and the intended receiver share a
secret key that is unknown to the jammer.
At receiver R, let PSR and PJR be the received signal powers from S and J respectively. The
signal-to-jamming ratio (SJR) at receiver R can be expressed as PSR/PJR, which determines
82
the decoding performance. We do not consider the noise and interference, since they are
negligible when compared to the jamming power.
4.2.2 Attack Model
There are three typical jamming attack models: 1) constant jammer continuously transmits
jamming signals to corrupt packet transmission. He/She has the capability of covering the
whole frame structure, whereas his/her energy consumption is extremely high, rendering
himself/herself easily discoverable; 2) random jammer is more energy-efficient, as he/she
emits jamming signals at random time for a random duration. However, his/her jamming
capability is limited due to the randomized jamming behavior; 3) reactive jammer is more
effective, energy-efficient and stealthier [25], which is the main focus of this chapter.
The key feature of reactive jammer is sensing-before-jamming. The jamming reaction time
denotes the time difference between the arrival of the original signal and the jamming signal
at the receiver. It takes a reactive jammer a minimum reaction time to perform channel
sensing and jamming initialization before sending out jamming signals, during which the
preamble of the frame could be transmitted without being jammed [34, 108], as shown in
Fig. 4.1.
In our experiment, a preamble takes only one OFDM symbol, which lasts 128µs with 1MHz
bandwidth. On the other hand, the jammer, who is agnostic to the implementation details
of the network (e.g., the transmission protocol and preamble symbols), can only carry out
energy detection [114], which requires more than 1ms to detect the signal for a 0.6 detection
probability and −110dBm signal strength, when implemented in a fully parallel pipelined
FPGA [92]. Even the advanced software radio based reactive jammer, who is aware of the
implementation details of the network, still incurs a considerable reaction delay including
83
software and hardware delays to process the incoming signal and to make a jamming decision,
during which the preamble of a frame is successfully delivered to the receiver without being
disturbed [34,35,115].
In addition, the jammer can transmit arbitrary signals with/without any signal structures.
The jammer is also capable of jamming the whole spectrum, invalidating the traditional
spread spectrum anti-jamming methods [107,108]. However, we assume the jammer cannot
perform full-duplex communications, which essentially disallows the jammer to sense and
jam simultaneously.
4.2.3 MIMO Interference Cancellation and OFDM Basics
In a MIMO network, the spatial multiplexing gain can be represented by a concept called
Degrees-of-Freedom (DoF), which is defined as the dimension of received signal space over
which concurrent communications can take place [116]. DoF indicates the number of con-
currently transmitted streams that can be reliably distinguished at a MIMO receiver.
Consider a 1 × 2 MIMO communication between sender S and receiver R as shown in
Fig. 4.2, the signals (xsxj ) from the sender and jammer respectively are transmitted concur-
rently through the channel H, and the received signals can be written as:
(y1y2) = (hsh′s)xs + (hjh′j
)xj, (4.1)
which live in a two-dimensional vector space corresponding to two receive antennas.
In order to decode xs, the IC technique is utilized to remove the interference from xj by
projecting the received signals onto the subspace orthogonal to xj (see Fig. 4.2), i.e., [h′j,−hj],
84
yielding a projected signal as:
yproj = h′jy1 − hjy2 = (h′jhs − hjh′s)xs. (4.2)
After that, the projected signal can be decoded using any standard decoder. This IC tech-
nique is also called Zero-Forcing (ZF).
According to Eq. 4.2, the knowledge of channel coefficients seems indispensable in decoding
xs, the estimation of which is referred to as channel estimation, which can be done by
transmitting a known symbol from the sender. However, as the jammer’s signals may not
have any recognizable signal structure, it becomes impossible to learn his/her corresponding
channel coefficients. Fortunately, we claim that to learn the exact values of jammer’s channel
coefficients is unnecessary, since we are not interested in decoding jammer’s signals. Instead,
we show in Section 4.4 that it will be sufficient to know the direction of the received jamming
signal. Note that, estimating jammer’s signal direction1 is the core of ZF decoder. Also, a
loss of original signal amplitude after projection is observed from Fig. 4.2.
OFDM divides the spectrum into multiple narrow subbands called subcarriers. The receiver
operates on each subcarrier, and applies FFT to the received signal for demodulation. This
allows many narrowband signals to be multiplexed in the frequency domain, which greatly
simplifies the channel estimation and equalization. In this chapter, the sender and receiver
establish OFDM communications with the signals of interest as OFDM-modulated signals.
Note that Eq. (4.1) assumes a narrowband channel, where h (such as hs, hj, etc) appears
simply as a complex number. However, for wideband channels, the signals at different fre-
quencies will experience different channels, bringing so called multi-path effects. As a result,
h will become a complex vector indexed by different frequency responses. Yet, Eq. (4.1)
1Signal direction is determined by the received signal vector induced on the receive antenna array by thetransmitted signal [116], which is defined in the antenna-spatial domain and not the I-Q domain.
85
Figure 4.2: 1× 2 MIMO-OFDM link attacked by a Jammer
still holds for each OFDM subcarrier in the OFDM communications, such that MIMO IC is
carried out over each subcarrier.
4.3 Impact of Reactive Jamming Attack to MIMO-
OFDM Communications
In this section, we investigate the impact of reactive jammer to the MIMO-OFDM commu-
nications. Without loss of generality, we explain the jamming strategy in the context of a
two-antenna receiver decoding a single transmission from the sender in Fig. 4.2. The sender
and receiver form a 1× 2 MIMO link of two DoF with one DoF consumed by the jammer.
According to Eq. (4.1), the received frequency-domain signals for each OFDM subcarrier i
are shown below:
y1i = hjixji + hsixsi, (4.3)
y2i = h′jixji + h′sixsi, (4.4)
where hji, h′ji, hsi and h′si are frequency version of channels at subcarrier i, and xji and xsi
are frequency-domain signals from the jammer and sender. Note that the jamming signals
need not be OFDM signals, and xji simply represents the narrowband portion of jamming
86
signals on i-th OFDM subband. As mentioned in Section 4.2.3, the MIMO IC technique is
carried out over each subcarrier to recover the legitimate signal, which is deemed as the key
to the data recovery process. Naturally, the MIMO IC technique becomes the target of the
jammer.
We reformulate Eqs. (4.3), (4.4) as follows (in the following, we omit the subscript notation
i for i-th subcarrier):
(y1y2) = H(10)xj + H(0
1)xs, (4.5)
where H = [hj hsh′j h
′s] = [hj,hs] is the 2 × 2 channel matrix. The received signals are the sum
of two vectors Jr = H[1 0]Txj and Sr = H[0 1]Txs, as shown in Fig. 4.2. We find that the
angle2 between Jr and Sr, determined by hj and hs, can be exploited by the jammer to
launch effective attack.
Attacking MIMO Interference Cancellation. In order to understand the attack strat-
egy, we inspect three special scenarios in Fig. 4.3 with different received signal spaces. Un-
doubtedly, the most severe attack is depicted in Fig. 4.3(a), in which Jr overshadows Sr in
the received signal space, preventing Sr from being recovered. On the contrary, the least
powerful attack emits a jamming signal that is orthogonal to the legitimate signal as shown
in Fig. 4.3(b), in which the projected signal is equivalent to the original signal, yielding
the highest projected signal amplitude. Fig. 4.3(c) shows a case between the above two
extreme cases, where the angle between two received signals takes a small value. Therefore,
by manipulating the jamming signal direction, the jammer has the potential of affecting the
effectiveness of MIMO IC mechanism.
Correspondingly, the jammer’s attack strategy is to shrink the angle between the jamming
2The angle between two received signal vectors is equal to the angle between two channel vectors, com-
puted by cosθ =|hH
j ·hs|‖hj‖‖hs‖ . The angle’s range is [0, π2 ].
87
signal and the intended signal by moving towards the vicinity of the sender. As a matter
of fact, the difference between hs and hj deviates according to the distance between S
and J [71]. More specifically, if the spacing between two antennas is narrower than a half
wavelength, the channels from these two antennas will become highly correlated [116], which
renders two received signal directions similar.
In order to demonstrate the effectiveness of such attack strategy, we perform an experiment
on a 1× 2 MIMO link of Fig. 4.2 by varying the distance between the jammer and sender’s
antennas. Fig. 4.4 shows the packet delivery rate (PDR) performance, in which sender’s
PDR drops to nearly zero when the antenna distance decreases below 6cm.
Antenna-Spatial Domain vs. I-Q Domain. The jammer has the ability of varying the
phase of the jamming signal, resulting in a same situation as having frequency offset. The
frequency offset causes the signal vectors to rotate in the I-Q plane. One may speculate that
the jamming signal may not have a constant phase offset with the signal of interest as shown
in Fig. 4.3, even if the channel matrix is fixed. This reasoning however is incorrect, since the
received signal space of Fig. 4.3 is in the antenna-spatial domain and not in the I-Q domain.
The frequency offset only determines how signal rotates in the I-Q domain, but only scales
the direction of the signal vectors in the antenna-spatial domain by a complex number [36].
In other words, the jamming signal direction in the received signal space is unaffected by
signal rotation in the I-Q domain, but instead is determined by the channels between the
sender, jammer and the receiver. Therefore, the jamming signal direction keeps constant
during the channel coherence period.
88
(a) Overlap signals (b) Orthogonal (c) Small angle
Figure 4.3: Different two-dimensional received signal spaces
0 2 4 6 8 10 12 14 16 180
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Distance Between Two Antennas (cm)
Pac
ket D
eliv
ery
Rat
e
Figure 4.4: Jamming attack performance by approaching the sender’s location (in this ex-periment, the device works on 2.45GHz central frequency with a half wavelength λ
2= c
2f≈
6.12cm)
4.4 Defense Mechanisms of Reactive Jamming Attack
In this section, we propose effective MIMO-based defense mechanisms to counteract reactive
jamming attack based on IC technique. We first develop an iterative channel tracking mech-
anism to cancel arbitrary jamming signals by keeping track of the jamming signal direction.
Then, we build an enhanced defense mechanism by incorporating sender signal enhancement
to enable a more robust OFDM communication.
As opposed to the attack strategy to shrink the angle between two arrival signals, the defense
mechanism attempts to expand the angle. We address two major issues in this section: 1)
how to decode the signals of interest in the presence of arbitrary jamming signals; 2) how to
strengthen the robustness of OFDM communications against adaptive and reactive jammer.
89
4.4.1 Defense Mechanism Overview
We offer an overview of proposed defense mechanisms in this section. The defense mech-
anism mainly includes angle expansion, signal decoding (Section 4.4.2), channel tracking
(Section 4.4.2) and jamming detection (Section 4.4.3) modules. Angle expansion module
aims at expanding the angle of arrival signals to make intended signals decodable. As long
as the jammer fails to approach the sender, the channels hs and hj will be uncorrelated,
resulting in a random angle between Sr and Jr, and thus a high decoding rate. To preven-
t the jammer from getting close is straightforward, the sender can move randomly inside
the receiver’s reception range to avoid being approached. Alternatively, spatial retreat [117]
technique can be utilized to strategically move away from the jammer. Then, signal decoding
is implemented using MIMO IC technique after channel estimation. Meanwhile, jamming
detection module intends to instantly identify the beginning and end of a jamming attack
to trigger the defending process.
Enhanced defense mechanism (Section 4.4.4) involves SSE module, for rotating the transmit-
ted signal to improve sender signal decodability. It also incorporates a feedback mechanism
to reliably guide the sender’s rotation process. A flow chart is illustrated in Fig. 4.5, which
shows both the defense mechanism and its enhanced version.
4.4.2 Decoding the Signal of Interest
According to Eqs. (4.2), (4.5), the estimation of the sender’s and jammer’s channels is
the most crucial task in jamming-resistant solution based on MIMO IC technique. Initial
estimation of sender’s channel hs can be derived via analyzing the undisturbed preamble.
However, since initial channel estimation is only valid within the channel coherence time,
updating the channel estimation over time becomes a necessity.
90
Figure 4.5: A Flow Chart of Proposed Defense Mechanisms (Solid Box: Modules of theDefense Mechanism, Dashed Box: Modules of Enhanced Defense Mechanism)
Inspired by ZigZag decoding technique [118], we devise an iterative channel tracking mech-
anism by jointly keeping track of both the sender and jammer’s channel conditions in a
timely manner. In the following, we first exhibit jammer channel estimation method, and
then present the iterative mechanism for updating both channels iteratively.
Jammer Channel Estimation. Without pre-known preambles in the jamming signals, it
is difficult to carry out jammer channel estimation. Fortunately, the most recent advance [37]
shows that the complete knowledge of hj = [hj, h′j]T is not necessary for decoding xs. Due
to the nice scale invariance property of signal direction, i.e., the direction of [hj, h′j]T is
equivalent to that of [hjh′j, 1]T , the only information required about jamming signal for IC to
work is the signal direction, i.e. jammer’s channel ratiohjh′j
.
Note that the received signal is a mixed signal Jr + Sr. If we can extract jammer’s signal
Jr = (hjh′j
)xj, we can derive the jammer’s channel ratio by computing the ratio of received
91
Figure 4.6: Extended frame structure
jamming signals on two receiving antennas, ashjh′j
=xj ·hjxj ·h′j
. Based on this derivation, We
propose the following method to enable the extraction of the jamming signal Jr so that the
channel ratio can be computed.
As shown in Fig. 4.6, the basic idea of extracting the received jamming signal Jr is to insert
known symbols (i.e. pilots) in the original data frame, and then subtract them from the
received mixed signal. The location of the inserted pilots should remain secret between the
sender and intended receiver, because if the jammer learns the locations of the pilots, he/she
can intentionally stop jamming during these pilot periods to avoid being tracked. Moreover,
the pilots should be inserted frequently to enable frequent updates of the channel estimation.
Note that, the extension of the frame structure introduces limited overheads, which will be
evaluated in Section 4.6.4.
The complete jammer channel estimation scheme proceeds as follows: 1) after detecting the
beginning of jamming (refer to Section 4.4.3), the intended receiver finds the next jammed
pilots; 2) the received pilots are reconstructed using the known pilot symbol transformed
by the estimated sender’s channel (sender channel estimation is presented below); 3) the
constructed received pilots are subtracted from the jammed pilots to restore the jamming
signal; 4) the extracted jamming signal is used to compute the jammer’s channel ratio
(jamming signal direction).
Iterative Channel Tracking Mechanism. For IC to work, we need the estimations of
both the sender channel and the jammer channel. When the channel is being jammed,
92
deriving an accurate estimation of sender channel is a difficult task. In addition, wireless
channels are time-varying due to multipath fading effects. Jammers are also motivated to
vary the channel in order to evade the defense mechanism. To keep the channel estimation
updated and accurate, we need to carry out the channel estimation frequently. However,
the estimation of both channels under the jamming situation is hard - we have two channel
responses to estimate and the received signal is a mixed signal with two unknown signal
components.
We propose the following alternating and iterative method to keep track of the sender and
jammer channels. The key idea of the proposed method is that, we will not be able to
calculate the two channel estimations given two unknown signals. However, we will be able
to estimate one channel if the other is known. We can make the initial sender channel
estimation after receiving the preamble. Assume there was no jamming signal, the initial
sender channel response can be estimated as:
Hs(0) = (hs(0)h′s(0)) = (y1y2)/x
�s, (4.6)
where x�s denotes the known pilots. We will then do the sender and jammer channel esti-
mations alternately for every pilot received. Assume the pilots are numbered as i = 1, ..., n.
After receiving the first pilot (or odd numbered pilot), the receiver updates the jammer
channel ratio as:
hj(i)/h′j(i) =
y1 − x�s · hs(i− 1)
y2 − x�s · h′s(i− 1), i = 1, 3, ..., (4.7)
where we assume the sender channel did not change in the past time slot. Similarly, after
receiving the second pilot (or an even numbered pilot), the receiver updates the sender
93
channel estimation Hs(i) = (hs(i)h′s(i)
) according to:
hs(i)−hj(i− 1)
h′j(i− 1)h′s(i) = (y1 −
hj(i− 1)
h′j(i− 1)y2)/x�s, i = 2, 4, ..., (4.8)
where we assume the jammer channel did not change in the past time slot. Two unknown
sender channel components hs(i) and h′s(i) in Eq. (4.8) are updated alternately after receiving
an even numbered pilot. Specifically, hs(i) gets updated when i = 4, 8, ..., while h′s(i) gets
updated when i = 2, 6, ..., by assuming the other channel component did not change over
the past two time slots. This updating process continues in such a way that the sender and
jammer channels are updated alternately. Note that this mechanism requires very frequent
channel updates, within the channel coherence time, which can be as short as tens of OFDM
symbol time [119] in some application scenarios. On the other hand, this frequent channel
updates help us to keep close track of the jammer’s potential fast adaptation.
Sender Signal Decoding. Based on Eq. (4.2), the signal of interest x∗s can be written as:
x∗s =y1 − hj
h′jy2
hs − hjh′jh′s, (4.9)
in whichhjh′j
is updated every odd numbered pilot in Eq. (4.7), and (hs− hjh′jh′s) is updated every
even numbered pilot in Eq. (4.8). With precise and frequent updates of channel estimation,
the signal of interest can be correctly recovered using any standard decoder.
Inter-Symbol Interference Issue. Another practical issue with the wideband jamming
signal is that it suffers from multipath effects, which leads to inter-symbol interference (ISI).
ISI of jamming signals will impose additional noise to Eq. (4.5). To counteract ISI, we
average our channel tracking results derived from multiple pilots within channel coherence
time to mitigate the negative effects of ISI on channel estimation. While it is not a problem
94
for accurate channel estimation, this additional noise would reduce the SNR of the intended
signal, hence, affects the throughput. To address ISI issue, we must directly investigate
the time-domain signal, since ISI is inherently a time-domain phenomenon. We apply the
method in [37] to deal with ISI issue, i.e., we convolute the received time-domain signals
with a filter constructed by taking the IFFT of jammer’s channel ratio to cancel out the
ISI and jamming signal simultaneously. The signal of interest can then be decoded using a
standard decoder.
4.4.3 Detecting the Jamming Signal
As mentioned in previous section, the receiver needs to detect the beginning and end of
jamming to facilitate IC mechanism. The jamming detection problem has been studied
in [108], in which the constellation diagrams are employed to identify jammed symbols. We
follow the same principle. Soft error vector is utilized to build the detection metric, defined
as the distance vector between the received symbol vector and the nearest constellation
points in the I/Q diagram, as shown in Fig. 4.7(a). The soft error is further normalized by
minimum distance of the constellation. We assume the normalized soft error vector is ‖Vk‖
for k-th received symbol, then the jamming detection metric is defined as ‖Vk‖/‖Vk−1‖ at k-
th symbol time, which is named as jumped value. Jamming attack is supposed to start when
‖Vk‖/‖Vk−1‖ > γ, where γ is a pre-defined threshold for jamming detection. Jamming
attack stops if the jumped value returns to normal. In our system design, we discover a
potential jammer by identifying a jump that is higher than doubling the errors with the
jamming attack, so that γ = 2. An example is shown in Fig. 4.7(b), where we can easily
identify the beginning and end of the jamming attack.
95
(a) Soft error vector (b) Detection example
Figure 4.7: Soft error based jamming detection
4.4.4 Enhanced Defense Mechanism
The basic idea of IC is to project the received sender signal to the direction that is orthogonal
to the received jammer signal. As shown in Fig. 4.3, the signal after projection will have a
reduced signal amplitude, depending on the angle between the two signals. The IC method
is most effective when the sender signal and the jammer signal are orthogonal [37, 113].
Therefore, another approach we can explore here is to maximize the amplitude of projected
sender signal, i.e. to improve the sender signal decodability.
The key idea is to rotate the sender’s signal so that the received sender signal is orthogonal
to the jamming signal. This mechanism works for a multi-antenna sender. Using a 2 × 2
MIMO link as an example,
(y1y2) = hjxj + Hs(10)xs, (4.10)
where hj denotes a two-dimensional channel vector from J to R, and Hs is the 2× 2 channel
matrix from S to R. We exploit the nice property of MIMO communications to control the
received signal vector along which the signal is received [36]. Instead of multiplying vector
[1 0]T , MIMO allows the sender to multiply with a different two-dimensional vector ~r, which
96
we call rotation vector 3 . After that, the sender will transmit two elements of ~r ·xs, one over
each antenna respectively, and the receiver will receive Hs ·~r · xs. In this way, the sender is
able to control the received signal vector, thus the received signal direction.
Constraints on Rotation Vector. After signal rotation, the received signal can be repre-
sented as:
(y1y2) = hjxj + Hs~rxs,
with a 2 × 2 channel matrix between S, J and R as H = {hj,Hs~r}. In order to make xs
decodable, H should remain as a full rank matrix. Thus, one constraint on ~r is that it cannot
reduce the rank of channel matrix.
In addition, the received signal powers from the sender and jammer are PSR ∝ Ps‖Hs~r‖2 and
PJR ∝ Pj‖hj‖2, where Ps and Pj are the sender and jammer’s transmission powers. From
the above formulas, different ~r may induce different PSR and SJR, which will in turn affect
the decoding performance. Therefore, we set ~r as a unit vector, i.e., ‖~r‖ = 1, such that PSR
can be confined in a reasonable range.
Sender Signal Enhancement Mechanism. In a 2 × 2 MIMO link of Eq. (4.10), signal
rotation can be achieved by simply multiplying normalized ~r = (H−1s · h⊥j )/‖H−1
s · h⊥j ‖ =
H−1s · [1,−
hjh′j
]T/‖H−1s · h⊥j ‖ to the sender signal, so that the received legitimate signal will
be orthogonal to the jamming signal, where h⊥j stands for the orthogonal vector of hj.
However, SSE is carried out over sender signal, while the channel estimation is conducted
at the receiver side. A feedback mechanism is necessary for sending the rotation vector ~r
calculated at the receiver back to the sender.
We define a “burst of packets” as a consecutive sequence of packets during the commu-
nications as shown in Fig. 4.8. During each burst, after identifying jamming threats, the
3Note that the signal rotation is carried out in the antenna-spatial domain rather than in the I-Q domain.
97
Figure 4.8: Burst of packets
sender continuously rotates the transmit signals of the subsequent frame using the computed
rotation vector of the previous frame carried by the feedback frame. To reliably feedback
rotation vectors in the presence of reactive jammer, we develop a feedback mechanism as
follows.
Feedback Mechanism. The feedback frame can be formulated using the same frame struc-
ture in Fig. 4.1 because it is short. The same IC technique can be employed to decode the
feedback information at the sender, reversing the roles of the sender and receiver in the
forward channel. However, during the transmission of packet bursts, it is highly likely that
both the feedback packets and the subsequent forwarding packets will be completely jammed
by the reactive jammer. In such a scenario, we try to find an opportunity to compute the
jammer’s channel ratio when the jammer is alone on the medium.
There are various situations that a jammer’s isolated transmission could be captured. In
the case that the feedback packets are covered by the jamming signals, the jamming signal
transmits ahead of the feedback signal, leaving the opportunity of capturing the jammer’s
isolated transmission, from which the sender can compute the jammer’s channel ratiohjsh′js
by
taking the ratio of two jamming signals received on his/her two antennas ys1 = hjsxjs and
ys2 = h′jsxjs. The receiver could also delay the transmission of the feedback packet for a
random time period so that the sender could capture jammer’s isolated transmission right
after his/her own transmission finishes. In either case, the sender uses the jammer’s channel
98
ratio to eliminate the jamming signal from the received mixed signal Jr + Sr, and find the
preamble to estimate the feedback channel using Eq. (4.6), which can be used for signal
decoding as usual.
Similarly, the receiver can also use the same mechanism to recover the completely jammed
forwarding packets in a packet burst. Two points are worth noting: first, the sender needs
to detect the jamming signals to decide whether he/she will apply the rotation vectors to
the subsequent packet. In particular, if the sender detects jamming signals when decoding
the feedback packet, he/she will apply rotation vectors, assuming the jammer will be active
for the subsequent transmission. Second, the feedback information should be received in a
timely fashion, because if the channel estimation expires, the rotation vector will no longer
be effective. Thus, the sender will count the feedback time to determine whether to apply
rotation vectors or not.
4.4.5 Defending Against Reactive Jamming In a Multi-hop Net-
work
In a multi-hop network with legitimate nodes and reactive jammers, every receiver in the
network performs IC-based defense mechanism when detecting jamming signals during the
packet reception period. With a reliable communication protocol such as TCP, the receiver
turns into a sender by sending back a feedback message to inform the original sender about
the reception status, after recovering the signals of interest. Because of the “quiet/stealthy”
nature of reactive jammer, the traditional media access control (MAC) protocol for wireless
networking can still be applied to avoid concurrent transmissions from multiple legitimate
senders.
Our defense mechanisms bring conspicuous opportunities to the multi-hop networking. In a
99
traditional multi-hop network with jammers, the link being jammed will be unable to transfer
information. However, by employing the proposed IC-based jamming resilient communica-
tion scheme, the jammed link is still capable of transmitting information, which introduces
new optimization problems associated with rate allocation, resource scheduling, and relay
selection in the presence of jammers, while under the protection of IC-based mechanisms.
We will investigate the cross-layer optimization of networking under jamming attacks in our
future work.
4.4.6 Dealing with Other Types of Jammers
We briefly discuss about the impacts of constant jammer and random jammer to our de-
fense mechanisms in this section. Constant jammer can cover all the packets including their
preambles, which will disable the initial channel estimation of our defense mechanisms. How-
ever, constant jamming is impractical due to its enormous energy consumption. Random
jammer randomly alternates between jamming and sleeping. We investigate the jammer’s
probability of covering preambles, and present the necessary modifications to the defense
mechanisms. First, let us assume both the jamming and sleeping periods are uniformly dis-
tributed within [0, 20]ms with an average of 10ms, thus the random jammer starts jamming
with a probability of 1/2. We further assume the preamble length is 0.1ms, and one burst
lasts for 100ms with 400ms inter-burst idle interval. Then, the probability of covering the
preamble of the first packet in the burst can be easily written by: 10/0.1(500−10)/0.1
· 12≈ 1%. One
can further reduce the probability by introducing a longer burst or burst interval, which
makes the preamble distortion a small probability event. Second, as the jamming detector
can identify the beginning and end of jamming attacks promptly, we can modify our de-
fense mechanisms to perform normal processing when the jammer is sleeping and conduct
IC within his/her jamming periods.
100
4.4.7 Discussion
Our defense mechanisms enable a reliable OFDM communication in the presence of powerful
single-antenna reactive jammer. Extending to a network with multiple jammers, the defense
mechanism should succeed in canceling jamming signals as long as different jammers operate
on different spectrum bands or transmit at different time slots, since the cancellation is
carried out for each OFDM subband at one time. In addition, our defense mechanism defeats
the multi-antenna jammers transmitting the same jamming signals over all the antennas,
because they can be regarded as single-antenna jammers with aggregated channel state
information. However, multi-antenna or multiple jammers sending multiple jamming streams
simultaneously are more destructive to the MIMO-OFDM communications, since they can
deplete the DoF of MIMO links. Currently, there is no available solution in the literature
to provide jamming resilient communication under multi-antenna jammers sending multiple
concurrent jamming streams. How to deal with such jammers will be considered in our future
work.
4.5 Implementation
We build a prototype using five USRP-N200 radio platforms [6] and GNURadio software
package. Each USRP board is equipped with one XCVR2450 daughterboard operating on
802.11 spectrums. The MIMO cable allows two USRP devices to share reference clock and
achieve time synchronization by letting the slave device acquire clock and time reference
from the master device. By connecting two USRP boards using MIMO cable to act as one
MIMO node, we build a 2 × 2 MIMO system using four USRP boards. Each MIMO node
runs 802.11-like PHY layer protocol using OFDM technology with 64 OFDM subcarriers.
The MIMO system works with various modulation types, while we use BPSK for legitimate
101
communications in our experiments. We configure each USRP to span 1MHz bandwidth
by setting both the interpolation rate and decimation rate to 100. MIMO IC technique
is implemented at the receiver to recover the signals of interest. We also implement the
enhanced defense mechanism by incorporating SSE.
The reactive jammer is another USRP device connected with XCVR 2450 daughterboard.
To defend against jamming attack, the receiver first estimates sender’s channel and jammer’s
channel ratio, then uses IC technique to eliminate the signals from the jammer. Meanwhile,
the receiver will compute the rotation vector and transmit it back to the sender for SSE.
After receiving the rotation vector, the sender checks whether it is still within the predefined
channel coherence time since its previous transmission. If it is, the sender will apply the
rotation vector to the newly generated symbols and send the rotated elements through two
antennas. We set the transmission power of both the sender and jammer as 100mW by
default.
Implementing a SDR-based reactive jammer is itself a non-trivial task [34, 115]. Here, we
emulate the reactive jamming attack and the jammer’s carrier sensing process by letting
the receiver broadcast a trigger signal. Both the jammer and sender record the timestamp
of detecting the trigger ttrig, then sender sets its beginning time of transmission as tsend =
ttrig + t∆1, and jammer sets its jamming start time as tjam = ttrig + t∆2. Then, the reactive
jammer’s reaction time is equivalent to (t∆2 − t∆1).
4.6 Evaluation
In this section, we demonstratively show the ability of jammer to disable MIMO IC mech-
anism, and we also evaluate the performance of our defense mechanisms in an indoor lab
environment. In our experiments, we first show how the received signal direction affects the
102
Carrier Frequency 2.4512GHzModulation Type BPSKTransmit Amplitude 1Transmit Gain 30dBReceive Gain 30dBOFDM FFT Length 64OFDM OccupiedTones
48
OFDM CP Length 64
Table 4.1: Default system setup
packet delivery performance. Then, we present our measured channel coherence time in the
indoor environment and discuss how it will affect the performance of our defense mecha-
nisms. The performance of jamming attack and defense mechanisms is evaluated using a
testbed under different bandwidth settings, different jamming powers and different types of
jamming signals. Finally, we provide an overhead analysis of our defense mechanisms. The
default communication system parameters are listed in Table 4.1.
4.6.1 Impact of Received Signal Direction
We argued in Section 4.3 that the angle between two received signal directions will affect the
decoding performance using IC. In this section, we will show the packet delivery performance
with respect to different angles. We set up two clients synchronized by a MIMO cable,
together with a two-antenna receiver. Then, two clients transmit different streams to the
receiver. The receiver applies IC technique to decode one of the streams by regarding the
other stream as interference from the jammer. We mentioned that the signal direction
is determined by the channels between the transmitter and the receiver in Section 4.2.3.
Although the channel evolves over time, we observe that the angle remains relatively stable
for the time being, given the fixed locations of clients and receiver. Then, we change the
locations of the clients and receiver to measure the packet delivery performance with different
angles between two received signals. We fix the distance between the clients and receiver, so
103
0 10 20 30 40 50 60 70 80 900
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Angle Between Two Clients in Degrees
Pac
ket D
eliv
ery
Rat
e
Figure 4.9: Packet delivery rate performance with different angles between two receivedsignals
that the performance variation among different cases is mainly induced by different angles,
rather than different path losses.
We show the performance measurement in Fig. 4.9, from which we can see the angle between
two received signals indeed affects the packet delivery performance significantly. The major
observation is that PDR declines below 20% once the angle becomes smaller than 20◦, while
PDR rises above 90% once the angle expands greater than 60◦. This result confirms our
analysis.
4.6.2 Impact of Channel Coherence Time
The channel coherence time determines how often the channel estimation should be updated
and the validity period of the rotation vector. In this section, we measure the channel
coherence time in an indoor environment.
We let a sender transmit consecutive known OFDM symbols following a preamble to track
the channel variations. The receiver uses these known OFDM symbols to estimate the chan-
nel coefficients, and examines how long the channel from the sender to the receiver remains
104
correlated. Each channel coefficient is a complex number with amplitude and phase values.
We investigate multiple subcarriers over several rounds. Fig. 4.10 shows the autocorrelation
of channel phase over multiple subcarriers. The channel phase correlates over multiple OFD-
M symbols before it becomes uncorrelated (i.e. autocorrelation value becomes zero [119]).
The number of correlated OFDM symbols varies with subcarriers, with the average number
of 33. On the other hand, the channel amplitude stays more stable over multiple OFDM
symbols, whose autocorrelation value shows correlation over 500 OFDM symbols. Therefore,
the channel coherence time in our experimental environment is nearly 33 OFDM symbols
or 8.5ms, which indicates that the channel estimation should be updated at least every 30
OFDM symbols, nearly 200 bytes under 500KHz bandwidth, or nearly 400 bytes under
1MHz bandwidth. Therefore, the pilots should be inserted at least once every 100 (200)
bytes of data under 500KHz (1MHz) bandwidth, because the estimation of the sender’s
and jammer’s channels is updated alternately every other pilot as shown in Section 4.4.2.
This result also tells us the rotation vector is effective within the 33 OFDM symbol time,
after which the rotation vector becomes expired.
Note that during jammer’s channel estimation in Section 4.4.2, we assume jammer’s channel
keeps static during the channel coherence time. However, mobile jammer has the ability of
changing his/her channel conditions in real-time. Referring back to Fig. 4.4, we notice 10cm
distance change will bring a dissimilar channel, i.e., if the jammer moves 10cm within the
channel coherence time, not only the jammer’s channel estimation will be inaccurate, but the
jammer can also vary his/her signal directions to nullify the channel tracking. However in this
case, the jammer should move at a speed of at least 10cm8ms
= 12.5m/s, or equivalently 45km/h,
making it extremely difficult to target at a specific MIMO link. Apparently, reducing the
pilot interval is a remedy to defeat a high-speed jammer. We will design experiments to
evaluate the IC performance under mobile jammers in our future work.
105
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of OFDM Symbols
Nor
mal
ized
Aut
ocor
rela
tion
Val
ue
1st Subcarrier Autocorrelation15−th Subcarrier Autocorrelation35−th Subcarrier Autocorrelation
Figure 4.10: Autocorrelation of the channel phase in an indoor environment (tested using500KHz bandwidth communications)
Case Number Location Set (Sender, Jammer)1 (1,2)2 (3,7)3 (default) (4,5)4 (6,8)5 (8,9)6 (5,9)7 (4,8)
Table 4.2: Testbed setup
4.6.3 Jamming Attack and Defense Performance
In this section, we evaluate the performance of the jamming attack and defense mechanisms
in terms of packet delivery rate. We place the receiver at location A in Fig. 4.11. In each
run, we place the sender and jammer at the selected locations in Fig. 4.11. We run the
experiments in seven different cases, as shown in Table 4.2. We repeat each case for more
than 10 times, with each run transmitting 5000 packets. The jamming signals are randomly
generated OFDM-modulated signals with similar configurations as in Table 4.1, but with
512 OFDM FFT length, 200 occupied tones and 128 CP length.
First, we present the jamming attack performance by jamming the 1 × 2 MIMO link in
Fig. 4.12, from which we can see that the PDR drops to zero in almost all seven cases in
the presence of the reactive jammer. This result shows the reactive jammer succeeds in
106
Figure 4.11: Testbed. The receiver is placed at A, while the sender and jammer are placedat the selected locations 1 to 9.
throttling MIMO-OFDM communications completely.
Then, we run another set of experiments to jam a 2 × 2 MIMO link. Fig. 4.13 plots the
sender’s PDR performance under different bandwidth settings. This figure also shows the
reactive jammer is very effective in degrading packet delivery performance of the MIMO links,
as none of the packets is successfully delivered to the receiver using the traditional MIMO
decoding scheme. In contrast, using our defense mechanism with IC technique, the jamming
signals can be eliminated to some extent by estimating jammer channel ratio. Therefore, the
PDR under 500KHz bandwidth can stay higher than 30%, while exact PDR value depends
on the channel estimation accuracy and the relative angles between the received signals from
the jammer and sender. We notice that the achieved performance shows great variations
across difference cases.
Finally, the PDR performance can be further improved using SSE. Both Fig. 4.13(a) and
Fig. 4.13(b) reveal that the packet delivery performance using enhanced defense mechanism
after applying SSE has been significantly improved and becomes more stable. In particular,
the jamming resilient communications achieve more than 60% PDR under 500KHz band-
width and more than 40% PDR under 1M bandwidth. Thus, we conclude that SSE can help
107
Figure 4.12: Packet delivery rate with and without jammer in 1× 2 link
sustain more robust OFDM communications. From Fig. 4.13(a) to Fig. 4.13(b), we note a
trend that the packet delivery performance becomes worse as the transmission bandwidth
expands. That is because higher data rate transmission is more sensitive to the burst of
interference and noise in the environment [120].
Different Jamming Signal Powers. Different jamming signal powers affect the jamming
attack and defense performance significantly. High power jamming signals will decrease SJR,
making it more difficult to cancel them out. We evaluate the PDR performance of 2 × 2
MIMO link under reactive jamming attacks with different jamming powers. We change the
jamming power by adjusting the jammer’s transmit amplitude from 0 to 1, corresponding
to the range of jamming power from 0 to 100mW. The sender’s transmit amplitude is set
as 0.5, and we place the sender and jammer according to case 3. Both Fig. 4.14(a) and
Fig. 4.14(b) show the PDR drops drastically with the increase of jamming power. Although
high power jamming signals drag down the PDR performance using IC and SSE techniques,
it is noticeable that the communication system using our defense mechanisms becomes more
robust against high power jamming attacks. Even with the jamming power that is nearly two
times of sender’s power (i.e., with transmit amplitude of 1), the enhanced defense mechanism
108
(a) 500KHz Bandwidth (b) 1M Bandwidth
Figure 4.13: Jamming attack and defense performance
with IC and SSE still achieves more than 50% PDR under 500KHz bandwidth (40% PDR
under 1MHz bandwidth). In the experiment, we find that our proposed defense mechanisms
are robust against different power levels of the jammers.
Different Types of Jamming Signals. We also evaluate the PDR performance using four
types of jamming signals: constant power signal, Gaussian noise, square signal, 100KHz sine
signal. These signals are configured to have 0.5 transmit amplitude, 30dB transmit gain,
2.4512GHz RF frequency. Fig. 4.15(a) shows the PDR performance using IC technique to
defend against different types of jamming signals. We vary the jammer’s transmit power,
and the results illustrate the effectiveness of our defense mechanism under various types
of jamming signals. Comparing between different types of jamming signals, we find that
Gaussian noise and sine signal lower down the PDR performance of our defense mechanism.
This is because the constant power signals and square signals are much easier to cancel out
compared with Gaussian noise and sine signals. Fig. 4.15(b) plots the PDR performance
using our enhanced defense mechanism with IC and SSE, which demonstrates the benefits
brought by SSE technique. Our enhanced mechanism achieves a improved PDR performance
109
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transmit Amplitude
Pac
ket D
eliv
ery
Rat
e
Without DefenseDefend Solely with ICDefend with IC and SSE
(a) Varying jammer’s transmit amplitude under500KHz Bandwidth
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transmit Amplitude
Pac
ket D
eliv
ery
Rat
e
Without DefenseDefend Solely with ICDefend with IC and SSE
(b) Varying jammer’s transmit amplitude under1M Bandwidth
Figure 4.14: PDR performance with different jamming powers
compared with Fig. 4.15(a) with IC technique under all four types of jamming signals. This
result proves the robustness and wide applicability of our defense mechanisms to defend
against various types of jamming signals.
Throughput Performance. Finally, We further evaluate the throughput performance of
the proposed jamming resilient communication mechanism under reactive jammers. Fig. 4.16(a)
and Fig. 4.16(b) show that our enhanced defense mechanism achieves 140 Kbps under
500KHz bandwidth and 220 Kbps under 1MHz bandwidth. Without jammers, the max-
imal achievable throughput under 500KHz is 187.5 Kbps, while the achievable throughput
under 1MHz is 375 Kbps. It is worth mentioning that the reative jammer causes a non-
connectivity scenario without defense mechanisms as shown in Fig. 4.13(a) and Fig. 4.13(b).
Therefore, considering the powerfulness and effectiveness of reactive jammers, the through-
put achieved using our defense mechanisms is very promising. In conclusion, our defense
mechanisms revive acceptable data rate communications under powerful reactive jamming
attacks.
110
(a) Using IC (b) Using IC and SSE
Figure 4.15: PDR performance with different types of jamming signals
(a) 500K Bandwidth (b) 1M Bandwidth
Figure 4.16: Throughput performance using our defense mechanisms
111
4.6.4 Overhead Analysis
We analyze the overhead for both the pilots and feedback information. In Section 4.6.2,
we measured that the channel coherence time (w.r.t. channel phase) in our experimental
environment is nearly 33 OFDM symbols or 8.5ms using 500KHz bandwidth, which indicates
that the channel estimation should be updated at least every 30 OFDM symbols. Thus, one
pilot symbol should be inserted every 15 OFDM data symbols, which takes nearly 6% of the
whole packet. On the other hand, the feedback message includes 48 rotation vectors with one
for each subcarrier in our setting. In order to reduce the feedback size, instead of returning
all the 48 vectors, it is sufficient to respond 12 vectors, since the channels for consecutive
subcarriers are rather similar. In addition, as the direction of vector [v1, v2] is equivalent to
[1, v2v1
], we can reduce the number of elements in a vector into one complex number. The
overall feedback overhead adds up to 24 bytes, or 4 OFDM symbols. Therefore, the feedback
information is also very short with only a few OFDM symbols.
4.7 Summary
OFDM is one of the most widely adopted wireless communication schemes. Despite its
popularity in the wireless field, it is vulnerable to advanced jamming attacks, especially
the powerful reactive jamming attack enabled by SDR technology. While no effective anti-
jamming solutions exist to secure OFDM communications, for the first time, we exploited
MIMO technologies to defend against such jamming attacks. We showed that such attacks
can severely disrupt MIMO-OFDM communications through controlling the jamming signal
vectors in the antenna-spatial domain. Accordingly, we proposed defense mechanisms based
on interference cancellation and transmit precoding techniques to maintain OFDM commu-
nications under reactive jamming. Our prototype experimental results demonstrated that,
112
while the MIMO-OFDM communication can be completely throttled by jamming attacks,
our defense mechanisms can effectively turn it into an operational scenario with considerable
throughput.
Chapter 5
Unveiling Peer-to-Peer Botnetsthrough Dynamic Group BehaviorAnalysis
Advanced botnets adopt peer-to-peer (P2P) infrastructure for more resilient command and
control (C&C) without relying on a central server. Traditional signature, sandboxing, and
blacklist based detection techniques become less effective in identifying bots that communi-
cate via a P2P structure. In this chapter, we present PeerClean, a novel system that detects
P2P botnets in real time using only high-level features extracted from C&C network flow
traffic. PeerClean reliably distinguishes P2P bot-infected hosts from legitimate P2P hosts
by jointly considering flow-level traffic statistics and network connection patterns. Instead
of working on individual connection or host, PeerClean clusters hosts with similar flow traf-
fic statistics into groups. It then extracts the collective and dynamic connection pattern
of each group by leveraging a novel dynamic group behavior analysis. Comparing to the
individual host-level connection patterns, the collective group patterns are more robust and
differentiable. Multi-class classification models are then used to identify different types of
bots based on the established patterns. To increase the detection probability, we further
propose to train the model with average group behavior, but to explore the extreme group
113
114
behavior for the detection. We develop a prototype system, and evaluate it on real-world
flow records from a campus network with IPv4 space of size /16. Our evaluation shows that
PeerClean is able to achieve high detection rates with few false positives in identifying P2P
botnets.
5.1 Related Work
The increasing popularity of P2P botnets have led to a vast amount of research that attempt
to track and remove them. In those work, the detection mechanisms can be classified into
two categories: host-based approaches and network-based approaches. The second category
can be subdivided into network traffic-based approaches and communication graph-based
approaches.
Network traffic-based approaches: Some related work utilized attack traffic character-
istics to identify hosts with similar abnormal network behaviors, such as spamming, port
scanning, sharing the same packet contents [121], or, having common destinations, similar
payloads and common host platforms [122]. However, these approaches can be evaded by
manipulating attacking strategies, such as using social engineering as an infection vector
instead of scanning.
Several work focused on identifying C&C traffic from the botnets. Bilge et al. [53] proposed
to use NetFlow analysis to distinguish botnet C&C servers from benign servers by extract-
ing flow-level features from the data. Wurzinger et al. [123] identify C&C by automatically
extracting signatures from bot responses after receiving commands. However, this approach
is hindered by traffic encryption. Moreover, the above approaches, which use only flow-level
statistics, are not robust enough to produce accurate detection results. Instead, PeerClean
greatly enhances the detection capability by jointly considering the flow-level traffic statis-
115
tics and network connection behaviors. Recently, Zhang el al. [124] proposed to pinpoint
stealthy malicious activities using triggering relation discover of network events. They used
the proposed mechanism to detect DNS-based botnets. However, the performance of differ-
entiating P2P C&C traffic and benign P2P traffic using triggering relation discover remains
to be seen.
Communication graph-based approaches: In [125], Coskun et al. proposed to identify
the local members of P2P bots using mutual contacts graph. However, this method requires
to start with a captured seed bot in the network, which may not be available. [126] attempted
to identify spamming bots using large scale graph analysis by looking for tightly connected
subgraph components. Jelasity et al. [127] argued that it is difficult to detect P2P bots
using traffic dispersion graph (TDG) especially with a limited view of the Internet traffic at a
single AS. Most recently, Li et al. [128] proposed to detect P2P community by identifying the
densely connected subgraphs. However, this approach only focused on a backbone network
which requires a very large communication graph. Also, solely relying on the connection
patterns, it may falsely include lots of benign hosts in the discovered P2P botnets.
P2P botnet resilience: Several related work studied the resilience of P2P botnets. [129]
provided a overview of bot structures, which shows the P2P bots with random graph networks
are highly resilient to both random and targeted defense mechanisms. Most recently, [54]
investigated an important characteristic of botnets: assortativity, and showed its impacts to
the network resilience and recovery. [44] provided formal models to systemize the attacking
strategies against P2P botnets. However, these takedown attempts require reverse engi-
neering of the bot binaries to crawl or sinkhole the whole botnets, which prevents the wide
applicability of these approaches.
116
Figure 5.1: PeerClean system flow
5.2 Overview of PeerClean
Our primary goal is to design a detection system for the network administrators to identify
P2P bots in a monitored network. Toward that goal, we present our data-driven detection
framework, PeerClean, which exploits network flow data captured at the edge of the network.
In this section, we give a brief overview of the PeerClean system.
Figure 5.1 shows the system flow of PeerClean. The upper part of the figure describes
the training process, with inputs from two labeled data sets: one is a subset of monitored
NetFlow data that is from the labeled legitimate P2P hosts, and the other one contains
the data from the labeled P2P bots (we discuss the acquisition of training data in Section
5.5.1). For each type of legitimate P2P hosts and P2P bots, PeerClean then performs
DGBA training to extract a collection of group-level connection features aggregated from
all the hosts of this specific type, and trains a SVM classification model using the extracted
group-level features. The bottom part of the figure presents the detection process with input
of monitored NetFlow data. After identifying the P2P hosts in the network, PeerClean
carries out P2P host clustering using the statistical features of their traffic flows, and applies
DGBA detection to every cluster of interest with the goal of detecting clusters containing
bots. Finally, the refined bot identification picks out the bots from the clusters for further
processing. PeerClean can be regarded as a three-layer system, with the first-layer modules
processing every host, the second-layer modules operating on the host group/cluster, while
the third-layer modules further handling the identified bot clusters.
117
Input Data: The input data set consists of a training data set and a testing data set of
NetFlow format. The testing NetFlow data can be one hour (or one day) flow traffic traces
captured at the gateway router of a campus (or enterprise, ISP) network, while the training
data set is constructed by the traffic from identified P2P bots and legitimate P2P hosts.
Specifically, the training data of P2P bots can be imported from honeynets running in the
wild, and the training data of legitimate P2P hosts can be derived via identifying legitimate
P2P traffic in the captured payload-available traffic traces using a signature matching method
[130].
P2P Host Identification: The high-speed networks generate a huge amount of NetFlow
data, which would potentially overwhelm the processing capability of our detection system.
Thus, the first step of PeerClean is to reduce the traffic volume by filtering out the hosts that
are unlikely to be related to P2P communications. Our approach is based on the observation
that the hosts engaging in P2P communications exhibit high failed connection rates mainly
caused by the high peer churn rate [131]. Therefore, we compute the percentage of failed
connections inside each time epoch (e.g. 1 hour). The hosts with failed connection rate
higher than an empirical threshold are selected as candidate P2P hosts [46]. This selection
process allows us to retain hosts engaging in P2P communications, while eliminating a vast
majority of non-P2P hosts.
Flow Statistical Feature Extraction: Having eliminated most of the traffic from non-
P2P hosts, we extract a collection of flow statistical features from the network flows of each
candidate P2P host. We propose two sets of features, including flow size statistical features
(e.g., the average number of bytes per flow transmitted from each host) and host access
pattern features (e.g., the outgoing/incoming flow interarrival time from each host). We will
explore the motivation behind these features in Section 5.3.1.
P2P Host Clustering: The objective of host clustering is to group together the hosts with
118
similar flow patterns. Since P2P bots from the same botnet share the same P2P network and
communication protocol, their flow features are likely to be clustered together in the feature
space. Clustering techniques aim at finding meaningful clusters that are both compact and
well separated from each other. PeerClean applies a clustering algorithm to generate the
clusters of hosts solely based on their flow statistical features. Ideally, we will gather the
bots belonging to different types of botnets into separated and compact clusters without
including any legitimate hosts, which is often not the case in reality.
Dynamic Group Behavior Analysis: DGBA investigates the aggregated connection be-
haviors of the entire cluster of hosts. The group-level connection features include cluster
connectivity feature (i.e., how the cluster connects to the outside world), shared neighbor
feature (i.e., the shared contacts of the hosts inside the cluster), significant connection feature
(i.e., the connections contributing a significant amount of network traffic), and temporal fea-
ture (i.e, how the connection behavior evolves over time), the details of which are presented
in Section 5.3.3. We divide DGBA into two phases for training and detection respectively.
DGBA training extracts average group-level connection features from the group of P2P bots
or legitimate P2P hosts of each type, while DGBA detection examines extreme group-level
connection features of every unlabeled cluster in order to identify clusters containing P2P
bots more accurately.
Training and Classification: As network traffic pattern is often distorted by noise, Peer-
Clean trains a non-linear SVM classifier due to its robustness against noisy data, using the
group-level connection features from DGBA. After the classifier is constructed with the train-
ing data set, PeerClean applies this classifier for classifying the unlabeled clusters generated
from the testing data set. Hopefully, each cluster containing a certain type of P2P bots will
be assigned a label corresponding to that specific bot type.
Refined Bot Identification: Finally, for each cluster classified as a cluster with bots, we
119
further identify the P2P bots inside the cluster based on their individual connection behav-
iors. We leverage a simple threshold-based approach to discriminate P2P bots from falsely
included benign users in the cluster. The identified P2P bots are then labeled and confirmed
as “infected”, calling for subsequent bot cleanup actions from the network administrators.
Detection Period: Since bot memberships are dynamically changing with some bots been
cleaned up and others been newly infected, we propose to perform bot detection periodical-
ly. PeerClean supports various configurations of detection periods, as long as bots generate
enough network flows with representable flow and connection features during that period.
In this chapter, we select one hour as the detection period in response to agile bot infec-
tions. Specifically, PeerClean produces one SVM model for each hour of the day. Then, by
examining each hour of testing traces collected from the edge routers, PeerClean identifies
specific types of bots existed in the network within that hour. In this manner, PeerClean
enables real time bot detection which supports a fast response to the bot infections (i.e. one
hour response time in this chapter).
Studied Botnets: We give a brief introduction to three botnets studied in this dissertation:
Sality, Kelihos and ZeroAccess Botnets. All these three botnets use P2P channel as their
primary communication channel. Sality botnet uses an unstructured P2P network, where
peers regularly contact their neighbors to exchange new URLs for downloading malwares.
Kelihos botnet is an unstructured P2P network with a tiered infrastructure, which includes
an upper layer of centralized nodes providing commands to a middle layer of router nodes.
Nodes at the router layer are responsible for relaying messages to a lower network layer
consisting of regular P2P worker nodes. The worker nodes are hosts that do not have a
direct connection to the internet, who are used for carrying out malicious activities such
as sending spam, collecting email addresses, sniffing user credentials. ZeroAccess botnet is
another unstructured P2P botnet which is used to download new malware payloads and to
120
form a network mostly involved in Bitcoin mining and click fraud, while remaining hidden
in the infected hosts using rootkit techniques.
5.3 System Design
PeerClean systematically incorporates two categories of features including flow statistical
features and network connection features. The effectiveness of PeerClean largely hinges
upon the discriminative ability of the selected features to set apart various P2P bots and
legitimate P2P hosts. In this section, we demonstrate the rationale behind the feature
selection, and look into the strength and weakness of the selected features. Meanwhile, two
machine learning techniques performing clustering and classification are described, which are
used to gather, identify and subsequently label the P2P bots.
5.3.1 Flow Statistical Features
The performance of host clustering relies on a set of carefully selected network flow fea-
tures. A common criticism of early attempts using machine learning methods over network
flow data is that the selected features were often not robust, resulting in an overfit model
to some specific features of the training set, such as a particular port or IP address used
by a bot. Dedicated bots can simply adapt their used ports and IP addresses to inval-
idate such flow analysis. To avoid the overfitting issue, we select flow features that are
both robust and distinctive among the botnets, including flow size statistical features and
host access pattern features. As some P2P botnets mostly use TCP protocol for C&C (e.g.
Kelihos), while others simply carry out UDP transmissions, we divide the traces into TCP
and UDP flow segments, and examine their flow statistical features separately. To yield cred-
121
ible statistical features, we only consider the hosts with 100+ TCP/UDP outgoing flows and
100+ incoming flows during the one-hour detection period E. Note that, at this stage, we on-
ly extract the flow features of candidate P2P hosts who survived the P2P host identification
process.
Flow Size Statistical Features: Flow size statistical features capture the flow size dis-
tribution for both outgoing flows and incoming flows at a specific host. Let F(out)i =
{f (out)j }j=1..m and F
(in)i = {f (in)
j }j=1..n denote the series of flows sent from or received by
host i inside E. We consider the basic flow size related features such as: bytes-per-flow (bpf)
feature and packets-per-flow (ppf) feature, as shown in Table 5.1. Note that each feature
records the distribution of the flow sizes among all the outgoing (incoming) flows at the cor-
responding host. In particular, we extract the mean µF
(out)i
, µF
(in)i
and the standard deviation
σF
(out)i
, σF
(in)i
of bpf and ppf from both the outgoing and incoming flows respectively.
This group of features characterizes the regularity of traffic flow size over time for each host.
The reason for selecting flow size features is because the flows carrying C&C information are
preferred by the botmaster to be as short as possible to remain stealthy under the surveillance
of various network monitoring tools. Moreover, due to the limited types of C&C messages,
only a small fluctuation on these features is expected. On the other hand, legitimate P2P
applications usually generate flows with large, yet highly variable flow sizes, with a few
exceptions such as skype application, which has a small flow size when used as instant
messenger or voice-over-IP client. Therefore, flow size statistical features are promising in
differentiating bots from legitimate hosts, but it alone may not be enough to create dense
bot clusters without including a large number of legitimate hosts such as the skype hosts.
Host Access Pattern Features: We introduce host access pattern features to capture the
flow arrival patterns. Table 5.2 lists the adopted features, including flow interarrival pattern,
flow density pattern and diurnal pattern. Assume T(out)i (T
(in)i ) is a time series of the starting
122
Feature Descriptions
Bytes-per-flowpattern
The distribution of thenumber of bytes per flowsent from (received by) ahost
Packets-per-flowpattern
The distribution of thenumber of packets per flowsent from (received by) ahost
Table 5.1: Flow size statistical features
time of outgoing (incoming) flows from host i inside E, based on which we can compute a
sequence of flow interarrival time I(out)i (I
(in)i ) by taking the difference of the starting time of
two consecutive flows. Flow interarrival pattern feature represents the statistical features of
flow interarrival time sequences, including the minimum, maximum, median and standard
deviation.
Different from all the aforementioned features which are extracted inside each detection
period E, the last two types of features, flow density pattern and diurnal pattern, are de-
termined anew every day. We define a time unit as a three-hour period with one whole
day been divided into 8 time units. We assume the flow amounts of each time unit during
the day pertaining to a certain host as Nj, j = 1, 2, · · · , 8. Flow density pattern records the
fraction of time units having ≥ x flows per day, i.e.,∑8j=1 σ(Nj≥x)
8, where σ() is a step function
yielding one when Nj ≥ x satisfies, and zero otherwise. In our prototype, x is empirically
set as 1000. In addition, to assess whether the flow arrival displays a diurnal pattern, we
take two percentages regarding to the flow amounts during the peak period or dip period1
as diurnal pattern features, i.e., NP∑8j=1Nj
and ND∑8j=1Nj
, where NP and ND denote the flow
numbers at the peak period and dip period respectively. These two types of features are
inserted as additional features at the final hour of the day, further elevating the possibility
1The peak time is expressed as P = arg maxjNj with flow amount NP = maxjNj , and the dip time asD = arg minjNj with flow amount ND = minjNj , j = 1, · · · , 8.
123
Feature Descriptions
Flow interarrivalpattern
The distribution of the in-coming (outgoing) flow in-terarrival time at a host
Flow density pat-tern
The fraction of the time u-nits with more than x flowsat a host
Diurnal pattern The percentage of the flownumbers in the peak (dip)period of the day at a host
Table 5.2: Host access pattern features
of differentiating various P2P bots from legitimate P2P hosts.
5.3.2 P2P Host Clustering
The basis of host clustering relies on the following observation: bots that belong to the same
botnet run the same P2P communication protocol and share the same C&C messages. In
the literature, there are a wide variety of clustering methods, but a well-suited algorithm for
PeerClean should be cautiously selected, because: first, the clustering algorithm for gathering
hosts with similar flow patterns not only determine the subsequent bot detection, but also
affect the system efficiency; second, P2P host clustering in a network with bots and benign
hosts is a challenging task, since the percentage of bots in the network is generally small
compared with the benign hosts. Thus, the clustering objective is to separate the small
number of P2P bots from a large number of benign P2P hosts. In this respect, partition-
based clustering methods suit our problem well [132].
Affinity Propagation (AP) is a recently proposed partition-based clustering method by Frey
and Dueck [133]. Compared with K-means, one of the most popular clustering methods
[134], the performance of AP does not rely on an initial selection of exemplars2 or cluster
2Exemplar represents for the cluster center that best accounts for the data in the cluster [133].
124
centers. Rather than specifying the number of clusters, AP can automatically determine
it solely based on the data. Whereas K-means clustering follows a greedy heuristics to
find the optimum of a combinatorial optimization problem, which is prone to local minima,
AP considers all data points as potential exemplars and tackles the optimization problem
distributively by exchanging messages between pairs of points until the clusters gradually
emerge. Thus, AP provides a guarantee of quasi global optimization [133].
The similarity s(i, k) of AP indicates how well data point xk is suited to be the exemplar
of data point xi. With the goal of minimizing squared error, we use negative squared error
(Euclidean distance) as the similarity measure, i.e., s(i, k) = −‖xi−xk‖2 [133] (the objective
turns into maximizing s(i, k)). Since unsupervised learning is a notoriously difficult task, it
seems impossible to achieve a perfect clustering result. In consequence, besides of several
clearly separated bot clusters (i.e. clusters of bots) and benign clusters (i.e. clusters of benign
hosts), we expect some clusters to include both benign hosts and bots, which we call mixed
clusters. For ease of exposition, the bot clusters and mixed clusters are collectively called
bot-included clusters. In the following section, we will show how we use supervised learning
to identify and further examine bot-included clusters, as well as the method of spotting bots
inside them.
5.3.3 Dynamic Group Behavior Analysis
In this section, we introduce DGBA with the objective of identifying bot-included clusters.
DGBA is based on our intuition that the bot-included clusters have cluster-level aggregated
characteristics that are distinguishable from benign clusters. Whereas the connection activity
of an individual host is highly dynamic and unidentifiable, we believe the group connection
behavior will help us identify bots’ communications.
125
Figure 5.2: Cluster connectivity feature
To enhance the detection capability, we propose two modules: DGBA training and DGBA
detection, to extract features from the training set and testing set respectively. The purpose
of DGBA training is to extract the representative group behavior from a collection of labeled
P2P hosts to build SVM classifiers, whereas DGBA detection searches for the abnormal
behaviors from every unlabeled cluster to spot P2P bots. Thus, we propose to use different
statistical aspects of the collected features from a group to represent group-level training
and detection features, respectively. Namely, the training features capture the average group
behavior, while the detection features capture the extreme group behavior (i.e. the maximum
or the minimum). Note that all the features below are extracted from the collection of traces
inside each detection period if not otherwise stated.
Cluster Connectivity Feature
Cluster connectivity feature captures the aggregated connectivity of the peers inside each
cluster. A connection between two hosts can be successful or failed. We define a good
connection as a successfully established connection between two hosts, one from the cluster
and one from its outside. We consider a TCP connection as good if it takes a complete
SYN, SYN/ACK, ACK handshake flags, and also a UDP connection as good if at least one
126
response packet is followed by a request packet. We denote the good connection set of host
hi as Ci which includes all the successful connections of host hi.
Training feature: Cluster connectivity feature for DGBA training is defined as the aver-
age number of good connections among all the P2P hosts of each type, i.e.,∑M
i=1 |Ci|/M ,
assuming M hosts of one specific type exist in the training set.
In order to see the discriminatory strength of this feature, we run an experiment using 24-
hour training data (refer to Section 5.5.1 for the data sets used in the experiment) to show the
cluster connectivity features of different P2P bots and legitimate hosts running various P2P
applications. The box-plot results (measured in 24 hours) are shown in Fig. 5.2, from which
we notice different types of P2P hosts indeed exhibit varied cluster connectivity features. In
particular, ZeroAccess bot stands out with a significantly larger amount of good connections.
We attribute the difference to several factors including: (1) the botnet network size; (2) the
botnet peer discovery mechanisms. For instance, the bots in a populous network with a
more aggressive peer discovery mechanism are supposed to have more network connections.
Among all six types of P2P hosts, ZeroAccess bot makes much more good connections than
any other bots and benign hosts, suggesting its larger network size and aggressive peer
discovery [44].
Detection feature: Cluster connectivity feature for DGBA detection is defined as the max-
imal number of good connections among all the hosts in the unlabeled cluster, i.e., maxM′
i=1|Ci|,
assuming M ′ hosts in the cluster. Fig. 5.2 shows a notable gap between ZeroAccess bots and
other types of hosts, which means ZeroAccess bots can be detected solely based on the clus-
ter connectivity feature. Specifically, since the detection feature corresponds to a maximum
connection number among M ′ hosts, the connectivity feature of a cluster containing ZeroAc-
cess bots will be exceptionally high, which immediately reveals the presence of ZeroAccess
bots.
127
(a) (b)
Figure 5.3: (a): The shared neighbor ratio of one Emule host pair compares with that of oneZeroAccess host pair (b): Group shared neighbor ratio
Shared Neighbor Feature
Shared neighbor feature captures the amount of shared connections between every pair of
hosts in each cluster. The set of shared neighbors of host hi and hj can be written as: Ci⋂Cj.
We further define pairwise shared neighbor ratio of a host pair as the ratio between the
number of shared neighbors and the number of total neighbors, i.e., sij = (Ci⋂Cj)/(Ci
⋃Cj)
for a host pair (hi, hj).
Training feature: Given the above definitions, shared neighbor feature for DGBA training
is represented by cluster shared neighbor ratio, simply defined as the average pairwise shared
neighbor ratio among all the host pairs of one type, i.e.,∑
i,j∈[1,M ],i 6=j{sij/M(M−1)
2}. Previous
work has adopted pairwise shared neighbor ratio sij [47] to distinguish between bots and
benign hosts. However, according to our experiment, pairwise shared neighbor ratio seems
ineffective in identifying certain pairs of P2P bots. In Fig. 5.3(a), we compare the pairwise
shared neighbor ratio of an emule host pair (who download the same file) with that of a
ZeroAccess bot pair. We find it almost impossible to make a distinction between these two
pairs, which brings false positives or false negatives. In contrast, group shared neighbor ratio
128
clearly differentiates ZeroAccess bots from the emule hosts with a large gap between them,
via feature aggregation from multiple hosts.
In addition, Fig. 5.3(b) shows different types of P2P bots and P2P hosts exhibit distin-
guishable shared neighbor features measured in 24 hours, where we observe P2P bots have
much higher group shared neighbor ratios compared with legitimate P2P hosts. The rea-
son is obvious - the bots from the same botnet search for the same commands published
by the botmaster [47], which makes their contacted peers more likely to be shared by oth-
er companions. Furthermore, although P2P botnets have decentralized C&C architecture,
botmasters still strive to make their P2P network robust against peer churns and provide
end-to-end communication with a minimum delay. This inherent C&C objective translates
into a convergence of contacted peers by a group of bots to ensure the reliable delivery
of C&C messages. On the other hand, different legitimate P2P hosts generally search for
different contents from their peers, which yields a more dispersed peer list.
Detection feature: Correspondingly, shared neighbor feature for DGBA detection is de-
fined as the maximal pairwise shared neighbor ratio among all the host pairs in each cluster,
i.e., maxi,j∈[1,M ′],i 6=jsij. The shared neighbor feature of every bot-included cluster will again
be dominated by the bots, since bots have significantly higher shared neighbor ratios, which
helps differentiate between the bot-included clusters and benign clusters.
Significant Connection Feature
Significant connection (SC) feature captures the amount of hot links in the network, i.e.,
the connections that contribute significantly larger amounts of network flows compared with
the other connections. The SCs extracted from the Internet traffic data have been used to
diagnose the network operation and quickly identify the anomalous events [135]. Similarly, we
129
try to identify SCs of bot groups for better understanding the bots’ behaviors and accurately
identifying bots’ presence.
Background of Entropy-based Approach: In the information theory, entropy quantifies
the amount of uncertainty contained in the data. Given a random variable R that may take
nR discrete values, suppose we sample R for n times, we observe R takes the value of ri for
ni times, which will produce a probability distribution for R, i.e., P (ri) = ni/n, ri ∈ R. The
entropy of R is defined as:
H(R) := −∑ri∈R
P (ri)logP (ri),
where H(R) is a function of support size nR and sample size n. To eliminate the dependency
on support and sample sizes, standardized entropy is proposed to provide a robust quantifier
for data variety or uniformity [135]:
Hs(R) :=H(R)
log min{nR, n},
where Hs(R) = 0 means that all the observations of R have the same value, thus no data
variety exists, while Hs(R) = 1 means all the observed values of R are different. Note that
in this chapter, n� nR, such that Hs(R) = H(R)/log n.
Let C be the set of all the observed values on R, then Hs(R) = 1 satisfies if and only if |C| = n
and P (ri) = 1/n for ri ∈ C, which means the observed values are uniformly distributed over
C. Thus, Hs(R) provides a measure for the data uniformity in the observed value set
C. Similar to conditional entropy, conditional standardized entropy Hs(R|C) is derived by
conditioning R based on C. As H(R|C) = H(R), we have Hs(R|C) = H(R)/log|C|. Hence,
when Hs(R|C) getting close to 1, the observed values are more uniformly distributed over
the set C, or less distinguishable from one another. On the other hand, Hs(R|C)� 1 depicts
130
that some values are more frequently observed. This measure is used to define significant
connections as follows.
Significant Connection Feature Extraction: Significant connection feature is quantified
by the conditional standardized entropy, and extracted from the traffic flows of every host.
Suppose a random variable R denotes the connections made by a host in the cluster. Given a
detection period E, let N be the total number of flows observed from/towards the host, and
C = {c1, c2, . . . , cm}, m ≥ 2 be the set of distinct connections in R that the observed flows
take. Then the induced probability distribution PC on R is given by PC(ci) = Ni/N , where
Ni is the number of flows from the connection ci. Based on PC , the conditional standardized
entropy, Hs(PC) := Hs(R|C), measures the degree of uniformity in the observed features C.
In particular, with Hs(PC) getting close to 1, the observed values in C are close to being
uniformly distributed, or indistinguishable from one another. Otherwise, some values in C
“jump out” from the rest.
To extract significant connections, we employ a dynamic threshold algorithm [135] to separate
C into two sets: the significant connection set Cs and the indistinguishable connection set
Ci. In this algorithm, Cs becomes the most significant connection set under two conditions:
(1). if Cs is the smallest subset of C such that the probability of any value in Cs is larger
than that of the remaining values; (2). if the values in Ci are nearly uniformly distributed,
i.e., Hs(PCi) > λ, where λ is the significance threshold that is close to one (e.g., 0.9).
We first sort the connections in C in a decreasing order of their probabilities PC as C =
{c1, c2, . . . , cm}. Then, the dynamic threshold algorithm will find Cs = {c1, c2, . . . , ck}, Ci =
{ck+1, . . . , cm}, where k is the smallest integer such that Hs(PCi) > λ. Thus, c∗ = ck is called
the cut-off threshold such that the probability distribution of the remaining connections
almost becomes uniformly distributed. Curious readers please refer to [135] for a complete
algorithm design.
131
(a) (b)
Figure 5.4: (a): Significant connection feature (b): Significant connection feature of Kelihosand ZeroAccess bots in 24 hours
Finally, the significant connection feature is simply the number of connections in Cs, i.e.
|Cs|.
Training feature: We define the SC feature for DGBA training as the average number of
SCs for all the hosts of one type. Fig. 5.4 shows the SC features of Sality bots and three
other types of P2P hosts measured in 24 hours. Compared with Sality bots, these legitimate
P2P hosts produce a larger number of SCs.
We believe this distinctive phenomenon is attributed to the following fact: the SCs in a
botnet indicate the existence of some active bots that are pivotal to the P2P infrastructure.
These active bots may be well connected with a high bandwidth connection, or may be close
to the botmaster. Few number of distinctive connections favors the botnets in remaining
stealthy under the radar of numerous intrusion detection systems. Conversely, benign P2P
hosts yield a much higher number of SCs due to their unorganized nature.
Interestingly, the traffic flows from ZeroAccess and Kelihos bots reveal unique SC patterns
as shown in Fig. 5.4(b). ZeroAccess bots simply have none SCs, while Kelihos bots suddenly
132
Figure 5.5: Significant connection volatility
generate a large amount of SCs from a “hot” period between 7pm to 1am. This period
can be interpreted as a peak period of C&C message exchanging, with so many suddenly
emerging hot links. The study of this abrupt phenomenon and the exact origin of SCs of
botnets are out of scope of this chapter, but may become research topics on their own rights.
Detection feature: Among all the hosts in the cluster, SC feature for DGBA detection is
defined as the minimal number of SCs, or the maximal number if it exceeds an empirical
threshold α. Thus in most cases, the SC feature of bot-included cluster will be dominated by
the bots with less SCs. However, the number of SCs of Kelihos bots skyrockets during the
“hot” period, which far exceeds that of the normal hosts. Hence, by setting an appropriate
threshold α (e.g. 200), we guarantee the group-level feature of the cluster containing Kelihos
bots will be ruled by the bots’ behavior, which reveals the existence of Kelihos bots.
Temporal Feature
Lastly, temporal feature captures the dynamic evolvement of SC sets. Instead of performing
feature extraction per one-hour detection period, temporal features are computed at the end
of each day to combat noise and disturbance, which are represented by significant connection
133
volatility, measuring whether the cluster has the same set of SCs over time. We assume the
number of distinct SCs for host hi over the whole day is Ui, and the number of SCs during
k-th hour is Sik, i = {1, . . . ,M}, k = {1, . . . , 24}. SC volatility of host i is defined as:
Φi = Ui∑24k=1 Sik
. Obviously, if the SC sets of the 24 hours are all different, we have Φi = 1. On
the contrary, when the same set of SCs appears every hour, we have Φi = 1/24. In general,
the less volatile the set of SCs is, the closer Φi is toward zero.
Training feature: The temporal feature for DGBA training is represented by the average
SC volatility of all the hosts of the same type, expressed as: 1M
∑Mi=1 Φi. Fig. 5.5 shows
different temporal features for various P2P bots and legitimate P2P hosts. We notice that
Sality and ZeroAccess bots have small volatility features, while emule hosts and Kelihos
bots have a moderate value of SC volatility. SC volatility is related to a number of factors,
including the total number of SCs, the size of P2P networks and how dynamic the network
connections are. Generally speaking, the SCs of bots are less volatile than that of the benign
hosts, because the pivotal points of the botnets are required to be extremely robust and
reliable to support network-wide C&C communication, thus are less likely to be volatile over
time.
Detection feature: The temporal feature for DGBA detection captures the minimal SC
volatility of all the hosts in the cluster, i.e., minΦi. Therefore, the temporal feature of bot-
included cluster will be determined by the bots, whose SCs appear less volatile. In the end,
a smaller value of temporal feature reveals the presence of bots (i.e., Sality or ZeroAccess),
while a larger value represents the temporal feature of benign hosts (i.e., Skype or Bittorrent).
134
5.3.4 Training and Classification
Data Preprocessing: Data preprocessing tries to cope with the issue that the collective
features extracted from the network flow data have different data ranges. To make sure every
feature in the feature sets is given equal importance, we perform feature-wise normalization
to shift and re-scale each feature value so that they lie within the range of [0, 1].
Multi-class Classification: Support vector machine is adopted as our main classification
method due to its robustness, efficiency and excellent non-linear classification performance.
In particular, we use multi-class SVM classification to assign each cluster one label corre-
sponding to a specific type of botnet or a non-bot host. We denote the multiple labels as
{B1, B2, . . . , Bk}, assuming k − 1 classes of botnets with the last class representing non-bot
label. The basic component of SVM method is a binary classification mechanism, which
classifies an unlabeled cluster based on the distance of its feature to the decision hyperplane
with norm vector w and constant b:
f(x) = wTx + b =∑∀i
yiαiK(xi,x) + b, (5.1)
where xi is the feature vector of host i from the training set, yi ∈ {−1, 1} denotes the label
of the training data, and the parameters αi determines whether the host i is a support vector
(αi > 0) or not (αi = 0). The feature vector xi is transformed into a higher dimensional
space by a non-linear kernel function K(xi,x).
Two-class SVM determines w and b by searching for the optimal hyperplane to separate
the feature space into two parts. This is also termed as a maximum margin approach, since
the objective is to maximize the distance between training data and decision hyperplane.
Multi-class SVM model is built by combining multiple two-class SVM models. For a K-
class SVM model (K > 2), we use “one-versus-one” approach [136], in which K(K − 1)/2
135
classifiers are trained on all possible pairs of classes, and then a voting strategy is used to
classify the clusters according to which class has the highest number of votes. Note that the
training of classifier employs DGBA training features, while the voting-based classification
relies on DGBA detection features. Finally, the clusters labeled as specific types of botnets
are deemed as bot-included clusters, demanding a further inspection.
5.3.5 Refined Bot Identification
After labeling bot-included clusters, the final step is to extract bots from the cluster based on
their individual connection features. Taking advantage of the distinct behaviors of bots and
benign hosts shown in Section 5.3.3, we devise a feature test to separate bots from benign
hosts who happen to be in the same cluster. The feature test exploits the differences of
various connection features between bots and benign hosts, which is shown in Algorithm 3.
A number of threshold values are defined to empirically set apart bots and benign hosts (e.g.
λ1 = 8000, λ2 = 0.2, λ3 = 10, λ4 = 200, λ5 = 0.2). As long as one type of features satisfies
the statement, the host is identified as bot.
Algorithm 3 Feature Test
1: for each bot-included cluster do2: for each host in the cluster do3: host ← benign host label4: if number of connections > λ1 or
shared neighbor ratio with any peer > λ2 ornumber of significant connections < λ3 or > λ4 orsignificant connection volatility < λ5 then
5: host ← bot label6: end if7: end for8: end for
136
Group-level feature ex-traction
Training feature extract-ed from labeled hosts
Detection feature extract-ed from a cluster
Cluster connectivity feature average maximum
Shared neighbor feature average maximum
Significant connection feature average maximum and minimum
Temporal feature average minimum
Table 5.3: Group-level feature summary
5.3.6 Putting Them All Together
A summary of PeerClean detection system is presented in this section. For the labeled
hosts in the training data set, PeerClean extracts their training features corresponding to
the average of each group-level feature using DGBA training, then builds a multi-class SVM
model. For the hosts in the testing data set, PeerClean first clusters the hosts based on their
flow statistical features. Each cluster undergoes DGBA detection to extract the detection
features corresponding to the maximum or minimum of each group-level feature. Then, each
cluster is designated a label by the SVM model to indicate whether it contains a specific type
of bots. Finally, PeerClean employs a refined bot identification algorithm to pick out the
bots from each bot-included cluster. A summary of group-level features is listed in Table 5.3.
5.3.7 Evasion Mechanisms and Limitations
PeerClean detects botnets without relying on packet payloads, which already raises the bar
for botnet authors. In the following, we discuss the potential evasion mechanisms that botnet
authors might use to thwart PeerClean.
The bots may disrupt the clustering mechanism by not following the same transmission pro-
tocol. However, that will increase the complexity of bot implementations and will also affect
the efficiency of C&C message exchange. Evading DGBA is even harder to achieve. The
137
possible attempts to evade the DGBA detection include lowering the connection number,
lowering the shared neighbor ratio, raising the significant connection number and raising
the significant connection volatility. For example, ZeroAccess bots need to reduce the good
connection number from 15000 to 2000 per hour, who lose nearly 86.7% connections. Mean-
while, the shared neighbor ratios should drop from at least 30% to nearly 0, which basically
suggests constructing a C&C delivery network with few shared neighbors between every two
peers. The SC number should increase to between 20 and 60 per hour, and SC volatility
should raise above more than 30%, which may disrupt the botnet operations, especially if
we consider the stealthiness and reliability requirements of P2P botnets. Generally, the P2P
bots always endeavor to meet the requirements by avoiding involving a large number of SCs
to maintain a low profile, and by preventing from adapting SC sets over time.
Therefore, the change of one or more connection features will greatly affect the C&C de-
livering operation, and may compromise the stealthiness of the botnets. Furthermore, the
collective features enlarge the gaps between the bots and benign hosts. To make the collec-
tive features indistinguishable from those of benign hosts will require substantial work on
designing a complex botnet. We leave the design of such botnets as future work.
Since PeerClean identifies the bots based on the traffic flow statistics, a limitation of Peer-
Clean is in identifying the bot-infected hosts running benign P2P application simultaneously
and persistently. In this case, the bot traffic might be obscured by the traffic from benign
P2P applications. Since PeerClean performs detection per hour, the smart bots would have
to run benign P2P applications all the time to prevent them from being discovered. However,
most benign P2P nodes have fast peer churn rate with short communication sessions [131].
Thus, it is unlikely for the P2P hosts to run benign P2P applications with P2P bot protocol
persistently, which will eventually reveal the bots’ presence at a certain point of time. On the
other hand, the future bots might intentionally run the bot protocol together with benign
138
P2P applications. Nevertheless, this will affect the communication efficiency of bots, which
might lead to a high peer churn rate or a complete disruption of C&C communications.
5.4 Discussion
In this section, we discusses two important issues that PeerClean is facing. Note that Peer-
Clean identifies a host by its IP address captured in the traffic flow data. However, in practice,
the IP address is not always a reliable source for correctly identifying a particular host, due
to dynamic address assignment enabled by DHCP protocol. Thus, some IP addresses may
belong to the same host, which might confuse the PeerClean detection. Fortunately, the
hosts tend to finish running applications before they went down and later returned with a
different IP address. Also, the DHCP protocol will not assign IP addresses very frequently.
Therefore, during a one-hour epoch, few IP addresses, if any, associated with the same hosts
will be clustered together. These small number of duplicate hosts will only cause negligible
impacts to the PeerClean system. Thus, we simply regard different IP addresses as different
hosts.
Another issue is that some hosts may be behind NATs, such that they will appear to have
the same IP address to the outside. We do not consider this type of NATed bots in our
system design, since these bots are typically not recruited in the P2P infrastructure of a
botnet, as they cannot be contacted by the other peers in the Internet. In fact, NATed bots
will not become the P2P overlay of a botnet. On the other hand, legitimate hosts may also
live behind NATs, such that the traffic flow from this NATed IP will be aggregated traffic
from multiple hosts. As long as they are not clustered into the same cluster with bots, they
will not cause serious issues. Luckily, the probability that these NATed IPs are clustered
together with bots will be a negligible value, due to their distinctive traffic flow patterns
139
compared to bots’ traffic flow patterns. Therefore, we simply regard this NATed IP as one
single special host.
5.5 Evaluation
In this section, we evaluate the bot identification performance of PeerClean system. We
first describe the collected data sets (Section 5.5.1). Then, we show that PeerClean can
well separate different types of P2P bots into different clusters, but may falsely incorporate
some benign P2P hosts who have bot-like traffic patterns (Section 5.5.2). After generating
host clusters, DGBA is carried out to extract group-level connection features from each
cluster. By separating the data set into training and testing sets, a multi-class SVM model
is trained using the labeled training set. We then evaluate the classification performance and
refined bot identification performance in Section 5.5.3 and Section 5.5.4, respectively. The
performance comparison with two existing approaches based on network flow patterns and
pairwise shared neighbor ratio is detailed in Section 5.5.5. Finally, we evaluate the system
scalability in a large network.
5.5.1 Data Collection
We use the traffic trace captured from the edge routers of a large campus network, which
have two /16 subnets. The traffic rate is about 5000 flows per second, and was captured for
one whole day in April 2013. We focus on the TCP and UDP traffic in this traffic trace.
However, as the network flow trace does not include traffic payload, we cannot unveil the
ground truth about whether or not the active hosts are running legitimate P2P applications.
To provide the ground truth data from legitimate P2P hosts, we run three of the most
140
popular P2P applications in our lab machines: emule, bittorrent and skype, and collect their
network flow traces. To make the traffic traces more representative, we interact with the
P2P hosts using AutoIt script [137] to randomly select contents to be downloaded/uploaded
(for emule and bittorrent application), or randomly generate texts to be transmitted (for
skype application) at random time periods. In total, we collected one-day traces from 100
bittorrent clients, 100 skype clients and 100 emule clients in April 2013.
We also collected the network traces for three recent P2P botnets: Sality, Kelihos and
ZeroAccess in April 2013. These network traces were gathered by purposefully running
Sality, Kelihos and ZeroAccess samples in a controlled environment, in which we carefully
block spamming, scanning, and Denial of Service attack activities. They contain 24-hour
traces for 6 Sality bots, 4 Kelihos bots and 4 ZeroAccess bots. Since the major malicious
activities were blocked during the collection of network traces, the collected traces mainly
include C&C traffic, e.g., for peer discovery, command exchanging, etc. Note that these
traces are collected when the three botnets are fully active. The traffic summary is listed in
Table 5.4. The traffic data from 300 legitimate P2P clients and 14 P2P bots constitute our
ground truth data set.
To make the evaluation more realistic, we overlay the traffic traces from 300 legitimate P2P
hosts and 14 P2P bots onto the campus traffic trace by assigning them to randomly selected
hosts of the campus network. In order to reduce the traffic volume, we eliminate flows from
well-known and extremely busy servers such as DNS servers, email servers, popular website
servers (e.g. google, facebook, youtube, etc.). After that, P2P host identification searches
for the hosts with a high percentage of failed connections (with threshold of 5%). In total,
we find 1097 hosts involved in P2P applications during the day, including all the 314 P2P
hosts serving as ground truths and additional 783 hosts in the campus network as shown in
Table 5.4. Therefore, we validate the effectiveness of P2P host identification, since all the
141
Trace Size Dur Pkts TCP/UDPFlows
clients
Campus 20.7G 24h 21.5G 401,661,350 34743
Bittorrent 6.7G 24h 854M 62,674,080 100
Skype 1.1G 24h 376M 12,615,840 100Emule 1.6G 24h 406M 18,978,800 100
Sality 40M 24h 10.8M 565,490 6Kelihos 224M 24h 23.5M 3,249,931 4ZeroAccess 4.6G 24h 166.9M 69,896,829 4
P2P incampus
487M 24h 608M 7,127,054 783
Table 5.4: Traffic summary (‘P2P in campus’ denotes the traffic flows of the campus networkafter P2P host identification)
314 ground-truth P2P hosts are correctly identified.
5.5.2 Clustering P2P Hosts
In this section, we evaluate the P2P host clustering performance of PeerClean system. Based
on the extracted flow features in Section 5.3.1, we perform AP clustering to group together
P2P bots of the same type. During the flow feature extraction, we find that almost all of
the traffic flows from Kelihos bots adhere to TCP protocol, while Sality and ZeroAccess bots
mostly generate UDP traffic. Hence, we use both the TCP and UDP traffic patterns for host
clustering.
The data set contains 24-hour flow traces from 1097 P2P hosts, which is divided into 24
sections with one hour per section. For each data section, we extract the flow statistical
features of every host who has 100+ outgoing TCP flows and 100+ incoming TCP flows
for the purpose of building representative flow patterns. Then, host clustering is carried out
using AP clustering method based on the extracted flow statistical features. Note that, since
the last two features in Table 5.2 are refreshed at the end of the day, they will only be used
for clustering at the last hour.
142
Hour 2 4 6 8 10 12 14 16 18 20 22
ClusterNum.
29 25 24 27 30 29 32 25 29 30 28
SalityClusterIndex
28 22 22 26 28 2728
31 22 22 29 26
ZeroAccessClusterIndex
29 25 24 27 30 29 32 25 29 30 28
BSR1 1 1 1 1 1 1 1 1 1 1 1BSR2 1 1 1 1 1 1 1 1 1 1 1
Table 5.5: Clustering result using UDP traffic (BSR1, BSR2 denotes the BSRs of Sality andZeroAccess bots respectively)
Hour 2 4 6 8 10 12 14 16 18 20 22
ClusterNum.
21 24 25 15 22 25 26 21 24 20 18
KelihosClus-terIndex
13 24 23 14 19 6 5 20 6 14 17
BSR3 1 1 1 1 1 1 1 1 1 1 1
Table 5.6: Clustering result using TCP traffic (BSR3 denotes the BSR of Kelihos bots)
We evaluate the clustering performance in terms of the ability of producing well separat-
ed and compact bot clusters, for which we propose two performance criterion. We define
mix-clustered bots as the bots mistakenly residing in the cluster of other types of bots, and
the complement of which are called separate-clustered bots. The two performance criterion
for evaluating the separation and compactness performances of bot clustering are: (1) Bot
Separation Ratio (BSR), which is defined for each type of bot as the ratio of the number
of separate-clustered bots and the overall number of bots of this type; (2) Bot Compact-
ness Ratio (BCR), which is defined as the number of separate-clustered bots and the overall
number of hosts (whether benign or not) assigned into the same cluster.
Bot Separation Performance: As observed from the per-hour clustering result3 in Table
5.5 and 5.6, Sality, Kelihos and ZeroAccess bots are assigned into different clusters with the
cluster index shown in the tables, i.e., all three types of bots are well separated from each
3Note that we only count the clusters containing more than one node.
143
Figure 5.6: Box plot of Bot Compactness Ratio
other, which makes their BSRs achieve one. Moreover, almost all the bots of one type are
grouped into the same cluster, with an exception of Sality bots who are divided into two
clusters at the 12-th hour. Nevertheless, none of these clusters contains more than one type
of bots, which demonstrates the perfect separation of different types of bots.
Bot Compactness Performance: The excellent bot separation performance indicates that
the bot-included cluster completely excludes other types of bots. However, every bot-
included clusters may still incorporate some benign P2P hosts, who accidentally display
a similar traffic pattern during the detection period. BCR quantifies the clustering capabil-
ity to exclude benign P2P bosts out of the bot-included cluster. BCR achieves one if the
bot-included cluster contains zero benign P2P host. Accumulating the 24-hour BCR results,
we plot BCR box-plot performance of three types of bots in Fig. 5.6. On average, Sality and
ZeroAccess bot clusters falsely include 3 benign hosts respectively, while Kelihos bot clusters
falsely include 12 benign hosts, which indicates that the clustering mechanism is not effective
in generating compact bot clusters. Further inspection of the falsely included benign hosts
shows they have traffic profiles that are highly similar to the bots’ traffic. Based on the
experimental results, we claim that network flow features are not sufficient to discriminate
P2P bots from benign P2P hosts.
144
5.5.3 Identifying Bot-included Clusters via Classification
In this section, we evaluate the bot cluster identification using multi-class SVM classification
with Gaussian kernel. Since we only have a limited number of labeled bots during every
hour, we enlarge the training space by incorporating a half day of labeled bots and benign
hosts into the training set. Consequently, the training set contains 36 clusters (3 clusters
per hour) of labeled bots (with labels ‘Sality’, ‘Kelihos’, ‘ZeroAccess’) and 36 clusters of
labeled legitimate P2P hosts running bittorrent, emule and skype (with labels ‘Non-Bot’).
We extract the training features from all the 72 labeled clusters to build the SVM classifier.
Then, we use the next half day of bots and benign hosts as testing set, which includes a
total of 37 bot-included clusters and 545 benign clusters.
After host clustering, DGBA detection process extracts four different types of group-level
connection features from every cluster in the testing set. Then, the SVM model predicts the
labels of clusters. Since the classification module relies on four different types of features,
we train the classifier on each individual group behavior feature in order to understand their
relative importance.
Table 5.7 shows the classification performance using different types of features for training.
The classification based on either shared neighbor feature or significant connection feature
have high accuracy and recall, but only achieve moderate precision. Looking into the clas-
sification results, we find that the classification produces few false negatives but many false
positives, i.e., bot-included clusters are unlikely to be regarded as benign cluster with these
two features, but many benign clusters are falsely considered as bot-included clusters. On
the other hand, cluster connectivity feature seems unable to discriminate bots from benign
hosts, as it brings substantial false positives and false negatives. We observe that the cor-
rectly classified bots mainly belong to ZeroAccess botnet, which agrees with our analysis in
145
Group Behavior Fea-ture
Accuracy Precision Recall
Cluster ConnectivityFeature
51.8% 7.9% 34.3%
Shared Neighbor Fea-ture
92.7% 68.8% 91.7%
Significant ConnectionFeature
91.8% 66.7% 90%
Temporal Feature 71.3% 3.1% 66.7%
All Features 98.8% 94.6% 100%
Table 5.7: Classification accuracy when trained on one type of feature. Shared neighbor fea-ture and significant connection feature present the best classification accuracy. The classifierachieves the best performance when combining all the features. Accuracy=(TP+TN)/all;Precision=TP/(TP+FP); Recall=TP/(TP+FN).
Section 5.3.3. Finally, temporal feature is designed to be updated at the end of the day, thus
is only used for bot classification in the final hour. Again, many false positives arise due to
the inseparability of bots’ features and benign hosts’ features. However, the combination of
all features provide the best result for detecting bot-included clusters. Overall, we only find
two false positives with none false negatives.
It is worth noting that the training set constitutes 50% of the whole data set in the previous
evaluation. Here, we also evaluate the classification performance by varying the percentage
of training data, since it is always difficult to collect traffic traces from labeled bots. As
shown in Fig. 5.7, PeerClean can still retain more than 70% classification accuracy when the
training sets contain traces from merely 10% of labeled hosts. This suggests that PeerClean is
robust against small sets of training data, such that PeerClean will have a wide applicability
under different network sizes.
146
Figure 5.7: Classification performance with different percentages of training data
Bot Type BotNum.
BenignHostNum.
CorrectlyIdentified
Falsely I-dentified
Sality 72 36 69(95.8%) 0 (0%)
Kelihos 48 123 47(97.9%) 5 (4.1%)
ZeroAccess 48 42 48(100%) 2 (4.8%)
Table 5.8: Refined bot identification performance (the percentage in the parenthesis denotesthe bot detection rate and false alarm rate respectively)
5.5.4 Refined Bot Identification Performance
Bot-included clusters contain a considerable amount of benign hosts as shown in Section
5.5.2, thus we use refined bot identification to extract the bots inside each bot-included
cluster. Feature test in Algorithm 3 is utilized to perform refined bot identification. We
run feature test on all the 39 bot-included clusters identified through SVM classification,
including 13 Sality clusters, 12 Kelihos clusters, 12 ZeroAccess clusters and 2 false positives.
The bot identification performance is depicted in Table. 5.8, which shows the bot number and
benign host number in the bot-included clusters. In summary, the refined bot identification
correctly identifies more than 95.8% of bots, and falsely triggers less than 4.8% alarms.
147
5.5.5 Comparisons with Other Detection Approaches
In this section, we present performance comparison of PeerClean system with other types of
detection approaches: method A and method B. Method A relies on network flow statistical
features to detect bot communications [53]. As shown in Section 5.5.2, a lot of benign hosts
appear to have similar flow traffic pattern like bots. We perform host clustering based on flow
statistical features to separate different patterns of flow traffic. Then, the clusters containing
bots are regarded as bot clusters, and all the benign hosts inside the bot clusters are falsely
labeled as bots. The number of false positives using method A is listed in Table 5.9, which
is unacceptably high.
Method B builds upon PeerClean system, but utilizes pairwise shared neighbor ratios to
identify bots [47] instead of a set of group-level connection features. For each cluster derived
from host clustering, we compute the pairwise shared neighbor ratio for every pair of hosts
in the cluster. If the pairwise shared neighbor ratio is higher than an empirical threshold, we
identify this pair of hosts as bots. From the results in Table 5.9, we find method B not only
has low accuracy of bot identification, but is also more likely to produce considerable false
alarms. The reason for its inadequate bot identification is mainly due to the traffic dynamics,
which causes method B to miss many bots but falsely include benign hosts. On the other
hand, PeerClean is able to significantly improve the identification accuracy by extracting
and modeling group-level features instead of individual or pairwise features, which manifests
itself by comparing Table 5.9 with Table 5.8.
5.5.6 System Scalability
The running time of PeerClean system depends on the clustering method, group feature
extraction and SVM classification method. The SVM classification method intends to classify
148
Bot Type BotNum.
Falsely Identified:Method A
Correctly Identi-fied: Method B
Falsely Identified:Method B
Sality 72 36 51 (70.8%) 14 (38.9%)
Kelihos 48 123 38 (79.2%) 21 (17.1%)
ZeroAccess 48 42 33 (68.8%) 10 (23.8%)
Table 5.9: Performance comparison with method A and method B (threshold: 0.2).
Figure 5.8: Running time of AP clustering
a small number of meaningful clusters. The running time can be negligible. On the other
hand, group feature extraction tries to collect the group behavior features in the whole
cluster. If one cluster contains a large number of P2P hosts, the group behavior feature
extraction will consume a considerable amount of time. In the case of large clusters, instead
of extracting all the group features, we can only extract one single feature to tradeoff the
complexity and performance. Based on the results in Table 5.7, we can extract only the
shared neighbor feature or significant connection feature for large clusters without degrading
too much performance, which will greatly reduce the computational complexity.
Finally, we evaluate the time consumption of AP clustering algorithm. We use a commodity
computer system with AMD FX(tm)-8120 processor and 16GB memory. The running time
corresponding to different number of P2P hosts are shown in Fig. 5.8. The time consumption
for a large amount of P2P hosts (up to 8000) is still constrained in one hour period. Since
PeerClean works within a one-hour period, the time consumption is acceptable. Alterna-
149
tively, leveraged affinity propagation method [133] can be used to efficiently deal with large
data sets, and we leave it for future work.
5.6 Summary
P2P C&C infrastructure has become a popular choice for the future botnets, which is ex-
tremely resilient to even sophisticated takedown attempts. The ability to identify botnets
inside a network is particularly important to the network administrators. Toward this di-
rection, we present PeerClean, a new network flow-based system to identify and classify
botnets with a high accuracy. PeerClean leverages a novel dynamic group behavior analysis
to extract and model a set of robust and reliable connection features at the group level. In
our experimental evaluation on real-world network traces, PeerClean shows excellent detec-
tion accuracy on various types of botnets. An interesting future direction is to apply the
group behavior analysis to other types of applications to help identify the network behaviors
which would be otherwise unnoticeable. We can also further merge and correlate the group
behaviors of different applications to detect anomalous communication patterns and identify
newly emerging attacks.
Chapter 6
Conclusion and Future Research
In this dissertation, we have identified and explored key points to apply cognitive technolo-
gies to enhance the security of the current cognitive communication systems including CRNs
and core networks. We designed effective and efficient countermeasure mechanisms to detect
and defend against the sophisticated and adaptive attackers that may exploit the security
vulnerabilities of CRNs. We also investigated IC-based jamming resilient wireless commu-
nications under SDR based reactive jammers. To protect communications in core networks,
we propose P2P botnet identification mechanisms to pinpoint P2P bots.
This dissertation contributes to advancing the state-of-the-art security research in modern
cognitive network design. The discovered security loopholes in the existing networking sys-
tems are astounding, which urgently call for a system redesign with a series of thorough
security tests or a comprehensive set of effective defending mechanisms in place. The pro-
posed defense mechanisms in this dissertation are designed correspondingly with real-world
implementation issues in mind, and are evaluated with prototype and testbed designs. We
believe these defense mechanisms make a compelling effort to guard and fortify the modern
cognitive networks. In this chapter, we summarize the research work and discuss future
research directions.
150
151
6.1 Research Summary
Recent advances in information technology have led to the proliferation of mobile devices
and intelligent IoT devices, which require more and more spectrum resources to support the
growing demand of world-wide and ubiquitous Internet connections. A shortage of spectrum
resources have become a serious issue which will hinder the development of mobile technolo-
gies and their global markets. Cognitive radio has been envisioned as a new paradigm to
improve the spectrum utilization and to provide opportunistic access to the licensed bands.
However, the communication paradigm is untethered and ad hoc, therefore the communica-
tion channel is adversarial and the communication hosts are not trustworthy. Furthermore,
the CR technologies allow more powerful adaptive and reactive adversaries, who may in-
tentionally adjust their attack strategies through sensing and learning capabilities. Finally,
the communications in core networks are further disrupted by sophisticated botnets who
are organized to commit cyber crimes. The untrusted networking and cognitive empowered
adversarial environments create new challenges in communication and network security.
In the dissertation, we have explored security research in the area of cognitive communica-
tions in cognitive networks. We identified security vulnerabilities of distributed spectrum
sensing mechanisms in CR ad hoc networks. We then developed a monitoring framework
to identify the malicious network activities in CRNs. We established jamming resilient
communication mechanisms to mitigate SDR-based reactive jamming attacks by employing
MIMO IC technology. Finally, we put forward a detection system to uncover P2P botnets
by leveraging group behavior analysis and machine learning technology.
Our main finding and their implications can be concluded as follows.
• We focused on the protection of distributed spectrum sensing in CR ad hoc networks.
We first identified various attacks that can disrupt the consensus algorithm or stealthily
152
subvert the sensing results, especially the covert adaptive attacks with learning capa-
bility. We then developed a robust distributed outlier detection scheme with adaptive
local threshold to counter covert adaptive attacks by exploiting the state convergence
property. In addition, a hash-based computation verification scheme is presented to
effectively defend against colluding attackers. To the authors’ best knowledge, this is
the first article to address the security issues in distributed spectrum sensing in CR
networks. Our simulation results demonstrated the severe vulnerabilities of distribut-
ed spectrum sensing, and also showed that our protection mechanisms are effective
and efficient in enforcing a trustworthy reporting of sensing results. In conclusion,
we should pay attention to the adaptive adversaries with cognitive learning capability
while designing communication protocols in cognitive networks.
• We proposed a monitoring framework, SpecMonitor, which utilizes a non-parametric
density estimation method to model SUs’ channel usage pattern. This method makes
no assumptions on the unknown distribution of channel access pattern, thus offers ac-
curate and flexible models which can be updated in an online fashion with acceptable
complexity. Moreover, we designed a sliding window method to perform online learn-
ing of data dynamics, and an accumulative combination method to further improve
modeling accuracy. We considered two levels of monitoring objectives: frame-level and
user-level, to diagnose different network issues. Based on the predicted traffic pat-
tern, We casted the monitoring optimization problem as a sniffer channel assignment
problem with objective of maximizing the corresponding QoMs, for which we designed
near-optimal monitoring algorithms. Our simulation and experimental results both
showed that SpecMonitor has superior capturing capability with low channel switch-
ing overhead.
• We proposed a novel defense mechanism for jamming resilient OFDM communication
153
based on MIMO IC technique, which tracks the jamming signal’s direction in real-time
before canceling it out. We devised an iterative channel tracking mechanism using
multiple pilots to estimate the sender and jammers channels alternately and iteratively
in a timely fashion. More importantly, we introduced an enhanced defense mechanism
leveraging sender signal enhancement (SSE) and message feedback techniques, which
strategically enhances the projected sender signal strength via signal rotation, resulting
in an improved anti-jamming performance. A tactical IC scheme is designed not only
to protect the forwarding frame transmission, but also to guard the feedback messages
against jamming. Our prototype experimental results demonstrated that the proposed
defense mechanisms can effectively sustain operational OFDM communications under
reactive jamming attack with considerable throughput.
• We proposed a dynamic group behavior analysis method to investigate the group-level
connection behaviors inside botnets. We applied DGBA to every host cluster so as to
extract the aggregated connection features. The propose detection system, PeerClean,
then trains a support vector machine (SVM) classifier using the group-level training
features, and labels each cluster using the SVM classifier subsequently. To improve
the detection performance, we trained the classifier using average group behavior, but
explore the extreme group behavior for the detection. While almost all the existing
network-based detection approaches identify the bots without specifying their corre-
sponding botnet types, we consider the botnet type as a useful piece of information for
the network administrators to evaluate the damaging impacts and decide correspond-
ing bot-specific countermeasures. Furthermore, the PeerClean system is tailored to
support real-time bot detection, and to enable a quick response to the bot infections.
Real world experiments with traffic captured in a campus environment showed that
PeerClean is able to identify and classify botnets with a high accuracy. Throughout
154
the experiments, we find that different botnets appear to have significantly different
network behaviors including traffic patterns and connection patterns. Therefore, in-
herent, robust and distinguishable features are still required to effectively detect the
botnets.
6.2 Future Research Directions
We can further investigate the following open problems for robust and reliable spectrum
sensing based on distributed outlier detection.
• We chose the distance-based approach due to the nature of localized/distributed detec-
tion. In future research, we can investigate and evaluate other types of outlier detection
techniques. For instance, a deviation-based approach identifies outliers that deviate
from the general characteristics of the set [138], i.e., the sensing reports minimizing the
local set variance after being removed will be flagged as falsified reports. Moreover,
density-based approach compares the density around a node with that around its lo-
cal neighbors [139,140], assuming a significant distinction between the density around
an outlier and that around normal neighbors. We can design protocols that enable
detection while minimizing the message exchanges and delay induced by the protocol
running in a distributed fashion. The performance of these different outlier detection
protocols can be evaluated in terms of detection probability, false alarm rate, and com-
putation and communication costs. Theoretical analysis of performance guarantees on
bounded miss detection and false alarm probabilities can also be investigated.
• Although the robustness and fault tolerance of various outlier detection schemes have
been reported before [61, 134, 141, 142], their impacts when applied to distributed
155
consensus-based protocols are not known. The effectiveness of outlier detection ap-
proaches relies on an assumption that a vast majority of nodes in the network are
honest. Therefore, the number of malicious nodes has a direct impact on our outlier
detection based consensus protocol. We can initiate a thorough theoretical and ex-
perimental study on the relationship between the number of malicious nodes and the
detection performance, the result of which will provide an upper bound of malicious
nodes tolerable within the detection zone of our outlier detection approach.
• The malicious behavior modeling is an open problem for robust and reliable spectrum
sensing. In our research, we consider persistent attackers who attack all the time to
manipulate the sensing results. However, in general, such persistent attack can be more
easily detected than sophisticated attack behaviors such as random, opportunistic and
insidious attack behaviors considered in [143]. When attackers perform such smart and
colluded attacks, the detection rate may be greatly reduced and hence it calls for more
advanced countermeasures to improve the detection rate and restrict the damaging
impacts.
• We can further extend the proposed detection approach to counteract data falsification
attacks in a mobile ad hoc network (MANET), where both the SUs and attackers are
mobile. In this case, the neighborhood of every node is constantly changing, which
adds another layer of challenges to the detection approaches, since nodes may execute
computations involving different neighbors during every iteration of consensus. We can
theoretically and experimentally evaluate the performance of the consensus algorithm
in a mobile network, and propose mobility-aware protection mechanisms to enforce
robust spectrum sensing. In addition, the proposed outlier detection approach can be
generalized to incorporate different channel propagation models for both indoor and
outdoor. In other words, we can investigate the configuration of detection threshold in
156
different networking environments with different channel propagation models.
We can extend our research effort in the following directions for the Non-Parametric predic-
tion based SpecMonitor framework.
• Our monitoring framework can be applied to model and then capture different types
of traffic, for different monitoring objectives such as detecting ICMP flooding attacks,
TCP/UDP spoofing attacks [144], etc. The performance of monitoring various types of
attacking traffic can be evaluated by simulation as well as in a controlled experimental
environment to ensure the wide applicability of our monitoring framework.
• The ultimate objective of the monitoring service is to provide a deep understanding
of benign traffic patterns and identify abnormal network behaviors such as secondary
user spectrum abuse [28] and network intrusions. Further extension can be made to
apply non-parametric machine learning techniques such as non-parametric Bayesian
inference model [145,146] to quickly classify benign traffic and identify dishonest user
behaviors and intrusions by learning and classifying traffic patterns in CRNs.
We can also explore the following open research problems for MIMO IC based jamming
resilient communication mechanism.
• In a traditional multi-hop network with jammers, the link being jammed will be un-
able to transfer information. However, by employing the proposed IC-based jamming
resilient communication scheme, the jammed link is still capable of transmitting infor-
mation, which introduces new optimization problems associated with rate allocation,
resource scheduling, and relay selection in the presence of jammers, while under the
protection of IC-based mechanisms. We can broadly study the newly emerged problem
sets to enable cross-layer optimization of networking under jamming attacks.
157
• We can extend our jamming cancellation mechanism to cancel jamming signals from
mobile jammers. The channel between a mobile jammer and intended receiver will
constantly change, which makes it difficult to estimate instantaneous jamming chan-
nels. We can carry out experiments to evaluate jamming signal cancellation schemes
in the presence of reactive jammers with different mobility patterns.
We can explore the following open research problems for group behavior analysis based
PeerClean system.
• We can apply group behavior analysis to identify newly emerging botnets with novel
botnet behaviors. We may use extracted group-level features to define the “normal”
behavior of benign host groups, then the behaviors those deviate to a certain extent
from the normal behavior will be regarded as anomalous behaviors. In the first place,
we need to gather together the bots from such unseen botnets to investigate their
group-level behaviors. Then, we can compare their group-level behaviors with normal
behavior to gauge the group’s abnormality, which can be used to uncover previously
unidentified botnets.
• We can apply PeerClean system to detect more types of P2P botnets. In future re-
search, PeerClean system can utilize cognitive technology to learn the bot behaviors
and adapt the strategy to uncover bots’ presence. In the current PeerClean system,
we discovered several features based on real traffic data analysis. Future research may
focus on developing advanced machine learning mechanisms to automatically extract
reliable and robust features to tell apart bots and benign hosts.
• In future, we will see how the botmasters adapt the bot behaviors to evade the new
detection system. PeerClean can be designed to incorporate the strategies to counteract
botmasters’ adaptation. The arm race between the botmasters and detectors is an
158
open problem worth studying. Finally, we can evaluate the real-time traffic analysis
performance of PeerClean system in the real production networks.
Bibliography
[1] D. D. Clark, C. Partridge, J. C. Ramming, and J. T. Wroclawski, “A knowledge plane
for the internet,” in Proceedings of the 2003 Conference on Applications, Technologies,
Architectures, and Protocols for Computer Communications, ser. SIGCOMM ’03, 2003,
pp. 3–10.
[2] R. Thomas, D. Friend, L. DaSilva, and A. MacKenzie, “Cognitive networks: adap-
tation and learning to achieve end-to-end performance objectives,” Communications
Magazine, IEEE, vol. 44, no. 12, pp. 51–57, Dec 2006.
[3] C. Fortuna and M. Mohorcic, “Trends in the development of communication networks:
Cognitive networks,” Computer networks, pp. 1354–1376, January 2009.
[4] “Cognitive.” [Online]. Available: http://www.merriam-webster.com/dictionary/
cognitive
[5] J. Mitola and G. Q. Maguire, “Cognitive radio: making software radios more personal,”
IEEE Personal Communications, vol. 6, no. 4, pp. 13–18, August 1999.
[6] “Ettus research llc.” [Online]. Available: http://www.ettus.com
[7] “Warp: Wireless open access research platform.” [Online]. Available: http://http:
//warpproject.org/trac
159
160
[8] “Information security.” [Online]. Available: http://en.wikipedia.org/wiki/
Information security
[9] A. Simmonds, P. Sandilands, and L. Van Ekert, “An ontology for network security
attacks,” Applied Computing, vol. 3285, pp. 317–323, 2004.
[10] S. Haykin, “Cognitive radio: brained-empowered wireless communications,” Selected
Areas in Communications, IEEE Journal on, vol. 23, no. 2, pp. 201–220, 2005.
[11] I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty, “Next generation/dynamic
spectrum access/cognitive radio wireless networks: a survey,” Computer Networks,
vol. 50, no. 13, pp. 2127–2159, Sep. 2006.
[12] S. Haykin, D. J. Thomson, and J. H. Reed, “Spectrum sensing for cognitive radio,”
Proceedings of the IEEE, vol. 97, no. 5, pp. 849–877, May 2009.
[13] I. F. Akyildiz, B. F. Lo, and R. Balakrishnan, “Cooperative spectrum sensing in cog-
nitive radio networks: a survey,” Physical Communication, vol. 4, pp. 40–62, 2011.
[14] I. F. Akyildiz, W.-Y. Lee, and K. R. Chowdhury, “CRAHN: cognitive radio ad hoc
networks,” Ad Hoc Networks, vol. 7, no. 5, pp. 810–836, July 2009.
[15] R. Chen, J.-M. Park, and J. H. Reed, “Defense against primary user emulation attacks
in cognitive radio networks,” Selected Areas in Communications, IEEE Journal on,
vol. 26, no. 1, pp. 25–37, 2008.
[16] R. Chen, J.-M. Park, and B. Kaigui, “Robust distributed spectrum sensing in cognitive
radio networks,” in INFOCOM 2008, IEEE, April 2008, pp. 1876–1884.
[17] O. Fatemieh, R. Chandra, and C. A. Gunter, “Secure collaborative sensing for crowd-
sourcing spectrum data in white space networks,” in New Frontiers in Dynamic Spec-
trum, 2010. (DySPAN ’2010) IEEE Symposium on, April 2010, pp. 1–12.
161
[18] A. W. Min, K.-H. Kim, and K. G. Shin, “Robust cooperative sensing via state esti-
mation in cognitive radio networks,” in New Frontiers in Dynamic Spectrum, 2011.
(DySPAN ’2011) IEEE Symposium on, 2011.
[19] Z. Li, F. R. Yu, and M. Huang, “A distributed consensus-based cooperative spectrum-
sensing scheme in cognitive radios,” Vehicular Technology, IEEE Transactions on,
vol. 59, no. 1, pp. 383–393, 2010.
[20] F. R. Yu, M. Huang, and H. Tang, “Biologically inspired consensus-based spectrum
sensing in mobile ad hoc networks with cognitive radios,” Network, IEEE, vol. 24,
no. 3, pp. 26–30, May 2010.
[21] J. L. Burbank, “Security in cognitive radio networks: the required evolution in ap-
proaches to wireless network security,” in Cognitive Radio Oriented Wireless Networks
and Communications, 2008. CrownCom 2008. 3rd International Conference on, May
2008, pp. 1–7.
[22] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimation
in electric power grids,” in Proceedings of the 16th ACM Conference on Computer and
Communications Security (CCS), 2009, pp. 21–32.
[23] W. Xu, W. Trappe, Y. Zhang, and T. Wood, “The feasibility of launching and detecting
jamming attacks in wireless networks,” in Proceedings of the 6th ACM International
Symposium on Mobile Ad Hoc Networking and Computing, ser. MobiHoc ’05, 2005,
pp. 46–57.
[24] M. Li, I. Koutsopoulos, and R. Poovendran, “Optimal jamming attacks and network
defense policies in wireless sensor networks,” in Proc. of IEEE INFOCOM, May 2007.
162
[25] M. Strasser, B. Danev, and S. Capkun, “Detection of reactive jamming in sensor
networks,” ACM Transactions on Sensor Networks(TOSN), vol. 7, no. 16, pp. 1–29,
2010.
[26] T. X. Brown and A. Sethi, “Potential cognitive radio denial-of-service vulnerabilities
and protection countermeasures: a multi-dimensional analysis and assessment,” Jour-
nal of Mobile Networks and Applications, vol. 13, no. 5, pp. 516–532, Oct. 2008.
[27] S. Liu, Y. Chen, W. Trappe, and L. J. Greenstein, “Aldo: An anomaly detection
framework for dynamic spectrum access networks,” in INFOCOM 2009, IEEE, April
2009, pp. 675–683.
[28] L. Yang, Z. Zhang, B. Y. Zhao, C. Kruegel, and H. Zheng, “Enforcing dynamic spec-
trum access with spectrum permits,” in Proceedings of the thirteenth ACM interna-
tional symposium on Mobile Ad Hoc Networking and Computing, ser. MobiHoc ’12,
2012, pp. 195–204.
[29] J. Yeo, M. Youssef, and A. Agrawala, “A framework for wireless LAN monitoring and
its applications,” in Wise 2004, October 2004, pp. 70–79.
[30] Y.-C. Cheng, J. Bellardo, P. Benko, A. C. Snoeren, G. M. Voelker, and S. Savage,
“Jigsaw: Solving the puzzle of enterprise 802.11 analysis,” in SIGCOMM ’06, Sep.
2006, pp. 39–50.
[31] Y.-C. Cheng, M. Afanasyev, P. Verkail, P. Benko, J. Chiang, A. C. Snoeren, S. Savage,
and G. M. Voelker, “Automating cross-layer diagnosis of enterprise wireless networks,”
in SIGCOMM ’07, Aug. 2007, pp. 25–36.
[32] A. Wood and J. Stankovic, “Denial of service in sensor networks,” Computer, vol. 35,
no. 10, pp. 54–62, 2002.
163
[33] K. Pelechrinis, M. Iliofotou, and S. Krishnamurthy, “Denial of service attacks in
wireless networks: The case of jammers,” Communications Surveys Tutorials, IEEE,
vol. 13, no. 2, pp. 245–257, 2011.
[34] M. Wilhelm, I. Martinovic, J. B. Schmitt, and V. Lenders, “Reactive jamming in
wireless networks - how realistic is the threat?” in Proc. of WiSec, June 2011.
[35] A. Cassola, W. Robertson, E. Kirda, and G. Noubir, “A practical, targeted, and
stealthy attack against wpa enterprise authentication,” in Proceedings of the 20th An-
nual Network and Distributed System Security Symposium (NDSS ’13), February 2013.
[36] S. Gollakota, S. D. Perli, and D. Katabi, “Interference alignment and cancellation,” in
Proc. of SIGCOMM, August 2009.
[37] S. Gollakota, F. Adib, D. Katabi, and S. Seshan, “Clearing the RF smog: Making
802.11 robust to cross-technology interference,” in Proc. of SIGCOMM, August 2011.
[38] K. C.-J. Lin, S. Gollakota, and D. Katabi, “Random access heterogeneous MIMO
networks,” in Proc. of SIGCOMM, August 2011.
[39] Damballa, “Peer-to-peer: A growing tactic used for threat command-and-
control,” https://www.damballa.com/downloads/r pubs/wp peer-to-peer a growing
tactic.pdf, 2013.
[40] “Latest kelihos botnet shut down live at rsa conference 2013,” http://threatpost.com/,
2013.
[41] N. Falliere, “Sality: Story of a peer to-peer viral network,” http://www.symantec.
com/connect/downloads/whitepaper-sality-story-peer-peer-viral-network, July 2011.
[42] J. Wyke, “The zeroaccess botnet - mining and fraud for massive financial gain,” Sep.
2012.
164
[43] T. Werner, “Kelihos.c: Same code, new botnet, 2012,” http://www.crowdstrike.com/
blog/kelihosc-same-code-new-botnet/index.html, Mar. 2012.
[44] C. Rossow, D. Andriesse, T. Werner, B. Stone-Gross, D. Plohmann, C. J. Dietrich, and
H. Bos, “P2PWNED: Modeling and evaluating the resilience of peer-to-peer botnets,”
in IEEE Symposium on Security & Privacy, May 2013.
[45] S. Nagaraja, P. Mittal, C.-Y. Hong, M. Caesar, and N. Borisov, “Botgrep: Finding
p2p bots with structured graph analysis,” in Proc. of USENIX Security’10, 2010.
[46] T.-F. Yen and M. K. Reiter, “Are your hosts trading or plotting? telling p2p file-
sharing and bots apart,” in Proc. of ICDCS, June 2010.
[47] J. Zhang, R. Perdisci, W. Lee, U. Sarfraz, and X. Luo, “Detecting stealthy p2p botnets
using statistical traffic fingerprints,” in Dependable Systems Networks (DSN), 2011
IEEE/IFIP 41st International Conference on, June 2011.
[48] Z. Xu, L. Chen, G. Gu, and C. Kruegel, “Peerpress: Utilizing enemies’ p2p strength
against them,” in Proc. of ACM CCS’12, October 2012.
[49] C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda, X. Zhou, and X. Wang, “Effec-
tive and efficient malware detection at the end host,” in Proc. of USENIX Security’09,
August 2009.
[50] G. Gu, P. Porras, V. Yegneswaran, and M. Fong, “Bothunter: Detecting malware
infection through IDS-driven dialog correlation,” in Proc. of USENIX Security’07,
August 2007.
[51] G. Gu, R. Perdisci, J. Zhang, and W. Lee, “Botminer: Clustering analysis of network
traffic for protocol- and structure-independent botnet detection,” in Proc. of USENIX
Security’08, 2008.
165
[52] J. Zhang, X. Luo, R. Perdisci, G. Gu, W. Lee, and N. Feamster, “Boosting the scala-
bility of botnet detection using adaptive traffic sampling,” in Proc. of AsiaCCS, March
2011.
[53] L. Bilge, D. Balzarotti, W. Robertson, E. Kirda, and C. Kruegel, “DISCLOSURE:
Detecting botnet command and control servers through large-scale netflow analysis,”
in Proc. of ACSAC, Dec. 2012.
[54] T.-F. Yen and M. K. Reiter, “Revisiting botnet models and their implications for
takedown strategies,” in Proceedings of the First international conference on Principles
of Security and Trust, 2012.
[55] Q. Zhao and B. Sadler, “A survey of dynamic spectrum access,” Signal Processing
Magazine, IEEE, vol. 24, no. 3, pp. 79–89, 2007.
[56] Q. Yan, M. Li, T. Jiang, W. Lou, and T. Hou, “Vulnerability and protection for dis-
tributed consensus-based spectrum sensing in cognitive radio networks,” in INFOCOM
2012, IEEE, March 2012, pp. 900–908.
[57] Q. Yan, M. Li, F. Chen, T. Jiang, W. Lou, T. Hou, and C.-T. Lu, “Non-parametric
passive traffic monitoring in cognitive radio networks,” in INFOCOM 2013, IEEE,
April 2013, pp. 1264–1272.
[58] Q. Yan, H. Zeng, T. Jiang, M. Li, W. Lou, and T. Hou, “Mimo-based jamming resilient
communication in wireless networks,” in INFOCOM 2014, IEEE, April 2014.
[59] H. Li and Z. Han, “Catch me if you can: an abnomality detection approach for col-
laborative spectrum sensing in cognitive radio networks,” Wireless Communications,
IEEE Transactions on, vol. 9, no. 11, pp. 3554–3565, 2010.
166
[60] O. Fatemieh, A. Farhadi, R. Chandra, and C. A. Gunter, “Using classification to pro-
tect the integrity of spectrum measurements in white space networks,” in Proceedings
of the 18th Annual Network and Distributed System Security Symposium (NDSS ’11),
February 2011.
[61] A. W. Min, K. G. Shin, and X. Hu, “Attack-tolerant distributed sensing for dynamic
spectrum sensing access networks,” in Network Protocols, 2009. (ICNP ’2009) 17th
International Conference on, October 2009, pp. 294–303.
[62] O. Fatemieh, M. LeMay, and C. A. Gunter, “Reliable telemetry in white spaces using
remote attestation,” in Proceedings of the 27th Annual Computer Security Applications
Conference, 2011, pp. 323–332.
[63] K. Zeng, P. Paweczak, and D. Cabric, “Reputation-based cooperative spectrum sensing
with trusted nodes assistance,” Communications Letters, IEEE, vol. 14, no. 3, pp. 226–
228, 2010.
[64] P. Kaligineedi, M. Khabbazian, and V. Bhargava, “Secure cooperative sensing tech-
niques for cognitive radio systems,” in IEEE International Conference on Communi-
cations, ICC ’08, 2008, pp. 3406–3410.
[65] P. Kaligineedi, M. Khabbazian, and V. K. Bhargava, “Malicious user detection in a
cognitive radio cooperative sensing system,” Wireless Communications, IEEE Trans-
actions on, vol. 9, no. 8, pp. 2488–2497, 2010.
[66] F. R. Yu, H. Tang, M. Huang, Z. Li, and P. C. Mason, “Defense against spectrum
sensing data falsification attacks in mobile ad hoc networks with cognitive radios,” in
Military Communications Conference, 2009. MILCOM 2009. IEEE, October 2009, pp.
1–7.
167
[67] R. Zhang, J. Zhang, Y. Zhang, and C. Zhang, “Secure crowdsourcing-based cooperative
spectrum sensing,” in INFOCOM 2013, IEEE, April 2013.
[68] S. Li, H. Zhu, Z. Gao, X. Guan, and K. Xing, “Yousense: Mitigating entropy selfishness
in distributed collaborative spectrum sensing,” in INFOCOM 2013, IEEE, April 2013.
[69] C. Wang, X.-Y. Li, C. Jiang, S. Tang, Y. Liu, and J. Zhao, “Scaling laws on multicast
capacity of large scale wireless networks,” in INFOCOM 2009, IEEE, April 2009, pp.
1863 –1871.
[70] W. Wang, H. Li, Y. Sun, and Z. Han, “Securing collaborative spectrum sensing against
untrustworthy secondary users in cognitive radio networks,” EURASIP Journal on
Advances in Signal Processing, vol. 2010, Jan. 2010.
[71] Y. Liu, P. Ning, and H. Dai, “Authenticating primary users’ signals in cognitive radio
networks via integrated cryptographic and wireless link signatures,” in Security and
Privacy (SP), 2010 IEEE Symposium on, 2010, pp. 286–301.
[72] X. Tan, K. Borle, W. Du, and B. Chen, “Cryptographic link signatures for spectrum
usage authentication in cognitive radio,” in Proceedings of the fourth ACM conference
on Wireless network security, 2011, pp. 79–90.
[73] Z. Jin, S. Anand, and K. P. Subbalakshmi, “Mitigating primary user emulation attacks
in dynamic spectrum access networks using hypothesis testing,” SIGMOBILE Mobile
Computing and Communications Review, vol. 13, no. 2, pp. 74–85, Sep. 2009.
[74] S. Anand, Z. Jin, and K. P. Subbalakshmi, “An analytical model for primary user
emulation attacks in cognitive radio networks,” in New Frontiers in Dynamic Spectrum
Access Networks, 2008. DySPAN 2008. 3rd IEEE Symposium on, 2008, pp. 1–6.
168
[75] H. Chan, A. Perrig, and D. Song, “Secure hierarchical in-network aggregation in sensor
networks,” in CCS ’06 Proceedings of the 13th ACM conference on Computer and
communications security, October 2006.
[76] A. Goldsmith, Wireless Communications. Cambridge University Press, 2005.
[77] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in net-
worked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233,
2007.
[78] R. Tandra, S. M. Mishra, and A. Sahai, “What is a spectrum hole and what does it
take to recognize one?” Proceedings of the IEEE, vol. 97, no. 5, pp. 824–848, May
2009.
[79] R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge University Press, 2005.
[80] W. Ren, R. W. Beard, and E. M. Atkins, “A survey of consensus problems in multi-
agent coordination,” in American Control Conference, 2005. Proceedings of the 2005,
vol. 3, June 2005, pp. 1859–1864.
[81] V. Yadav and M. V. Salapaka, “Distributed protocol for determining when averaging
consensus is reached,” in Proceedings of 2007 Allerton Conference on Communication,
Control, and Computing, 2007.
[82] F. Mosteller and J. W. Tukey, Data analysis and regression: A second course in statis-
tics. Addison-Wesley Publishing Company, 1977.
[83] A. Perrig, R. Szewczyk, J. Tygar, V. Wen, and D. E. Culler, “SPINS: Security protocols
for sensor networks,” Wireless Networks, vol. 8, pp. 521–534, 2002.
169
[84] I. Khalil, S. Bagchi, and N. B. Shroff, “LITEWORP: a lightweight countermeasure
for the wormhole attack in multihop wireless networks,” in DSN 2005, Dependable
Systems and Networks, 2005.
[85] A. Balachandran, G. M. Voelker, P. Bahl, and P. V. Rangan, “Chracterizing user
behavior and network performance in a public wireless LAN,” in SIGMETRICS 2002,
ACM, June 2002, pp. 195–205.
[86] D.-H. Shin and S. Bagchi, “Optimal monitoring in multi-channel multi-radio wireless
mesh networks,” in MobiHoc’09, May 2010.
[87] A. Chhetri, H. Nguyen, G. Scalosub, and R. Zheng, “On quality of monitoring for
multi-channel wireless infrastructure networks,” in MobiHoc’10, Sep. 2010.
[88] P. Arora, C. Szepesvari, and R. Zheng, “Sequantial learning for optimal monitoring of
multi-channel wireless networks,” in INFOCOM 2011, IEEE, 2011.
[89] S. Chen, Z. Kai, and P. Mohapatra, “Efficient data capturing for network forensics
in cognitive radio networks,” in Network Protocols, 2011. (ICNP ’2011) 19th Interna-
tional Conference on, 2011, pp. 176–185.
[90] S. Yi, K. Zeng, and J. Xu, “Secondary user monitoring in unslotted cognitive radio
networks with unknown models,” in Wireless Algorithms, Systems, and Applications,
ser. Lecture Notes in Computer Science, vol. 7405, 2012, pp. 648–659.
[91] P. Bahl, R. Chandra, T. Moscibroda, R. Murty, and M. Welsh, “White space network-
ing with wi-fi like connectivity,” in SIGCOMM ’09, August 2009, pp. 27–38.
[92] D. Cabric, A. Tkachenko, and R. W. Brodersen, “Experimental study of spectrum
sensing based on energy detection and network cooperation,” in Proceedings of the first
170
international workshop on Technology and policy for accessing spectrum, ser. TAPAS
’06, 2006.
[93] D. Murray, M. Dixon, and T. Koziniec, “Scanning delays in 802.11 networks,” in The
2007 International Conference on Next Generation Mobile Applications, Services and
Technologies, Sept. 2007, pp. 255–260.
[94] S. Narlanka, R. Chandra, P. Bahl, and J. I. Ferrell, “A hardware platform for utilizing
tv bands with a wi-fi radio,” in IEEE LANMAN, June 2007.
[95] Z. I. Botev, J. F. Grotowski, and D. P. Kroese, “Kernel density estimation via diffu-
sion,” Annals of Statistics, vol. 38, no. 5, pp. 2916–2957, Nov. 2010.
[96] C. Heinz and B. Seeger, “Towards kernel density estimation over streaming data,” in
International Conference on Management of Data (COMAD), Dec. 2006.
[97] G. W. Corder and D. I. Foreman, Nonparametric Statistics for Non-Statisticians: A
Step-by-Step Approach. New York, USA: Wiley, 2009.
[98] A. Srinivasan, “Distributions on level-sets with applications to approximation algo-
rithms,” in FOCS, 2001.
[99] F. Zhang, W. He, X. Liu, and P. G. Bridges, “Inferring users’ online activities through
traffic analysis,” in WiSec 2011, June 2011, pp. 59–69.
[100] “Airpcap Adapter,” http://www.riverbed.com/products-solutions/products/
network-performance-management/wireshark-enhancement-products/, 2013.
[101] V. Brik, S. Banerjee, M. Gruteser, and S. Oh, “Wireless device identification with
radiometric signatures,” in Proceedings of the 14th ACM international conference on
Mobile computing and networking, ser. MobiCom ’08, 2008, pp. 116–127.
171
[102] S. Liu, L. Lazos, and M. Krunz, “Thwarting inside jamming attacks on wireless broad-
cast communications,” in Proc. of Wisec, June 2011.
[103] R. Zhang, Y. Zhang, and X. Huang, “JR-SND: jamming-resilient secure neighbor dis-
covery in mobile ad hoc networks,” in Proc. of ICDCS, June 2011, pp. 529–538.
[104] M. Strasser, C. Popper, S. Capkun, and C. Mario, “Jamming-resistant key establish-
ment using uncoordinated frequency hopping,” in Proc. of IEEE S&P, May 2008.
[105] Y. Liu, P. Ning, H. Dai, and A. Liu, “Randomized differential DSSS: jamming-resistant
wireless broadcast communication,” in Proc. of IEEE INFOCOM, March 2010.
[106] G. Noubir, R. Rajaraman, B. Sheng, and B. Thapa, “On the robustness of ieee802.11
rate adaptation algorithms against smart jamming,” in Proc. of WiSec, June 2011.
[107] W. Xu, W. Trappe, and Y. Zhang, “Anti-jamming timing channels for wireless net-
works,” in Proc. of WiSec, 2008.
[108] Y. Liu and P. Ning, “Bittrickle: Defending against broadband and high-power reactive
jamming attacks,” in Proc. of IEEE INFOCOM, 2012.
[109] T. D. Vo-Huu, E.-O. Blass, and G. Noubir, “Counter-jamming using mixed mechanical
and software interference cancellation,” in Proc. of WiSec, April 2013.
[110] R. Miller and W. Trappe, “Subverting MIMO wireless systems by jamming the channel
estimation procedure,” in Proc. of WiSec, March 2010.
[111] M. Han, T. Yu, J. Kim, K. Kwak, S. Lee, S. Han, and D. Hong, “OFDM channel esti-
mation with jammed pilot detector under narrow-band jamming,” IEEE Transactions
on Vehicular Technology, vol. 57, no. 3, pp. 1934–1939, 2008.
172
[112] T. Clancy, “Efficient OFDM denial: Pilot jamming and pilot nulling,” in Proc. of ICC,
2011.
[113] W.-L. Shen, Y.-C. Tung, K.-C. Lee, K. C.-J. Lin, S. Gollakota, D. Katabi, and M.-S.
Chen, “Rate adaptation for 802.11 multiuser MIMO networks,” in Proc. of MobiCom,
August 2012.
[114] H. Kim and K. G. Shin, “In-band spectrum sensing in cognitive radio networks: energy
detection or feature detection?” in MobiCom 2008, September 2008, pp. 14–25.
[115] D. Giustiniano, V. Lenders, J. B. Schmitt, M. Spuhler, and M. Wilhelm, “Detection
of reactive jamming in dsss-based wireless networks,” in Proceedings of the Sixth ACM
Conference on Security and Privacy in Wireless and Mobile Networks, ser. WiSec ’13,
2013, pp. 43–48.
[116] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge
University Press, 2005.
[117] W. Xu, W. Trappe, and Y. Zhang, “Channel surfing and spatial retreats: Defenses
against wireless denial of service,” in Proc. of WiSe, 2004.
[118] S. Gollakota and D. Katabi, “ZigZag decoding: Combating hidden terminals in wireless
networks,” in Proc. of SIGCOMM, August 2008, pp. 159–170.
[119] K. Miller, A. Sanne, K. Srinivasan, and S. Vishwanath, “Enabling real-time inter-
ference alignment: promises and challenges,” in Proceedings of the thirteenth ACM
international symposium on Mobile Ad Hoc Networking and Computing, 2012, pp.
55–64.
[120] W. Stallings, Data and Computer Communications (9th Edition). Prentice Hall, 2010.
173
[121] G. Gu, J. Zhang, and W. Lee, “Botsniffer: Detecting botnet command and control
channels in network traffic,” in Proc. of NDSS’08, 2008.
[122] T.-F. Yen and M. K. Reiter, “Traffic aggregation for malware detection,” in Proc. of
DIMVA ’08, 2008.
[123] P. Wurzinger, L. Bilge, T. Holz, J. Goebel, C. Kruegel, and E. Kirda, “Automatically
generating models for botnet detection,” in Proc. of ESORICS’09, 2009, pp. 232–249.
[124] H. Zhang, D. Yao, and N. Ramakrishnan, “Detection of stealthy malware activities
with traffic causality and scalable triggering relation discovery,” in Proceedings of the
9th ACM Symposium on Information, Computer and Communications Security (ASI-
ACCS ’14), 2014, pp. 39–50.
[125] B. Coskun, S. Dietrich, and N. Memon, “Friends of an enemy: Identifying local mem-
bersof peer-to-peer botnets using mutual contacts,” in Proc. of ACSAC, 2010.
[126] Y. Zhao, Y. Xie, F. Yu, Q. Ke, Y. Yu, Y. Chen, and E. Gillum, “Botgraph: large scale
spamming botnet detection,” in Proc. of NSDI’09, 2009.
[127] M. Jelasity and V. Bilicki, “Towards automated detection of peer-to-peer botnets: on
the limits of local approaches,” in Proc. of LEET’09, 2009.
[128] L. Li, S. Mathur, and B. Coskun, “Gangs of the internet: Towards automatic discovery
of peer-to-peer communities in the internet,” in Proc. of CNS, 2013, pp. 167–175.
[129] D. Dagon, G. Gu, C. Lee, and W. Lee, “A taxonomy of botnet structures,” in Proc.
of ACSAC’07, 2007.
[130] S. Sen, O. Spatscheck, and D. Wang, “Accurate, scalable in-network identification of
p2p traffic using application signatures,” in Proc. of WWW’04, 2004, pp. 512–521.
174
[131] D. Stutzbach and R. Rejaie, “Understanding churn in peer-to-peer networks,” in Pro-
ceedings of the 6th ACM SIGCOMM conference on Internet measurement, October
2006.
[132] O. Maimon and L. Rokach, Data Mining and Knowledge Discovery Handbook.
Springer-Verlag New York, Inc., 2005.
[133] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,”
Science, vol. 315, no. 5814, pp. 972–976, 2007.
[134] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Addison-
Wesley, 2005.
[135] K. Xu, Z.-L. Zhang, and S. Bhattacharyya, “Profiling internet backbone traffic: Be-
havior models and applications,” in Proc. of SIGCOMM, August 2005.
[136] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[137] “Autoit script,” http://www.autoitscript.com/site/autoit/, 2013.
[138] A. Arning, R. Agrawal, and P. Raghavan, “A linear method for deviation detection
in large databases,” in KDD 1996. 2nd ACM International Conference on Knowledge
Discovery and Data Mining, 1996, pp. 164–169.
[139] M. Ester, H.-p. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discov-
ering clusters in large spatial databases with noise,” in KDD 1996. 2nd ACM Interna-
tional Conference on Knowledge Discovery and Data Mining, 1996, pp. 226–231.
[140] M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF: Identifying density-based
local outliers,” in The 2000 ACM SIGMOD International Conference on Management
of Data, Proceedings of, 2000, pp. 93–104.
175
[141] V. Hodge and J. Austin, “A survey of outlier detection methodologies,” Artificial
Intelligence Review, vol. 22, no. 2, pp. 85–126, Oct. 2004.
[142] M. Ding, D. Chen, K. Xing, and X. Cheng, “Localized fault-tolerant event boundary
detection in sensor networks,” in INFOCOM 2005. 24th Annual Joint Conference of
the IEEE Computer and Communications Societies. Proceedings IEEE, 2005.
[143] R. Mitchell and I. Chen, “Effect of intrusion detection and response on reliability of
cyber physical systems,” Reliability, IEEE Transactions on, vol. 62, no. 1, pp. 199–210,
March 2013.
[144] J. Mirkovic, G. Prier, and P. Reiher, “Attacking DDoS at the source,” in Network
Protocols, 2002. Proceedings. 10th IEEE International Conference on, 2002, pp. 312–
321.
[145] A. W. Moore and D. Zuev, “Internet traffic classification using bayesian analysis tech-
niques,” in Proceedings of the 2005 ACM SIGMETRICS international conference on
Measurement and modeling of computer systems, ser. SIGMETRICS ’05, 2005, pp.
50–60.
[146] N. L. Hjort, C. Holmes, P. Muller, and S. G. Walker, Bayesian Nonparametrics. Cam-
bridge University Press, April 2010.