An Efficient Wi-Fi RSS Indoor Positioning System and Its ... · An E cient Wi-Fi RSS Indoor...

University of Toronto

Master’s Thesis

An Efficient Wi-Fi RSS IndoorPositioning System and Its

Client-Server Implementation

Author:

Yibo Yu

Supervisor:

Dr. Shahrokh Valaee

A thesis submitted in conformity with the requirements

for the degree of Master of Applied Science

in the

Communications Group

The Edward S. Rogers Sr. Department of Electrical and Computer

Engineering

c©Copyright by Yibo Yu (2013)

http://www.utoronto.ca/

http://www.ecf.utoronto.ca/~yuyibo1/

http://www.comm.utoronto.ca/~valaee/

http://www.comm.utoronto.ca/~valaee/

http://www.ece.utoronto.ca/


UNIVERSITY OF TORONTO

Abstract

Faculty of Applied Science and Engineering

The Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Master of Applied Science

An Efficient Wi-Fi RSS Indoor Positioning System and Its

Client-Server Implementation

by Yibo Yu

The demand of Indoor Location Based Services (LBS) has increased over the past

years as smart phone market expands. As a result, there’s a growing interest in

developing efficient and reliable indoor positioning systems for mobile devices. Wi-

Fi signal strength fingerprint-based approaches attract more and more attention

due to the wide deployment of Wi-Fi access points. Indoor positioning problem

using Wi-Fi signal fingerprints can be viewed as a machine learning task to be

solved mathematically. This thesis proposes an efficient and reliable Wi-Fi real-

time indoor positioning system using machine learning algorithms. The proposed

positioning system, together with a location server equipped with the same al-

gorithms, are tested and evaluated in several indoor scenarios. Simulation and

testing results show that the proposed system is a feasible LBS solution.

ii

http://www.utoronto.ca/

http://www.engineering.utoronto.ca/Page4.aspx


Acknowledgements

Foremost, I would like to express my sincere gratitude to my supervising professor

Dr. Shahrokh Valaee, whose knowledge, guidance and support have made this

work possible.

Besides my supervisor, I owe my special thanks to Dr. Chen Feng, Dr. Vahid

Purahmadi, Shervin Shahidi, and Dr. Soheil Salari, whom I have worked with on

various aspects of this project and obtained valuable inputs and ideas. I would also

like to thank all my colleagues in the Wireless and Internet research Laboratory

(WIRLab) in the Department of Electrical and Computer Engineering, University

of Toronto.

I am grateful for Nantworks LLC. for its generous financial support.

Last but not least, I would give my best regards to my parents for their invaluable

supports.

iii

Contents

Abstract i

Acknowledgements ii

List of Tables vi

List of Figures vii

Abbreviations ix

1 Introduction 1

1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Overview on Indoor Positioning Techniques . . . . . . . . . . . . . . 2

1.3 Problem Statement and Objectives . . . . . . . . . . . . . . . . . . 4

1.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Background and Related Works 10

2.1 Empirical RSS Fingerprinting Methods . . . . . . . . . . . . . . . . 11

2.1.1 Supervised Positioning Methods . . . . . . . . . . . . . . . . 11

2.2 Current Implementation: Label Propagation . . . . . . . . . . . . . 15

2.2.1 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . 15

2.2.2 Non-parametric Regression . . . . . . . . . . . . . . . . . . . 17

2.2.3 The Label Propagation Classification Algorithm . . . . . . . 18

2.2.4 Extending Label Propagation to Regression . . . . . . . . . 22

2.3 Closed-form Solution of the Iterative Label Propagation . . . . . . . 22

2.4 Label Propagation Algorithm as a Kernel Estimator . . . . . . . . . 26

2.4.1 Limitations of Kernel Estimators . . . . . . . . . . . . . . . 28

2.5 Properties of Wi-Fi RSS . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5.1 Distribution of RSS . . . . . . . . . . . . . . . . . . . . . . . 30

2.5.2 Stationarity of the RSS Measurements . . . . . . . . . . . . 31

iv

Contents v

2.5.3 Other Considerations . . . . . . . . . . . . . . . . . . . . . . 32

2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Indoor Positioning Procedures 34

3.1 RSS-based Positioning System Overview . . . . . . . . . . . . . . . 34

3.2 Off-line Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.1 Raw RSS Fingerprint Collection . . . . . . . . . . . . . . . . 36

3.2.2 Processing Raw RSS Readings . . . . . . . . . . . . . . . . . 39

3.2.3 Client-Server Communications during the Off-line Phase . . 41

3.3 On-line Localization Phase . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.1 Coarse Localization: Cluster Generation using AP Visibilities 44

3.3.2 Fine Localization . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.3 Client-Server Communications During the On-line Phase . . 48

3.4 Laplacian Matrix of a Graph . . . . . . . . . . . . . . . . . . . . . . 48

3.5 AP Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Adjacency Matrix W . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.6.1 Dynamic Kernel Parametrization . . . . . . . . . . . . . . . 54

3.6.2 Processing Adjacency Matrix . . . . . . . . . . . . . . . . . 55

3.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4 Implementation 58

4.1 Development Platforms . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 Client-Server Communication Model . . . . . . . . . . . . . . . . . 60

4.3 Android Application Design . . . . . . . . . . . . . . . . . . . . . . 62

4.3.1 Application High Level Overview . . . . . . . . . . . . . . . 64

4.3.2 Application Functionalities . . . . . . . . . . . . . . . . . . . 66

4.3.3 Structure of External Storage . . . . . . . . . . . . . . . . . 67

4.4 The Design of the Location Server . . . . . . . . . . . . . . . . . . . 68

4.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Evaluation 70

5.1 Simulation and Testing Set-up . . . . . . . . . . . . . . . . . . . . . 70

5.1.1 Experimental and Data Acquisition Sites . . . . . . . . . . . 70

5.1.2 Figure of Merit . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1.3 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . 73

5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2.1 Techniques for Processing Adjacency Matrix . . . . . . . . . 74

5.2.2 Comparison with kNN . . . . . . . . . . . . . . . . . . . . . 80

5.3 On-Device Testing Results . . . . . . . . . . . . . . . . . . . . . . . 82

5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6 Conclusion 88

Contents vi

A CSV Data File Format 90

B Detailed Information on Validation Data Sets 92

C UML Diagram of the Android Application 97

Bibliography 99

List of Tables

2.1 Comparison of RSS Fingerprint Based Positioning Systems . . . . . 12

4.1 Devices Used in Development and Testing . . . . . . . . . . . . . . 59

5.1 Data Sets Used in Simulations and Testing . . . . . . . . . . . . . . 71

5.2 Comparison on Different Normalization Methods. STONE standsfor Sum To One algorithm. M-STONE stands for Modified SumTo One algorithm. None means that no normalization algorithmis applied. All simulation results are presented in meters accuratewithin two decimal numbers . . . . . . . . . . . . . . . . . . . . . . 75

5.3 Effectiveness of Dynamic Kernel Parametrization Algorithm . . . . 78

5.4 A Set of Configurations Used in Simulations and Testing . . . . . . 81

vii

List of Figures

1.1 Typical environment for indoor positioning systems . . . . . . . . . 5

1.2 Positioning System High Level Block Diagram . . . . . . . . . . . . 7

2.1 Comparison of localization errors among different methods in a labenvironment [13]. The other two curves, kNN and kernel-based, areused as benchmarks in [13] and are out of the scope of this thesis. . 14

2.2 Label propagation produces local constant outputs . . . . . . . . . 29

2.3 A sample RSS distribution of a single AP at a single location. Cour-tesy of Small et al. [35] . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 RSS stationarity Results. Courtesy of K. Kaemarungsi et al [22] . . 32

3.1 Fast Data Labelling with timestamps. The test was performed onthe entire 4th floor of Bahen Centre at the University of Toronto.The left half shows a path on the map. The right half shows labelledpoints collected along different paths. Each points on the rightcorresponds to a labelled point . . . . . . . . . . . . . . . . . . . . . 38

3.2 Off-line Client-Server Communication Diagram . . . . . . . . . . . . 42

3.3 On-line Client-Server Communication Diagram . . . . . . . . . . . . 49

3.4 The ARMSE versus the number of APs used. The testing site is onthe 4th floor of BA. Courtesy of Anthea et al. [8] . . . . . . . . . . . 52

3.5 The probability of correct location estimation versus the number ofAPs used [21] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.1 Client-Server Communication Model as proposed in [14]. The staredmodules are authentication related modules and are currently unim-plemented. The solid lines show the communication message ex-changes involved in the location request stage. The dotted linesshow the message exchanges in the authentication stage. The dot-ted dash line shows the response messages containing either userlocations or positioning resources . . . . . . . . . . . . . . . . . . . 61

4.2 The Implemented Client-Server Communication Model. Option 1:the location server returns resource files to the mobile device, whichdoes the positioning calculations. Option 2: the location serverestimates and returns the location information to the mobile device. 62

4.3 Android Application High Level Diagram. Black arrows show thedependency of libraries. E.g. Localizer.android module depends onthe Android SDK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

viii

List of Figures ix

4.4 Application Storage Structure . . . . . . . . . . . . . . . . . . . . . 67

4.5 Client-Server Block Diagram. This image shows the modules in theapplication and the location server. Both of the share the samepositioning algorithms and data structures. The communication isvia the Internet using HTTP protocol. . . . . . . . . . . . . . . . . 68

5.1 Positioning Error cdf: BA1F Data Set . . . . . . . . . . . . . . . . 75

5.2 Positioning Error cdf: BA1F-Fast Data Set . . . . . . . . . . . . . . 76



5.5 Positioning Error cdf: EATON3 Data Set . . . . . . . . . . . . . . . 77

5.6 Positioning Error Statistics: BA1F Data Set . . . . . . . . . . . . . 79

5.7 Positioning Error Statistics: BA1F-Fast Data Set . . . . . . . . . . 79

5.8 Positioning Error Statistics: BA4F Data Set . . . . . . . . . . . . . 80

5.9 Positioning Error Statistics: BA4F-Fast Data Set . . . . . . . . . . 80

5.10 Positioning Error Statistics: Eaton Data Set . . . . . . . . . . . . . 81


5.12 Positioning Error cdf: BA1-Fast Data Set . . . . . . . . . . . . . . . 82



5.15 Positioning Error cdf: EATON3 Data Set . . . . . . . . . . . . . . . 84

5.16 Testing results on Bahen 1st floor: positioning error cdf . . . . . . . 84

5.17 Testing results on Bahen 4th floor: positioning error cdf . . . . . . . 85

5.18 Testing results on Eaton Centre level 3: positioning error cdf . . . . 85

5.19 Testing in Los Angeles World Airport. The arrows point out thedevice’s actual position, a red circle, and the estimated position, ablue arrow marker. The positioning error is around 3 meters in thiscase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

A.1 A sample CSV file depicting all the fields . . . . . . . . . . . . . . . 91

B.1 Data point locations on Bahen first floor . . . . . . . . . . . . . . . 93

B.2 Data point locations on Bahen fourth floor . . . . . . . . . . . . . . 94

B.3 Data point locations in BA1F-Fast Data Set . . . . . . . . . . . . . 95

B.4 Data point locations in BA4F-Fast Data Set . . . . . . . . . . . . . 96

B.5 Data point locations in EATON3 Data Set . . . . . . . . . . . . . . 96

C.1 Android Application Java UML Diagram . . . . . . . . . . . . . . . 98

Abbreviations

GPS Glocal Positioning System

AP Access Point

RF Radio Frequency

UWB Ultra Wide Band

WLAN Wireless Local Area Network

LBS Location Based Sservice

RSS Received Signal Strength

CoO Cell of Origin

TDOA Time Difference Of Arrival

AOA Angle Of Arrival

MAC Media Access Control

CS Compressed Sensing

kNN k Nearest Neighbour

PDIP Primal Dual Interior Point

NIC Network Interface Card

CEF Conditional Expectation Function

GRF Gaussian Random Field

KDE Kernel Density Estimation

LLSR Local Linear Semi-supervised Regression

CSV Comma Separated Values

HTTP Hyper Text Transfer Protocol

ISP Internet Service Provider

cdf cumulative distribution function

x

Abbreviations xi

pdf probability density function

pmf probability mass function

SDK Software Development Kit

IDE Integrated Develpment Environment

Jar Java archive

API Application Programming Interface

JSP Java Server Page

HTML Hyper Text Markup Language

EJB Enterprise Java Bean

SD Secure Digital

I/O Input Output

GUI Graphical User Interface

UML Unified Modelling Language

ARMSE Average Root Mean Square Error

MRMSE Maximum Root Mean Square Error

Chapter 1

Introduction

1.1 Motivations

The ability to navigate persons and objects in indoor and outdoor environments

has become increasingly important nowadays. Outdoor positioning accuracy and

reliability has become excellent thanks to the global satellite positioning systems

such as Global Positioning System (GPS) from the U.S., and the Galileo posi-

tioning system from the European Union. But many applications require seamless

positioning capabilities across all environments. Therefore, indoor positioning has

become a popular area of research recently.

Indoor positioning and navigation systems can provide location-based services

(LBS), which are a general class of application level services used in a variety of

contexts such as targeted advertisement, health care, work, social networking, and

many more. For example, LBS includes services such as identifying users’ current

locations, localizing important devices inside buildings, locating the nearest bank-

ing machines, restaurants, and sharing the whereabouts with friends. As such,

LBS can be made accessible with mobile devices through the mobile networks

using the geographical position of the mobile device. LBS has become more and

1

Chapter 1. Introduction 2

more important with the expansion of the smart phone and tablet markets as well.

According to [27], as of late 2012, there are at least 140 companies working on

indoor localization, tracking, and navigation systems. The growth of LBS market

also became impressive over the last 10 years, especially when Google entered this

market with rich map contents. Another stunning fact about indoor positioning

systems is that 75 percent of iPhone devices use Wi-Fi geolocation and not GPS

to locate the device [27].

Technologically speaking, there is not a unique solution to indoor positioning based

on a single technology. In literature, indoor positioning methods typically include

wireless signal based technologies involving Wi-Fi, Bluetooth, radio frequency

(RF), ultra-wide band (UWB), and Global System for Mobile Communications

(GSM) signals. Sensor-based complementary technologies are also available: ac-

celerometers, gyroscopes, magnetic sensors, and so on [30]. However, many of

these technologies require dedicated local infrastructure and customized mobile

units [23] [26] [29] [30]. For example, UWB systems require the installation of

UWB signal transmitters all over the target buildings; and users are required to

carry corresponding receivers. Strictly speaking, Wi-Fi based systems also require

a dedicated infrastructure. However, the wide deployment of wireless local area

networks (WLANs), also known as the IEEE 802.11 standard, naturally provides

such an infrastructure. With no extra expense on positioning infrastructure, many

institutes have conducted investigations on Wi-Fi received signal strength (RSS)

based systems, some of which will be introduced in Section 2.1.

1.2 Overview on Indoor Positioning Techniques

Indoor localization systems, in literature and industries, employ a wide range of

different technologies. Such systems could use any combination of the follow-

ings: camera, sound (ultra-sound or audible sound), infra-red, Wi-Fi RSS, radio


frequency identification (RFID), ultra-wideband (UWB), pseudolites, inertial sen-

sors, magnetic sensors, etc. Amongst these technologies, positioning systems using

ultra-wide band (UWB) signals, infra-red, radio frequency (RF), proximity sensors

and ultrasound systems [23] [26] are able to localize users with high accuracies.

However, these systems require the installation of additional transmitters and

sensors, which lead to high budget and labour cost preventing them from having

large-scale deployments.

The use of Wi-Fi RSS to estimate locations is a tempting approach since Wi-Fi ac-

cess points (AP) are readily available in large quantities in today’s indoor environ-

ments and it is possible to use commercially available mobile devices on the users’

side. RSS-based methods generally come in two flavours: analytic fingerprinting

approaches and empirical fingerprinting approaches. Analytical approaches in-

clude wireless signal propagation modelling, which models wireless signal propaga-

tion patterns inside buildings with complex layouts. On the other hand, empirical

fingerprinting methods require a previous set of RSS measurement to be collected

all over the building during an off-line stage, where RSS readings are collected and

stored together with their corresponding ground truth locations. During the on-

line stage, these RSS measurements will be used to perform location estimations.

Empirical methods include Cell-of-Origin (CoO) method, multi-lateration meth-

ods, and probabilistic methods [30]. Traditional empirical fingerprinting methods

are mostly CoO methods or multi-lateration methods. For example, one could

estimate a user’s location using triangulation with RSS-derived criteria such as

time-difference-of-arrival (TDOA) or angle-of-arrival (AOA). One instance of these

approaches, as described in [35], uses table-based lookup for triangulation of a

user’s location in a WLAN infrastructure. TDOA and AOA perform time or an-

gle estimation whose accuracy is greatly affected by fading and none-line-of-sight

indoor environment [9].

In contrast to the TDOA, AOA methods, probabilistic methods also have been


proposed in indoor positioning domain. Probabilistic approaches use users’ cur-

rent RSS readings, together with pre-constructed radio maps as prior knowledge,

to infer the current locations of the mobile devices. Owing to the strong rela-

tionship to statistical inference, these probabilistic algorithms are often machine

learning algorithms. Such methods require a radio map for each region of interest;

and depending on how radio maps are generated and used, RSS fingerprint-based

methods are categorized into two major categories: supervised learning methods,

and semi-supervised learning methods. In terms of performance, fingerprinting

methods can reach meter-accuracy, depending on the number of APs in the area

of interest and the physical layout of the indoor environments.

This thesis presents an accurate, efficient, and reliable RSS fingerprint-based

WLAN positioning system that employs semi-supervised machine learning algo-

rithms [10] [42] [40] [43], and can be implemented on Android mobile devices. In

addition, our implementation supports cloud computing using an anonymous com-

munication model (Section 4.2). Mobile devices, limited in computational power

and battery life, now can borrow the computing power of dedicated location servers

to off-load heavy calculation burdens.

1.3 Problem Statement and Objectives

Figure 1.1 depicts a typical WLAN indoor environment in which indoor position-

ing systems operate. This picture consist of three major entities: 1) a group of

APs deployed in the area of interest emitting beacon signals containing MAC ad-

dresses (Media Access Control address), 2) users with mobile devices equipped

with WLAN adapters capable of sensing the APs within their premises, and 3) a

location server, which stores RSS fingerprint information and floor plan image files

for different buildings. The location server is usually accessible from the mobile

devices via the Internet.


Figure 1.1: Typical environment for indoor positioning systems

The proposed real-time indoor positioning system should be able to operate in the

aforementioned infrastructure, should be able to produce accurate and reliable

position estimates, and should be a feasible LBS solution for resource-limited

Android mobile devices. All these objectives require that the position estimation

algorithms be computationally feasible for mobile devices, and that they be robust

enough to cope with wireless signal variations in order to produce accurate and

reliable position estimates.

The positioning procedures are divided into two phases: the off-line data acqui-

sition phase, and the on-line operational phase. First, during the off-line phase,

a set of RSS measurements ri are collected from various known locations with

Cartesian coordinates (xi, yi), forming a radio map matrix denoted by Ψ. Then,

during the on-line phase, the device collects on-line RSS readings from nearby APs

periodically at a time interval ∆t. These on-line RSS readings can be denoted as


ro(t) ∀ t = 1, 2, . . .. The proposed positioning system then uses ro(t) as observa-

tions and the radio map as prior knowledge, and computes a position estimate for

the mobile user.

1.4 Challenges

Challenges from RSS Measurements

The unpredictable variations of RSS readings in the indoor environments imposes

a challenge to indoor positioning systems. The variations in RSS is caused by

the layout of the indoor environment, the presence of moving objects, multi-path

fading, interference from other sources, design of Wi-Fi antenna, and many more.

Moreover, the presence of human bodies also affect the RSS by absorbing wireless

waves [22]. All of these appear as noise in the observed RSS readings. The design

of RSS processing algorithms and positioning algorithms should be robust enough

to handle such noises.

Choosing Parameters for Algorithms

Machine learning algorithms generally involve different parameters or hyper-parameters

(the parameters used to determine other parameters) to be determined either

ahead of time during the training phase, or in real-time during the on-line phase.

Modern buildings come in different designs, sizes, and are built with different

materials. The RSS characteristics in different buildings could vary significantly.

Thus, one of the challenges is that we need to dynamically adjust estimator’s pa-

rameters in real-time to handle not only RSS variations in the same building, but

also different RSS characteristics across different buildings.

Reducing Computations on Mobile Devices


Another requirement is that the proposed system should be able to run in real-time

both on the mobile device and on the server side. Mobile devices are often limited

in computational power, storage, and battery life. Thus, some special software

design considerations have to be made to accommodate limitations in resources.

1.5 Contributions

Figure 1.2: Positioning System High Level Block Diagram

In this thesis, we make the following contributions:

•We provide a new indoor positioning framework (shown in Figure 1.2), and make

improvements at different stages. Existing positioning systems in the literature

often suffer from various drawbacks. Some have time-consuming off-line RSS col-

lection problem; and some have high computational complexities in the on-line

phase. Our positioning framework is fast in off-line data collection, due to a new

way of RSS collection in the off-line phase.


• We employ a bath learning algorithm, proposed in machine learning literature,

to make position estimations more efficient. The kernel estimator (part of the on-

line phase in Figure 1.2) uses semi-supervised batch learning algorithms to make

a batch of position predictions for different users in one single run. The use of

batch algorithms lowers the computation time used in making a single position

prediction.

•We implement new algorithms that makes the system adaptive and robust. These

algorithms are at different stages of the positioning process; and they include the

dynamic cluster generation algorithm (block Cluster Generation in Figure 1.2),

the dynamic kernel parametrization algorithm (block Kernel Estimator in Figure

1.2), the adjacency matrix normalization algorithms (block Graph Construction

in Figure 1.2), and new AP selection algorithms (block Feature Selection in Figure

1.2). The first two algorithms try to dynamically adjust parameters so that the

system is able to adapt to different indoor environments. The other two algorithms

are refinements in the location prediction process and are able to make predictions

more accurate.

• The proposed system also has server support in both off-line and on-line phases.

We have implemented the proposed client-server model on Android mobile devices

and Apache application server running on top of Windows operating systems. The

application has also been tested in a variety of buildings to ensure that they are

able to operate reliably in different complex environments. Testing and simulation

results are summarized and discussed in Chapter 5.

1.6 Structure of the Thesis

The rest of this thesis is organized as follows:


Chapter 2 starts off by discussing the background of RSS fingerprinting methods

such as existing positioning systems, simple kNN algorithm for position estima-

tion, and compressed sensing based (CS-based) indoor positioning system. The

major focus of Chapter 2, however, is on the actual location prediction algorithm

that is implemented in our proposed system: the label propagation algorithm. We

first explain the theoretical work related to graph-based semi-supervised learning,

and non-parametric learning in order to understand what type of algorithm label

propagation is. Then, we re-formulate the iterative label propagation algorithm in

to a convex optimization problem and present the closed-form solution to it. The

last part of Chapter 2 discusses the RSS distributions inside buildings, hoping that

the understanding of RSS behaviour can help in designing positioning algorithms.

Chapter 3 describes the proposed positioning procedures including both the off-

line data acquisition phase, and on-line phase. It explains how the system operates

mathematically and algorithmically. Flow charts and block diagrams are also used

to explain the procedures in detail. Up until Section 3.3, we assume that a good

graph can be constructed using an adjacency matrix. Starting from Section 3.4,

we discuss how to actually find a good adjacency matrix for the aforementioned

label propagation algorithm.

Chapter 4 presents the software implementation of our system. The emphasis is

on the architecture of Android application and the client-server model.

Chapter 5 includes all the simulation and on-device testing results obtained using

five validation data sets from three different test sites. Both the simulation results

and the testing results indicate that the proposed algorithms are efficient, accurate,

and robust.

Finally, Chapter 6 concludes the thesis with remarks.

Chapter 2

Background and Related Works

Out of many possible positioning methods, we focused on analysing and imple-

menting a fingerprinting method using an algorithm named label propagation. This

chapter starts off by giving a brief overview of machine learning techniques that

can be used to perform indoor positioning tasks. Then, Section 2.2.1 and Section

2.2.2 provide discussions on semi-supervised learning and non-parametric learn-

ing, respectively. The purpose is to establish an understanding of basic machine

learning principles so that readers can have a grasp of what type of algorithm label

propagation is. Section 2.3 examines the probabilistic assumptions behind label

propagation, and provides the convex formulation of label propagation algorithm.

Lastly, in Section 2.5, we switch gear to discuss some of the properties of RSS read-

ings in the hope of utilizing these properties and making positioning algorithms

more reliable.

10

Chapter 2. Background and Related Works 11

2.1 Empirical RSS Fingerprinting Methods

Due to the time-varying characteristic of wireless signal propagation channel in

in-building environments, empirical fingerprinting methods require a set of cali-

bration measurements (also refereed to as training data set in machine learning

literature) to be collected for each region of interest in the off-line training phase.

Each training set contains a collection of both labelled and unlabelled data points.

Data points with both RSS readings and Cartesian coordinates are referred to as

“labelled” points in the machine learning literature since the Cartesian coordinates

are called the labels of RSS readings; and data points with RSS readings only are

called “unlabelled” points. All these points are used to generate the so-called “ra-

dio map”. In the operational on-line localization phase, the training set is used, in

conjunction with on-line RSS readings submitted by mobile users, to estimate the

locations of mobile users by means of probabilistic estimations. The requirement

of a training data set also becomes the major drawback because collecting labelled

points is a time- and labour-consuming task.

Depending on how training set is obtained and used, the indoor localization meth-

ods mostly fall into two categories: supervised methods, and semi-supervised

methods. Table 2.1 is a summary of some existing indoor positioning systems

using different approaches.

2.1.1 Supervised Positioning Methods

Supervised methods, in general, correlate all current on-line measurements ro(t)

received at time t with off-line RSS readings r observed at location l with a Carte-

sian coordinate (x, y) using certain distance or similarity measure. And these

similarity relationships are then used to estimate users’ locations. For example, a

commonly used similarity measure is the Euclidean distance


Table 2.1: Comparison of RSS Fingerprint Based Positioning Systems

System Name Company/Institute Performance DescriptionsRedpin [11] ETH Zurich (2008) 90% hit on labelled

pointsSupervised Method (calibration-intensive)

Mole [25] MIT, Nokia 80% room level hitrate

Supervised method

ARIADNE [19] Auburn University approx. 3.3 to 3.6m Supervised kNN method runningon Laptops

RADAR [9] Microsoft Research approx. 3m averageerror

Supervised kNN method runningon Laptops

CompressedSensing kNN[13]

WIRLab, University ofToronto

1m average error Supervised Method; calibrationand calculation intensive

Ekahau c© Real-time LocationSystem

Ekahau 7m accuracy in-doors (Gallagher etal. 2009)

Operates based on RSS finger-prints and track history

XPS WLANFingerprintingsystem

Skyhook c© 10-20m accuracyoutdoors30-70m indoors

Software-only, server based sys-tem used in dense urban areas.Employs WLAN signal, GNSSand cell tower ID. Designed to beefficient on a large scale.

EZ Model [12] Microsoft Research In-dia

Median error of 2mand 7m on two test-beds

genetic algorithm, server-basedsystem that uses Wi-Fi RSS

Horus [39] University of Maryland Median error of0.7m and 4m ontwo test-beds

Calibration-intensive, no serversupport, the algorithm is a prob-abilistic technique

QuickMap WiFiSlam, Apple Inc. 2.5m accuracy onaverage indoors

Probably semi-supervised learn-ing using Wi-Fi RSS; server sup-port possible

Google Map Google Inc. Unknown Probably semi-supervised learn-ing algorithm with time stamp-based automatic RSS labelling.Combines Wi-Fi signal, cell-phone signal, and GPS, if avail-able

WIRLab IndoorPositioning Sys-tem

WIRLab, University ofToronto

2.5m average posi-tioning error. 99percentile of error isat 8m

Semi-supervised algorithm usingWi-Fi RSS, inertia sensor-basedfast automatic RSS labelling ispossible

d(ro(t), r) =√

(ro(t)− r)2 (2.1)

in RSS signal space, where the vector (ro(t)−r) is the difference vector of dimension

m, which is the total number of access points. Systems such as Redpin [11],

Mole[25], ARIADNE [19], and RADAR [9] perform RSS-based indoor positioning

using the k-nearest-neighbours (kNN) method with different k values and return


the centroid of the k nearest neighbours of ro(t) as the final location prediction.

The kNN approach trades accuracy for simplicity and lacks robustness since it is

less resistant to RSS noise. Another approach, presented in [37], is probabilistic

and uses RSS distribution modelling. It calculates the likelihood of all locations

given on-line and off-line RSS observations; the estimated location is the location

associated with the highest a priori probability. RSS modelling approaches often

have high computational complexity, which often renders them infeasible on mobile

platforms.

Previous Implementation: Supervised CS-based Positioning Method

The first implemented indoor positioning system is a supervised approach based on

[13]. In [13], Feng et al. presented a compressed sensing-based (CS-based) location

estimator, which uses compressed sensing techniques to exploit the sparse nature

of the vector representation of user locations: at each time instant, a user is located

at only one point in space. Thus, if we are to create a position indicator vector θ

with zeros representing users absence and ones representing users presence; then

this vector contains only a single 1 and is, therefore, sparse. Compressed sensing

theory dictates that such a vector can be recovered accurately with high prob-

ability by solving an l1-minimization problem. The CS-based approach showed

high accuracy provided that a proper training is done in off-line training phase.

However, a proper training requires that a floor be sampled with high granularity.

This requirement results in a labour-intensive and time-consuming training phase,

which is not desirable in practice. Figure 2.1 shows the localization error of this

CS-based system.

Another major disadvantage of the CS-based method is its high computational

complexity. To perform a single location estimation for one on-line measurement

ro using a training set with n data points and m access points, the CS-based

system has to solve an l1 minimization problem with the following formulation

[13] [8]


Figure 2.1: Comparison of localization errors among different methods in alab environment [13]. The other two curves, kNN and kernel-based, are used as

benchmarks in [13] and are out of the scope of this thesis.

θ = arg min ‖ θ ‖1

s. t. ro = Qθ + ε(2.2)

where Q is an m×n matrix of RSS basis, ro is the on-line RSS measurement vector

of dimension m, and θ ∈ Rn is a K-sparse position indicator vector containing the

most likely locations of appearance. The CS-based method is able to achieve high

accuracy and outperforms kNN algorithm and the probabilistic kernel method

by 2 or 3 folds (see Figure 2.1). Moreover, the computational complexity of the

CS-based method is considerably high as well. For the CS-based method, l1 min-

imization problem solvers, using the Primal-Dual Interior Point method (PDIP),

require a total of O(√n) iterations, and each iteration can be executed in O(n3)

operations [38], making the total run time O(n3.5).

To cope with the drawbacks of the CS-based positioning system, we move on to


explore graph-based semi-supervised machine learning techniques. Specifically, the

algorithm we are using is called label propagation proposed by [42] [44] [40].

2.2 Current Implementation: Label Propagation

2.2.1 Semi-supervised Learning

This section discusses semi-supervised learning in general. In literature, the field

of machine learning has been conventionally categorised into three sub-fields: 1.

supervised learning, 2. unsupervised learning, and 3. reinforcement learning. For

any type of learning task, it is called regression if labels are continuous and real

(i.e. Y ∈ Rd), and is called classification if Y takes discrete values from a set.

•For supervised learning algorithms, inputs to the learning machines are sets

of labelled points also known as training data sets. Training sets contain only

labelled points, which are (feature, label) pairs conventionally denoted by (X1, Y1),

. . . , (Xn, Yn), where n is the size of a data set. The goal of such a system is to use

the training sets to predict the label Y for any on-line observation X.

•For unsupervised learning algorithms, the inputs only consist of unlabelled

points denoted by Xi,∀i = 1 . . . n. The goal is to learn some patterns out of the

unlabelled points. Examples of unsupervised learning tasks include data cluster-

ing where unlabelled points are grouped into different groups according to some

criteria, outlier detection where points with unlikely values are eliminated, dimen-

sionality reduction where unlabelled points are mapped into other spaces, etc.

•Reinforcement learning repeatedly observes new on-line input X ′s, performs

learning calculations, and evaluates some pre-defined risk functions. The goal is

to keep learning so that the risks are minimized in the future.


One of the problems with the three aforementioned sub-fields is that they do not

use unlabelled points to perform training. Semi-supervised learning algorithms,

on the other hand, take both labelled points and unlabelled points into consid-

eration. In practice, collecting labelled points is usually expensive as it may be

technologically difficult, time-consuming or expensive. For example, in the con-

text of RSS-based indoor positioning systems, labelled points are (RSS, Cartesian

coordinate) pairs. Collecting labelled points is commonly referred to as location

fingerprinting, and is usually the first phase of RSS-based positioning schemes.

This task requires dedicated technicians to stand still at different locations and

collect RSS readings multiple times from different APs using Wi-Fi equipped mo-

bile devices. From our experimental studies, it takes at least 20 seconds to take

RSS samples for a single location using Android phones. To fully chart a single

level of an office building with fine granularity for supervised algorithms like the

CS-based scheme [13], it takes hours or days of work.

On the other hand, unlabelled points are usually available in large quantities and

cost virtually nothing to collect. For example, collecting unlabelled points for

RSS-based positioning systems can be done by simply turning on smart phones’

Wi-Fi network interface card (NIC) and walking around the building. With the

collaboration of other mobile users, the cost of collecting a large amount of un-

labelled data is virtually zero for each individual. Realizing that labelled points

are expensive to collect, semi-supervised learning tries to utilize a relatively small

amount of labelled points and a large amount of unlabelled points to predict labels

for the unlabelled points.

In conclusion, RSS-based indoor positioning systems generally require time-consuming

location fingerprint collection in the training phase. Semi-supervised methods

could mitigate this problem and are, therefore, of great theoretical and practical

interest to indoor positioning tasks. We employ a graph-based non-parametric re-

gression algorithm named label propagation, which is a semi-supervised method.


The following sections will cover different aspects of our method.

2.2.2 Non-parametric Regression

Label Propagation is a non-parametric algorithm. This section provides a brief

introduction on non-parametric regression. Regression is a fundamental tool in

statistics, economics, sociology, and machine learning analysis. Simply put, re-

gression methods try to discover the relationships (a function or a mapping in

general) between two or more random variables. Taking Wi-Fi fingerprint based

positioning system for example, the goal is to predict mobile users’ positions by

finding the mapping between the observed RSS vectors X and the locations Y

where these RSS vectors are observed.

Given a data set (X1, Y1), . . . (Xn, Yn), where Xi ∈ Rm are the input random

RSS vectors and Y ′i s are the multi-dimensional outputs, the goal of regression

algorithms is to estimate the value of Y for any X in the on-line phase, whether

or not it has been observed previously. Estimating the value of Y for a X can be

done by simply taking the conditional expectation of Y given the training data.

The estimated Y is expressed as the following:

m(X) = E(Yi|Xi) (2.3)

The conditional expectation function (CEF) m(x) can be any non-linear func-

tion unless some restrictions are applied to it. In other words, if the relationship

between Xi and Yi is assumed to be of a certain function type (e.g. linear relation-

ship or quadratic relationship), then the CEF will be a function with parameters

of finite dimension; and the problem becomes learning the parameters of that par-

ticular function. On the other hand, if no assumptions are made on the type of

XY relationships, there will be no specific function parameters to learn (in this


case, the dimensionality of the parameters is indefinite and could potentially be in-

finite [17]). In this case, learning the XY relationship is a non-parametric machine

learning problem. In semi-supervised non-parametric regression problems, values

of the labels are derived directly from the data itself instead of being evaluated

from a function of a certain type. According to [18], there are mainly two classes

of non-parametric regression estimators: kernel estimators, and series estimators.

Label Propagation falls into the category of kernel estimators, as we will show in

Subsection 2.4.

2.2.3 The Label Propagation Classification Algorithm

One instance of non-parametric graph-based semi-supervised classification algo-

rithm is called label propagation; and it is originally proposed by Zhu et al. [42]

and used in the context of binary classification. This section introduces label

propagation algorithm as described in [10],[43] and [42]. Intuitively, the label

propagation algorithm involves propagating labels on an undirected graph, which

is not necessarily a complete graph. Each labelled node acts like the source of

knowledge and propagates such knowledge across the entire graph through neigh-

bours. Although it is first introduced as a classification algorithm, it can be seen

later in this section that the extension to regression is apparently natural and

simple.

To formulate the label propagation classification algorithm, let (X1, Y1) . . . (Xl, Yl)

be the labelled data points, and (Xl+1,−) . . . (Xl+u,−) be the unlabelled data. As

mentioned previously, we usually have l u in practice. Let n = l+u be the total

number of data points. In literature, L usually denotes the set of labelled data;

and, similarly, U denotes the set of unlabelled data. The labels, Y , take value

from a set of integers that are usually the indices of some categories or classes.

This label set is denoted by C and has a size of C. The problem label propagation


tries to solve is to find the labels for all data points within U ; and this problem is

referred to as a transductive learning problem (The inductive problem of finding

labels for data points outside of L ∪ U is out of the scope of this thesis).

Label propagation starts by building a graph G = (V,E) using both the labelled

and unlabelled data points. Each vertex represents a data point and is marked

by the label of that data point. Edge weights capture the similarities of pairs of

data points. They are represented by a n× n adjacency matrix W . For the time

being let us assume that a good adjacency matrix is given. Wij in W represents

the pairwise similarity between data point i and j ∀i, j = 1 . . . n. Edge weights

are often calculated using kernel functions such as the Gaussian Kernel (2.4), or

the Laplacian kernel (2.5).

Wij = exp(−‖ Xi −Xj ‖22σ

) (2.4)

Wij = exp(−‖ Xi −Xj ‖2σ

) (2.5)

where σ is a kernel bandwidth hyper-parameter. Chapter 3 will discuss in full

detail about topics such as calculating edge weights, and automatic selection of

bandwidth hyper-parameter σ.

Edge weight Wij reflects how easy it is for a label to move from point i to j, and

vice versa. Larger edge weight means higher similarity between the labels of i and

j; thus, easier label propagation. To propagate labels, a probabilistic transition

matrix P is defined in terms of pairwise weights W :

Pij = P (i→ j) =Wij∑nk=1Wik

(2.6)


where Pij is the likelihood of node i’s label moving to node j. Since the graph is

assumed to be undirected, Pij = Pji holds.

In this subsection only, let Il be the label indicator matrix of size l×C, where l is

the number of labelled points, and C is the number of label categories (C = |C|).

It is used to indicate the known labels for Yi ∀i = 1 . . . l: each row of Il is an

indicator vector with only a single 1 at position j to indicate that this data point

belongs to the jth category. The estimated labels for the labelled and unlabelled

data are represented by another indicator matrix I of size n× C:

I =

Il0

.

With P , Il, and I defined, label propagation can be performed according to Algo-

rithm 1.

Algorithm 1: Label Propagation Algorithm by Zhu et al. [42] for classificationtasksInput: Weight matrix W , label indicator matrix IlOutput: Estimated label of all xi

1 Compute the diagonal degree matrix D by Dii ←∑

j wij2 Compute transition matrix P ← D−1W

3 Initialize I(0) ← (Il, 0, . . . , 0)>

4 Iterate until I converges

5 1. I(t+1) ← PY I(t)

6 2. I(t+1)l ← Il

7 Label point xi according to the ith row of I

A modified label propagation algorithm, inspired by the Jacobi iteration algorithm,

has been proposed by Bengio et al. in [10]. Three additional rules applied to this

modified label propagation algorithm:

•Il are allowed to differ from the true label values during label propagation itera-

tions. This means Il 6= Il (c.f. line 6 in Algorithm 1).


•Wii are forced to be 0, which often works better in simulations [10]. In doing

so, the predicted label of each data point will only contain contributions from its

neighbours rather than from the point itself.

•An additional regularization term ε is used to perform Laplace Smoothing by

avoiding division-by-zero errors.

The modified label propagation, summarized in Algorithm 2, is what we used to

perform RSS-based indoor position estimations.

Algorithm 2: Modified Label Propagation Algorithm by Bengio et al. [10] forclassification tasksInput: Weight matrix W , label indicator matrix IlOutput: Estimated labels for xi ∀i = 1 . . . n

1 For the weight matrix W, force wii = 02 Compute the diagonal degree matrix D by Dii ←

∑j wij

3 Choose a parameter α ∈ (0, 1) and a small smoothing value ε > 0 for LaplaceSmoothing

4 µ← α

1− α∈ (0, 1)

5 Compute the diagonal matrix A by Aii ← I[l])(i) + µDii + µε

6 Initialize I(0) ← (Il, 0, . . . , 0)>

7 Iterate until I converges

8 1. I(t+1) ← A−1(µWI(t) + I(0))

9 Label all xi according to largest element in the ith row of I

The iteration steps of Algorithm 2 can be re-written for labelled points

I(t+1)i ←

∑nj=1Wij I

(t)j +

1

µIi∑n

j=1Wij +1

µ+ ε

, ∀i = 1 . . . l (2.7)

and for unlabelled points

I(t+1)i ←

∑nj=1Wij I

(t)j∑n

j=1Wij + ε, ∀i = l + 1 . . . l + u (2.8)


where ε is a smoothing factor, and µ ∈ (0, 1) is the Lagrangian multiplier described

in Section 2.3.

The label propagation algorithm will iteratively assign labels according to (2.7)

and (2.8) until convergence is reached.

2.2.4 Extending Label Propagation to Regression

The indoor positioning task is a regression problem as the labels Y ∈ R2 are Carte-

sian coordinates on maps. Several modifications and requirements on Algorithm

2 need to be done in order to make it a regression algorithm.

1. Indicator matrix I is no longer used in regression problems. Instead, the actual

label Y of size n × 2 is used to replace I. Y matrix represent the 2D Cartesian

coordinates on the map for all n points.

2. As a result of item 1, there is no need to translate from indicator vectors to

labels since Y , once converged, is the actual estimated labels for xi’s instead of

the indicator matrix.

2.3 Closed-form Solution of the Iterative Label

Propagation

Label propagation was first proposed as an iterative algorithm. This section for-

mulates it into a convex optimization problem with a closed-form solution. The

idea is to define a graph using labelled and unlabelled data points, and apply

Gaussian Random Field (GRF) on the graph based on which a target function is

defined. The following discussion on graph-based convex optimization framework

follows [41] and [10].


Suppose there are l labelled data points (X1, Y1), . . . (Xl, Yl), and u unlabelled

points (Xl+1,−), . . . (Xl+u,−), where X ∈ Rm are the observed RSS vectors at

different locations on a map; and Y ∈ R2 are the corresponding 2D coordinates of

those locations. The unknown label values Yl+1 . . . Yl+u are to be inferred. They are

usually initialized to some arbitrary values or 0. For indoor positioning task, and

many other machine learning tasks, l u generally holds true because labelled

points take more time and/or effort to collect. Let n = l + u be the total number

of data points.

The convex formulation begins by constructing an undirected connected graph

G = (V,E) with vertices V representing all the data points (i.e. Y ′s), and graph

edge weights (Wij, calculated based on RSS reading vectors X) representing the

pairwise similarities between data points. Also, let L = (1, . . . , l) represent labelled

points with labels Y1, . . . , Yl; and U = (l + 1, . . . , l + u) be the set of unlabelled

points. Since this chapter does not focus on weight generation, we assume that a

n×n edge weight matrix W is given for the time being. With a graph G = (V,E)

defined, the task becomes assigning labels to U based on a real-valued function

f : Rm → R2.

Ideally f takes values f(Xi) = fl(Xi) ≡ Yi for the labelled data. Intuitively, it is

also true for all points that, if two points Xi and Xj are far away from each other,

the pairwise similarity Wij should be small; and the distance between estimated

labels f(Xi) = Yi and f(Xj = Yj) should be large. On the other hand, if point

Xi and Xj are close to each other, the similarity Wij should be large, and the

distance between them should be small. This motivates the choice of the weighted

quadratic energy function to measure errors of f :


Error(f) =1

2

n∑i=1

n∑j=1

(Wij ‖ Yi − Yj ‖22)

=1

2(2

n∑i=1

‖ Yi ‖22n∑j=1

Wij − 2n∑i=1

n∑j=1

WijY>i Yj)

=n∑i=1

‖ Yi ‖22 Dii −n∑i=1

n∑j=1

WijYi>Yj

= Y >DY − Y >WY

= Y >∆Y

(2.9)

where D is a diagonal degree matrix with Dii =∑n

j=1Wij, ∆ = D −W is the

combinatorial graph Laplacian matrix.

Applying GRF means to define a joint conditional Gaussian Random Field on

labels Yi, ∀i = 1 . . . n, conditioned on the input X and on the constraints that for

labelled data, the predicted labels are equal to the true labels. It has been shown

that a Gaussian random process restricted to finite number of data pointsX1 . . . Xn

is simply a multivariate Gaussian distribution [28]; therefore, the Gaussian random

field assumption on labels gives:

p(Y ) ∝exp(−β4

n∑i,j=1

Wij(‖ Yi − Yj ‖2))

= exp(−β2Y >∆Y )

(2.10)

And the optimal mapping function f should maximize the conditional likelihood

p(Y ) by minimizing the error function under the constraint that the estimated

labels for the labelled data are equal to the true labels:


f ∗ = arg minf |L

Y >∆Y

s.t. Yi = Yi, ∀i = 1, . . . , l

(2.11)

Let S be a diagonal indicator matrix; and Si,i = 1 if i is a labelled point, and 0

for unlabelled point. The Lagrangian function for optimization problem 2.11 is:

L(µ) =‖ Yl − Yl ‖22 +µY >∆Y

=‖ SY − SY ‖22 +µY >∆Y(2.12)

Solving the Lagrangian gives the closed-form solution to the optimal mapping

function f :

f ∗ = Y = (S + µ∆)−1SY (2.13)

It is obvious that the label propagation algorithm is also a linear smoother

since the estimated value Y is a linear combination of the observations SY ; and

takes the general form of linear smoothers shown in the following equation (linear

transformation matrix denoted by L).

Y = (S + µ∆)−1SY = LY (2.14)


2.4 Label Propagation Algorithm as a Kernel

Estimator

This section examines the label propagation algorithm from another perspec-

tive and discusses its disadvantages. Kernel estimators are also known as the

Nadaraya-Watson estimator, the kernel regression estimator, or the local

constant estimator. Simply put, kernel estimators treat kernel function values as

pairwise similarities or weights, and use these values to make predictions. This

section first derives the general form of kernel estimators following [18]. Then, we

will show that the label propagation algorithm is a kernel estimator with a joint

distribution assumed to be Gaussian.

To derive the general form of kernel estimators, let the data set be (xi, yi) ∀

i = 1 . . . n; and assume that X and Y have an unknown joint probability density

function f(Y,X), which can be estimated using Kernel Density Estimation (KDE

[33] [31]). Let f(y | x) = f(y, x)/f(x) be the conditional density of Y given X.

Also assume that in the following derivation, x ∈ Rd and y ∈ R1. Then to estimate

the value of y given a particular x, we can use the maximum likelihood estimation

method:

y = g(x) = E(y | X = x) (2.15)

Kernel estimators try to estimate g(x) non-parametrically with minimal assump-

tions about function g. which can be written in an alternative form:

g(x) =

∫yf(y, x)dy

f(x)(2.16)


Furthermore, assume that f(y, x) can be estimated using KDE method (mean-

ing that it can be estimated as a weighted sum of Gaussian Kernels and can be

normalized to a valid pdf) and can be expressed as:

f(y, x) =1

n | H | hy

n∑i=1

K(H−1(xi − x))k(yi − yhy

) (2.17)

where hy is the kernel bandwidth in the y direction of f(y, x), H is the bandwidth

matrix in the x direction, K is the kernel function serving as the estimate of the

marginal distribution of x, and k is the estimate of the marginal distribution of y.

Then, with f(y, x) defined, f(x) is:

f(x) =

∫f(y, x)dy

=

∫1

n | H | hy

n∑i=1

K(H−1(xi − x))k(yi − yhy

)dy

=1

n | H |

n∑i=1

K(H−1(xi − x))

(2.18)

and∫yf(y, x)dy is

∫yf(y, x)dy =

1

n | H | hy

n∑i=1

K(H−1(xi − x))

∫yk(

yi − yhy

)dy

=1

n | H |

n∑i=1

K(H−1(xi − x))yi

(2.19)

Taking the ratio of (2.18) and (2.19) gives

g(x) =

∑ni=1K(H−1(xi − x))yi∑ni=1K(H−1(xi − x))

(2.20)


Note that the iterations (2.7) and (2.8) used in label propagation algorithm take

the same form as (2.20) with the kernel K being either the Gaussian kernel or the

Laplacian kernel. Thus, the label propagation algorithm is a kernel estimator and

shares the same disadvantages that kernel estimators possess.

2.4.1 Limitations of Kernel Estimators

One of the major undesired properties is that label propagation produces local

constant outputs [18] [10]. This is a property that is ideal for classification tasks

whose outputs y are discrete values, but not for regression tasks whose outputs are

continuous values. This local constant property leads to local constant predictions:

kernel estimators tend to assign the same value to the data points that are close

to each other and form the so-called flat neighbourhood. For indoor positioning

tasks, this means that all the coordinates, predicted and known, tend to form

clusters on the map. This will appear as artifact in the outputs. Figure 2.2

illustrates this problem. The circles in Figure 2.2 indicate the location of the

predicted Cartesian coordinates for different data points. These data points are

collected almost uniformly along the corridors; their predicted locations, however,

form clusters.

The source of this lies in the marginal distribution of the input X. As mentioned

previously, f(x) is estimated using the KDE method. If the x′is in the data set

are not sampled uniformly, the predicted g(x) 6= g(x), in general. Thus, one of

the requirements for data acquisition is to sample input RSS measurements as

uniformly as possible.

Realizing that kernel estimators are locally constant, one possible future work

is to modify the label propagation algorithm, or to seek non-local constant (or

local linear) algorithms. One such algorithm is called local-linear semi-supervised

regressor (LLSR) proposed by M. R. Rwebangira et al. in [34]. LLSR combines


Figure 2.2: Label propagation produces local constant outputs

the supervised local linear regressor with the local constant semi-supervised label

propagation. The discussion of LLSR is out of the scope of this thesis.

2.5 Properties of Wi-Fi RSS

To help improve the design of positioning algorithms and lower their positioning

errors, it is worthwhile to study the underlying properties of Wi-Fi RSS readings

and make use of such properties in position estimation calculations. This section

discusses some of the statistical properties of the RSS reported by IEEE 802.11

wireless NICs.


2.5.1 Distribution of RSS

In [35], Small et al. conducted experiments on analysing the distribution of the

RSS readings from an AP at a location. The results in [35] are based on a collection

of stationary RSS measurements sampled over a period of 5 hours, 20 hours, and

1 month at a sampling period of 5 seconds in their development facility. A sample

histogram (Figure 2.3) from [35] shows that the data is consistent with a standard

deviation of 2.13 dBm. Small et al. also observed that changes in environmental

condition such as opening office doors can also create a change of up to 10 dBm.

Figure 2.3: A sample RSS distribution of a single AP at a single location.Courtesy of Small et al. [35]

RSS distributions are stutied in [35], [24], [20], and [22]. All concluded that the

RSS measurements obey log-normal distribution, which is also slightly left-skewed.

The standard deviation of RSS measurements ranges from 1 dBm to 5 dBm de-

pending on the average of RSS readings and the environment conditions [20]. This

means that RSS readings can be approximately modelled as a normal distribution

(on dBm scale) within a certain degree of error. In the context of indoor posi-

tioning, knowing the distribution of RSS readings could help us at various aspects


such as determining parameters such as similarity threshold (Algorithm 3), and

constructing radio maps (section 3.2.2).

2.5.2 Stationarity of the RSS Measurements

Another important question to answer is whether or not the RSS readings can be

used as reliable position indicators over a long period of time. This subsection

aims to answer this question by analysing the stationarity of RSS.

The analysis [22] starts by assuming that the ergodic theorem is applied according

to the Wiener definition of stationarity [15]. Since RSS has randomness due to the

wireless signal propagation in complex indoor environments, RSS can be treated

as a random process. In stochastic process theories, a random process X(t) is said

to be wide-sense stationary if the first moment and covariance do not vary with

respect to time or space. As a consequence, the parameters such as mean and

variance stay the same over time; and the auto-covariance function depends only

on the time interval rather than time itself. Formally speaking, this means the

following two criteria should be met in order for a random process to be wide-sense

stationary:

1. the 1st moment E [x(t)] = mx(t) = xx(t+ τ)

2. the auto-covariance function E [(x(t1 −mx(t1)))(x(t2 −mx(t2)))] = Cx(t1, t2) =

Cx(t1 + (−t2), t2 + (−t2)) = Cx(t1 − t2, 0)

Figure 2.4, adapted from [22], indicates that the mean and standard deviation

meet criterion 1; the correlograms in the same figure below depict the same shapes,

indicating that criterion 2 is also met.


Figure 2.4: RSS stationarity Results. Courtesy of K. Kaemarungsi et al [22]

2.5.3 Other Considerations

Presence of Human Body

According to the study in [22] and [20], the presence of human body affects the RSS

distribution by increasing its standard deviation by a significant amount. Testing

results in [22] reveal that the standard deviation is increased from approximately

0.68 dBm to 3.00 dBm when the user is present. On the other hand, the average

of the RSS distribution is not affected to any significant extent. Thus, when the

positioning system is supposed to cater to real mobile system users, it is important

to have a human present while collecting off-line RSS readings and to take into

account the effect of human body.

Independence of Multiple Access Points


Intuitively, the RSS readings from different APs are independent since they are

configured and controlled independently. This part analyses the dependency of

multiple APs’ RSS readings and tries to confirm the intuition state above.

In [22], RSS measurements from three different APs were taken at a specific loca-

tion. These RSS measurements were taken for approximately one hour with the

users’ presence. Analysis shows that the correlation values between each pair of

RSS data are −0.02, 0.13, and −0.03, respectively. Therefore, they concluded that

the RSS from the APs are uncorrelated.

2.6 Chapter Summary

This chapter gives brief overviews on different topics relevant to RSS-based indoor

positioning systems. First, a comprehensive introduction to different positioning

methods was given in Section 2.1, which emphasizes the supervised CS-based

method, and the new approach: semi-supervised label propagation method. After

this brief introduction, this chapter changes focus to explain label propagation

algorithm in details by providing its convex formulations, and discussing the pros

and cons. Finally, some properties of RSS distribution are provided so as to help

in designing and implementing the indoor positioning algorithms.

Chapter 3

Indoor Positioning Procedures

In practice, a major technical difficulty using RSS arises from the fact that RSS

values depend, to a large extend, on the propagation environment. Although soft-

ware programs capable of modelling wave propagations are available, it is, however,

very time-consuming to set up suitable simulation configurations that describe the

actual structure of indoor environments. Therefore, empirical fingerprinting be-

comes more favourable compared to channel modelling. The positioning system

we have implemented uses empirical RSS fingerprinting-based methods. Using the

collected RSS fingerprints, the proposed system tries to discover pair-wise sim-

ilarity information and uses such information, along with the label propagation

algorithm described in Chapter 2, to estimate mobile devices’ locations. This

chapter provides detailed description on how the proposed system operates.

3.1 RSS-based Positioning System Overview

Figure 1.2 gives a high level overview of the RSS-based positioning system, which

consists of two phases: off-line training phase and the on-line localization phase

where mobile users’ locations are estimated using on-line RSS observations. The

34

Chapter 3. Indoor Positioning Procedures 35

off-line phase consists of another two stages. In the first stage, technicians have to

collect raw RSS data including both labelled and unlabelled data covering an area

of interest. In the second stage, raw RSS data is fed into a series of algorithms

to generate radio maps and other related data. The goal of the off-line phase

is to prepare the data to be used in the on-line localization phase. The on-line

phase is divided into coarse localization stage and fine localization stage. Coarse

localization is performed in order to narrow down the area of interest and to reduce

the computational complexity, in case the radio map is too large, for mobile devices

to handle. Coarse localization stage outputs a cluster that contains the on-line

data plus a small portion of the training data. This cluster, small enough for

mobile devices to use, is then used in fine localization stage which includes graph

construction, and location estimation using the label propagation algorithm. The

following sections describe each of these stages in detail.

3.2 Off-line Training Phase

Two major tasks are carried out during the off-line training phase: collecting

raw RSS readings for labelled and unlabelled data, and processing these raw data

points so that location service users are able to use such data to estimate positions.

Training must cover the area in which the positioning system is deployed and must

be re-done whenever the WLAN infrastructure changes significantly in that area.

In other words, sufficient up-to-date RSS samples have to be taken over the areas

of interest. The time required to cover each floor of a building depends on:

1. the method of data collection

2. the scanning rate of mobile devices

3. the area of the environment


Unfortunately only the first factor can be exploited with various data collection

methods. Our proposed methods are described next.

3.2.1 Raw RSS Fingerprint Collection

Traditional Labelled Point Collection: Manual Data Labelling

Any type of RSS fingerprint-based indoor positioning technique requires a techni-

cian to walk around the building and collect RSS readings r (r stands for RSS)

from a total of m different APs at l known locations using a WLAN-ready mobile

device. Unlike traditional supervised machine learning approaches such as [13],

the Cartesian coordinates of labelled points no longer need to form a well-shaped

grid of fine granularity since the semi-supervised label propagation works well if

1. the labels of the labelled points define the range of an area of interest (labels

should reflect how large the area of interest is by covering the corners, edges and

interiors of that area)

2. RSS fingerprint are taken over the entire area of interest in the ideal scenario

(see Section 2.4.1).

As a result of item 1, the locations of labelled points are pre-defined for each map,

and their locations are picked in such a way that they are located at both ends

and at the intersections of corridors. With technicians standing at each labelled

point location, RSS readings are taken with the device pointing at any direction.

In this thesis, the raw RSS samples collected from AP i at location j and time τ

is denoted as rij(τ), τ = 1 . . . q, q ≥ 1. q is the total number of samples taken

for each point and may vary. The raw RSS data r at location j ∀j = 1 . . . l is

therefore


rj =

r1j(1) r1j(2) . . . r1j(q)

r2j(1) r2j(2) . . . r2j(q)

......

. . ....

rmj(1) rmj(2) . . . rmj(q)

,

where a tilde is used to emphasize that this is the raw data at location j. In

practice, not all the entries in rj are valid RSS readings due to two reasons:

1. beacon signals are broadcast periodically according to the AP settings. It is

possible that an AP happens to be undetected when the mobile phones are sensing

RSS

2. AP beacon signal is too weak to be detected by mobile devices

In case AP i is undetected at location j at time τ , rij(τ) will be assigned with a

nominal value of −110dBm, which is a weak power close to the noise level.

Fast Data Collection: Labelling Data Points using Time-stamps

The traditional method for collecting labelled point has always been one of the ma-

jor drawbacks of RSS fingerprinting based positioning systems because it requires

technicians to stand still at each location for a relatively long period of time when

sampling RSS. To speed up this process, one possible approach is to collect RSS

readings while walking across corridors and automatically estimate RSS labels on

the fly. This is made possible with the help of timestamps.

To collect a time series of RSS readings, technicians first specify on the map their

starting point and end point. Then they walk at a constant speed along the line

they specified starting at time τ0 (timestamps are usually accurate to milliseconds

on Android phones). While they are walking, the mobile device is on iterative

scanning mode, collecting RSS reading vectors one after another, and forming a

time series denoted by r(τi)∀i = 1 . . . k, with k being the number of RSS samples

collected along the path. Unlike manually labelled data points, each data point in


RSS time series has only one RSS sample. By the time the technicians reach the

end point, they need to press the stop button to stop the iterative RSS sampling

and giving the last time stamp τk+1. Given k+ 2 time-stamps, k RSS vectors, and

the Cartesian coordinates of the starting and end point, it is possible to interpolate

coordinates for all k data points along the path even when the RSS samples are

not uniformly spaced in time. Figure 3.1 depicts the user interface for time series

collection (left) and the interpolated coordinates of RSS readings (right). In this

figure, we manage to collect 347 labelled data points in approximately 12 minutes.

Figure 3.1: Fast Data Labelling with timestamps. The test was performed onthe entire 4th floor of Bahen Centre at the University of Toronto. The left halfshows a path on the map. The right half shows labelled points collected along

different paths. Each points on the right corresponds to a labelled point

Collecting Unlabelled Points

For graph-based semi-supervised algorithms, unlabelled points provide more in-

formation in graph construction and facilitate label propagation process. In the

context of RSS-based positioning systems, unlabelled points are simply RSS vec-

tors r with fewer number of samples taken from unknown places.


To collect unlabelled points, technicians usually take RSS samples at u unknown

locations scattered over the entire floor uniformly. Uniform sampling makes sure

that requirement 2 in the previous subsection is satisfied.

At the end of data collection phase, the raw RSS data set should contain rj

∀j = 1 . . . n where n = l + u and l u in practice.

3.2.2 Processing Raw RSS Readings

Populating MAC Address for All Training Points

Before constructing the off-line radio map Ψ, a list of Media Access Control Ad-

dresses (MAC addresses) is first populated and stored on the server’s permanent

storage. In this thesis, MAC lists are denoted by M with |M | = m. M will later

be used to construct the Ψ matrix whose rows are aligned according to M . The

list of MAC addresses contains all visible APs within a region, and can also be

used in the on-line phase to perform cluster generation (Algorithm 3).

Generating Off-line Radio Map Ψ

Before the positioning system can estimate locations, a location fingerprint database,

or a radio map needs to be constructed using the raw RSS readings. The radio

map could contain the average values of all RSS readings (RADAR system [9]),

or it could be probabilistic as in [37]. This thesis employs average-based radio

map. For indoor environments, studies show that the instantaneous RSS readings

rij can be approximated as a Gaussian random variable N (rij, σij) ∀i, j. For any

Gaussian random variable, its mean and variance pair will suffice to capture its

characteristics. Thus, to characterise the behaviour and describe the spacial dis-

tribution of all APs over space, the radio map of an area, built out of the raw RSS

readings of all n points, should contain tuples ψij = [xj, yj, rij, σij] ∀j = 1 . . . l and

∀i = 1...m. Unlabelled training points will have their Cartesian coordinates set to


the origin, meaning (xj, yj) = (0, 0) ∀j = l + 1 . . . n. In the end, each region will

have a radio map Ψ defined as

Ψ =

ψ1,1 ψ1,2 . . . ψ1,n

ψ2,1 ψ2,2 . . . ψ2,n

......

. . ....

ψm,1 ψm,2 . . . ψm,n

,

where m is the number of APs visible in the entire area, n = l + u is the total

number of training points, rij =1

q

∑qτ=1 rij(τ) is the average RSS reading, and

σij =1

q − 1

∑qτ=1(rij − µij)2 is the unbiased variance of RSS readings from AP i

at location j. In case an AP is undetectable at a certain location, a nominal low

reading of −110dBm will be assigned to it as its average RSS reading.

Generating RSS Distance Matrix N

As shown in Chapter 2, calculating the pair-wise distance between all data points

is an intermediate step towards graph construction. To reduce the amount of

calculations during the on-line phase, the n×n distance matrix for all the training

points is calculated on the server side in the off-line phase. With n being the

number of training points, distance matrix N is defined as:

N =

N1,1 N1,2 . . . N1,n

N2,1 N2,2 . . . N2,n

......

. . ....

Nn,1 Nn,2 . . . Nn,n

,

where Nij is the distance between point i and j in the RSS domain and is defined

as Nij =1

m‖ Q(ri − rj) ‖2. The Q matrix is a m × m diagonal access point

selection matrix described in full detail in Section 3.5. The Qii entry takes a 1

if the ith AP is used in RSS distance calculations; and Qii = 0 if the ith AP is

excluded in the calculation.


At the end of off-line training phase, the location server stores the processed radio

maps Ψ and related data structures, such as MAC address list M . Appendix A

depicts the structure of a typical data file.

3.2.3 Client-Server Communications during the Off-line

Phase

Offloading data processing task to the server helps to avoid practical issues raised

by CPU, memory and battery limitations if smart phones. For example, the

algorithms for radio map construction involve allocating memory for hundreds

of Java objects and may consume more memory than the Java virtual machine

can allocate to mobile phone applications. This usually results in Out-of-Memory

errors on smart phones. To avoid problems as such, we propose to utilize dedicated

servers for off-line training.

Figure 3.2 depicts how the proposed positioning server and the radio map collecting

mobile device operate during the off-line training phase. First, the mobile device

iteratively collects q RSS time samples from detectable APs at a specific location j.

After having q samples, the mobile device simply appends all raw RSS readings to

a CSV data file without any further processing. Once raw RSS readings, together

with their MAC addresses and Cartesian coordinates, are transferred to the server

via the internet, the server constructs a MAC list M , a radio map Ψ, and a distance

matrix N in the same manner as described in Subsections 3.2.1 and 3.2.2.

3.3 On-line Localization Phase

The on-line localization phase is initiated by a mobile user, carrying a mobile

phone, standing at an unknown location with the device pointing at an arbitrary

direction. Once localization software is started, several on-line RSS samples will be


Figure 3.2: Off-line Client-Server Communication Diagram

collected from detectable APs near the user. The raw on-line readings, denoted by

o(τ) (’o’ for on-line), are similar to unlabelled points and contain one or more RSS

time samples. Taking the average of the time samples gives the RSS fingerprint

at the user’s current location:

o = [o1, o2, . . . , om]> (3.1)

On-line RSS o, together with the radio map prepared for that area during the

off-line phase, will then be used to perform location estimates either locally on the

mobile device or remotely on the server side. Let rcomplete denote the combined


radio map containing both off-line training data r and on-line data o; and it is

formed by concatenating r and o:

rcomplete =

r1,1 r1,2 . . . r1,n o1

r2,1 r2,2 . . . r2,n o2

......

. . ....

...

rm,1 rm,2 . . . rm,n om

The combined radio map rcomplete first passes the coarse localization stage (shown

in Figure 1.2) that produces a smaller cluster of points denoted by C. Then, the

fine localization stage takes the cluster as input and gives a final location estimate.

The following subsections detail these stages.

Augmenting Distance Matrix N

Once an on-line RSS vector o is obtained, it is appended to make rcomplete; the

n × n RSS distance matrix N will also be augmented by one in each dimension

accordingly. The augmented distance matrix N is:

N =

N1,1 N1,2 . . . N1,n N1,o

N2,1 N2,2 . . . N2,n N2,o

......

. . ....

...

Nn,1 Nn,2 . . . Nn,n Nn,o

No,1 No,2 . . . No,n No,o


where Ni,o is the distance between on-line data point and the ith off-line data point,

and is defined as: Ni,o = No,i =1

m‖ Q(ri − o) ‖2.

3.3.1 Coarse Localization: Cluster Generation using AP

Visibilities

As mentioned previously, the goal of the coarse localization stage is to narrow

down the area of interest from a large map to a small region, and thereby reducing

the computational complexities as fewer data points are included in the actual

positioning calculations. The reduction in computational complexity is of great

practical interest because the amount of calculations without clustering (O(n∗m))

is still overwhelming even though modern smart phones are equipped with large

memory and powerful CPUs. Testing showed that, by reducing the number of

points from n to a cluster of size c < n, smart phones are able to update location

estimates in a matter of seconds. Test results also reveal that reducing the area

of interest reduces the maximum localization error since irrelevant data points

cannot interfere with graph construction once they are excluded from the cluster.

In our implemented system, coarse location selects a single cluster of points from

off-line data points rcomplete such that data points in this cluster are similar to the

on-line RSS vector o in terms of AP visibilities. For each off-line RSS reading

vector in rcomplete, the number of shared access points with on-line RSS vector

is computed and recorded in an array. This array of shared access point is then

sorted in descending order; and the indices of the top elements are the selected off-

line training points. The iterative algorithm for generating clusters is summarized

as Algorithm 3.

With the cluster indices I selected using Algorithm 3, the RSS vectors correspond-

ing to these indices, together with the on-line vector o, form a cluster radio map


Algorithm 3: Dynamic Cluster Generating Algorithm

Input: Off-line radio map r, on-line reading o, Maximum cluster size cmax,Initial RSS similarity threshold rssth, Access point invisible flag −110.0

Output: A set of indices of cluster points I, with |I| ≤ cmax/* compute an array containing the number of shared access points

between the on-line readings and off-line readings */

1 Initialize the array of common access points (of length n) CommonAP to zero;2 repeat3 for i = 1 to n do4 CommonAP [i]← 05 rssColumn← the ith column of r6 for apIndex = 1 to m do7 if abs(rssColumn[apIndex]− o[apIndex]) ≤ rssth AND

rssColumn[apIndex] > −110.0 AND o[apIndex] > −110.0 then8 CommonAP [i]← CommonAP [i] + 1

9 if number of non-zero element in CommonAP > cmax then10 rssth ← rssth/2

11 until number of non-zero element in CommonAP ≤ cmax12 I ← the indices of non-zero elements in CommonAP13 Sort I in ascending order14 return I

denoted by rcluster. All future discussions will be using rcluster as the effective radio

map.

Constructing A Distance Matrix for the Cluster

Since the scope of positioning problem is narrowed down to a small cluster, the

effective distance matrix will no longer be the augmented distance matrix N which

includes all the off-line and on-line data points. There is a need for a smaller c× c

distance matrix Ncluster where c = |I| is the size of the cluster.

Ncluster is generated by using the corresponding entries from N .


3.3.2 Fine Localization

The fine localization stage takes the selected cluster rcluster, the cluster distance

matrix Ncluster, and indices of selected points I as inputs to the position estima-

tion algorithms to produce the final position. There are two possible estimation

algorithms: semi-supervised label propagation, and supervised kNN.

Positioning using Label Propagation Algorithm

As described in Chapter 2, label propagation is a semi-supervised machine learning

algorithm that can be formulated into a convex optimization problem with the

closed-form solution (2.13), which is re-stated here:

f ∗ = Y = (S + µ∆)−1SY (3.2)

Label propagation is one of the graph-based learning methods and highly depends

on the graph structure, which is captured by the Graph Laplacian matrix ∆. As

shown in previous chapters, combinatorial graph Laplacian is solely determined

by the weight matrix W (i.e. ∆ = D −W with D being the degree matrix Dii =∑jWij); it is therefore crucial to find such a weight matrix that can represent the

underlying graph as accurate as possible. In this section, we assume that, given

rcluster and Ncluster, it is possible to find an accurate weight matrix W for the

graph. Starting from Section 3.4, we will focus on finding an accurate adjacency

matrix W .

Let Y be the estimated labels for all points in the cluster, which can be found

from (3.2). The point selection matrix S is a diagonal matrix with 1’s indicating

labelled points, and 0’s indicating unlabelled points.


The initial labelling Y is a c × 2 matrix containing Cartesian coordinates for all

points. The unlabelled points and on-line points have their coordinates initialized

to zeros. That is Yi = Y ∈ R2 ∀i < l, and Yi = 0 ∈ R2 otherwise.

Since indices in I are sorted in ascending order (Algorithm 3) with the off-line

training points corresponding to the elements at the lower indices, and on-line

reading o corresponding to the last element, the estimated position for the latest

on-line reading o is therefore the last row of Y .

Positioning using kNN Algorithm

Fast data point collection and labelling described previously opens opportunity to

supervised machine learning algorithms such as kNN. As the number of labelled

points grows, it becomes feasible to use supervised methods which have lower com-

putational complexity than semi-supervised methods. The proposed positioning

system is also capable of estimating labels using simple kNN algorithm. With

sufficient amount of labelled points (such as the ones shown in Figure 3.1), kNN

algorithm aims to find the k of the nearest neighbours of o in the RSS domain,

and assigns the centroid of these k neighbours’ labels as the label of o. The im-

plemented routine is summarized in Algorithm 4.

Algorithm 4: kNN Position Estimation Algorithm

Input: Edge Weight Matrix W for the cluster, number of neighbours kOutput: Yo: The estimated position for on-line reading

1 Let similarityColumn be the last column of W2 Sort, both elements and their indices, similarityColumn in ascending order3 K ← the indices of the first k elements in similarityColumn

4 Initialize Yo ← 05 for i = 1→ k do

6 Yo ← Yo + YK[i]

7 Yo ← Yo/k

8 return Yo

Unlike simple kNN algorithm implemented in the RADAR systems, the proposed

kNN algorithm operates within a cluster, which is the same as that used in the


label propagation algorithm shown in previous sections. Additionally, the pair-

wise similarity measure is not based directly on the Euclidean distance in the RSS

space; but, instead, it is based on the graph edge weight matrix for that cluster.

Thus, stages such as the cluster generation and edge weight calculation apply to

the proposed kNN algorithm as well.

3.3.3 Client-Server Communications During the On-line

Phase

Localization servers are not only used in the off-line phase to offload data process-

ing and storage burdens from mobile devices, but also used in the on-line phase

to help with positioning calculations. Figure 3.3 illustrates how the proposed

positioning server and mobile device operate during the on-line localization phase.

The mobile device first takes q RSS time samples from detectable APs iteratively.

Once q Wi-Fi scan is finished, the raw data (i.e. [MAC address, RSS] pairs) will

be stored into a temporary binary file using Java Serialization mechanism. This

temporary binary file is then uploaded to the server side. Once the localization

request and the binary file is accepted by the server, the server extracts raw RSS

data o from the binary file, creates a combined radio map Ψcomplete, augments the

distance matrix N and finds the Ncluster. An edge weight matrix W is calculated

using Ncluster, and then the location of the mobile device is found using the kNN

or label propagation algorithm.

3.4 Laplacian Matrix of a Graph

Label propagation is a graph-based semi-supervised learning method and highly

depends on the graph structure, which is captured by the Graph Laplacian matrix

∆ (or equivalently the adjacency matrix W ). As shown in the previous chapters,


Figure 3.3: On-line Client-Server Communication Diagram

combinatorial graph Laplacian is solely determined by the adjacency matrix W

(i.e. ∆ = D −W with D being the degree matrix defined as Dii =∑

jWij); it

is therefore crucial to find an adjacency matrix that can represent the underlying

graph as accurate as possible. Graphs, in the context of indoor localization, consist

of vertices (the Cartesian coordinates of all data points) and edge weights (the

similarity between pairs of points) measured on a scale between 0 and 1. In (3.2),

it is assumed that an adjacency matrix W , which reflects the similarity between

pairs of points accurately, is readily available. This section focuses on finding

adjacency matrix W using the input RSS vectors from all data points.

In Graph theory, the Laplacian matrix ∆ (also known as the Kirchoff matrix or the

admittance matrix) of a graph G is a measure of graph connectivity and is defined


as ∆ = D−W . One of the interesting properties of the graph Laplacian matrix is

that the second smallest eigenvalue, called the Fiedler value, of a Laplacian matrix

∆ for a graphG is the algebraic connectivity ofG [32], which is greater than 0 if and

only if G is a connected graph. Also, the magnitude of this Fiedler value reflects

how well connected the overall graph is, and is often used to measure the stability

of networks [32][16]. According to the definition of Laplacian Matrix, calculating

∆ is equivalent to finding an adjacency matrix W since ∆ solely depends on W .

Our proposed system uses label propagation algorithm, which is a graph-based

semi-supervised learning method that depends on adjacency information. Hence,

constructing a well-connected graph is of great importance because better con-

nectivity means better label propagation. For instance, if an on-line data point

is weakly connected to the rest of the graph, label propagation algorithm is not

able to make any valid location predictions because all the w terms in 2.6 will be

close to zero. On the other hand, if an on-line point is strongly connected to the

off-line data points, the predicted label for that on-line point will simply be the

centroid of all off-line data points. Both of these two cases have been observed

in our testing. In practice, due to the randomness of wireless signals in indoor

environment, the quality of adjacency matrix W often varies from time to time.

In order to achieve stability and improve the robustness of our positioning system,

a variety of algorithms have been used in the W matrix calculations.

3.5 AP Selection

Modern indoor environments such as shopping malls and large office buildings are

often quipped with a large number of APs to ensure a satisfying quality of Wi-Fi

services. Besides, a single AP can expose multiple MAC addresses in case WLAN

virtualization technology is enabled on that AP. In this thesis, AP selection and

MAC selection are used interchangeably; and such selection is often categorised as


feature selection in machine learning literature. This thesis uses validation data

sets collected from three sites with 419, 259, and 337 detectable MAC addresses,

respectively (Appendix B). The total number of detectable MAC addresses, de-

noted by m as in the m×n rcomplete matrix, in these buildings is often much greater

than that required to perform indoor positioning. These extra or redundant MACs

not only lead to excessive computations, but may also lead to performance degra-

dations and biased position predictions since they may have negative effect on

the adjacency matrix calculations. Thus, selecting a subset of MACs (also known

as features in machine learning literature) from the entire feature space not only

reduces computational complexity, but also helps to improve positioning accuracy.

Let M denote the set of APs containing the name or index of all detectable APs

found in all data points, with |M| = m. The AP selection step is to find, for each

on-line RSS vector ro, a subset of APs S ⊂ M such that |S| = S M . The AP

selection process then uses S to produce an AP selector matrix Q that is to be

applied to all RSS vectors, both off-line and on-line. Q is a m ×m matrix; each

row of Q is a 1×m vector with at most a single 1 at the ith positioning, indicating

that the ith AP is selected as the desired AP. For instance, the following Q matrix

is a 5 choose 2 AP selection matrix that selects the set S = 2, 4.

Q =

0 0 0 0 0

0 1 0 0 0

0 0 0 0 0

0 0 0 1 0

0 0 0 0 0

This thesis discusses two simple AP selection schemes: the k-strongest AP selection

scheme, and the Visibility-based AP selection scheme, all of which make use of

APs’ visibility and signal strength in the spatial domain.

1) k-strongest AP Selection


Chen et al. in [8] and [13] devised the k-strongest AP selection scheme, which

selects a set of AP S with the strongest RSS readings from the on-line RSS mea-

surement vector. They claim that the stronger the RSS, the more reliable the

position estimation. The set S is obtained by sorting the elements in on-line RSS

reading ro in descending order and selecting the indices of the first k values that

correspond to k APs with highest power readings. The set of APs S is then used

to construct an AP selector matrix Q. Since the ro varies with time, the Q matrix

has to be refreshed for each new ro reading.

The choice of k in the k-strongest AP selection scheme have been analysed by

Anthea et al. in [8] and [13] via computer simulations and on-device testing.

Figure 3.4 shows the positioning performances versus the number of APs. Figure

3.4 (a) shows the results obtained using computer simulations; and Figure 3.4 (b)

shows the results obtained via on-device testing. Both of them indicate that, in

order to have a satisfying level of positioning performance, the k-strongest AP

selection scheme requires k to be at least 14.

Figure 3.4: The ARMSE versus the number of APs used. The testing site ison the 4th floor of BA. Courtesy of Anthea et al. [8]

Kamol et al. [21] reached a similar conclusion regarding the amount of AP to

use for indoor positioning tasks. They analysed kNN positioning algorithm using

different number of APs in a simulated environment. Their simulation result is


shown in Figure 3.5. It it obvious that a higher number of AP may improve the

precision but the probability does not increase significantly when S ≥ 14.

Figure 3.5: The probability of correct location estimation versus the numberof APs used [21]

2) Visibility-based AP Selection

Visibility-based AP selection uses all detectable APs in the on-line RSS measure-

ment ro to select as many AP as possible, arguing that the complete set of visible

(or detectable) APs of ro is able to discriminate ro from all other RSS vectors the

best. This scheme is summarized in Algorithm 5.

Algorithm 5: Visibility-Based AP Selection Algorithm

Input: On-line RSS measurement roOutput: AP Selector Matrix Q

1 Visibility threshold t← −1102 Q← m×m zero matrix3 for i = 1→ m do4 if ro(i) > t then5 Q(i, i)← 1

6 return Q


3.6 Adjacency Matrix W

3.6.1 Dynamic Kernel Parametrization

In the field of Bayesian statistics and machine learning, the kernel of a probability

density function (pdf) is a non-negative function that takes the shape of a pdf. The

proposed system employs two kernel functions: Gaussian kernel function (2.4) and

Laplacian kernel function (2.5), both of which involve a hyper-parameter σ acting

as the width of the kernel function. For graph-based semi-supervised learning

applications, kernel functions are used to construct exponentially-weighted fully

connected graphs. In other words, they are used to translate Euclidean distances

(denoted by N) in feature domain into edge weights (denoted by W ) that are in

the range of [0, 1].

The goal of this section is to illustrate how to find σ values that work well in

practice. In this section, we propose a dynamic kernel function parametrization

algorithm (Algorithm 6). The intuition behind this algorithm is to adjust the

width of the kernel function so that higher weights are assigned to the neighbouring

data points that are close to a data point, and lower weights are assigned to data

points that are far away from that point. For each data point i, let the ith column

of Ncomplete be ci; and let the size of the neighbourhood be α. α = 0.4 means that

40 percent of the neighbours of i are used to determine the kernel function width.

Empirical studies show that an α of 0.2 to 0.5 generally works well in practice.

The way Algorithm 6 works is that it uses α × 100 percent of the neighbours

(called near neighbours) of i and sets the σ value to be cik, where k is the

farthest near neighbours of i. In doing so, we can make sure that α× 100 percent

of the neighbours of i fall into the µ ± 2σ range of the Gaussian kernel function.

Algorithm 6 summarizes the dynamic parametrization algorithm. It is inspired

by the evidence maximization hyper-parameter learning described in Chapter 7 of

[40]. This thesis will not pursue the details of evidence maximization.


Algorithm 6: Dynamic Kernel Function Parametrization Algorithm

Input: RSS Euclidean distance matrix N ; the size of the neighbourhood αOutput: Kernel function hyper-parameter array σ

1 n← number of columns in N2 Initialize an array of cut-off values σ ← 1× n vector of 0’s3 Set the neighbourhood size t← bn× αc4 for i = 1→ n do5 ni ← the ith column of N6 sortedColumni ← sort(ni, ascendingorder)7 σ(i)← sortedColumni(t)

8 return σ

3.6.2 Processing Adjacency Matrix

Adjacency Matrix Normalization

Due to the fact that σ in kernel functions is chosen dynamically, the adjacency ma-

trix may be badly scaled with one or more columns being close to zero. As a result

of a badly scaled adjacency matrix W , the Laplacian matrix ∆ can be close to sin-

gular and becomes non-invertible due to computer precision limitations. To avoid

the close-to-singular problem, two adjacency matrix normalization algorithms are

devised. The goal is to scale columns of W up or down according to

W (:, i)← W (:, i)∑nj=1W (j, i)

∀i = 1 . . . n (3.3)

and make each column a probability mass function (pmf). Algorithm 7, named

Sum To One (STONE) algorithm, applies this idea.

A modified version of Algorithm 7 is motivated by the following observations. As

shown in iterations (2.7) and (2.8), the final position estimation of data point i is

a weighted sum of the significant neighbours of i. Thus, insignificant neighbours

(with small Wi,j terms) are not only unnecessary but may also cause adverse

effects on position estimation as their weights are small interference in the overall

weighting scheme. Thus, one could make use of the significant terms in the Wi


Algorithm 7: Adjacency Matrix Normalization Algorithm: Sum To One(STONE)

Input: Un-normalized Adjacency matrix WOutput: Normalized Adjacency Matrix W ′

1 n← the number of columns in W2 W ′ ← n× n zero matrix3 for i = 1 . . . n do4 W (i, i)← 0

5 for i = 1 . . . n do6 c← W (:, i), assign the ith column of W to c7 s← sum(c), sum up all entries in vector c8 W ′(:, i)← c/s

9 return W ′

column and normalize only these terms into pmf’s. The modified adjacency matrix

normalization algorithm, named Modified STONE algorithm, is described in

Algorithm 8.

Algorithm 8: Adjacency Matrix Normalization Algorithm: Modified Sum ToOne (Modified STONE)

Input: Un-normalized Adjacency matrix W ; a percentage of neighbours toinclude α

Output: Normalized Adjacency Matrix W ′

1 n← the number of columns in W2 Threshold t← bn× αc3 W ′ ← n× n zero matrix4 for i = 1 . . . n do5 W (i, i)← 0

6 for i = 1 . . . n do7 c← W (:, i), assign the ith column of W to c8 csorted ← sorted c in ascending order9 s← the sum of the first t entries in vector csorted

10 W ′(:, i)← c/s

11 return W ′

After normalizing each column of the adjacency matrix W , the resulting W matrix

is no longer symmetric. Symmetry can be easily restored by applying:

W ← (W + tran(W ))/2 (3.4)


where tran(W ) is the transpose of W .

3.7 Chapter Summary

In this chapter, the proposed positioning system is described in details. The system

has two phases. The off-line phase is the training period in which RSS values from

access points are collected at locations scattered all over the area of interest. The

post-processing of raw RSS readings is done in the off-line phase.

The actual localization takes place in the on-line phase, which itself is divided into

the coarse localization phase and the fine localization phase. Coarse localization

takes the off-line radio map and the on-line readings as inputs, and narrows down

the localization problem to within a smaller region of the map. In doing so, both

computational complexity and localization accuracy are improved. This process

is also refereed to as the AP visibility-based cluster generation. Beside generating

a cluster of relevant points, the coarse localization stage also constructs a cluster

distance matrix Ncluster, which is used by the position estimating algorithms: label

propagation and kNN.

In addition to positioning procedures, this chapter also presents the communica-

tion flow charts involving the mobile client and the location server, and briefly

explains the mode of operation during both the off-line and the on-line phases.

In the client-server implementation, the server takes different roles in each phase.

During the off-line phase, the server helps to off-load raw data processing and stor-

age burdens from mobile devices limited in computing power and storage. While

in the on-line phase, depending on the type of client request, the server either

returns the required resource files (such as CSV data files or map images) to the

mobile clients, or performs localization calculations for them.

Chapter 4

Implementation

This chapter describes how the indoor positioning system, along with the localiza-

tion server described in Chapter 3, are implemented in Java programming language

on Android smart phones and Apache Geronimo Server.

4.1 Development Platforms

The proposed positioning application is developed on mobile devices running An-

droid operating system version 2.3.3 (Software Development Kit SDK level 10)

and above. The Android SDK contains all Android libraries and executables for

coding and compiling Android user applications. The choice of minimum Android

SDK level is determined by the relative number of active Android devices cur-

rently on the market; we target the largest number of devices on the market. The

most up-to-date statistics are available on [1]. Like other Android applications,

the indoor positioning application is developed in Java programming language un-

der Eclipse integrated development environment (IDE) with a series of Android

specific classes. Besides standard Java classes and Android classes, the appli-

cation also makes use of one external Java library (named Jama in the form of

58

Chapter 5. Implementation 59

Table 4.1: Devices Used in Development and Testing

Device Name System Memory OS version WLAN NIC Wi-Fi ScanningRate

HTC Desire Z 256MB Android 2.3.3 802.11 b/g/ncompliant

1 HZ

HTC One X 1GB Android 4.1.1 802.11 b/g/ncompliant

0.5 Hz

Nexus 4 1GB Android 4.2.1 802.11 a/b/g/ncompliant

1.5 Hz

Java ARchive file or jar file) which provides basic matrix operations such as ma-

trix multiplication, addition, pseudo-inversion, etc. Table 4.1 summarizes three

mobile devices used to perform RSS data acquisition and on-device performance

evaluations. Note that, unlike Windows Mobile devices [4] [3] [7], the sampling

rate of Wi-Fi NICs cannot be altered by user applications; Android APIs lack such

support. Thus, RSS readings are acquired at different sampling rates on different

devices (shown in Table 4.1).

Along side with the Android application, the proposed system also consists of

a localization server that is capable of running the same set of data processing

and localization algorithms at a faster speed. The primary concern in choosing

a server application is its compatibility with Java language and standard HTTP

protocol. The server-side development utilizes Apache Geronimo 2.2 server, which

is an off-the-shelf open source application server developed by the Apache Software

Foundation [2] and distributed under the Apache licence. Geronimo 2.2 is currently

compatible with Java Enterprise Edition (Java EE) 5.0 that provides standard

Java Application Programming Interface (API), JavaServer Pages (JSP), which is

a technology to create dynamically generated web pages based on HTML , and

Enterprise JavaBeans, which is a server-side component for modular construction

of Java applications. A Java compatible server application makes it easier to port

existing Java code from Android application to the server-side.


4.2 Client-Server Communication Model

The client-server communication model presented in this thesis is partially based

on an anonymous location service system described in [14]. In [14], Chen et al.

proposed an anonymous server-based positioning system which defines a four-way

interaction protocol amongst four separate entities: the mobile user, the local-

ization server, the dispatch server, and the authentication server. Figure 4.1 is

re-interpretation of the design in [14]. The goal is to prevent user privacy breeches

from happening. In some positioning system, the location is estimated at the loca-

tion server, and then transmitted to the client side via the network. This scheme

is susceptible to privacy breeches by exposing users’ physical locations. Thus, they

take two steps to make sure the user privacy is preserved. The most important

step is that location estimation should be always performed on the mobile device

rather than the location server in case the users lack confidence in the location

server. Our proposed system partially adopts this privacy-preserving scheme by

allowing mobile devices to request for resources used in positioning calculations.

Such resources include floor plan images and CSV data files.

In addition to Chen’s privacy protection scheme, the proposed system can also ask

users for permissions to allow location estimation to be performed on the location

server. Once users have enough confidence in location server and consent to this,

the mobile device will request the server to locate the device.

Figure 4.2 depicts the proposed client-server communication scheme that is able

to support two of the aforementioned schemes. Note that the modules marked

by a star are related to user authentications and are currently dummy modules

(i.e. authentication is by-passed in the current design). The solid lines show

the communication message exchanges involved in the location request stage; the

dotted lines show the message exchanges in the user authentication stage; and the

dotted dash line shows the localization response messages containing either user


Figure 4.1: Client-Server Communication Model as proposed in [14]. Thestared modules are authentication related modules and are currently unimple-mented. The solid lines show the communication message exchanges involvedin the location request stage. The dotted lines show the message exchanges inthe authentication stage. The dotted dash line shows the response messages

containing either user locations or positioning resources

locations or positioning resources. All localization processes are initiated by the

client. The entire localization process is summarized as follows. First, the client

sends a location request to the localization server containing RSS fingerprints,

the user’s ephemeral ID, etc.; upon receiving this request, the localization server

first checks the ephemeral ID that is contained in the location request by cross-

referencing it with the third party authentication server operated and maintained

by the client’s ISP (Internet Service Provider). If the user’s ephemeral ID is

verified, the localization server starts to locate the device using the localization

module; otherwise, the localization server sends an authentication request to the

client and instructs it to obtain an ephemeral ID from the ISP.


Figure 4.2: The Implemented Client-Server Communication Model. Option1: the location server returns resource files to the mobile device, which does thepositioning calculations. Option 2: the location server estimates and returns

the location information to the mobile device.

The following sections will describe the design of the Android application and the

location server that makes Figure 4.2 possible.

4.3 Android Application Design

Both the mobile application and the server-side application are designed in a mod-

ular fashion utilizing Java design principles extensively: encapsulation, abstrac-

tion, and metamorphism [6]. The design also employs standardized technologies

such as HTTP, JSP, EJB, etc. The use of modular design and standard technolo-

gies aim to make the code base easy to maintain and upgrade, especially in the

case where multiple researchers are collaborating and making code contributions

simultaneously.

On the other hand, the design should also take into account the fact that label

propagation algorithms is relatively computationally expensive for a smart phone

since it involves matrix inversions and multiplications that have a complexity of


Ω(n2logn)[36]. Unlike dedicated servers (full Microsoft Windows or full Linux ma-

chines), it is obvious that the smart phones have much less resources in processing

power, and bus bandwidth (limits file input and output). Due to such limitations,

the software design of the proposed indoor positioning system requires extra de-

sign decisions to make the label propagation realizable on smart phones. These

decisions include:

1. Using A Cluster of Off-line Data Points

Chapter 3 provides a discussion on creating a cluster of data points C of size |C|

in the hope of reducing the computational complexity while still maintaining the

same (or a higher) level of positioning accuracy. This cluster generation algorithm

is particularly helpful in case the smart phones are performing positioning calcu-

lations. Thus, the use of rcluster and Ncluster is enforced at the client end.

2. Maintaining On-line and Off-line Data Structures Separately

As shown in Chapter 3, the on-line localization phase requires two important

matrices: the complete radio map rcomplete of size m × (n + o), and the complete

distance matrix Ncomplete of size (n+o)× (n+o) with m being the total number of

APs, n being the number of off-line data points, and o being the number of on-line

points. Realizing that (n + o) ranges from several hundreds to a thousand in our

case, constructing rcomplete and Ncomplete before positioning calculations can slow

down the smart phones significantly; or in some cases, the Android OS may prompt

users to kill the positioning application due to overly high CPU consumptions.

Thus, instead of having two complete matrices rcomplete and Ncomplete, we split

each of them into 2 parts and maintain these 4 smaller matrices separately. In

doing so, only the on-line portions of rcomplete and Ncomplete are calculated for each

incoming on-line reading ro, while the off-line portions of rcomplete and Ncomplete are

pre-processed during the off-line phase on the localization server. The trade-off


here is between storage and CPU time: smart phones use more permanent storage

(such as Secure Digital Card or SD card for short) to store off-line data in order

to reduce the amount of calculations during the on-line phase.

Additionally, as mentioned previously, it is possible that some of the on-line points

may lose connection to the rest of the graph. As a result of poor graph connec-

tivity, the loosely connected points will receive no location predictions. In this

case, it is required to clean the stale on-line readings in order to re-establish a well

connected graph. Thus, separating the on-line and off-line portions of rcomplete and

Ncomplete solves the no-prediction problem caused by a poorly connected graph.

3. Utilizing Background Thread for Long-Running Tasks

Tasks such as HTTP communications, file input and output (I/O) are considered

long-running tasks for the CPU as they may take tenth of seconds. Once they

are running on the main UI thread, they may block user multi-touch inputs and

may in turn make the application non-responsive. Off-loading long-running tasks

to separate threads, using the so called Android Asynchronous Task mechanism,

helps to prevent the application from becoming non-responsive.

4.3.1 Application High Level Overview

Figure 4.3 presents the block diagram showing the high-level overview of the An-

droid application. Appendix C also provides a class-level block diagram, in the

form of UML, as well as detailed descriptions on each of the classes.

As shown in Figure 4.3, Java libraries are grouped into several Java packages:

Android Specific Package


Figure 4.3: Android Application High Level Diagram. Black arrows show thedependency of libraries. E.g. Localizer.android module depends on the Android

SDK.

There is a total of two Android specific packages: the Localizer.android.activity

package, and the Localizer.android package. The former defines all of the Graphi-

cal User Interface (GUI) screens and the multi-touch behaviour of the application.

The latter provides all Android system level capabilities such as HTTP upload

and download, multi-threaded file I/O, automatic settings update, etc. Since they

are all Android specific, they all depend on the Android SKD, which is provided

by Google Inc.

Non-Android Specific Package

There is a total of three non-Android specific packages, meaning that they can be

shared between the Android application and the localization server. The first pack-

age, Localizer.common, defines the data set containing both off-line and on-line

data points (including Cartesian (x, y) coordinates of the off-line labelled points,

and raw RSS readings r acquired at all locations). It also contains all necessary

data structures such as rcomplete, Ncomplete, graph Laplacian ∆, adjacency matrix


W , and so on. The second package, Localizer.common.algorithm, provides generic

algorithms such as sorting algorithms, and static functions used in positioning.

The third package, namely Jama.jar [5], is an external package that provides basic

linear algebra package for Java. It provides user-level classes for constructing and

manipulating matrices: matrix addition, multiplication, pseudo-inversion, orthog-

onalization, etc.

Based on Figure 4.3, it is obvious to see that the design of the Android application

is modular: a lower level package provides functionalities used by its upper level.

For example, Localizer.common.algorithm is hidden from all other packages except

for its dependant package Localizer.common. Each package depends on only the

package below it with Localizer.android.activity being the top-most package used

to handle user touch inputs.

4.3.2 Application Functionalities

As depicted in the options menu in Figure 4.3, the application has 6 major types

of operations:

1. Collect RSS, which allows users to collect off-line data points in three ap-

proaches: manually labelled data points, time-stamp based labelled data points,

and unlabelled data points.

2. Localize Me, which initiates the positioning sequence involving on-line RSS

scanning, performing label propagation calculations, and displaying the estimated

locations on the display.

3. Stop Tracking, which stops the iterative positioning (tracking) and clearing

stale on-line readings.

4. Settings, which brings the settings screen forward.


5. Radio Map Options, which allows users to do the following tasks: writ-

ing/loading off-line data points to/from CSV data files, uploading CSV data files

to the server for processing, downloading processed CSV data files from the server,

and resetting the application by clearing all off-line and on-line data stored in the

system memory.

6. Information, which displays information such as locations of labelled data

points, and current application settings.

4.3.3 Structure of External Storage

The Android application uses SD cards to store all the required resources used by

the application. These resources include floor plan images (jpeg, jpg, or png files),

data set files in CSV format, and the temporary files used in HTTP communi-

cation. The folder structure and naming conventions of files are shown in Figure

4.4.

Figure 4.4: Application Storage Structure


4.4 The Design of the Location Server

This section descibes the design of the location server that is capable of storing

resource files, and carrying out positioning calculations in the same way as the

mobile devices. Figure 4.5 is a high-level overview of the location server’s structure.

Figure 4.5: Client-Server Block Diagram. This image shows the modules inthe application and the location server. Both of the share the same positioningalgorithms and data structures. The communication is via the Internet using

HTTP protocol.

As mentioned earlier, the proposed system employs standardized technologies to

perform different tasks:

1. Communication Protocol: the location server uses HTTP 1.1 as the com-

munication protocol to talk to the mobile users by means of POST and GET.

2. Communication Interface: The location server uses JavaServer Page (dy-

namically created web page) to interface with the client.

3. Core: EJB is the core of the location server as it hosts the core Java classes

and functions used to perform location predictions. Note that the classes and


functions on the server side are exactly the same as the ones used in the mobile

application.

The actual floor plan images and CSV data files are named according to the naming

convention shown in Figure 4.4, and are stored on the server’s hard drive.

4.5 Chapter Summary

This chapter describes the software developed on the Android smart phones and

the Apache server for the proposed indoor positioning system. The software is

written in Java using standard Java libraries (additional Android specific libraries

on the phone side) and is depicted in different figures to show the relative rela-

tionships amongst all libraries. Moreover, to accomplish an efficient system, some

of the design considerations are also discussed in this chapter: modular design,

sharing code between the client and server, the use of Java design principles, and

the use of standard technologies. The proposed system is installed on different

Android phones and tested in various indoor environment. Chapter 5 discusses

the performance of our positioning system.

Chapter 5

Evaluation

In this chapter, the proposed positioning system is evaluated using MATLAB

simulations and on-site testing using actual Android devices. Instead of generating

RSS readings according to wireless propagation models, we use actual RSS readings

collected from different types of indoor environment to run computer simulations.

In doing so, we could gain better insights into the generalization error of our

positioning algorithms. Moreover, to demonstrate the effectiveness and efficiency

of different algorithms, simulations are configured in different ways both with

and without certain algorithms applied; comparisons and discussions on these

simulations are also provided in the sections that follow.

5.1 Simulation and Testing Set-up

5.1.1 Experimental and Data Acquisition Sites

Five sites have been used to collect RSS readings for computer simulations and

on-device testing: 1) the 1st floor of BA (Bahen Centre for Information technology

70

Chapter 6. Evaluation 71

Table 5.1: Data Sets Used in Simulations and Testing

Data Set ID Area of Interest Device Used Descriptions

BA1F BA 1st floor, Univer-sity of Toronto

HTC One X Manual Data Labelling

BA4F BA 4th floor HTC One X Manual Data Labelling

BA1F-Fast BA 1st floor Nexus 4 Fast Labelling (i.e. Labels are es-timated using time stamp basedmethod (Section 3.2.1)

BA4F-Fast BA 4th floor Nexus 4 Fast Labelling

EATON3 Eaton Centre Level 3 Nexus 4 Fast Labelling

at the University of Toronto), 2) the 4th floor of BA, 3) Level 3 of Eaton Cen-

tre, Toronto, 4) Toronto Pearson Airport terminal 1, and 5) Los Angeles World

Airport.

Site 1 is a floor of a typical office building with a large hall. Site 2 is an office

building floor that has relatively narrow corridors. Site 3 is a large shopping mall

with a vast open space. Lastly, site 4 and 5 are large airport terminals. These sites

are chosen in such a way that they are representative of most indoor environments.

Note that site 4 and 5 have only been used in quick on-device testing. From the

first three testing sites, a total of 5 data sets have been collected using different

devices. Table 5.1 provides a brief summary of these data sets. A more detailed

description on these data sets can be found in Appendix B.

5.1.2 Figure of Merit

A conventional performance evaluation metric for positioning systems is to use the

positioning error (also called the root mean squared error or RMSE), which is the

expectation of Euclidean distance between the actual location and the location


estimates. In literature, researchers mostly report the average RMSE to demon-

strate the accuracy of their proposed RSS based indoor positioning systems. In

practice, however, maximum localization error is equally important since mobile

users and LBS providers can tolerate reasonably low positioning errors but seldom

can they tolerate large errors. Thus, this thesis employs both the average RMSE

(ARMSE) and the maximum RMSE (MRMSE) as performance evaluation met-

rics. Besides these two important performance metrics, other metrics such as the

error standard deviation, and the 90th percentile are also used in the sections that

follow.

Average Root Mean Square Error (ARMSE)

The ARMSE performance metric is defined as the average of mean square errors

over all points and all RSS sample vectors:

ARMSE ,1

n

n∑i=1

1

T

T∑t=1

‖ yi − yi(t) ‖2 (5.1)

where n is the total number of data point including the off-line training points and

on-line data points, T is the total number of test samples (RSS vectors) taken, yi

is the actual location of test data point i, and yi(t) is the estimated location of

data point i using RSS sample measured at time instance t.

Maximum Root Mean Square Error (MRMSE)

The MRMSE performance metric is defined as the maximum of root mean square

errors over all n data points:

MRMSE , max 1

T

T∑t=1

‖ yi − yi(t) ‖2 ∀i = 1 . . . n (5.2)


5.1.3 Evaluation Methodology

The performance of our proposed system is evaluated in three different ways using

ARMSE and MRMSE defined in the previous section.

1) To demonstrate the effectiveness of different algorithms described in Chapter

3, MATLAB simulations are carried out both with and without these algorithms

applied. The goal is to illustrate that, when applied, these algorithm could improve

positioning accuracy. These comparisons are simulation-based and are discussed

in section 5.2.1.

2) Another evaluation method is to perform benchmarking against other position-

ing algorithms. Thus, the label propagation algorithm is evaluated against the

simple supervised kNN algorithm, which is implemented in RADAR system [9].

In both simulations, k is generally 3 to 5 in kNN since it works well in practice.

The results are discussed in Section 5.2.2.

3) In addition to simulations, on-device testing were also conducted using actual

Android smart phones in the aforementioned indoor environments: in a large shop-

ping mall (Eaton Centre), in an office building (Bahen Centre at the University of

Toronto), and in two airports (the Toronto Pearson International Airport and the

Los Angeles World Airport). We have obtained rich testing data in the first two

environments to demonstrate the accuracy levels of our system. Light testing have

been carried out in the last two sites, yielding a satisfactory level of positioning

accuracy and user experience.


5.2 Simulation Results

This section provides various simulations results to evaluate the performance of

label propagation algorithm, together with the adjacency matrix processing tech-

niques discussed in Chapter 3. Section 5.2.1 provides simulations results and

discussions on adjacency matrix processing techniques to demonstrate their effec-

tiveness. Section 5.2.2 compares label propagation with simple kNN scheme to

show that the proposed indoor positioning system is accurate and reliable.

5.2.1 Techniques for Processing Adjacency Matrix

Effectiveness of Adjacency Matrix Normalization Algorithms

Two different adjacency matrix normalization algorithms have been provided in

Section 3.6.2 to ensure that the edge weights of the constructed graph are compara-

ble amongst all entries in the W matrix. In simulations, both of the normalization

algorithms have been applied to the five data sets (Table 5.1). The resulting error

statistics (the ARMSE, standard deviation of RMSE, and MRMSE) are compared

in Table 5.2.

Simulation results in Table 5.2 indicate that the performance of the Modified

STONE algorithm is either on par with or exceeds that of the STONE algorithm

in all simulation cases in terms of ARMSE, positioning error standard deviation,

and MRMSE. Both normalization algorithms could provide improvements in all

three performance metrics.

In particular, the Modified STONE algorithm provides a 30.4% reduction in

ARMSE and a 52.5% reduction in MRMSE comparing to the un-normalized case.

When compared to regular STONE algorithm, the Modified STONE is able to

reduce the ARMSE and MRMSE by as high as 17% and 52.6%, respectively. The


Table 5.2: Comparison on Different Normalization Methods. STONE standsfor Sum To One algorithm. M-STONE stands for Modified Sum To One algo-rithm. None means that no normalization algorithm is applied. All simulation

results are presented in meters accurate within two decimal numbers

ARMSE [m] Standard Deviation [m] MRMSE [m]

Dataset ID M-STONE STONE None M-STONE STONE None M-STONE STONE None

BA1F 2.35 2.40 2.62 1.33 1.36 2.62 6.16 6.13 6.05

BA1F-Fast 1.12 1.35 1.61 0.81 1.11 1.34 4.36 9.21 9.19

BA4F 1.40 1.60 1.74 0.92 0.87 0.94 4.34 4.10 3.96

BA4F-Fast 0.99 1.12 1.29 0.88 0.97 1.09 10.61 10.71 10.73

Eaton3 0.87 0.93 1.02 0.69 0.80 0.83 4.58 8.18 8.03

corresponding positioning error cumulative density functions (cdf) are plotted in

Figures 5.1 to 5.5.

Figure 5.1: Positioning Error cdf: BA1F Data Set

Effectiveness of Dynamic Kernel Parametrization Algorithm

Dynamic kernel function parametrization algorithm, described in Algorithm 6,

makes the localization algorithm adaptive to the most current graph structure. It

does so by generating suitable σ values for kernel functions (such as the Gaussian

kernel function Equation 2.4). Instead of seeking for the optimal σ values, this


Figure 5.2: Positioning Error cdf: BA1F-Fast Data Set


algorithm is sub-optimal in a sense that it inspects the current adjacency matrix W

and tries to find a σ value that makes the average positioning errors low enough. In

this thesis, the optimal σ is defined as the σ value that yields the lowest ARMSE.

Simulations are carried out using all five validation data sets; and the performance

of dynamic parametrization algorithm is compared against the optimal σ value.

To demonstrate the effectiveness of this algorithm, we first run dynamic kernel

parametrization algorithm to generate a sub-optimal σ value for each validation



Figure 5.5: Positioning Error cdf: EATON3 Data Set

data set. Then, for each validation data set, an optimal σ can be found via

exhaustive search on σ in a given range. Comparisons are done between the sub-

optimal and optimal σ values to show that this dynamic parametrization algorithm

can, indeed, be used in practice. Table 5.3 is a summary of the simulations results.

Table 5.3 compares the performances of the dynamic parametrization algorithm

against the optimal kernel parameter σ obtained using exhaustive search. The

results indicate that, in most simulation cases, the performance using dynamically


Table 5.3: Effectiveness of Dynamic Kernel Parametrization Algorithm

ARMSE [m] Standard Deviation [m] MRMSE [m]

Dataset ID Dynamic σ optimal σ Dynamic σ optimal σ Dynamic σ optimal σ

BA1F 2.40 2.38 1.36 1.39 6.13 6.04

BA1F-Fast 1.35 0.93 1.11 0.64 9.21 4.69

BA4F 1.60 1.48 0.87 0.96 4.10 4.76

BA4F-Fast 1.12 0.94 0.97 0.85 10.71 11.68

Eaton3 0.93 0.89 0.80 0.71 8.18 5.34

generated σ values are on par with that of optimal σ values in terms of ARMSE.

When comparing the MRMSE and standard deviations of RMSE, the dynamically

generated σ values are on par with the optimal σ values in some of the simulation

cases.

AP Selection Scheme Comparison

Section 3.5 provides two different AP selection schemes: the k-strongest AP

schemes, and visibility-based AP selection schemes. This section provides sim-

ulation results to demonstrate the performances of each of these approaches using

all five validation data sets. Figure 5.6, 5.7, 5.8, 5.9, and 5.10 depict the statistics

of positioning errors for all five validation data sets.

In all five simulation cases, for manually labelled data set such as BA1F and

BA4F, the performances of different AP selection schemes do not differ to any

large extent. This is primarily due to the fact that the manually labelled data

sets are sparser in a sense that they consist of fewer data points scattered in the

same area. As a result, the data points are intrinsically separable even without AP

selection schemes. As for the fast-labelled data sets (namely BA1F-Fast, BA4F-

Fast, and Eaton3), data points tend to be very close to each other. Thus, AP

selection schemes are able to affect pairwise similarities significantly.


Figure 5.6: Positioning Error Statistics: BA1F Data Set

Figure 5.7: Positioning Error Statistics: BA1F-Fast Data Set

Based on Figure 5.6 to 5.10, the k-strongest AP selection scheme has variable

performances since it produces large ARMSE and MRMSE when applied to man-

ually labelled data sets; on the other hand, the k-strongest scheme is able to reduce

ARMSE and MRMSE when applied to fast-labelled data sets.

Visibility-based AP selection scheme, compared to the k-strongest scheme, is able

to provide stable and consistent performances over all five validation data sets: the

error metrics are either reduced or stay at the same level as the no-AP selection

case.


Figure 5.8: Positioning Error Statistics: BA4F Data Set

Figure 5.9: Positioning Error Statistics: BA4F-Fast Data Set

5.2.2 Comparison with kNN

Like other machine learning algorithms, label propagation and kNN algorithms

require a series of parameters to be set in order to function well in simulations and

in testing. Thus, before comparing the performance of label propagation against

the kNN algorithm, a set of configurations need to be determined and fixed. This

ensures that the positioning error results are comparable across all simulation cases

and on-device testing cases. A typical set of configurations, summarized in Table

5.4, has been used in the this section and the next section.


Figure 5.10: Positioning Error Statistics: Eaton Data Set

Table 5.4: A Set of Configurations Used in Simulations and Testing

Lagragian multiplier in label propagation λ = 0.2

Kernel function used in label propagation Gaussian kernel function

Dynamic kernel parametrization Enabled

AP selection scheme Visibility-based AP selection

Adjacency matrix normalization Modified Sum to One method

kNN Algorithm parameter k = 5

Figure 5.11 is an example cdf plot comparing the positioning performance of label

propagation with that of the kNN algorithm using BA1F data set. In this par-

ticular example, the performance of label propagation algorithm, represented by

the blue curve, is approximately 25.78% better in terms of ARMSE, and is 53.02%

better in terms of MRMSE. Figure 5.11 to 5.15 are cdf plots for all other validation

data sets. Considering all five validation data sets, using label propagation can

reduce ARMSE by 15.75% to 36.81%, reduce the standard deviation by 12.15%

to 70.5%, and reduce MRMSE by 12.5% to 87.3%. It indicates that the label

propagation algorithm is able to consistently outperform kNN in various indoor

environments, regardless of how the labelled data points are collected.



Figure 5.12: Positioning Error cdf: BA1-Fast Data Set

5.3 On-Device Testing Results

This section discusses the on-device performance of the proposed label propagation

positioning system using the configurations shown in Table 5.4.

Testing Results on Bahen 1st Floor




Figure 5.16 compares positioning performances between label propagation and

kNN algorithm on the 1st floor of Bahen Centre. Testing have been conducted

at 13 different locations. At each location, both label propagation algorithm and

kNN algorithm were used to produce 4 location predictions.There is a total of 52

samples obtained for each algorithm.

Testing Results on Bahen 4th Floor

Figure 5.17 compares positioning performances between label propagation and


Figure 5.15: Positioning Error cdf: EATON3 Data Set

Figure 5.16: Testing results on Bahen 1st floor: positioning error cdf

kNN algorithm on the 4st floor of Bahen Centre. Testing have been conducted at

5 different locations. At each location, both the label propagation algorithm and

the kNN algorithm were used to produce 4 location predictions. There is a total

of 20 samples obtained for each algorithm.

Testing Results on Eaton Centre Level 3

Table 5.18 compares positioning performances between label propagation and

kNN algorithm on the level 3 of Eaton Centre. Testing have been conducted

at 8 different locations. At each location, both label propagation algorithm and


Figure 5.17: Testing results on Bahen 4th floor: positioning error cdf

kNN algorithm were used to produce 4 location predictions. There is a total of

32 samples obtained for each algorithm.

Figure 5.18: Testing results on Eaton Centre level 3: positioning error cdf

Testing results summarized in Figure 5.16 to 5.18 are indicative of label propaga-

tion’s superior performance over the kNN algorithm. Although they are not as

strong indicative as the simulation results, the testing results are still able to show

that the ARMSE, the MRMSE, and the standard deviation of label propagation

remain more than 1 meter lower than those of kNN in all test cases. The 90th

percentile follows the same trend except for the Bahen 4 test case, where the 90th

percentile obtained from the kNN algorithm is 4.5 meters lower.


Testing Results in Los Angeles World Airport

A quick test using Google Nexus 4 smart phone has been carried out in Los

Angeles World Airport to demonstrate the user-friendliness and the effectiveness

of our indoor positioning system. In this test, RSS fingerprints were collected in

an area of approximately 50m by 20m in under 10 minutes. The positioning error

were not measured precisely; however, it was estimated to be around 5 meters

on average. A snapshot of a particular test is in Figure 5.19. The device’s actual

position and the estimated position are pointed by arrows, showing an localization

error of approximately 3 meters.

5.4 Chapter Summary

This chapter evaluates the performance of the proposed Wi-Fi RSS based position-

ing system using a total of five data sets collected from different types of indoor

environments ranging from office buildings with relatively tight corridors, to shop-

ping malls where a single floor could be open to other floors above and below it.

The label propagation algorithm outperforms the kNN methods in most of the

test cases. With the help of coarse localization, the label propagation algorithm

makes a light-weight location service solution that is feasible for resource-limited

mobile devices such as Android smart phones and tablets. Furthermore, testing re-

sults also show that various adjacency matrix generation algorithms (presented in

Chapter 3) are able to improves the accuracy and robustness of label propagation

algorithm.


Figure 5.19: Testing in Los Angeles World Airport. The arrows point out thedevice’s actual position, a red circle, and the estimated position, a blue arrow

marker. The positioning error is around 3 meters in this case.

Chapter 6

Conclusion

In this thesis, we presented an efficient and reliable Wi-Fi RSS fingerprint based in-

door positioning system. This system consists of a series of light-weight algorithms

designed for mobile devices that are often constrained in computational power and

battery life. Specifically, there are three major improvements that make the sys-

tem fast and smart phone friendly: 1. fast time stamp based RSS data collection,

2. the use of a batch semi-supervised learning algorithm for positioning, and 3.

the use of localization servers.

Fast time stamp based RSS data collection greatly reduces the time spend on the

off-line data collection phase. Also in the off-line phase, with the help of local-

ization servers, smart phones can off-load heavy RSS processing works in order to

save CPU time and preserve battery life. At the core of the positioning system

lies the label propagation algorithm, which is a semi-supervised machine learning

algorithm capable of producing position estimations for multiple mobile devices

simultaneously. We discuss in detail the theoretical work behind the label propaga-

tion algorithm in the early chapters. In Chapter 4, software design considerations

and design solutions have been provided; they aim to implement the software that

is feasible for location estimations on the mobile devices.

88

Conclusion 89

We then conducted both computer simulations and experimental studies to test

the feasibility, the efficiency, and the accuracy of our system. The results of the

simulations are promising, showing that the semi-supervised label propagation

algorithm is capable of providing location estimations with an average accuracy

of approximately 1.5 meters. Using five different validation data sets collected

from several typical indoor environments, we simulated the label propagation al-

gorithm and obtained its validation errors in terms of four performance metrics:

the ARMSE, the standard deviation of positioning errors, the MRMSE, and the

90th percentile. Based on the label propagation algorithm, we further implemented

simulators to explore the use of adjacency matrix processing techniques for higher

positioning accuracy. Even in different indoor wireless environments, these tech-

niques performed comparatively well with only a limited computational overhead

due to their low computational complexity.

The proposed algorithms are also integrated into a client-server architecture fea-

turing the Apache Geronimo application server. This system has been tested in

five testing sites. The experimental results, albeit not as impressive as the simula-

tion results, could clearly show the improvement brought about by our proposed

algorithms. Moreover, the algorithms perform consistently well with only a limited

sacrifice in processing power consumption thanks to the special mobile software

design considerations detailed in Chapter 4.

Appendix A

CSV Data File Format

90

Appendix A. CSV Data File Format 91

Figure A.1: A sample CSV file depicting all the fields

Appendix B

Detailed Information on

Validation Data Sets

The detail information regarding five of the data sets used in our simulations and

testing are presented below. Unlike most of the indoor positioning systems that

are developed in controlled environments [13] [9], the testing sites for our proposed

system are typical real indoor environments such as office buildings and shopping

malls. Analysing such data sets could provide insights into how RSS readings are

distributed and assist the designing of algorithms in RSS processing and position

estimation.

One important observation on these data set is that the more AP one data set

contains, the sparser the radio map is.

•BA1F Data Set

This set of training data is collected and labelled manually over the entire 1st floor,

with an area of 70 by 80 meters, of Bahen Centre at the University of Toronto.

This data set contains 128 data labelled points sampled at different locations with

10 RSS time samples at each location. Figure B.1 shows the actual locations of

each data point on the floor plan of the first floor of Bahen Centre.

92

Appendix B. Detailed Information on Validation Data Sets 93

The off-line radio map r is the average RSS matrix for all off-line training points.

In BA1F, r is of size 585 × 126, meaning that there are 585 MAC addresses

detectable at the locations shown in Figure B.2. However, this r matrix is sparse

because only 16.3% of the entries in r are valid RSS readings obtained through

Wi-Fi scanning while the rest are imputed RSS readings (−110.0dBm).

Figure B.1: Data point locations on Bahen first floor

•BA4F Data Set

This set of training data is collected and labelled manually on a small area (40×40

meters) of the 4th floor in Bahen Centre at the University of Toronto. This dataset

contains 143 different data points sampled at different locations with 10 RSS

samples per location. Figure B.2 shows the actual locations of each point.

The off-line radio map r is of size 215 × 143, meaning that there is a total of

215 MAC addresses detectable in the area shown in Figure B.2. This r matrix is

sparse since only 32% of the entries in r are actual RSS readings while the rest

are considered missing values.


Figure B.2: Data point locations on Bahen fourth floor

•BA1F-Fast Data Set

This set of training data is collected and labelled automatically using the fast data

collection method over the entire 1st floor of Bahen building of size 70×80 meters.

This dataset contains 445 different data points sampled at different locations with

1 RSS sample per location. Figure B.3 shows the actual locations of each data

point.

The off-line radio map r is of size 337× 445, meaning that there is a total of 337

MAC addresses detectable from all the locations shown in Figure B.3. Compared

to the manually labelled data sets, this r matrix is even more sparse since only 11%

of the entries in r are actual RSS readings while the rest are considered missing

values.

•BA4F-Fast Data Set


collection method over the entire 4th floor of Bahen building of size 70×80 meters.


Figure B.3: Data point locations in BA1F-Fast Data Set

This dataset contains 347 different data points sampled at different locations with

1 RSS sample per location. Figure B.4 shows the actual locations of each point.


259 MAC addresses detectable from all the locations shown in Figure B.4. This r

matrix is sparse since only 14.6% of the entries in r are actual RSS readings while

the rest are considered missing values.

•EATON3 Data Set


collection method on level 3 of Eaton Centre, down-town Toronto. This site is of

size 70 × 80 meters. This dataset contains 734 different data points sampled at


Figure B.4: Data point locations in BA4F-Fast Data Set

different locations with 1 RSS sample per location. Figure B.5 shows the actual

locations of each point.


419 MAC addresses detectable from all the locations shown in Figure B.5. This r

matrix is the most sparse amongst all data set. Only 8.7% of the entries in r are

actual RSS readings while the rest are considered missing values.

Figure B.5: Data point locations in EATON3 Data Set

Appendix C

UML Diagram of the Android

Application

This is an UML diagram showing the structure of the Android Application.

97

Appendix C. UML Diagram 98

Figure C.1: Android Application Java UML Diagram

Bibliography

[1] Android Dashboard, http://developer.android.com /about/dashboards/in-

dex.html.

[2] Apache Software Foundation, http://www.apache.org/.

[3] HP iPAW hx2750 Specifications, http://reviews.cnet.com/pdas/hp-ipaq-

hx2750/4507-31277-31218727.html.

[4] HP iPAW hx4700 Specifications, http://www.davespda.com /hardware/p-

da/pocketpc/devicea8ba.html?142.

[5] JAMA, A Java Matrix Package, http://math.nist.gov/javanumerics/jama/.

[6] Java OOP Concepts, http://docs.oracle.com/ javase/tutorial/java/concept-

s/index.html.

[7] Samsung Omnia II Specifications, http://www.phonearena.com/htmls

/Samsung-Omnia-II-phone-p 3790.html.

[8] Anthea Wain Sy Au. RSS-based WLAN Indoor Positioning and Tracking Sys-

tem Using Compressive Sensing and Its Implementation on Mobile Devices.

Master’s thesis, University of Toronto, 2010.

[9] Paramvir Bahl and Venkata N. Padmanabhan. RADAR: An In- Building

RF-based User Location and Tracking System. In INFOCOM ’00, volume 2,

page 775C784, 2000.

99

Bibliography 100

[10] Yoshua Bengio, Olivier Delalleau, and Nicolas Le Roux. Label Propagation

and Quadratic Criterion. 1st edition, 2006.

[11] Philipp Bolliger. Redpin - adaptive, zero-configuration indoor localization

through user collaboration. In Proceedings of the first ACM international

workshop on Mobile entity localization and tracking in GPS-less environments,

MELT ’08, pages 55–60, New York, NY, USA, 2008. ACM.

[12] Krishna Chintalapudi, Anand Padmanabha Iyer, and Venkata N. Padman-

abhan. Indoor localization without the pain. In Proceedings of the sixteenth

annual international conference on Mobile computing and networking, Mobi-

Com ’10, pages 173–184, New York, NY, USA, 2010. ACM.

[13] Chen Feng, Wain Sy Anthea Au, Shahrokh Valaee, and Zhenhui Tan. Com-

pressive sensing based positioning using RSS of WLAN access points. In

Proceedings of the 29th conference on Information communications, INFO-

COM’10, pages 1631–1639, Piscataway, NJ, USA, 2010. IEEE Press.

[14] Chen Feng, Shahrokh Valaee, Anthea Wain Sy Au, Sophia Reyes, Sameh

Sorour, Samuel N. Markowitz, Deborah Gold, Keith Gordon, and Moshe

Eizenman. Anonymous Indoor Navigation System on Handheld Mobile De-

vices for Visually Impaired. International Journal of Wireless Information

Networks, 19:352–367, December 2012.

[15] John M. Gottman. Time-Series Analysis: A Comprehensive Introduction for

Social Scientists. Cambridge University Press, January 1982.

[16] Jonathan L. Gross and Jay Yellen. Handbook of Graph Theory. CRC Press,

1st edition, December 2003.

[17] Bruce E. Hansen. Nonparametrics, Lecture Notes, 2009.

[18] Bruce E. Hansen. Econometrics. Number 18. January 2013.

Bibliography 101

[19] Yiming Ji, Saad Biaz, Santosh Pandey, and Prathima Agrawal. ARIADNE:

a dynamic indoor signal map construction and localization system. In Pro-

ceedings of the 4th international conference on Mobile systems, applications

and services, MobiSys ’06, pages 151–164, New York, NY, USA, 2006. ACM.

[20] Kamol Kaemarungsi. Distribution of WLAN Received Signal Strength In-

dication for Indoor Location Determination. Wireless Pervasive Computing,

2006 1st International Symposium on, 2006.

[21] Kamol Kaemarungsi and Prashant Krishnamurthy. Modeling of Indoor Posi-

tioning Systems Based on Location Fingerprinting. volume 2 of INFOCOM

2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Com-

munications Societies, pages 1012 – 1022, March 2004.

[22] Kamol Kaemarungsi and Prashant Krishnamurthy. Properties of Indoor Re-

ceived Signal Strength for WLAN Location Fingerprinting. Mobile and Ubiq-

uitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004. The

First Annual International Conference on, pages 14–23, August 2004.

[23] Krzysztof W. Kolodziej and Johan Hjelm. CRC Press, 1 edition, May 2006.

[24] Andrew M. Ladd, Kostas E. Bekris, Algis Rudys, Guillaume Marceau, Ly-

dia E. Kavraki, and Dan S. Wallach. Robotics-based location sensing using

wireless ethernet. In Proceedings of the 8th annual international conference on

Mobile computing and networking, MobiCom ’02, pages 227–238, New York,

NY, USA, 2002. ACM.

[25] Jonathan Ledlie, Jun geun Park, Dorothy Curtis, Andre Cavalcante,

Leonardo Camara, Afonso Costa, and Robson Vieira. Mole: a Scalable,

User-Generated WiFi Positioning Engine. In Indoor Positioning and Indoor

Navigation, Guimaraes, Portugal, 09/2011 2011.

Bibliography 102

[26] Hui Liu, Houshang Darabi, Pat Janerjee, and Jing Liu. Survey of Wireless

Indoor Positioning Techniques and Systems. IEEE Transactions on Systems,

Man, and Cybernetics-Part C: Applications and Reviews, 37(6), November

2007.

[27] IndoorLBS LLC. Indoor Location Based Service Market Report, January

2013.

[28] David J.C. MacKay. Introduction to Gaussian Process. C. M. Bishop, editor,

Neural Networks and Machine Learning, NATO, ASI Series, page 133, 1998.

[29] R. Mautz. Positioning, Nagivation and Communication, 2009. WPNC 2009

6th Workshop on, March 2009.

[30] Rainer Mautz. Indoor Positioning Technologies: Application for Venia Leg-

endi in Positionning and Engineering Geodesy. PhD thesis, Institute of

Geodesy and Photogrammetry, Department of Civil, Environmental and Ge-

omatic Engineering, ETH Zurich, Feburary 2012.

[31] Emanuel Parzen. On Estimation of a Probability Density Function and Mode.

The Annals of Mathematical Statistics, 33(3):1065–1076, 1962.

[32] Sukanta Pati. Laplacian Matrix of a Graph, 2011.

[33] Murray Rosenblatt. Remarks on Some Nonparametric Estimates of a Density

Function. 1956.

[34] Mugizi Robert Rwebangira and John Lafferty. Local Linear Semi-supervised

Regression. Technical report, 2009.

[35] Jason Small, Asim Smailagic, and Daniel P. Siewiorek. Determining User

Location for Context Aware Computing Through the Use of a Wireless LAN

Infrastructure. Technical report, Institute for Complex Engineered Systems.

[36] Amund Tveit. On the Complexity of Matrix Inversion.

Bibliography 103

[37] Z. Xiang, S. Song, J. Chen, H. Wang, J. Huang, and X. Gao. A wireless

LAN-based indoor positioning technology. IBM J. Res. Dev., 48(5/6):617–

626, September 2004.

[38] Allen Yang, Arvind Ganesh, Shankar Sastry, and Yi Ma. Fast L1-

Minimization Algorithms and An Application in Robust Face Recognization:

A Review. IEEE Transactions on Image Processing, Feburary 2013.

[39] Moustafa Youssef and Ashok Agrawala. The Horus WLAN location determi-

nation system. In Proceedings of the 3rd international conference on Mobile

systems, applications, and services, MobiSys ’05, pages 205–218, New York,

NY, USA, 2005. ACM.

[40] Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis, School of

Computer Science, Carneie Mellon University, May 2005.

[41] Xiaojin Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning us-

ing Gaussian fields and harmonic functions. In The Twentieth International

Conference on Machine Learning, August 21-24, 2003, Washington, DC USA,

pages 912–919, 2003.

[42] Xiaojin Zhu and Zoubin Ghahramani. Learning from Labeled and Unlabeled

Data with Label Propagation. 2002.

[43] Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi-supervised Learn-

ing. Morgan and Claypool Publishers, June 2009.

[44] Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. Semi-Supervised Learn-

ing: From Gaussian Fields to Gaussian Process. Technical report, School of

Computer Science, August 2003.

An Efficient Wi-Fi RSS Indoor Positioning System and Its ... · An E cient Wi-Fi RSS Indoor...

Documents

Transcript of An Efficient Wi-Fi RSS Indoor Positioning System and Its ... · An E cient Wi-Fi RSS Indoor...