Trajectory prediction in cellular networks using Discrete...

Trajectory prediction in cellular networks usingDiscrete-time Markov chains

Mati Karner Department of Computer ScienceUniversity of Tartu

Tartu, [email protected]

Abstract— This paper provides one possible use-case howFirst-order Markov chains could be used to predict usertrajectory in mobile network and explains the motivationbehind such research. We developed a software prototype whichconsumes events from network operator provided data feedand produces a probability distribution characterizing possiblepaths a Mobile Equipment (ME) holder can take at someupcoming time interval. An anonymized real-world user datawas used to evaluate the accuracy of the predictions. We givean overview of the results and implementational details, as wellas provide possible solutions for the problems and difficultiesexperienced during the study.

I. INTRODUCTION

Mobile devices have become more widespread than ever.According to global data published by International Telecom-munication Union 90 people out of 1001 have mobiletelephone subscriptions [1]. Innovation in technology andreduced production cost has made mobile phones with dataconnectivity and support for positioning systems, such asGPS, widely available. The fact, that people tend to carrycellphones with them wherever they go, gives rise to new ex-iting Location Based Services (LBS) and products. However,given the expensiveness of cellular data services and consid-ering that free WiFi access is still rarity rather than com-modity, metadata resulting from various GSM/UMTS/LTEnetwork orations, such as voice calls, messages, radio op-erations triggered by ME etc, is still one of the cheapestretrievable data source for making meaningful conclusionsabout the characteristics of the network.

One particularly interesting data source, unfortunatelyunavailable for the period of study, is RSSI (Received SignalStrength Indication) feed. Among other parameters, this feedalso includes the signal strength of the radio operationassociated to some particular signal source. This in turnprovides means to approximate the position of ME (usingwireless triangulation). Numerous methods exist to modeldifferent aspects of RS (Radio Signal) propagation and usermovement inside the network and this is a growing subjectof research ([2][3][4][5] to name but a few). In this paperwe only concentrate to Visitor Location Registry (VLR) feedthat captures data from several types of events occurring incellular networks, e.g. periodic location updates, CID (CellID) and LAC (Location Area Code) changes, UE (UserEquipment) being switched on/off etc. We are specifically

1119 out of 100 for developed countries

interested in those events that contain the CID, allowing us tomodel user movement as a series of state transitions betweenthe cells.

Our aim was to experiment with Discrete-time Markovchains (DTMC) and gain hands on experience as wellas deeper understanding of modelling stochastic processes.Having granted access to anonymized VLR feed of one of themobile network operators in Northern-Europe, we developedan application that analyzes a specified interval from previ-ously collected data of some cellphone user. This data is usedto calculate necessary inputs for Markov chain predictor aswell as picking out series of events (N -length random walk)on which we traverse from time ascending order and providepredictions for the next possible state transition. For sake ofsimplicity, we use one hour offsets as discrete time stepsand choose 3 cells with the highest transition probabilitieson each step. To evaluate the predictions we observe thenext actual state on time step t + 1 and inspect whethertransitioning into this cell was predicted in time step t. AGUI tool for controlling the application and visualizing thepath with corresponding predictions on the map was alsodeveloped.

This paper is organized as follows: Section II formulatesthe problem using First-order Markov chains. Section IIIcovers the implementational part. Overview of the resultsis given in section IV. Section V summarizes the paper,highlights and provides possible solutions to some problemsand difficulties experienced during the study.

II. PROBLEM FORUMLATIONIn this section we present how user mobility is formalized

using First-order Markov chains. Each event in VLR feedin interpreted as state transition, and the CID of the eventdetermines the state of the system at any given time. Thestate space is therefore the set of all CID-s.

Altogether, fora) a set of N states S = {s1, s2, .., sN}, where si ∈ N,b) state transition probability distribution matrix A ={aij},

aij = P (Xt+1 = sj | Xt = si) 1 ≤ i, j ≤ N,

c) an initial distribution Π = {π(1), π(2), .., π(N)}, whereπ(i) indicates the probability of starting in state si,

estimate the probability distribution P [Xt+1:1] at time t+ 1.

First-order Markov assumption states that only theknowledge of the present, not the past, is relevant formaking predictions about the future states of the system.More formally

P (Xt+1 = s | Xt = st, Xt−1 = st−1, ..., X1 = s1) =

P (Xt+1 = s | Xt = st).

We’ll also assume that the Markov process under considera-tion is time homogeneous, that is, the processes that governthe system do not change over time [6]. This allows us tocompute probability distribution for any subsequent time:

P (Xt+1 = jt, Xt = jt−1, ..., X2 = j1, X1 = i) =

(ajt−1jt(ajt−2jt−1(...(aij1π(i))))), (1)

more generallyP [Xt+1:1] = ΠAt (2)

A. Initial distribution & state transition matrix

Let ξ(i, j) be number of transitions from state i to jobserved over some training set TN . One intuitive way howone could express the likelihoods incorporated into A is asfollows:

aij =∑k

ξ(i, j)

ξ(i, k)(3)

Latter is simply a normalized count. Using similar approachfor initial distribution yields

π(i) =∑k

ξ(i)

ξ(k). (4)

III. IMPLEMENTATION

Application prototype was built upon Node.js frameworkand consists of HTTP server and web application.

A. Web application

We used Open Layers library and their famous Stamentiles for the map. Cell areas were drawn based on operatorprovided coverage area maps (multipolygons). Applicationalso features a date range selector, that when submitted,queries HTTP Server for transition estimates together withten last events, which are then visualized on the map.

B. HTTP Server

In addition to serving static assets, server also handlestransition estimate requests, picks out the corresponding timeinterval from VLR log and passes them to DTMC predictor.

Majority of the functions in DTMC implementation oper-ate on a data structure called state map, that maps the CID-sto their indexes in A as well as in initial distribution vector.State map also counts state transitions, which in turn is usedto calculate initial distribution and state transition probabilitydistribution matrix. We use Numberic library for Node.js formatrix-vector (or matrix-matrix) operations in (2).

0 5 10 15 20 25 30 35 40 45 500

10

20

30

40

50

60

70

80

90

100

No of days in TN

Pre

dict

ion

accu

racc

y(%

ofev

ents

)

Fig. 1. Calculations with 50 days (7300 events) of VLR data. Sample 1.

0 5 10 15 20 25 30 35 40 45 500

10

20

30

40

50

60

70

80

90

100

No of days in TN

Pre

dict

ion

accu

racc

y(%

ofev

ents

)

Fig. 2. Calculations with 50 days (3200 events) of VLR data. Sample 2.

Fig. 3. Events from Figure 1. visualized on map. Polygons denote cellareas.

Fig. 4. Events from Figure 1. visualized on map (zoomed). Polygonsdenote cell areas.

IV. RESULTS

As seen in Figures 1 and 2, the accuracy of the predictionsgrows together with the the size of the training set. Bothseries seem to converge to certain value for training setscontaining up to 45 - 50 days of data, i.e. ≥ 2000 events.Data sets that contain events more localized to certaingeographical area, without significant outliers, tend to givebetter predictions. This is the case between Sample 1 andSample 2. Sample 1 contains a large cluster of events nearTartu (see Figure 3), but also has few smaller ones in Tallinn,near Narva and Hiiumaa, and multiple trails between Tartuand Tallinn. Sample 2, on the other hand, contains eventsoriginating from Harjumaa county and Tallinn. Assumingsufficiently large data sets, for non-localized data sets onaverage 40% of the events are among the ones predicted,63% - 65% for localized sets.

Current implementation of the VLR feed caches aroundtwo months of events for each network subscriber. It wouldbe interesting to run the application on a data set containingat least half a year of events and even more. These measure-ments, although possible, require better planning, since loghas to be concatenated from multiple pieces, each capturedafter two month intervals.

V. CONCLUSIONS

In this paper we implemented an application predictinguser movement between cellular network cells. The accu-racy of the predictions was around 40% vs 65% for theunlocalized data sets and localized data sets respectively.For possible applications this is promising, but clearly notenough. Given the accuracy difference between localized andunlocalized training sets, it might be worth experimenting,if for given cell ID, the predictions made for the next time-step, only take into account the transition probabilities intothe neighboring cells. This could be further improved byprecalculating distributions over historical data of all allsubscribers for each cell in the network. If the historicaldata for particular user doesn’t contain any transitions fromspecific cell to another, resulting zero probability either ininitial distribution or state transition distribution matrix, theprecalculated distribution would be a helpful clue. Anotherapproach to improve the accuracy is to use higher orderMarkov chains, which build more ”memory” into the statesunder observation, but are also more costly in terms ofcomputation.

The future work for the current topic involves modellingthe mobility of the user as series of state transitions, whereeach state also describes the Cartesian coordinates of theME as well as the speed and velocity. A crucial mean forthis is the signaling feed described in section I. This is,however, much more difficult task to tackle with, becauseof the various components included in mobility model, e.g.RSSI propagation. There are numerous interesting papersavailable for the given topic that all tend to highlight novelapproches and therefore further encurage writer to continuewith the project.

REFERENCES

[1] ICT. Key 2005-2015 ict data for the world. http://www.itu.int/en/ITU-D/Statistics/Documents/statistics/2015/ITU_Key_2005-2015_ICT_data.xls, 2015. [Online;accessed 16-May-2016].

[2] Zengshan Tian, Xindi Liu, Mu Zhou, and Kunjie Xu.Mobility tracking by fingerprint-based knn/pf approachin cellular networks. In 2013 IEEE Wireless Commu-nications and Networking Conference (WCNC), pages4570–4575, April 2013.

[3] Zigang Yang and Xiaodong Wang. Joint mobility track-ing and hard handoff in cellular networks via sequentialmonte carlo filtering. In INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computerand Communications Societies. Proceedings. IEEE, vol-ume 2, pages 968–975 vol.2, 2002.

[4] Z. R. Zaidi and B. L. Mark. Mobility tracking basedon autoregressive models. IEEE Transactions on MobileComputing, 10(1):32–43, Jan 2011.

[5] Z. R. Zaidi and B. L. Mark. Real-time mobility trackingalgorithms for cellular networks based on kalman filter-ing. IEEE Transactions on Mobile Computing, 4(2):195–208, March 2005.

[6] S. Russell and P. Norvig. Artificial Intelligence A ModernApproach 3rd. PearsonEducation, Inc, 2013.

Trajectory prediction in cellular networks using Discrete...

Documents

Transcript of Trajectory prediction in cellular networks using Discrete...