Efficient Processing of Moving Top- Spatial Keyword Queries...

20
Research Article Efficient Processing of Moving Top- Spatial Keyword Queries in Directed and Dynamic Road Networks Muhammad Attique , 1 Hyung-Ju Cho , 2 and Tae-Sun Chung 3 1 Department of Soſtware, Sejong University, Republic of Korea 2 Department of Soſtware, Kyungpook National University, Republic of Korea 3 Department of Soſtware, Ajou University, Republic of Korea Correspondence should be addressed to Tae-Sun Chung; [email protected] Received 28 May 2018; Revised 15 August 2018; Accepted 18 September 2018; Published 1 November 2018 Academic Editor: Ke Guan Copyright © 2018 Muhammad Attique et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A top-k spatial keyword (TkSk) query ranks objects based on the distance to the query location and textual relevance to the query keywords. Several solutions have been proposed for top-k spatial keyword queries. However, most of the studies focus on Euclidean space or only investigate the snapshot queries where both the query and data object are static. A few algorithms study TkSk queries in undirected road networks where each edge is undirected and the distance between two points is the length of the shortest path connecting them. However, TkSk queries have not been thoroughly investigated in directed and dynamic spatial networks where each edge has a particular orientation and its weight changes according to the traffic conditions. erefore, in this study, we address this problem by presenting a new method, called COSK, for processing continuous top-k spatial keyword queries for moving queries in directed and dynamic road networks. We first propose an efficient framework to process snapshot TkSK queries. Furthermore, we propose a safe-exit-based approach to monitor the validity of the results for moving TkSK queries. Our experimental results demonstrate that COSK significantly outperforms existing techniques in terms of query processing time and communication cost. 1. Introduction With the popularization of geo-tagged data (e.g., geo-tagged photos, videos, check-ins, and text messages), many online location-based services such as Google Maps, Yahoo Maps, and Bing Maps have started providing useful information via location-based queries [1–4]. Moreover, textual descriptions of points of interest, e.g., hotels, shopping malls, and tourist attractions, are easily accessible on the Web. ese devel- opments demand techniques that efficiently process top-k spatial keyword queries that return a ranked list of the k best facilities based on their proximity to the query location and relevance to the query keywords. Several algorithms have been proposed for processing top-k spatial keyword queries in Euclidean space [5, 6]. Although few algorithms exist that study keyword queries in a road network, they all focus on undirected road networks. However, in real scenarios the urban road networks are directed and dynamic where each edge has a particular orientation and its weight changes according to traffic conditions such as traffic congestion and reversible lanes. erefore, in this study, we investigate mov- ing top-k spatial keyword queries in directed and dynamic road networks. Top-k keyword queries can be used for a wide range of applications in recommendation and decision support systems. For example, tourists may want to retrieve a sorted list of restaurants that serve Italian steak based on the shortest distance from their location and textual relevance to the query keywords. Tourists can issue a top-k spatial keyword query to the location-based services (LBS) to collect information about qualifying restaurants in their vicinity. However, through moving top-k spatial keyword queries if they does not like the results, they can simply keep moving, and the updated results will be provided until a desired restaurant is found. Typically, the query issuer follows the underlying road network to reach at the desired location. erefore, TkSK algorithms based on Euclidean space does not work in road networks. A road network is generally Hindawi Wireless Communications and Mobile Computing Volume 2018, Article ID 7373286, 19 pages https://doi.org/10.1155/2018/7373286

Transcript of Efficient Processing of Moving Top- Spatial Keyword Queries...

Page 1: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Research ArticleEfficient Processing of Moving Top-119896 Spatial KeywordQueries in Directed and Dynamic Road Networks

Muhammad Attique 1 Hyung-Ju Cho 2 and Tae-Sun Chung 3

1Department of Software Sejong University Republic of Korea2Department of Software Kyungpook National University Republic of Korea3Department of Software Ajou University Republic of Korea

Correspondence should be addressed to Tae-Sun Chung tschungajouackr

Received 28 May 2018 Revised 15 August 2018 Accepted 18 September 2018 Published 1 November 2018

Academic Editor Ke Guan

Copyright copy 2018 Muhammad Attique et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

A top-k spatial keyword (TkSk) query ranks objects based on the distance to the query location and textual relevance to the querykeywords Several solutions have been proposed for top-k spatial keyword queries However most of the studies focus on Euclideanspace or only investigate the snapshot queries where both the query and data object are static A few algorithms study TkSk queriesin undirected road networks where each edge is undirected and the distance between two points is the length of the shortest pathconnecting them However TkSk queries have not been thoroughly investigated in directed and dynamic spatial networks whereeach edge has a particular orientation and its weight changes according to the traffic conditions Therefore in this study we addressthis problemby presenting a newmethod calledCOSK for processing continuous top-k spatial keyword queries formoving queriesin directed and dynamic road networks We first propose an efficient framework to process snapshot TkSK queries Furthermorewe propose a safe-exit-based approach to monitor the validity of the results for moving TkSK queries Our experimental resultsdemonstrate that COSK significantly outperforms existing techniques in terms of query processing time and communication cost

1 Introduction

With the popularization of geo-tagged data (eg geo-taggedphotos videos check-ins and text messages) many onlinelocation-based services such as Google Maps Yahoo Mapsand Bing Maps have started providing useful information vialocation-based queries [1ndash4] Moreover textual descriptionsof points of interest eg hotels shopping malls and touristattractions are easily accessible on the Web These devel-opments demand techniques that efficiently process top-kspatial keyword queries that return a ranked list of the kbest facilities based on their proximity to the query locationand relevance to the query keywords Several algorithms havebeen proposed for processing top-k spatial keyword queriesin Euclidean space [5 6] Although few algorithms exist thatstudy keyword queries in a road network they all focuson undirected road networks However in real scenariosthe urban road networks are directed and dynamic whereeach edge has a particular orientation and its weight changes

according to traffic conditions such as traffic congestion andreversible lanes Therefore in this study we investigate mov-ing top-k spatial keyword queries in directed and dynamicroad networks

Top-k keyword queries can be used for a wide rangeof applications in recommendation and decision supportsystems For example tourists may want to retrieve a sortedlist of restaurants that serve Italian steak based on theshortest distance from their location and textual relevanceto the query keywords Tourists can issue a top-k spatialkeyword query to the location-based services (LBS) to collectinformation about qualifying restaurants in their vicinityHowever through moving top-k spatial keyword queries ifthey does not like the results they can simply keep movingand the updated results will be provided until a desiredrestaurant is found Typically the query issuer follows theunderlying road network to reach at the desired locationTherefore TkSK algorithms based on Euclidean space doesnot work in road networks A road network is generally

HindawiWireless Communications and Mobile ComputingVolume 2018 Article ID 7373286 19 pageshttpsdoiorg10115520187373286

2 Wireless Communications and Mobile Computing

2

q

3

2

1

1 1

2

1

2

1

2

1

1

3

1 2

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

n5

d6 (Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 1 Illustration of directed road network

modeled as a weighted directed graph where each edge hassome direction and its weight can vary according to the trafficconditions

Given a set of data objects 119863 = 1198891 1198892 119889|119863| querylocation and set of keywords theTkSKquery returns the bestk data objects from D according to their combined textualand spatial relevance to the query We use distance function119889119894119904119905(119902 119889) to represent the shortest network distance from q todata object d Figure 1 presents an example of a directed roadnetwork where rectangles represent the data objects witha textual description and the triangle represents the querylocation The number label on each edge indicates the weightof that edge such as the amount of time required to travelalong it eg 119889119894119904119905(1198891 1198991) = 1 and 119889119894119904119905(1198891 1198992) = 2 Considera scenario where a tourist is interested in finding an ldquoItalianRestaurantrdquo If an undirected road network is considered thetop-1 ldquoItalian Restaurantrdquo is 1198896 However in a directed roadnetwork the shortest path from q to 1198896 is (119902 997888rarr 1198993 997888rarr1198997 997888rarr 1198896) Therefore for a directed road network the top-1result is 1198893 because it is closer to the query location than 1198896Now consider that the tourist is looking for ldquoCafe BakeryrdquoThe data object 1198897 could score higher than data object 1198891because 1198897 (ldquoCafe and bakeryrdquo) is more textually relevantto query keywords than 1198892 (ldquoCaferdquo) and 119889119894119904119905(119902 1198897) is onlymarginally greater than 119889119894119904119905(119902 1198892)

Moving Top-k spatial keywords in directed and dynamicroad networks are useful for many location-based applica-tions However query processing is costly because movementof query object qmay invalidate the query results Thereforethe main challenge in moving TkSk is to maintain the fresh-ness of the query results when the query objects are movingfreely A straightforward approach is to increase the updatefrequency of the queryHowever this approachnot only com-promises the up-to-date query results but also increases thecomputation and communication overhead Because when-ever query object changes its location the query object has toreport its location to server which increases the communica-tion cost and server has to recompute the results again whichincreases the computation cost

To address the aforementioned challenges we first pre-sent an efficient processing technique of snapshot TkSKqueries in directed road networks Then we present a safe-exit-based approach for processing and monitoring movingTkSk queries where query object q is freely moving in adirected spatial network The safe exit point of query object qrepresents a boundary point between the safe region andnon-safe region of q A safe region of query points indicates thatthe query result remains valid if the query object lies withinits respective safe region Therefore the query results willonly be recomputed when q leaves its respective safe regionwhich significantly reduces the computation and communi-cation costs To the best of our knowledge this is the firstattempt to study moving top-k spatial keyword queries indirected and dynamic road networks

Below we summarize our contributions

(i) We study the problem of continuous monitoring ofmoving top-k spatial keyword queries in a directedand dynamic road networks

(ii) We present an algorithm tomonitor themoving TkSKqueries which efficiently computes the safe exit pointsfor query object q in a directed road network Thealgorithm significantly minimizes the computationand communication costs for moving queries

(iii) We also propose a method that monitors the validityof query results and safe region when weight of roadsegments is updated due to traffic conditions

(iv) Finally we conduct extensive experiments on realroad network datasets and demonstrate the superi-ority of the proposed algorithm over the existing ap-proach

The remainder of this paper is structured as followsSection 2 reviews the existing work on the processing of TkSkqueries on Euclidean and road networks Section 3 providesterminology definitions and describes the problem Section 4elaborates on the proposed query processing technique for

Wireless Communications and Mobile Computing 3

TkSK queries in directed road networks In Section 5 we pre-sent our safe-exit-based technique to process moving TkSKqueries Section 7 presents a performance analysis of theproposed technique Section 8 concludes this paper

2 Related Work

In this section we discuss some of the promising relatedstudies of top-k spatial keyword queries Our related workis divided into two sections Section 21 reviews snapshotTkSK queries and Section 22 presents the studies proposedto address moving TkSK queries

21 Snapshot Top-k Spatial Keyword Queries In recent yearsspatial keyword queries have drawn the attention of manyresearchers Several approaches have been proposed forranking spatial data objects Initially Zhou et al [7] workedon combining inverted indexes [8] and R-trees [9] Theyproposed three different hybrid indexing structures Theirstudy demonstrated that building an inverted index on topof an R-tree provides superior performance Hariharan et al[10] proposed the indexing structure KRlowast-tree by capturingthe joint distribution of keywords in space Ian de Felipe et al[11] proposed a data structure that combines an R-tree withtext signatures Each node of the R-tree exploits a signatureto indicate the presence of keywords in the subtree of thenode However both these approaches address only Booleankeyword queries in Euclidean space

Top-k spatial keyword queries where data objects areranked according to their combined textual and spatialrelevance to keyword queries were first studied by Cong etal [5] and Li et al [6] Both studies [6] integrate locationindexing and text indexing to generate IR-treesThese studiesprocess top-k spatial keyword queries only in Euclidean spaceand are not suitable for processing top-k spatial preferencequeries in road networks where the distance between objectsis determined by the shortest path connecting them LaterRocha et al [12] proposed the indexing technique S2I whichmaps each term in the vocabulary into a separate blockor aR tree for efficient processing of top-k spatial keywordqueries Zhang et al [13] proposed an m-closest keywordquery that returns the closest object based on distance andwhich matchesm query keywords

Top-k spatial keyword queries in road networks wereintroduced by Rocha et al [14] In particular they pro-posed three different indexing techniques (Basic IndexingEnhanced Indexing and Overlay Indexing) for processingspatial keyword queries in road networks

22 Moving Top-k Spatial Keyword Queries Recently re-search focus has shifted to the continuous processing ofspatial queries where query or data objects are arbitrarilymoving in road networks which is themost realistic scenarioConsiderable research effort has been undertaken to processmoving range k nearest neighbor (kNN) and reverse knearest neighbor queries (RkNN) [15ndash18] However there isa lack of efficient algorithms for moving top-k spatial key-word queries Initially Wu et al [19] and Huang et al [20]

Table 1 Comparisons with existing solutions

Algorithm Type Space Domain OrientationCong et al [5] Snapshot Euclidean No orientationRocha et al [14] Snapshot Static Road UndirectedWu et al [19] moving Euclidean No orientationHuang et al [20] moving Euclidean No orientationGuo et al [21] moving Static Road UndirectedLi et al [22] moving Static Road UndirectedCOSK moving Dynamic Road Directed

proposed different methods formonitoring top-k spatial key-word queries in Euclidean space Guo et al [21] studied mov-ing top-k spatial keyword queries on road networks Theypresented two methods for monitoring moving queries in ancontinuous manner that reduces the traversing of networkedges Later Li et al [22] proposed TPR-tree-based indexingtomonitor moving top-k spatial keyword queries In contrastto [21 22] in this study we consider moving top-k spatialkeyword queries in directed and dynamic road networkswhere each road segment has a particular orientation and itsweight changes due to according to traffic conditions

Table 1 compares our problem scenario with related workin terms of query type space domain and orientation of roadnetworks

3 Preliminaries

Section 31 defines the terms and notations used in this paperSection 32 formulates the problem using an example thatillustrates the general results of top-k spatial keyword queries

31 Definition of Terms and Notations

311 Road Network A road network is represented by aweighted directed graph 119866 = (119873119864119882) where N E and Wdenote the node set edge set and edge distance matrixrespectively The network distance of an edge changes de-pending on the traffic conditions Each edge is also assignedan orientation that is either undirected or directed Theundirected edge is represented by 119890 = (119899119904 119899119890) where 119899119904 and 119899119890are the boundary nodes 119899120573 of an edge whereas the directed

edge is represented by 119890 = 997888997888997888997888997888rarr(119899119904 119899119890) or 119890 = larr997888997888997888997888997888(119899119890 119899119904) Naturallythe arrow above the edge indicates the associated directionWe refer to 119899119904 as the starting node and 119899119890 as the ending nodeof an edge For example in Figure 1 1198996 is the starting node ofedge

997888997888997888997888997888rarr(1198996 1198992) whereas it is the ending node for edgelarr997888997888997888997888997888(1198996 1198995)Theparticular edgewhere a query object is located is called anactive edge It is important to note that the distance betweentwo points 1199011 and 1199012 is not symmetrical in directed roadnetworks (ie 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011)) For example inFigure 1 the 119889119894119904119905(1198893 1198894) = 3 whereas the 119889119894119904119905(1198894 1198893) = 11because shortest path from 1198894 to 1198893 is (1198894 997888rarr 1198996 997888rarr 1198992 997888rarr1198993 997888rarr 1198893)312 Segment Segment 119904 = (1199011 1199012) is the part of an edgebetween two points 1199011 and 1199012 on the edge An edge consists

4 Wireless Communications and Mobile Computing

of one or more segments An edge is also considered a seg-ment where the nodes are the end points of the edge Theweight of a segment (1199011 1199012) is denoted by119882(119904)32 Problem Formulation Similar to previous studies [5 1423] we assume each data object 119889 isin 119863 has a point location119889119897 in the road network and a text description 119889119905 Given aquery location 119902119897 a set of keywords 119902119905 and k number ofdata objects to return the top-k spatial keyword query 119876119896 isdefined as119876119896 = (119902119897 119902119905 119896) which takes three arguments andreturns the best k data objects from D according to a scorethat considers spatial proximity and text relevance The score120595(119889) of a data object d is defined by the following equation

120595 (119889) = 120583 (119889119905 119902119905)1 + 120572 sdot 120582 (119889119897 119902119897) (1)

where 120582(119889119897 119902119897) is the spatial relevance between 119889119897 and119902119897 120583(119889119905 119902119905) is the textual relevance between 119889119905 and 119902119905 and120572 is a positive real number that determines the importanceof one measure over the other For example if only textualrelevance is considered then 120572 = 0 If more importance isgiven to spatial relevance then 120572 gt 1

Spatial relevance (120582) is defined as the shortest distancebetween data objects d and q 120582(119889119897 119902119897) = 119889119894119904119905(119889119897 119902119897)Thus 119889119894119904119905(119889119894119897 119902119897) lt 119889119894119904119905(119889119895119897 119902119897) indicates that data object119889119894 is more spatially relevant to q than data object 119889119895 Thetextual relevance (120583) can be computed using any popularinformation retrieval model such as cosine similarity or thelanguage model In this study we use the cosine similarity be-tween 119889119905 and 119902119905 The textual relevance is defined as follows

120583 (119889119905 119902119905) = sum119905isin119902119905 119908119905(119889119905)119908119905(119902119905)radicsum119905isin119889119905 [119908119905(119889119905)]2 sum119905isin119902119905 [119908119905(119902119905)]2

(2)

The weight 119908119905(119889119905) = 1 + ln(119891119905(119889119905)) where 119891119905(119889119905) representsthe frequency of term t in 119889119905 The weight 119908119905(119902119905) = ln(1 +|119863|119889119891119905) where |119863| is the number of objects in D and 119889119891119905 isthe document frequency A higher 120583 means a higher textualrelevance to the query keywords We used the variation ofcosine similarity based on the significance factor 120579119905(119899) ofterm t in a document n where n represents the descriptionof data object 119889119905 or query keywords 119902119905 The significance120579119905(119899) = 119908119905(119899)radicsum119905isin119899(119908119905(119899))2 is the normalized weight of theterm in the document by taking into account the length ofthe document [24 25] Hence the textual relevance 120583(119889119905 119902119905)can be rewritten as

120583 (119889119905 119902119905) = sum119905isin119902119905

120579119905(119889119905)120579119905(119902119905) (3)

4 Query Processing System

In this section we present the proposed query processingsystem that indexes the data objects and prunes the irrelevantedges for efficient query processing In Section 41 we discussthe indexing framework and in Section 42 we present anefficient keyword query processing algorithm for snapshotqueries

41 Indexing Framework In this study our main work focu-ses on moving queries in a directed and dynamic road net-works We use a method similar to the enhanced techniquepresented in [12] as our basic framework for processingsnapshot queries in directed and dynamic road networksTheindexing framework combines a road network framework[1] for storing spatial information and an inverted file forindexing data objects For easy traversing of the networkwe store the adjacent nodes of each given node by storingnode id (119899119894119889) edge id (119890119894119889) the direction of the edge andthe weight of the edge The indexing framework consists oftwomain components a pruning component and an invertedfile component Figure 2 illustrates the main componentsof an indexing framework The pruning component firstprunes the edges that contain data objects irrelevant to thequery keyword To achieve this we introduced the highestsignificance 120579+119905 of a given term t in the description of objectslying on the edge The 120579+119905 on an edge is retrieved by a keycomposed of a pair of edge id and term id (119890119894119889 119905119894119889) The 120579+119905represents an upper-bound significance of any object lying onan edge with term t in its description The inverted list of aterm t on an edge is accessed only if the upper-bound scorecomposed by 120579+119905 and theminimumnetwork distance betweenthe starting node of the edge and query q may return acandidate data object Naturally the edges with upper-boundscores smaller than the score of the k-th object found so farare pruned

We implement an inverted file for indexing data objectsThe inverted file contains a vocabulary and inverted lists Thevocabulary keeps general information about each term (suchas the frequency of the term) which is helpful in computingthe textual relevance of the data objects The inverted liststores the data objects located on the edge

997888997888997888997888997888rarr(119899119904 119899119890) that havea term t in their description An inverted list is identifiedby a key composed of (119890119894119889 119905119894119889) Each inverted file is a set ofinverted lists A separate inverted list is used for each term inthe object description An inverted list stores two attributesfor each data object first the distance between the data objectand the starting node 119889119894119904119905(119899119904 119889119894) second the significancefactor 120579(119905119894 119889119894) of the term 119905119894 in the description of the dataobject Note that the network distance between two points ina directed road network is not symmetrical (ie 119889119894119904119905(119899119904 119889119894) =119889119894119904119905(119889119894 119899119904)) Recall that the starting node is chosen accordingto the orientation of the edge such that the direction of theedge is from the node toward the data object In Figure 1 1198993is the starting node for 1198897 For bidirectional edges any of theadjacent nodes can act as a starting node

The proposed indexing scheme has three main advan-tages First the object search relevant to query keywords isvery efficient using the (119890119894119889 119905119894119889) pair Second inverted filesalso store the network distance between the starting node andthe data object which helps in accessing the data object in thedirected road network Finally the pruning technique allowsfor faster query processing by exploring fewer edges

Table 2 presents the notations used in this study

42 Query Processing Algorithm Our algorithm traverses theroad network incrementally in a similar fashion to Dijkstrarsquos

Wireless Communications and Mobile Computing 5

Inverted FileInverted Lists

PruningVocabulary

1 Compute upper-bound score using

2 Inverted list of a term is accessedonly if the upper-bound score is greater than kth object

dist(nq) and t+

lteid tidgt

lteid tidgt

tid Dftid

di dist(ns di) (d t )

+t

Figure 2 Indexing framework

Table 2 Summary of notations used in this paper

Notation DefinitionG = (N EW) Graph model of road network119889119894119904119905(119901119904 119901119890) Length of shortest path from 119901119904 to 119901119890 where 119901119904 and 119901119890 represent start and end points respectively119897119890119899(1199011 1199012) Length of segment connecting two points 1199011 and 1199012119899119894 Node in road network119890 = (119899119904 119899119890) Edge in edge set E where 119899119904 and 119899119890 are start and end points of the edge119899120573 Boundary node corresponding to start (119899119904) or end (119899119890) point of an edge119882(119890) Weight of edge (119899119904 119899119890)q Query point in road networkk A number that represents q can be among k number of closest facilities to a data object dD Set of data objects119863 = 1198891 1198892 119889|119863|119863(119899119904 119899119890) Set of data objects in an edge119901119886 Anchor point that corresponds to start point of expansion119875119878119864 Safe exit point where safe and non-safe regions of q intersect120572 query parameter120595(119889) Score of data object d120583(119889119905 119902119905) textual relevance of data object d with query keywords120582(119889119897 119902119897) Spatial relevance of data object d with query location119863+ Set of answer objects119863minus Set of non-answer objects119889+119897 Lowest answer object119889minusℎ Highest non-answer object

algorithm [26] Algorithm 1 returns the top-k data objectswith the highest scores according to their joint textual andspatial relevance to the query The algorithm begins byexploring the active edge where query object q is located andexpands the network in an increasing order of distance fromq Each entry in the min-heap has the form (119901119886 119890119889119892119890) where119901119886 indicates the anchor point in the edge For an active edgeq becomes the anchor point Otherwise for directed edgesending node 119899119890 becomes the anchor point For bidirectionaledges either of the adjacent boundary nodes ie 119899119904 or 119899119890becomes the anchor point Let119863119896 be the current set of top-kdata objects and 119904119896 be the score of the k-th data object in119863119896The 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) function retrieves the candidatedata objects 119863119888 located in an edge with a better score 120595(119889)than 119904119896 Next the 119863119896 set is updated with the data objects in

119863119888 and so does 119904119896The algorithm continues its expansion andinserts the adjacent edges of the boundary node until the heapis exhausted or the upper-bound score of the remaining dataobjects cannot have a better score than 119904119896 The upper-boundscore 120595(119899) of node n is computed using 119889119894119904119905(119899 119902) and themaximum textual relevance (120583 = 1)Therefore if120595(119899) le 119904119896 itmeans that even if there is unexplored data object dmatchingall query keywords its score can be better than the k-th objectin 119863119896 because 119889119894119904119905(119889 119902119897) ge 119889119894119904119905(119899 119902119897) This is certain owingto the fact that the algorithm strictly expands the node with aminimum distance to the query location

Algorithm 2 presents the 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) proce-dure which finds the candidate data objects This procedurehas twomain steps In the first step the upper-bound score ofthe edges is computed using a significance factor (120579119905 ) of a term

6 Wireless Communications and Mobile Computing

(1) Input Top-k spatial keyword query 119876119873 = (119902119897 119902119905 119896)(2) Output Top-k data objects with highest score(3) 119863119888 larr997888 0 lowastset of candidate data objects(4) max-heap 119863119896 larr997888 0 lowastcurrent Top-k set(5) 119904119896 larr997888 0 lowastk-th score in119863119896(6) min-heap larr997888 0(7) 119890119909119901119897119900119903119890119889 larr997888 0(8) min-heapinsert(119902119897 119890119889119892119890119886119888119905119894V119890)(9) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(10) update119863119896 and 119904119896 with 119889 isin 119863119888(11) whilemin-heap = 0 and (1(1 + 120572120582(119889119897 119902119897)) lt 119904119896) do(12) for each unexplored adjacent edge of (119901119886 119890119889119892119890) do(13) 119890119909119901119897119900119903119890119889 larr997888 119890119909119901119897119900119903119890119889 cup (119901119886 119890119889119892119890)(14) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(15) update119863119896 and 119904119896 with 119889 isin 119863119888(16) end(17) min-heapinsert(adjacent node edge)(18) end(19) return119863119896

Algorithm 1 EvaluateSnapshotQuery(Node 119899119894 Edge 119890119894)

(1) Input Edge ID 119890119894119889 Term ID 119905119894119889 score of k-th object 119904119896(2) Output candidate list119863119888(3) compute 120579119905(119890119894)(4) if 120579119905(119890119894) gt 0 then(5) 119898119886119909119904119888119900119903119890(119890119894) larr997888 119888119900119898119901119906119905119890119898119886119909119904119888119900119903119890(120579119905 119889119894119904119905(119890119894 119902119897))(6) end(7) if 119898119886119909119904119888119900119903119890(119890119894) gt 119904119896 then(8) for each data object in 119890119894 do(9) compute 119889119904119888119900119903119890(10) end(11) if 119889119904119888119900119903119890 gt 119904119896 then(12) 119863119888 larr997888 119863119888 cup 119889(13) end(14) end(15) return119863119888

Algorithm 2 CandidateSearch((119890119894119889 119905119894119889) 119904119896)

119905 isin 119902119905 and the shortest distance 119904119889119894119904119905(119890119894 119902119897) between the edgeand the query location In the next step the inverted lists ofterm t are fetched if their upper-bound score is greater than119904119896 In the inverted lists the objects with score 120595(119889) greaterthan 119904119896 are returned

To understand the proposed algorithm consider theroad network presented in Figure 1 Assume that a query qgenerated a top-1 keyword query with qd ldquoItalian Restau-rantrdquo For ease of presentation we assume 120572 = 1 and thetextual relevance 120583 is the number of occurrences of querykeywords in 119889119905 divided by the number of keywords in thedocument (description of data object) For example 120595(1198894) =120583(1198894119905 119902119905)(1 + 120582(1198894119897 119902119897)) = 058 = 006 The algorithmstarts the network expansion from an active edge

997888997888997888997888997888rarr(1198992 1198993)where q is the anchor point Note that the direction of the edge997888997888997888997888997888rarr(1198992 1198993) is from 1198992 to 1198993 Therefore the algorithm explores

only997888997888997888997888997888rarr(119902 1198993) There is no data object found in

997888997888997888997888997888rarr(119902 1198993) Then1198993 becomes the anchor point and edges (1198993 1198994) (1198993 1198995)and (1198993 1198997) are inserted in min-heap Next the 119888119886119899119889119904119890119886119903119888ℎfunction retrieves the candidate data objects on edges (1198993 1198994)(1198992 1198993) and (1198993 1198997) whose score is better than 119904119896 On edge(1198993 1198995) data object 1198893 is retrieved with 120595(1198893) = 02 Dataobject 1198893 is inserted in the119863119896 set and the value of 119904119896 is set to02 For edges (1198993 1198994) and (1198993 1198997) there is no candidate objectfound because 1198892119905 (ldquoCaferdquo) and 1198897119905 (ldquoCafe and Bakeryrdquo) donot match with 119902119905 The algorithm continues expanding theedges whose upper-bound score is greater than 119904119896 The edge997888997888997888997888997888rarr(1198997 1198992) is explored next The upper-bound score of

997888997888997888997888997888rarr(1198997 1198992)is 17 which is less than 119904119896 Similarly for edge

larr997888997888997888997888997888(1198996 1198995) theupper-bound score is 058 lt 119904119896 Therefore the algorithmterminates and reports 1198893 as the top-1 result

Wireless Communications and Mobile Computing 7

q

q issues TkSK query at p1

Server returns a set of objects for p1

Figure 3 Illustration of directed road network

qq issues TkSK query at p2

Server returns a set of objects for p2

Figure 4 Illustration of directed road network

5 Moving Top-119896 Spatial Keyword Queries

In this section we present our method to monitor themoving top-k spatial keyword queries where query objectsare moving in a directed road network Figure 3 providesan example of TkSK in road networks where query point qissues a TkSK query at point 1199011 Note that the numbers onthe arrows in the figure indicate the order of the steps Toobtain top-k results at 1199011 the server executes Algorithm 1as mentioned in Section 42 Now consider that the queryobject is moved to 1199012 as shown in Figure 4 to retrieve thetop-k results at point 1199012 The simple method is to repeat theprocedure executed at 1199011 However the use of recomputationwhenever query q changes its location significantly increasesthe computation cost Furthermore it also increases thecommunication overhead because the query object mustreport its location whenever it moves and the server mustsend the results set To address these issues we introduce thesafe exit approach

In the proposed framework the server computes safeexit points for a query object The server maintains a set ofmoving queries and the query result remains valid until thequery objects remain inside their respective safe exit pointsWhenever a query object leaves its safe exit points the serverrecomputes theTkSK and safe exit points for the query object

Next we present our method to compute the safe exitpoints for a query objectThe safe exit point represents a pointin the segment where a safe region and nonsafe region meetWe compute the safe exit point using the divide-and-conquertechnique Before presenting the detailed methodology wedefine the terminologies used in this section

Definition 1 (safe region) A portion of a road segment thatcan guarantee that as long as the query point lies in it itstop-k results remain valid

Definition 2 (answer objects 119863+) A data object d is calledan answer object of query q if the score of data object d(120595(119889) gt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called an answer object

of query q if the score of a data object d (120595(119889) gt 120595(119889119896+1))where 119889119896+1 represents the (119896+1)119905ℎ data object in the directedroad network In other words we can state that all answerobjects are top-k results of query q

Definition 3 (nonanswer objects119863minus) A data object d is calleda nonanswer object of query q if the score of data object d(120595(119889) lt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called a nonanswerobject of query q if the score of data object d (120595(119889) lt 120595(119889119896))where 119889119896 represents the kth data object in the directed roadnetwork That is we can say that all answer objects are top-k results of query q Therefore we can state that none of thenonanswer objects are in the top-k results of query q

Definition 4 (lowest answer object 119863+119897 ) An answer object119889+ isin 119863+ is called a lowest answer object to a point 119901 isin 119866such that 120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901)where120595(119889+119897 )119901 represents the score of the lowest answer objectat point p In other words 120595(119889+119897 )119901 lt 120595(119889+119886 )119901 at point p where119889+119886 is any other answer object in the 119863+ setDefinition 5 (highest nonanswer object 119863minusℎ) A nonanswerobject 119889minus isin 119863minus is called a highest nonanswer object toa point 119901 isin 119866 such that 120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889+|119889minus|)119901) where 120595(119889+ℎ)119901 represents the score of thehighest nonanswer object at point p In other words the120595(119889minus119897 )119901 lt 120595(119889minus119886 )119901 at point p where 119889minus119886 is any other nonanswerobject in the 119863minus set

As discussed earlier the main challenge in the continuousprocessing of moving TkSK is to maintain the validity of theresult set because the movement of query objects can nullifythe result set To monitor the validity of the result set wepropose a safe-region-based approach

51 Computation of Safe Exit Points In this section wepresent our technique to compute the safe exit points Themain goal is to find a point in the road network where the

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

2 Wireless Communications and Mobile Computing

2

q

3

2

1

1 1

2

1

2

1

2

1

1

3

1 2

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

n5

d6 (Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 1 Illustration of directed road network

modeled as a weighted directed graph where each edge hassome direction and its weight can vary according to the trafficconditions

Given a set of data objects 119863 = 1198891 1198892 119889|119863| querylocation and set of keywords theTkSKquery returns the bestk data objects from D according to their combined textualand spatial relevance to the query We use distance function119889119894119904119905(119902 119889) to represent the shortest network distance from q todata object d Figure 1 presents an example of a directed roadnetwork where rectangles represent the data objects witha textual description and the triangle represents the querylocation The number label on each edge indicates the weightof that edge such as the amount of time required to travelalong it eg 119889119894119904119905(1198891 1198991) = 1 and 119889119894119904119905(1198891 1198992) = 2 Considera scenario where a tourist is interested in finding an ldquoItalianRestaurantrdquo If an undirected road network is considered thetop-1 ldquoItalian Restaurantrdquo is 1198896 However in a directed roadnetwork the shortest path from q to 1198896 is (119902 997888rarr 1198993 997888rarr1198997 997888rarr 1198896) Therefore for a directed road network the top-1result is 1198893 because it is closer to the query location than 1198896Now consider that the tourist is looking for ldquoCafe BakeryrdquoThe data object 1198897 could score higher than data object 1198891because 1198897 (ldquoCafe and bakeryrdquo) is more textually relevantto query keywords than 1198892 (ldquoCaferdquo) and 119889119894119904119905(119902 1198897) is onlymarginally greater than 119889119894119904119905(119902 1198892)

Moving Top-k spatial keywords in directed and dynamicroad networks are useful for many location-based applica-tions However query processing is costly because movementof query object qmay invalidate the query results Thereforethe main challenge in moving TkSk is to maintain the fresh-ness of the query results when the query objects are movingfreely A straightforward approach is to increase the updatefrequency of the queryHowever this approachnot only com-promises the up-to-date query results but also increases thecomputation and communication overhead Because when-ever query object changes its location the query object has toreport its location to server which increases the communica-tion cost and server has to recompute the results again whichincreases the computation cost

To address the aforementioned challenges we first pre-sent an efficient processing technique of snapshot TkSKqueries in directed road networks Then we present a safe-exit-based approach for processing and monitoring movingTkSk queries where query object q is freely moving in adirected spatial network The safe exit point of query object qrepresents a boundary point between the safe region andnon-safe region of q A safe region of query points indicates thatthe query result remains valid if the query object lies withinits respective safe region Therefore the query results willonly be recomputed when q leaves its respective safe regionwhich significantly reduces the computation and communi-cation costs To the best of our knowledge this is the firstattempt to study moving top-k spatial keyword queries indirected and dynamic road networks

Below we summarize our contributions

(i) We study the problem of continuous monitoring ofmoving top-k spatial keyword queries in a directedand dynamic road networks

(ii) We present an algorithm tomonitor themoving TkSKqueries which efficiently computes the safe exit pointsfor query object q in a directed road network Thealgorithm significantly minimizes the computationand communication costs for moving queries

(iii) We also propose a method that monitors the validityof query results and safe region when weight of roadsegments is updated due to traffic conditions

(iv) Finally we conduct extensive experiments on realroad network datasets and demonstrate the superi-ority of the proposed algorithm over the existing ap-proach

The remainder of this paper is structured as followsSection 2 reviews the existing work on the processing of TkSkqueries on Euclidean and road networks Section 3 providesterminology definitions and describes the problem Section 4elaborates on the proposed query processing technique for

Wireless Communications and Mobile Computing 3

TkSK queries in directed road networks In Section 5 we pre-sent our safe-exit-based technique to process moving TkSKqueries Section 7 presents a performance analysis of theproposed technique Section 8 concludes this paper

2 Related Work

In this section we discuss some of the promising relatedstudies of top-k spatial keyword queries Our related workis divided into two sections Section 21 reviews snapshotTkSK queries and Section 22 presents the studies proposedto address moving TkSK queries

21 Snapshot Top-k Spatial Keyword Queries In recent yearsspatial keyword queries have drawn the attention of manyresearchers Several approaches have been proposed forranking spatial data objects Initially Zhou et al [7] workedon combining inverted indexes [8] and R-trees [9] Theyproposed three different hybrid indexing structures Theirstudy demonstrated that building an inverted index on topof an R-tree provides superior performance Hariharan et al[10] proposed the indexing structure KRlowast-tree by capturingthe joint distribution of keywords in space Ian de Felipe et al[11] proposed a data structure that combines an R-tree withtext signatures Each node of the R-tree exploits a signatureto indicate the presence of keywords in the subtree of thenode However both these approaches address only Booleankeyword queries in Euclidean space

Top-k spatial keyword queries where data objects areranked according to their combined textual and spatialrelevance to keyword queries were first studied by Cong etal [5] and Li et al [6] Both studies [6] integrate locationindexing and text indexing to generate IR-treesThese studiesprocess top-k spatial keyword queries only in Euclidean spaceand are not suitable for processing top-k spatial preferencequeries in road networks where the distance between objectsis determined by the shortest path connecting them LaterRocha et al [12] proposed the indexing technique S2I whichmaps each term in the vocabulary into a separate blockor aR tree for efficient processing of top-k spatial keywordqueries Zhang et al [13] proposed an m-closest keywordquery that returns the closest object based on distance andwhich matchesm query keywords

Top-k spatial keyword queries in road networks wereintroduced by Rocha et al [14] In particular they pro-posed three different indexing techniques (Basic IndexingEnhanced Indexing and Overlay Indexing) for processingspatial keyword queries in road networks

22 Moving Top-k Spatial Keyword Queries Recently re-search focus has shifted to the continuous processing ofspatial queries where query or data objects are arbitrarilymoving in road networks which is themost realistic scenarioConsiderable research effort has been undertaken to processmoving range k nearest neighbor (kNN) and reverse knearest neighbor queries (RkNN) [15ndash18] However there isa lack of efficient algorithms for moving top-k spatial key-word queries Initially Wu et al [19] and Huang et al [20]

Table 1 Comparisons with existing solutions

Algorithm Type Space Domain OrientationCong et al [5] Snapshot Euclidean No orientationRocha et al [14] Snapshot Static Road UndirectedWu et al [19] moving Euclidean No orientationHuang et al [20] moving Euclidean No orientationGuo et al [21] moving Static Road UndirectedLi et al [22] moving Static Road UndirectedCOSK moving Dynamic Road Directed

proposed different methods formonitoring top-k spatial key-word queries in Euclidean space Guo et al [21] studied mov-ing top-k spatial keyword queries on road networks Theypresented two methods for monitoring moving queries in ancontinuous manner that reduces the traversing of networkedges Later Li et al [22] proposed TPR-tree-based indexingtomonitor moving top-k spatial keyword queries In contrastto [21 22] in this study we consider moving top-k spatialkeyword queries in directed and dynamic road networkswhere each road segment has a particular orientation and itsweight changes due to according to traffic conditions

Table 1 compares our problem scenario with related workin terms of query type space domain and orientation of roadnetworks

3 Preliminaries

Section 31 defines the terms and notations used in this paperSection 32 formulates the problem using an example thatillustrates the general results of top-k spatial keyword queries

31 Definition of Terms and Notations

311 Road Network A road network is represented by aweighted directed graph 119866 = (119873119864119882) where N E and Wdenote the node set edge set and edge distance matrixrespectively The network distance of an edge changes de-pending on the traffic conditions Each edge is also assignedan orientation that is either undirected or directed Theundirected edge is represented by 119890 = (119899119904 119899119890) where 119899119904 and 119899119890are the boundary nodes 119899120573 of an edge whereas the directed

edge is represented by 119890 = 997888997888997888997888997888rarr(119899119904 119899119890) or 119890 = larr997888997888997888997888997888(119899119890 119899119904) Naturallythe arrow above the edge indicates the associated directionWe refer to 119899119904 as the starting node and 119899119890 as the ending nodeof an edge For example in Figure 1 1198996 is the starting node ofedge

997888997888997888997888997888rarr(1198996 1198992) whereas it is the ending node for edgelarr997888997888997888997888997888(1198996 1198995)Theparticular edgewhere a query object is located is called anactive edge It is important to note that the distance betweentwo points 1199011 and 1199012 is not symmetrical in directed roadnetworks (ie 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011)) For example inFigure 1 the 119889119894119904119905(1198893 1198894) = 3 whereas the 119889119894119904119905(1198894 1198893) = 11because shortest path from 1198894 to 1198893 is (1198894 997888rarr 1198996 997888rarr 1198992 997888rarr1198993 997888rarr 1198893)312 Segment Segment 119904 = (1199011 1199012) is the part of an edgebetween two points 1199011 and 1199012 on the edge An edge consists

4 Wireless Communications and Mobile Computing

of one or more segments An edge is also considered a seg-ment where the nodes are the end points of the edge Theweight of a segment (1199011 1199012) is denoted by119882(119904)32 Problem Formulation Similar to previous studies [5 1423] we assume each data object 119889 isin 119863 has a point location119889119897 in the road network and a text description 119889119905 Given aquery location 119902119897 a set of keywords 119902119905 and k number ofdata objects to return the top-k spatial keyword query 119876119896 isdefined as119876119896 = (119902119897 119902119905 119896) which takes three arguments andreturns the best k data objects from D according to a scorethat considers spatial proximity and text relevance The score120595(119889) of a data object d is defined by the following equation

120595 (119889) = 120583 (119889119905 119902119905)1 + 120572 sdot 120582 (119889119897 119902119897) (1)

where 120582(119889119897 119902119897) is the spatial relevance between 119889119897 and119902119897 120583(119889119905 119902119905) is the textual relevance between 119889119905 and 119902119905 and120572 is a positive real number that determines the importanceof one measure over the other For example if only textualrelevance is considered then 120572 = 0 If more importance isgiven to spatial relevance then 120572 gt 1

Spatial relevance (120582) is defined as the shortest distancebetween data objects d and q 120582(119889119897 119902119897) = 119889119894119904119905(119889119897 119902119897)Thus 119889119894119904119905(119889119894119897 119902119897) lt 119889119894119904119905(119889119895119897 119902119897) indicates that data object119889119894 is more spatially relevant to q than data object 119889119895 Thetextual relevance (120583) can be computed using any popularinformation retrieval model such as cosine similarity or thelanguage model In this study we use the cosine similarity be-tween 119889119905 and 119902119905 The textual relevance is defined as follows

120583 (119889119905 119902119905) = sum119905isin119902119905 119908119905(119889119905)119908119905(119902119905)radicsum119905isin119889119905 [119908119905(119889119905)]2 sum119905isin119902119905 [119908119905(119902119905)]2

(2)

The weight 119908119905(119889119905) = 1 + ln(119891119905(119889119905)) where 119891119905(119889119905) representsthe frequency of term t in 119889119905 The weight 119908119905(119902119905) = ln(1 +|119863|119889119891119905) where |119863| is the number of objects in D and 119889119891119905 isthe document frequency A higher 120583 means a higher textualrelevance to the query keywords We used the variation ofcosine similarity based on the significance factor 120579119905(119899) ofterm t in a document n where n represents the descriptionof data object 119889119905 or query keywords 119902119905 The significance120579119905(119899) = 119908119905(119899)radicsum119905isin119899(119908119905(119899))2 is the normalized weight of theterm in the document by taking into account the length ofthe document [24 25] Hence the textual relevance 120583(119889119905 119902119905)can be rewritten as

120583 (119889119905 119902119905) = sum119905isin119902119905

120579119905(119889119905)120579119905(119902119905) (3)

4 Query Processing System

In this section we present the proposed query processingsystem that indexes the data objects and prunes the irrelevantedges for efficient query processing In Section 41 we discussthe indexing framework and in Section 42 we present anefficient keyword query processing algorithm for snapshotqueries

41 Indexing Framework In this study our main work focu-ses on moving queries in a directed and dynamic road net-works We use a method similar to the enhanced techniquepresented in [12] as our basic framework for processingsnapshot queries in directed and dynamic road networksTheindexing framework combines a road network framework[1] for storing spatial information and an inverted file forindexing data objects For easy traversing of the networkwe store the adjacent nodes of each given node by storingnode id (119899119894119889) edge id (119890119894119889) the direction of the edge andthe weight of the edge The indexing framework consists oftwomain components a pruning component and an invertedfile component Figure 2 illustrates the main componentsof an indexing framework The pruning component firstprunes the edges that contain data objects irrelevant to thequery keyword To achieve this we introduced the highestsignificance 120579+119905 of a given term t in the description of objectslying on the edge The 120579+119905 on an edge is retrieved by a keycomposed of a pair of edge id and term id (119890119894119889 119905119894119889) The 120579+119905represents an upper-bound significance of any object lying onan edge with term t in its description The inverted list of aterm t on an edge is accessed only if the upper-bound scorecomposed by 120579+119905 and theminimumnetwork distance betweenthe starting node of the edge and query q may return acandidate data object Naturally the edges with upper-boundscores smaller than the score of the k-th object found so farare pruned

We implement an inverted file for indexing data objectsThe inverted file contains a vocabulary and inverted lists Thevocabulary keeps general information about each term (suchas the frequency of the term) which is helpful in computingthe textual relevance of the data objects The inverted liststores the data objects located on the edge

997888997888997888997888997888rarr(119899119904 119899119890) that havea term t in their description An inverted list is identifiedby a key composed of (119890119894119889 119905119894119889) Each inverted file is a set ofinverted lists A separate inverted list is used for each term inthe object description An inverted list stores two attributesfor each data object first the distance between the data objectand the starting node 119889119894119904119905(119899119904 119889119894) second the significancefactor 120579(119905119894 119889119894) of the term 119905119894 in the description of the dataobject Note that the network distance between two points ina directed road network is not symmetrical (ie 119889119894119904119905(119899119904 119889119894) =119889119894119904119905(119889119894 119899119904)) Recall that the starting node is chosen accordingto the orientation of the edge such that the direction of theedge is from the node toward the data object In Figure 1 1198993is the starting node for 1198897 For bidirectional edges any of theadjacent nodes can act as a starting node

The proposed indexing scheme has three main advan-tages First the object search relevant to query keywords isvery efficient using the (119890119894119889 119905119894119889) pair Second inverted filesalso store the network distance between the starting node andthe data object which helps in accessing the data object in thedirected road network Finally the pruning technique allowsfor faster query processing by exploring fewer edges

Table 2 presents the notations used in this study

42 Query Processing Algorithm Our algorithm traverses theroad network incrementally in a similar fashion to Dijkstrarsquos

Wireless Communications and Mobile Computing 5

Inverted FileInverted Lists

PruningVocabulary

1 Compute upper-bound score using

2 Inverted list of a term is accessedonly if the upper-bound score is greater than kth object

dist(nq) and t+

lteid tidgt

lteid tidgt

tid Dftid

di dist(ns di) (d t )

+t

Figure 2 Indexing framework

Table 2 Summary of notations used in this paper

Notation DefinitionG = (N EW) Graph model of road network119889119894119904119905(119901119904 119901119890) Length of shortest path from 119901119904 to 119901119890 where 119901119904 and 119901119890 represent start and end points respectively119897119890119899(1199011 1199012) Length of segment connecting two points 1199011 and 1199012119899119894 Node in road network119890 = (119899119904 119899119890) Edge in edge set E where 119899119904 and 119899119890 are start and end points of the edge119899120573 Boundary node corresponding to start (119899119904) or end (119899119890) point of an edge119882(119890) Weight of edge (119899119904 119899119890)q Query point in road networkk A number that represents q can be among k number of closest facilities to a data object dD Set of data objects119863 = 1198891 1198892 119889|119863|119863(119899119904 119899119890) Set of data objects in an edge119901119886 Anchor point that corresponds to start point of expansion119875119878119864 Safe exit point where safe and non-safe regions of q intersect120572 query parameter120595(119889) Score of data object d120583(119889119905 119902119905) textual relevance of data object d with query keywords120582(119889119897 119902119897) Spatial relevance of data object d with query location119863+ Set of answer objects119863minus Set of non-answer objects119889+119897 Lowest answer object119889minusℎ Highest non-answer object

algorithm [26] Algorithm 1 returns the top-k data objectswith the highest scores according to their joint textual andspatial relevance to the query The algorithm begins byexploring the active edge where query object q is located andexpands the network in an increasing order of distance fromq Each entry in the min-heap has the form (119901119886 119890119889119892119890) where119901119886 indicates the anchor point in the edge For an active edgeq becomes the anchor point Otherwise for directed edgesending node 119899119890 becomes the anchor point For bidirectionaledges either of the adjacent boundary nodes ie 119899119904 or 119899119890becomes the anchor point Let119863119896 be the current set of top-kdata objects and 119904119896 be the score of the k-th data object in119863119896The 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) function retrieves the candidatedata objects 119863119888 located in an edge with a better score 120595(119889)than 119904119896 Next the 119863119896 set is updated with the data objects in

119863119888 and so does 119904119896The algorithm continues its expansion andinserts the adjacent edges of the boundary node until the heapis exhausted or the upper-bound score of the remaining dataobjects cannot have a better score than 119904119896 The upper-boundscore 120595(119899) of node n is computed using 119889119894119904119905(119899 119902) and themaximum textual relevance (120583 = 1)Therefore if120595(119899) le 119904119896 itmeans that even if there is unexplored data object dmatchingall query keywords its score can be better than the k-th objectin 119863119896 because 119889119894119904119905(119889 119902119897) ge 119889119894119904119905(119899 119902119897) This is certain owingto the fact that the algorithm strictly expands the node with aminimum distance to the query location

Algorithm 2 presents the 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) proce-dure which finds the candidate data objects This procedurehas twomain steps In the first step the upper-bound score ofthe edges is computed using a significance factor (120579119905 ) of a term

6 Wireless Communications and Mobile Computing

(1) Input Top-k spatial keyword query 119876119873 = (119902119897 119902119905 119896)(2) Output Top-k data objects with highest score(3) 119863119888 larr997888 0 lowastset of candidate data objects(4) max-heap 119863119896 larr997888 0 lowastcurrent Top-k set(5) 119904119896 larr997888 0 lowastk-th score in119863119896(6) min-heap larr997888 0(7) 119890119909119901119897119900119903119890119889 larr997888 0(8) min-heapinsert(119902119897 119890119889119892119890119886119888119905119894V119890)(9) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(10) update119863119896 and 119904119896 with 119889 isin 119863119888(11) whilemin-heap = 0 and (1(1 + 120572120582(119889119897 119902119897)) lt 119904119896) do(12) for each unexplored adjacent edge of (119901119886 119890119889119892119890) do(13) 119890119909119901119897119900119903119890119889 larr997888 119890119909119901119897119900119903119890119889 cup (119901119886 119890119889119892119890)(14) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(15) update119863119896 and 119904119896 with 119889 isin 119863119888(16) end(17) min-heapinsert(adjacent node edge)(18) end(19) return119863119896

Algorithm 1 EvaluateSnapshotQuery(Node 119899119894 Edge 119890119894)

(1) Input Edge ID 119890119894119889 Term ID 119905119894119889 score of k-th object 119904119896(2) Output candidate list119863119888(3) compute 120579119905(119890119894)(4) if 120579119905(119890119894) gt 0 then(5) 119898119886119909119904119888119900119903119890(119890119894) larr997888 119888119900119898119901119906119905119890119898119886119909119904119888119900119903119890(120579119905 119889119894119904119905(119890119894 119902119897))(6) end(7) if 119898119886119909119904119888119900119903119890(119890119894) gt 119904119896 then(8) for each data object in 119890119894 do(9) compute 119889119904119888119900119903119890(10) end(11) if 119889119904119888119900119903119890 gt 119904119896 then(12) 119863119888 larr997888 119863119888 cup 119889(13) end(14) end(15) return119863119888

Algorithm 2 CandidateSearch((119890119894119889 119905119894119889) 119904119896)

119905 isin 119902119905 and the shortest distance 119904119889119894119904119905(119890119894 119902119897) between the edgeand the query location In the next step the inverted lists ofterm t are fetched if their upper-bound score is greater than119904119896 In the inverted lists the objects with score 120595(119889) greaterthan 119904119896 are returned

To understand the proposed algorithm consider theroad network presented in Figure 1 Assume that a query qgenerated a top-1 keyword query with qd ldquoItalian Restau-rantrdquo For ease of presentation we assume 120572 = 1 and thetextual relevance 120583 is the number of occurrences of querykeywords in 119889119905 divided by the number of keywords in thedocument (description of data object) For example 120595(1198894) =120583(1198894119905 119902119905)(1 + 120582(1198894119897 119902119897)) = 058 = 006 The algorithmstarts the network expansion from an active edge

997888997888997888997888997888rarr(1198992 1198993)where q is the anchor point Note that the direction of the edge997888997888997888997888997888rarr(1198992 1198993) is from 1198992 to 1198993 Therefore the algorithm explores

only997888997888997888997888997888rarr(119902 1198993) There is no data object found in

997888997888997888997888997888rarr(119902 1198993) Then1198993 becomes the anchor point and edges (1198993 1198994) (1198993 1198995)and (1198993 1198997) are inserted in min-heap Next the 119888119886119899119889119904119890119886119903119888ℎfunction retrieves the candidate data objects on edges (1198993 1198994)(1198992 1198993) and (1198993 1198997) whose score is better than 119904119896 On edge(1198993 1198995) data object 1198893 is retrieved with 120595(1198893) = 02 Dataobject 1198893 is inserted in the119863119896 set and the value of 119904119896 is set to02 For edges (1198993 1198994) and (1198993 1198997) there is no candidate objectfound because 1198892119905 (ldquoCaferdquo) and 1198897119905 (ldquoCafe and Bakeryrdquo) donot match with 119902119905 The algorithm continues expanding theedges whose upper-bound score is greater than 119904119896 The edge997888997888997888997888997888rarr(1198997 1198992) is explored next The upper-bound score of

997888997888997888997888997888rarr(1198997 1198992)is 17 which is less than 119904119896 Similarly for edge

larr997888997888997888997888997888(1198996 1198995) theupper-bound score is 058 lt 119904119896 Therefore the algorithmterminates and reports 1198893 as the top-1 result

Wireless Communications and Mobile Computing 7

q

q issues TkSK query at p1

Server returns a set of objects for p1

Figure 3 Illustration of directed road network

qq issues TkSK query at p2

Server returns a set of objects for p2

Figure 4 Illustration of directed road network

5 Moving Top-119896 Spatial Keyword Queries

In this section we present our method to monitor themoving top-k spatial keyword queries where query objectsare moving in a directed road network Figure 3 providesan example of TkSK in road networks where query point qissues a TkSK query at point 1199011 Note that the numbers onthe arrows in the figure indicate the order of the steps Toobtain top-k results at 1199011 the server executes Algorithm 1as mentioned in Section 42 Now consider that the queryobject is moved to 1199012 as shown in Figure 4 to retrieve thetop-k results at point 1199012 The simple method is to repeat theprocedure executed at 1199011 However the use of recomputationwhenever query q changes its location significantly increasesthe computation cost Furthermore it also increases thecommunication overhead because the query object mustreport its location whenever it moves and the server mustsend the results set To address these issues we introduce thesafe exit approach

In the proposed framework the server computes safeexit points for a query object The server maintains a set ofmoving queries and the query result remains valid until thequery objects remain inside their respective safe exit pointsWhenever a query object leaves its safe exit points the serverrecomputes theTkSK and safe exit points for the query object

Next we present our method to compute the safe exitpoints for a query objectThe safe exit point represents a pointin the segment where a safe region and nonsafe region meetWe compute the safe exit point using the divide-and-conquertechnique Before presenting the detailed methodology wedefine the terminologies used in this section

Definition 1 (safe region) A portion of a road segment thatcan guarantee that as long as the query point lies in it itstop-k results remain valid

Definition 2 (answer objects 119863+) A data object d is calledan answer object of query q if the score of data object d(120595(119889) gt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called an answer object

of query q if the score of a data object d (120595(119889) gt 120595(119889119896+1))where 119889119896+1 represents the (119896+1)119905ℎ data object in the directedroad network In other words we can state that all answerobjects are top-k results of query q

Definition 3 (nonanswer objects119863minus) A data object d is calleda nonanswer object of query q if the score of data object d(120595(119889) lt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called a nonanswerobject of query q if the score of data object d (120595(119889) lt 120595(119889119896))where 119889119896 represents the kth data object in the directed roadnetwork That is we can say that all answer objects are top-k results of query q Therefore we can state that none of thenonanswer objects are in the top-k results of query q

Definition 4 (lowest answer object 119863+119897 ) An answer object119889+ isin 119863+ is called a lowest answer object to a point 119901 isin 119866such that 120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901)where120595(119889+119897 )119901 represents the score of the lowest answer objectat point p In other words 120595(119889+119897 )119901 lt 120595(119889+119886 )119901 at point p where119889+119886 is any other answer object in the 119863+ setDefinition 5 (highest nonanswer object 119863minusℎ) A nonanswerobject 119889minus isin 119863minus is called a highest nonanswer object toa point 119901 isin 119866 such that 120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889+|119889minus|)119901) where 120595(119889+ℎ)119901 represents the score of thehighest nonanswer object at point p In other words the120595(119889minus119897 )119901 lt 120595(119889minus119886 )119901 at point p where 119889minus119886 is any other nonanswerobject in the 119863minus set

As discussed earlier the main challenge in the continuousprocessing of moving TkSK is to maintain the validity of theresult set because the movement of query objects can nullifythe result set To monitor the validity of the result set wepropose a safe-region-based approach

51 Computation of Safe Exit Points In this section wepresent our technique to compute the safe exit points Themain goal is to find a point in the road network where the

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 3

TkSK queries in directed road networks In Section 5 we pre-sent our safe-exit-based technique to process moving TkSKqueries Section 7 presents a performance analysis of theproposed technique Section 8 concludes this paper

2 Related Work

In this section we discuss some of the promising relatedstudies of top-k spatial keyword queries Our related workis divided into two sections Section 21 reviews snapshotTkSK queries and Section 22 presents the studies proposedto address moving TkSK queries

21 Snapshot Top-k Spatial Keyword Queries In recent yearsspatial keyword queries have drawn the attention of manyresearchers Several approaches have been proposed forranking spatial data objects Initially Zhou et al [7] workedon combining inverted indexes [8] and R-trees [9] Theyproposed three different hybrid indexing structures Theirstudy demonstrated that building an inverted index on topof an R-tree provides superior performance Hariharan et al[10] proposed the indexing structure KRlowast-tree by capturingthe joint distribution of keywords in space Ian de Felipe et al[11] proposed a data structure that combines an R-tree withtext signatures Each node of the R-tree exploits a signatureto indicate the presence of keywords in the subtree of thenode However both these approaches address only Booleankeyword queries in Euclidean space

Top-k spatial keyword queries where data objects areranked according to their combined textual and spatialrelevance to keyword queries were first studied by Cong etal [5] and Li et al [6] Both studies [6] integrate locationindexing and text indexing to generate IR-treesThese studiesprocess top-k spatial keyword queries only in Euclidean spaceand are not suitable for processing top-k spatial preferencequeries in road networks where the distance between objectsis determined by the shortest path connecting them LaterRocha et al [12] proposed the indexing technique S2I whichmaps each term in the vocabulary into a separate blockor aR tree for efficient processing of top-k spatial keywordqueries Zhang et al [13] proposed an m-closest keywordquery that returns the closest object based on distance andwhich matchesm query keywords

Top-k spatial keyword queries in road networks wereintroduced by Rocha et al [14] In particular they pro-posed three different indexing techniques (Basic IndexingEnhanced Indexing and Overlay Indexing) for processingspatial keyword queries in road networks

22 Moving Top-k Spatial Keyword Queries Recently re-search focus has shifted to the continuous processing ofspatial queries where query or data objects are arbitrarilymoving in road networks which is themost realistic scenarioConsiderable research effort has been undertaken to processmoving range k nearest neighbor (kNN) and reverse knearest neighbor queries (RkNN) [15ndash18] However there isa lack of efficient algorithms for moving top-k spatial key-word queries Initially Wu et al [19] and Huang et al [20]

Table 1 Comparisons with existing solutions

Algorithm Type Space Domain OrientationCong et al [5] Snapshot Euclidean No orientationRocha et al [14] Snapshot Static Road UndirectedWu et al [19] moving Euclidean No orientationHuang et al [20] moving Euclidean No orientationGuo et al [21] moving Static Road UndirectedLi et al [22] moving Static Road UndirectedCOSK moving Dynamic Road Directed

proposed different methods formonitoring top-k spatial key-word queries in Euclidean space Guo et al [21] studied mov-ing top-k spatial keyword queries on road networks Theypresented two methods for monitoring moving queries in ancontinuous manner that reduces the traversing of networkedges Later Li et al [22] proposed TPR-tree-based indexingtomonitor moving top-k spatial keyword queries In contrastto [21 22] in this study we consider moving top-k spatialkeyword queries in directed and dynamic road networkswhere each road segment has a particular orientation and itsweight changes due to according to traffic conditions

Table 1 compares our problem scenario with related workin terms of query type space domain and orientation of roadnetworks

3 Preliminaries

Section 31 defines the terms and notations used in this paperSection 32 formulates the problem using an example thatillustrates the general results of top-k spatial keyword queries

31 Definition of Terms and Notations

311 Road Network A road network is represented by aweighted directed graph 119866 = (119873119864119882) where N E and Wdenote the node set edge set and edge distance matrixrespectively The network distance of an edge changes de-pending on the traffic conditions Each edge is also assignedan orientation that is either undirected or directed Theundirected edge is represented by 119890 = (119899119904 119899119890) where 119899119904 and 119899119890are the boundary nodes 119899120573 of an edge whereas the directed

edge is represented by 119890 = 997888997888997888997888997888rarr(119899119904 119899119890) or 119890 = larr997888997888997888997888997888(119899119890 119899119904) Naturallythe arrow above the edge indicates the associated directionWe refer to 119899119904 as the starting node and 119899119890 as the ending nodeof an edge For example in Figure 1 1198996 is the starting node ofedge

997888997888997888997888997888rarr(1198996 1198992) whereas it is the ending node for edgelarr997888997888997888997888997888(1198996 1198995)Theparticular edgewhere a query object is located is called anactive edge It is important to note that the distance betweentwo points 1199011 and 1199012 is not symmetrical in directed roadnetworks (ie 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011)) For example inFigure 1 the 119889119894119904119905(1198893 1198894) = 3 whereas the 119889119894119904119905(1198894 1198893) = 11because shortest path from 1198894 to 1198893 is (1198894 997888rarr 1198996 997888rarr 1198992 997888rarr1198993 997888rarr 1198893)312 Segment Segment 119904 = (1199011 1199012) is the part of an edgebetween two points 1199011 and 1199012 on the edge An edge consists

4 Wireless Communications and Mobile Computing

of one or more segments An edge is also considered a seg-ment where the nodes are the end points of the edge Theweight of a segment (1199011 1199012) is denoted by119882(119904)32 Problem Formulation Similar to previous studies [5 1423] we assume each data object 119889 isin 119863 has a point location119889119897 in the road network and a text description 119889119905 Given aquery location 119902119897 a set of keywords 119902119905 and k number ofdata objects to return the top-k spatial keyword query 119876119896 isdefined as119876119896 = (119902119897 119902119905 119896) which takes three arguments andreturns the best k data objects from D according to a scorethat considers spatial proximity and text relevance The score120595(119889) of a data object d is defined by the following equation

120595 (119889) = 120583 (119889119905 119902119905)1 + 120572 sdot 120582 (119889119897 119902119897) (1)

where 120582(119889119897 119902119897) is the spatial relevance between 119889119897 and119902119897 120583(119889119905 119902119905) is the textual relevance between 119889119905 and 119902119905 and120572 is a positive real number that determines the importanceof one measure over the other For example if only textualrelevance is considered then 120572 = 0 If more importance isgiven to spatial relevance then 120572 gt 1

Spatial relevance (120582) is defined as the shortest distancebetween data objects d and q 120582(119889119897 119902119897) = 119889119894119904119905(119889119897 119902119897)Thus 119889119894119904119905(119889119894119897 119902119897) lt 119889119894119904119905(119889119895119897 119902119897) indicates that data object119889119894 is more spatially relevant to q than data object 119889119895 Thetextual relevance (120583) can be computed using any popularinformation retrieval model such as cosine similarity or thelanguage model In this study we use the cosine similarity be-tween 119889119905 and 119902119905 The textual relevance is defined as follows

120583 (119889119905 119902119905) = sum119905isin119902119905 119908119905(119889119905)119908119905(119902119905)radicsum119905isin119889119905 [119908119905(119889119905)]2 sum119905isin119902119905 [119908119905(119902119905)]2

(2)

The weight 119908119905(119889119905) = 1 + ln(119891119905(119889119905)) where 119891119905(119889119905) representsthe frequency of term t in 119889119905 The weight 119908119905(119902119905) = ln(1 +|119863|119889119891119905) where |119863| is the number of objects in D and 119889119891119905 isthe document frequency A higher 120583 means a higher textualrelevance to the query keywords We used the variation ofcosine similarity based on the significance factor 120579119905(119899) ofterm t in a document n where n represents the descriptionof data object 119889119905 or query keywords 119902119905 The significance120579119905(119899) = 119908119905(119899)radicsum119905isin119899(119908119905(119899))2 is the normalized weight of theterm in the document by taking into account the length ofthe document [24 25] Hence the textual relevance 120583(119889119905 119902119905)can be rewritten as

120583 (119889119905 119902119905) = sum119905isin119902119905

120579119905(119889119905)120579119905(119902119905) (3)

4 Query Processing System

In this section we present the proposed query processingsystem that indexes the data objects and prunes the irrelevantedges for efficient query processing In Section 41 we discussthe indexing framework and in Section 42 we present anefficient keyword query processing algorithm for snapshotqueries

41 Indexing Framework In this study our main work focu-ses on moving queries in a directed and dynamic road net-works We use a method similar to the enhanced techniquepresented in [12] as our basic framework for processingsnapshot queries in directed and dynamic road networksTheindexing framework combines a road network framework[1] for storing spatial information and an inverted file forindexing data objects For easy traversing of the networkwe store the adjacent nodes of each given node by storingnode id (119899119894119889) edge id (119890119894119889) the direction of the edge andthe weight of the edge The indexing framework consists oftwomain components a pruning component and an invertedfile component Figure 2 illustrates the main componentsof an indexing framework The pruning component firstprunes the edges that contain data objects irrelevant to thequery keyword To achieve this we introduced the highestsignificance 120579+119905 of a given term t in the description of objectslying on the edge The 120579+119905 on an edge is retrieved by a keycomposed of a pair of edge id and term id (119890119894119889 119905119894119889) The 120579+119905represents an upper-bound significance of any object lying onan edge with term t in its description The inverted list of aterm t on an edge is accessed only if the upper-bound scorecomposed by 120579+119905 and theminimumnetwork distance betweenthe starting node of the edge and query q may return acandidate data object Naturally the edges with upper-boundscores smaller than the score of the k-th object found so farare pruned

We implement an inverted file for indexing data objectsThe inverted file contains a vocabulary and inverted lists Thevocabulary keeps general information about each term (suchas the frequency of the term) which is helpful in computingthe textual relevance of the data objects The inverted liststores the data objects located on the edge

997888997888997888997888997888rarr(119899119904 119899119890) that havea term t in their description An inverted list is identifiedby a key composed of (119890119894119889 119905119894119889) Each inverted file is a set ofinverted lists A separate inverted list is used for each term inthe object description An inverted list stores two attributesfor each data object first the distance between the data objectand the starting node 119889119894119904119905(119899119904 119889119894) second the significancefactor 120579(119905119894 119889119894) of the term 119905119894 in the description of the dataobject Note that the network distance between two points ina directed road network is not symmetrical (ie 119889119894119904119905(119899119904 119889119894) =119889119894119904119905(119889119894 119899119904)) Recall that the starting node is chosen accordingto the orientation of the edge such that the direction of theedge is from the node toward the data object In Figure 1 1198993is the starting node for 1198897 For bidirectional edges any of theadjacent nodes can act as a starting node

The proposed indexing scheme has three main advan-tages First the object search relevant to query keywords isvery efficient using the (119890119894119889 119905119894119889) pair Second inverted filesalso store the network distance between the starting node andthe data object which helps in accessing the data object in thedirected road network Finally the pruning technique allowsfor faster query processing by exploring fewer edges

Table 2 presents the notations used in this study

42 Query Processing Algorithm Our algorithm traverses theroad network incrementally in a similar fashion to Dijkstrarsquos

Wireless Communications and Mobile Computing 5

Inverted FileInverted Lists

PruningVocabulary

1 Compute upper-bound score using

2 Inverted list of a term is accessedonly if the upper-bound score is greater than kth object

dist(nq) and t+

lteid tidgt

lteid tidgt

tid Dftid

di dist(ns di) (d t )

+t

Figure 2 Indexing framework

Table 2 Summary of notations used in this paper

Notation DefinitionG = (N EW) Graph model of road network119889119894119904119905(119901119904 119901119890) Length of shortest path from 119901119904 to 119901119890 where 119901119904 and 119901119890 represent start and end points respectively119897119890119899(1199011 1199012) Length of segment connecting two points 1199011 and 1199012119899119894 Node in road network119890 = (119899119904 119899119890) Edge in edge set E where 119899119904 and 119899119890 are start and end points of the edge119899120573 Boundary node corresponding to start (119899119904) or end (119899119890) point of an edge119882(119890) Weight of edge (119899119904 119899119890)q Query point in road networkk A number that represents q can be among k number of closest facilities to a data object dD Set of data objects119863 = 1198891 1198892 119889|119863|119863(119899119904 119899119890) Set of data objects in an edge119901119886 Anchor point that corresponds to start point of expansion119875119878119864 Safe exit point where safe and non-safe regions of q intersect120572 query parameter120595(119889) Score of data object d120583(119889119905 119902119905) textual relevance of data object d with query keywords120582(119889119897 119902119897) Spatial relevance of data object d with query location119863+ Set of answer objects119863minus Set of non-answer objects119889+119897 Lowest answer object119889minusℎ Highest non-answer object

algorithm [26] Algorithm 1 returns the top-k data objectswith the highest scores according to their joint textual andspatial relevance to the query The algorithm begins byexploring the active edge where query object q is located andexpands the network in an increasing order of distance fromq Each entry in the min-heap has the form (119901119886 119890119889119892119890) where119901119886 indicates the anchor point in the edge For an active edgeq becomes the anchor point Otherwise for directed edgesending node 119899119890 becomes the anchor point For bidirectionaledges either of the adjacent boundary nodes ie 119899119904 or 119899119890becomes the anchor point Let119863119896 be the current set of top-kdata objects and 119904119896 be the score of the k-th data object in119863119896The 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) function retrieves the candidatedata objects 119863119888 located in an edge with a better score 120595(119889)than 119904119896 Next the 119863119896 set is updated with the data objects in

119863119888 and so does 119904119896The algorithm continues its expansion andinserts the adjacent edges of the boundary node until the heapis exhausted or the upper-bound score of the remaining dataobjects cannot have a better score than 119904119896 The upper-boundscore 120595(119899) of node n is computed using 119889119894119904119905(119899 119902) and themaximum textual relevance (120583 = 1)Therefore if120595(119899) le 119904119896 itmeans that even if there is unexplored data object dmatchingall query keywords its score can be better than the k-th objectin 119863119896 because 119889119894119904119905(119889 119902119897) ge 119889119894119904119905(119899 119902119897) This is certain owingto the fact that the algorithm strictly expands the node with aminimum distance to the query location

Algorithm 2 presents the 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) proce-dure which finds the candidate data objects This procedurehas twomain steps In the first step the upper-bound score ofthe edges is computed using a significance factor (120579119905 ) of a term

6 Wireless Communications and Mobile Computing

(1) Input Top-k spatial keyword query 119876119873 = (119902119897 119902119905 119896)(2) Output Top-k data objects with highest score(3) 119863119888 larr997888 0 lowastset of candidate data objects(4) max-heap 119863119896 larr997888 0 lowastcurrent Top-k set(5) 119904119896 larr997888 0 lowastk-th score in119863119896(6) min-heap larr997888 0(7) 119890119909119901119897119900119903119890119889 larr997888 0(8) min-heapinsert(119902119897 119890119889119892119890119886119888119905119894V119890)(9) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(10) update119863119896 and 119904119896 with 119889 isin 119863119888(11) whilemin-heap = 0 and (1(1 + 120572120582(119889119897 119902119897)) lt 119904119896) do(12) for each unexplored adjacent edge of (119901119886 119890119889119892119890) do(13) 119890119909119901119897119900119903119890119889 larr997888 119890119909119901119897119900119903119890119889 cup (119901119886 119890119889119892119890)(14) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(15) update119863119896 and 119904119896 with 119889 isin 119863119888(16) end(17) min-heapinsert(adjacent node edge)(18) end(19) return119863119896

Algorithm 1 EvaluateSnapshotQuery(Node 119899119894 Edge 119890119894)

(1) Input Edge ID 119890119894119889 Term ID 119905119894119889 score of k-th object 119904119896(2) Output candidate list119863119888(3) compute 120579119905(119890119894)(4) if 120579119905(119890119894) gt 0 then(5) 119898119886119909119904119888119900119903119890(119890119894) larr997888 119888119900119898119901119906119905119890119898119886119909119904119888119900119903119890(120579119905 119889119894119904119905(119890119894 119902119897))(6) end(7) if 119898119886119909119904119888119900119903119890(119890119894) gt 119904119896 then(8) for each data object in 119890119894 do(9) compute 119889119904119888119900119903119890(10) end(11) if 119889119904119888119900119903119890 gt 119904119896 then(12) 119863119888 larr997888 119863119888 cup 119889(13) end(14) end(15) return119863119888

Algorithm 2 CandidateSearch((119890119894119889 119905119894119889) 119904119896)

119905 isin 119902119905 and the shortest distance 119904119889119894119904119905(119890119894 119902119897) between the edgeand the query location In the next step the inverted lists ofterm t are fetched if their upper-bound score is greater than119904119896 In the inverted lists the objects with score 120595(119889) greaterthan 119904119896 are returned

To understand the proposed algorithm consider theroad network presented in Figure 1 Assume that a query qgenerated a top-1 keyword query with qd ldquoItalian Restau-rantrdquo For ease of presentation we assume 120572 = 1 and thetextual relevance 120583 is the number of occurrences of querykeywords in 119889119905 divided by the number of keywords in thedocument (description of data object) For example 120595(1198894) =120583(1198894119905 119902119905)(1 + 120582(1198894119897 119902119897)) = 058 = 006 The algorithmstarts the network expansion from an active edge

997888997888997888997888997888rarr(1198992 1198993)where q is the anchor point Note that the direction of the edge997888997888997888997888997888rarr(1198992 1198993) is from 1198992 to 1198993 Therefore the algorithm explores

only997888997888997888997888997888rarr(119902 1198993) There is no data object found in

997888997888997888997888997888rarr(119902 1198993) Then1198993 becomes the anchor point and edges (1198993 1198994) (1198993 1198995)and (1198993 1198997) are inserted in min-heap Next the 119888119886119899119889119904119890119886119903119888ℎfunction retrieves the candidate data objects on edges (1198993 1198994)(1198992 1198993) and (1198993 1198997) whose score is better than 119904119896 On edge(1198993 1198995) data object 1198893 is retrieved with 120595(1198893) = 02 Dataobject 1198893 is inserted in the119863119896 set and the value of 119904119896 is set to02 For edges (1198993 1198994) and (1198993 1198997) there is no candidate objectfound because 1198892119905 (ldquoCaferdquo) and 1198897119905 (ldquoCafe and Bakeryrdquo) donot match with 119902119905 The algorithm continues expanding theedges whose upper-bound score is greater than 119904119896 The edge997888997888997888997888997888rarr(1198997 1198992) is explored next The upper-bound score of

997888997888997888997888997888rarr(1198997 1198992)is 17 which is less than 119904119896 Similarly for edge

larr997888997888997888997888997888(1198996 1198995) theupper-bound score is 058 lt 119904119896 Therefore the algorithmterminates and reports 1198893 as the top-1 result

Wireless Communications and Mobile Computing 7

q

q issues TkSK query at p1

Server returns a set of objects for p1

Figure 3 Illustration of directed road network

qq issues TkSK query at p2

Server returns a set of objects for p2

Figure 4 Illustration of directed road network

5 Moving Top-119896 Spatial Keyword Queries

In this section we present our method to monitor themoving top-k spatial keyword queries where query objectsare moving in a directed road network Figure 3 providesan example of TkSK in road networks where query point qissues a TkSK query at point 1199011 Note that the numbers onthe arrows in the figure indicate the order of the steps Toobtain top-k results at 1199011 the server executes Algorithm 1as mentioned in Section 42 Now consider that the queryobject is moved to 1199012 as shown in Figure 4 to retrieve thetop-k results at point 1199012 The simple method is to repeat theprocedure executed at 1199011 However the use of recomputationwhenever query q changes its location significantly increasesthe computation cost Furthermore it also increases thecommunication overhead because the query object mustreport its location whenever it moves and the server mustsend the results set To address these issues we introduce thesafe exit approach

In the proposed framework the server computes safeexit points for a query object The server maintains a set ofmoving queries and the query result remains valid until thequery objects remain inside their respective safe exit pointsWhenever a query object leaves its safe exit points the serverrecomputes theTkSK and safe exit points for the query object

Next we present our method to compute the safe exitpoints for a query objectThe safe exit point represents a pointin the segment where a safe region and nonsafe region meetWe compute the safe exit point using the divide-and-conquertechnique Before presenting the detailed methodology wedefine the terminologies used in this section

Definition 1 (safe region) A portion of a road segment thatcan guarantee that as long as the query point lies in it itstop-k results remain valid

Definition 2 (answer objects 119863+) A data object d is calledan answer object of query q if the score of data object d(120595(119889) gt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called an answer object

of query q if the score of a data object d (120595(119889) gt 120595(119889119896+1))where 119889119896+1 represents the (119896+1)119905ℎ data object in the directedroad network In other words we can state that all answerobjects are top-k results of query q

Definition 3 (nonanswer objects119863minus) A data object d is calleda nonanswer object of query q if the score of data object d(120595(119889) lt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called a nonanswerobject of query q if the score of data object d (120595(119889) lt 120595(119889119896))where 119889119896 represents the kth data object in the directed roadnetwork That is we can say that all answer objects are top-k results of query q Therefore we can state that none of thenonanswer objects are in the top-k results of query q

Definition 4 (lowest answer object 119863+119897 ) An answer object119889+ isin 119863+ is called a lowest answer object to a point 119901 isin 119866such that 120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901)where120595(119889+119897 )119901 represents the score of the lowest answer objectat point p In other words 120595(119889+119897 )119901 lt 120595(119889+119886 )119901 at point p where119889+119886 is any other answer object in the 119863+ setDefinition 5 (highest nonanswer object 119863minusℎ) A nonanswerobject 119889minus isin 119863minus is called a highest nonanswer object toa point 119901 isin 119866 such that 120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889+|119889minus|)119901) where 120595(119889+ℎ)119901 represents the score of thehighest nonanswer object at point p In other words the120595(119889minus119897 )119901 lt 120595(119889minus119886 )119901 at point p where 119889minus119886 is any other nonanswerobject in the 119863minus set

As discussed earlier the main challenge in the continuousprocessing of moving TkSK is to maintain the validity of theresult set because the movement of query objects can nullifythe result set To monitor the validity of the result set wepropose a safe-region-based approach

51 Computation of Safe Exit Points In this section wepresent our technique to compute the safe exit points Themain goal is to find a point in the road network where the

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

4 Wireless Communications and Mobile Computing

of one or more segments An edge is also considered a seg-ment where the nodes are the end points of the edge Theweight of a segment (1199011 1199012) is denoted by119882(119904)32 Problem Formulation Similar to previous studies [5 1423] we assume each data object 119889 isin 119863 has a point location119889119897 in the road network and a text description 119889119905 Given aquery location 119902119897 a set of keywords 119902119905 and k number ofdata objects to return the top-k spatial keyword query 119876119896 isdefined as119876119896 = (119902119897 119902119905 119896) which takes three arguments andreturns the best k data objects from D according to a scorethat considers spatial proximity and text relevance The score120595(119889) of a data object d is defined by the following equation

120595 (119889) = 120583 (119889119905 119902119905)1 + 120572 sdot 120582 (119889119897 119902119897) (1)

where 120582(119889119897 119902119897) is the spatial relevance between 119889119897 and119902119897 120583(119889119905 119902119905) is the textual relevance between 119889119905 and 119902119905 and120572 is a positive real number that determines the importanceof one measure over the other For example if only textualrelevance is considered then 120572 = 0 If more importance isgiven to spatial relevance then 120572 gt 1

Spatial relevance (120582) is defined as the shortest distancebetween data objects d and q 120582(119889119897 119902119897) = 119889119894119904119905(119889119897 119902119897)Thus 119889119894119904119905(119889119894119897 119902119897) lt 119889119894119904119905(119889119895119897 119902119897) indicates that data object119889119894 is more spatially relevant to q than data object 119889119895 Thetextual relevance (120583) can be computed using any popularinformation retrieval model such as cosine similarity or thelanguage model In this study we use the cosine similarity be-tween 119889119905 and 119902119905 The textual relevance is defined as follows

120583 (119889119905 119902119905) = sum119905isin119902119905 119908119905(119889119905)119908119905(119902119905)radicsum119905isin119889119905 [119908119905(119889119905)]2 sum119905isin119902119905 [119908119905(119902119905)]2

(2)

The weight 119908119905(119889119905) = 1 + ln(119891119905(119889119905)) where 119891119905(119889119905) representsthe frequency of term t in 119889119905 The weight 119908119905(119902119905) = ln(1 +|119863|119889119891119905) where |119863| is the number of objects in D and 119889119891119905 isthe document frequency A higher 120583 means a higher textualrelevance to the query keywords We used the variation ofcosine similarity based on the significance factor 120579119905(119899) ofterm t in a document n where n represents the descriptionof data object 119889119905 or query keywords 119902119905 The significance120579119905(119899) = 119908119905(119899)radicsum119905isin119899(119908119905(119899))2 is the normalized weight of theterm in the document by taking into account the length ofthe document [24 25] Hence the textual relevance 120583(119889119905 119902119905)can be rewritten as

120583 (119889119905 119902119905) = sum119905isin119902119905

120579119905(119889119905)120579119905(119902119905) (3)

4 Query Processing System

In this section we present the proposed query processingsystem that indexes the data objects and prunes the irrelevantedges for efficient query processing In Section 41 we discussthe indexing framework and in Section 42 we present anefficient keyword query processing algorithm for snapshotqueries

41 Indexing Framework In this study our main work focu-ses on moving queries in a directed and dynamic road net-works We use a method similar to the enhanced techniquepresented in [12] as our basic framework for processingsnapshot queries in directed and dynamic road networksTheindexing framework combines a road network framework[1] for storing spatial information and an inverted file forindexing data objects For easy traversing of the networkwe store the adjacent nodes of each given node by storingnode id (119899119894119889) edge id (119890119894119889) the direction of the edge andthe weight of the edge The indexing framework consists oftwomain components a pruning component and an invertedfile component Figure 2 illustrates the main componentsof an indexing framework The pruning component firstprunes the edges that contain data objects irrelevant to thequery keyword To achieve this we introduced the highestsignificance 120579+119905 of a given term t in the description of objectslying on the edge The 120579+119905 on an edge is retrieved by a keycomposed of a pair of edge id and term id (119890119894119889 119905119894119889) The 120579+119905represents an upper-bound significance of any object lying onan edge with term t in its description The inverted list of aterm t on an edge is accessed only if the upper-bound scorecomposed by 120579+119905 and theminimumnetwork distance betweenthe starting node of the edge and query q may return acandidate data object Naturally the edges with upper-boundscores smaller than the score of the k-th object found so farare pruned

We implement an inverted file for indexing data objectsThe inverted file contains a vocabulary and inverted lists Thevocabulary keeps general information about each term (suchas the frequency of the term) which is helpful in computingthe textual relevance of the data objects The inverted liststores the data objects located on the edge

997888997888997888997888997888rarr(119899119904 119899119890) that havea term t in their description An inverted list is identifiedby a key composed of (119890119894119889 119905119894119889) Each inverted file is a set ofinverted lists A separate inverted list is used for each term inthe object description An inverted list stores two attributesfor each data object first the distance between the data objectand the starting node 119889119894119904119905(119899119904 119889119894) second the significancefactor 120579(119905119894 119889119894) of the term 119905119894 in the description of the dataobject Note that the network distance between two points ina directed road network is not symmetrical (ie 119889119894119904119905(119899119904 119889119894) =119889119894119904119905(119889119894 119899119904)) Recall that the starting node is chosen accordingto the orientation of the edge such that the direction of theedge is from the node toward the data object In Figure 1 1198993is the starting node for 1198897 For bidirectional edges any of theadjacent nodes can act as a starting node

The proposed indexing scheme has three main advan-tages First the object search relevant to query keywords isvery efficient using the (119890119894119889 119905119894119889) pair Second inverted filesalso store the network distance between the starting node andthe data object which helps in accessing the data object in thedirected road network Finally the pruning technique allowsfor faster query processing by exploring fewer edges

Table 2 presents the notations used in this study

42 Query Processing Algorithm Our algorithm traverses theroad network incrementally in a similar fashion to Dijkstrarsquos

Wireless Communications and Mobile Computing 5

Inverted FileInverted Lists

PruningVocabulary

1 Compute upper-bound score using

2 Inverted list of a term is accessedonly if the upper-bound score is greater than kth object

dist(nq) and t+

lteid tidgt

lteid tidgt

tid Dftid

di dist(ns di) (d t )

+t

Figure 2 Indexing framework

Table 2 Summary of notations used in this paper

Notation DefinitionG = (N EW) Graph model of road network119889119894119904119905(119901119904 119901119890) Length of shortest path from 119901119904 to 119901119890 where 119901119904 and 119901119890 represent start and end points respectively119897119890119899(1199011 1199012) Length of segment connecting two points 1199011 and 1199012119899119894 Node in road network119890 = (119899119904 119899119890) Edge in edge set E where 119899119904 and 119899119890 are start and end points of the edge119899120573 Boundary node corresponding to start (119899119904) or end (119899119890) point of an edge119882(119890) Weight of edge (119899119904 119899119890)q Query point in road networkk A number that represents q can be among k number of closest facilities to a data object dD Set of data objects119863 = 1198891 1198892 119889|119863|119863(119899119904 119899119890) Set of data objects in an edge119901119886 Anchor point that corresponds to start point of expansion119875119878119864 Safe exit point where safe and non-safe regions of q intersect120572 query parameter120595(119889) Score of data object d120583(119889119905 119902119905) textual relevance of data object d with query keywords120582(119889119897 119902119897) Spatial relevance of data object d with query location119863+ Set of answer objects119863minus Set of non-answer objects119889+119897 Lowest answer object119889minusℎ Highest non-answer object

algorithm [26] Algorithm 1 returns the top-k data objectswith the highest scores according to their joint textual andspatial relevance to the query The algorithm begins byexploring the active edge where query object q is located andexpands the network in an increasing order of distance fromq Each entry in the min-heap has the form (119901119886 119890119889119892119890) where119901119886 indicates the anchor point in the edge For an active edgeq becomes the anchor point Otherwise for directed edgesending node 119899119890 becomes the anchor point For bidirectionaledges either of the adjacent boundary nodes ie 119899119904 or 119899119890becomes the anchor point Let119863119896 be the current set of top-kdata objects and 119904119896 be the score of the k-th data object in119863119896The 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) function retrieves the candidatedata objects 119863119888 located in an edge with a better score 120595(119889)than 119904119896 Next the 119863119896 set is updated with the data objects in

119863119888 and so does 119904119896The algorithm continues its expansion andinserts the adjacent edges of the boundary node until the heapis exhausted or the upper-bound score of the remaining dataobjects cannot have a better score than 119904119896 The upper-boundscore 120595(119899) of node n is computed using 119889119894119904119905(119899 119902) and themaximum textual relevance (120583 = 1)Therefore if120595(119899) le 119904119896 itmeans that even if there is unexplored data object dmatchingall query keywords its score can be better than the k-th objectin 119863119896 because 119889119894119904119905(119889 119902119897) ge 119889119894119904119905(119899 119902119897) This is certain owingto the fact that the algorithm strictly expands the node with aminimum distance to the query location

Algorithm 2 presents the 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) proce-dure which finds the candidate data objects This procedurehas twomain steps In the first step the upper-bound score ofthe edges is computed using a significance factor (120579119905 ) of a term

6 Wireless Communications and Mobile Computing

(1) Input Top-k spatial keyword query 119876119873 = (119902119897 119902119905 119896)(2) Output Top-k data objects with highest score(3) 119863119888 larr997888 0 lowastset of candidate data objects(4) max-heap 119863119896 larr997888 0 lowastcurrent Top-k set(5) 119904119896 larr997888 0 lowastk-th score in119863119896(6) min-heap larr997888 0(7) 119890119909119901119897119900119903119890119889 larr997888 0(8) min-heapinsert(119902119897 119890119889119892119890119886119888119905119894V119890)(9) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(10) update119863119896 and 119904119896 with 119889 isin 119863119888(11) whilemin-heap = 0 and (1(1 + 120572120582(119889119897 119902119897)) lt 119904119896) do(12) for each unexplored adjacent edge of (119901119886 119890119889119892119890) do(13) 119890119909119901119897119900119903119890119889 larr997888 119890119909119901119897119900119903119890119889 cup (119901119886 119890119889119892119890)(14) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(15) update119863119896 and 119904119896 with 119889 isin 119863119888(16) end(17) min-heapinsert(adjacent node edge)(18) end(19) return119863119896

Algorithm 1 EvaluateSnapshotQuery(Node 119899119894 Edge 119890119894)

(1) Input Edge ID 119890119894119889 Term ID 119905119894119889 score of k-th object 119904119896(2) Output candidate list119863119888(3) compute 120579119905(119890119894)(4) if 120579119905(119890119894) gt 0 then(5) 119898119886119909119904119888119900119903119890(119890119894) larr997888 119888119900119898119901119906119905119890119898119886119909119904119888119900119903119890(120579119905 119889119894119904119905(119890119894 119902119897))(6) end(7) if 119898119886119909119904119888119900119903119890(119890119894) gt 119904119896 then(8) for each data object in 119890119894 do(9) compute 119889119904119888119900119903119890(10) end(11) if 119889119904119888119900119903119890 gt 119904119896 then(12) 119863119888 larr997888 119863119888 cup 119889(13) end(14) end(15) return119863119888

Algorithm 2 CandidateSearch((119890119894119889 119905119894119889) 119904119896)

119905 isin 119902119905 and the shortest distance 119904119889119894119904119905(119890119894 119902119897) between the edgeand the query location In the next step the inverted lists ofterm t are fetched if their upper-bound score is greater than119904119896 In the inverted lists the objects with score 120595(119889) greaterthan 119904119896 are returned

To understand the proposed algorithm consider theroad network presented in Figure 1 Assume that a query qgenerated a top-1 keyword query with qd ldquoItalian Restau-rantrdquo For ease of presentation we assume 120572 = 1 and thetextual relevance 120583 is the number of occurrences of querykeywords in 119889119905 divided by the number of keywords in thedocument (description of data object) For example 120595(1198894) =120583(1198894119905 119902119905)(1 + 120582(1198894119897 119902119897)) = 058 = 006 The algorithmstarts the network expansion from an active edge

997888997888997888997888997888rarr(1198992 1198993)where q is the anchor point Note that the direction of the edge997888997888997888997888997888rarr(1198992 1198993) is from 1198992 to 1198993 Therefore the algorithm explores

only997888997888997888997888997888rarr(119902 1198993) There is no data object found in

997888997888997888997888997888rarr(119902 1198993) Then1198993 becomes the anchor point and edges (1198993 1198994) (1198993 1198995)and (1198993 1198997) are inserted in min-heap Next the 119888119886119899119889119904119890119886119903119888ℎfunction retrieves the candidate data objects on edges (1198993 1198994)(1198992 1198993) and (1198993 1198997) whose score is better than 119904119896 On edge(1198993 1198995) data object 1198893 is retrieved with 120595(1198893) = 02 Dataobject 1198893 is inserted in the119863119896 set and the value of 119904119896 is set to02 For edges (1198993 1198994) and (1198993 1198997) there is no candidate objectfound because 1198892119905 (ldquoCaferdquo) and 1198897119905 (ldquoCafe and Bakeryrdquo) donot match with 119902119905 The algorithm continues expanding theedges whose upper-bound score is greater than 119904119896 The edge997888997888997888997888997888rarr(1198997 1198992) is explored next The upper-bound score of

997888997888997888997888997888rarr(1198997 1198992)is 17 which is less than 119904119896 Similarly for edge

larr997888997888997888997888997888(1198996 1198995) theupper-bound score is 058 lt 119904119896 Therefore the algorithmterminates and reports 1198893 as the top-1 result

Wireless Communications and Mobile Computing 7

q

q issues TkSK query at p1

Server returns a set of objects for p1

Figure 3 Illustration of directed road network

qq issues TkSK query at p2

Server returns a set of objects for p2

Figure 4 Illustration of directed road network

5 Moving Top-119896 Spatial Keyword Queries

In this section we present our method to monitor themoving top-k spatial keyword queries where query objectsare moving in a directed road network Figure 3 providesan example of TkSK in road networks where query point qissues a TkSK query at point 1199011 Note that the numbers onthe arrows in the figure indicate the order of the steps Toobtain top-k results at 1199011 the server executes Algorithm 1as mentioned in Section 42 Now consider that the queryobject is moved to 1199012 as shown in Figure 4 to retrieve thetop-k results at point 1199012 The simple method is to repeat theprocedure executed at 1199011 However the use of recomputationwhenever query q changes its location significantly increasesthe computation cost Furthermore it also increases thecommunication overhead because the query object mustreport its location whenever it moves and the server mustsend the results set To address these issues we introduce thesafe exit approach

In the proposed framework the server computes safeexit points for a query object The server maintains a set ofmoving queries and the query result remains valid until thequery objects remain inside their respective safe exit pointsWhenever a query object leaves its safe exit points the serverrecomputes theTkSK and safe exit points for the query object

Next we present our method to compute the safe exitpoints for a query objectThe safe exit point represents a pointin the segment where a safe region and nonsafe region meetWe compute the safe exit point using the divide-and-conquertechnique Before presenting the detailed methodology wedefine the terminologies used in this section

Definition 1 (safe region) A portion of a road segment thatcan guarantee that as long as the query point lies in it itstop-k results remain valid

Definition 2 (answer objects 119863+) A data object d is calledan answer object of query q if the score of data object d(120595(119889) gt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called an answer object

of query q if the score of a data object d (120595(119889) gt 120595(119889119896+1))where 119889119896+1 represents the (119896+1)119905ℎ data object in the directedroad network In other words we can state that all answerobjects are top-k results of query q

Definition 3 (nonanswer objects119863minus) A data object d is calleda nonanswer object of query q if the score of data object d(120595(119889) lt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called a nonanswerobject of query q if the score of data object d (120595(119889) lt 120595(119889119896))where 119889119896 represents the kth data object in the directed roadnetwork That is we can say that all answer objects are top-k results of query q Therefore we can state that none of thenonanswer objects are in the top-k results of query q

Definition 4 (lowest answer object 119863+119897 ) An answer object119889+ isin 119863+ is called a lowest answer object to a point 119901 isin 119866such that 120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901)where120595(119889+119897 )119901 represents the score of the lowest answer objectat point p In other words 120595(119889+119897 )119901 lt 120595(119889+119886 )119901 at point p where119889+119886 is any other answer object in the 119863+ setDefinition 5 (highest nonanswer object 119863minusℎ) A nonanswerobject 119889minus isin 119863minus is called a highest nonanswer object toa point 119901 isin 119866 such that 120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889+|119889minus|)119901) where 120595(119889+ℎ)119901 represents the score of thehighest nonanswer object at point p In other words the120595(119889minus119897 )119901 lt 120595(119889minus119886 )119901 at point p where 119889minus119886 is any other nonanswerobject in the 119863minus set

As discussed earlier the main challenge in the continuousprocessing of moving TkSK is to maintain the validity of theresult set because the movement of query objects can nullifythe result set To monitor the validity of the result set wepropose a safe-region-based approach

51 Computation of Safe Exit Points In this section wepresent our technique to compute the safe exit points Themain goal is to find a point in the road network where the

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 5

Inverted FileInverted Lists

PruningVocabulary

1 Compute upper-bound score using

2 Inverted list of a term is accessedonly if the upper-bound score is greater than kth object

dist(nq) and t+

lteid tidgt

lteid tidgt

tid Dftid

di dist(ns di) (d t )

+t

Figure 2 Indexing framework

Table 2 Summary of notations used in this paper

Notation DefinitionG = (N EW) Graph model of road network119889119894119904119905(119901119904 119901119890) Length of shortest path from 119901119904 to 119901119890 where 119901119904 and 119901119890 represent start and end points respectively119897119890119899(1199011 1199012) Length of segment connecting two points 1199011 and 1199012119899119894 Node in road network119890 = (119899119904 119899119890) Edge in edge set E where 119899119904 and 119899119890 are start and end points of the edge119899120573 Boundary node corresponding to start (119899119904) or end (119899119890) point of an edge119882(119890) Weight of edge (119899119904 119899119890)q Query point in road networkk A number that represents q can be among k number of closest facilities to a data object dD Set of data objects119863 = 1198891 1198892 119889|119863|119863(119899119904 119899119890) Set of data objects in an edge119901119886 Anchor point that corresponds to start point of expansion119875119878119864 Safe exit point where safe and non-safe regions of q intersect120572 query parameter120595(119889) Score of data object d120583(119889119905 119902119905) textual relevance of data object d with query keywords120582(119889119897 119902119897) Spatial relevance of data object d with query location119863+ Set of answer objects119863minus Set of non-answer objects119889+119897 Lowest answer object119889minusℎ Highest non-answer object

algorithm [26] Algorithm 1 returns the top-k data objectswith the highest scores according to their joint textual andspatial relevance to the query The algorithm begins byexploring the active edge where query object q is located andexpands the network in an increasing order of distance fromq Each entry in the min-heap has the form (119901119886 119890119889119892119890) where119901119886 indicates the anchor point in the edge For an active edgeq becomes the anchor point Otherwise for directed edgesending node 119899119890 becomes the anchor point For bidirectionaledges either of the adjacent boundary nodes ie 119899119904 or 119899119890becomes the anchor point Let119863119896 be the current set of top-kdata objects and 119904119896 be the score of the k-th data object in119863119896The 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) function retrieves the candidatedata objects 119863119888 located in an edge with a better score 120595(119889)than 119904119896 Next the 119863119896 set is updated with the data objects in

119863119888 and so does 119904119896The algorithm continues its expansion andinserts the adjacent edges of the boundary node until the heapis exhausted or the upper-bound score of the remaining dataobjects cannot have a better score than 119904119896 The upper-boundscore 120595(119899) of node n is computed using 119889119894119904119905(119899 119902) and themaximum textual relevance (120583 = 1)Therefore if120595(119899) le 119904119896 itmeans that even if there is unexplored data object dmatchingall query keywords its score can be better than the k-th objectin 119863119896 because 119889119894119904119905(119889 119902119897) ge 119889119894119904119905(119899 119902119897) This is certain owingto the fact that the algorithm strictly expands the node with aminimum distance to the query location

Algorithm 2 presents the 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896) proce-dure which finds the candidate data objects This procedurehas twomain steps In the first step the upper-bound score ofthe edges is computed using a significance factor (120579119905 ) of a term

6 Wireless Communications and Mobile Computing

(1) Input Top-k spatial keyword query 119876119873 = (119902119897 119902119905 119896)(2) Output Top-k data objects with highest score(3) 119863119888 larr997888 0 lowastset of candidate data objects(4) max-heap 119863119896 larr997888 0 lowastcurrent Top-k set(5) 119904119896 larr997888 0 lowastk-th score in119863119896(6) min-heap larr997888 0(7) 119890119909119901119897119900119903119890119889 larr997888 0(8) min-heapinsert(119902119897 119890119889119892119890119886119888119905119894V119890)(9) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(10) update119863119896 and 119904119896 with 119889 isin 119863119888(11) whilemin-heap = 0 and (1(1 + 120572120582(119889119897 119902119897)) lt 119904119896) do(12) for each unexplored adjacent edge of (119901119886 119890119889119892119890) do(13) 119890119909119901119897119900119903119890119889 larr997888 119890119909119901119897119900119903119890119889 cup (119901119886 119890119889119892119890)(14) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(15) update119863119896 and 119904119896 with 119889 isin 119863119888(16) end(17) min-heapinsert(adjacent node edge)(18) end(19) return119863119896

Algorithm 1 EvaluateSnapshotQuery(Node 119899119894 Edge 119890119894)

(1) Input Edge ID 119890119894119889 Term ID 119905119894119889 score of k-th object 119904119896(2) Output candidate list119863119888(3) compute 120579119905(119890119894)(4) if 120579119905(119890119894) gt 0 then(5) 119898119886119909119904119888119900119903119890(119890119894) larr997888 119888119900119898119901119906119905119890119898119886119909119904119888119900119903119890(120579119905 119889119894119904119905(119890119894 119902119897))(6) end(7) if 119898119886119909119904119888119900119903119890(119890119894) gt 119904119896 then(8) for each data object in 119890119894 do(9) compute 119889119904119888119900119903119890(10) end(11) if 119889119904119888119900119903119890 gt 119904119896 then(12) 119863119888 larr997888 119863119888 cup 119889(13) end(14) end(15) return119863119888

Algorithm 2 CandidateSearch((119890119894119889 119905119894119889) 119904119896)

119905 isin 119902119905 and the shortest distance 119904119889119894119904119905(119890119894 119902119897) between the edgeand the query location In the next step the inverted lists ofterm t are fetched if their upper-bound score is greater than119904119896 In the inverted lists the objects with score 120595(119889) greaterthan 119904119896 are returned

To understand the proposed algorithm consider theroad network presented in Figure 1 Assume that a query qgenerated a top-1 keyword query with qd ldquoItalian Restau-rantrdquo For ease of presentation we assume 120572 = 1 and thetextual relevance 120583 is the number of occurrences of querykeywords in 119889119905 divided by the number of keywords in thedocument (description of data object) For example 120595(1198894) =120583(1198894119905 119902119905)(1 + 120582(1198894119897 119902119897)) = 058 = 006 The algorithmstarts the network expansion from an active edge

997888997888997888997888997888rarr(1198992 1198993)where q is the anchor point Note that the direction of the edge997888997888997888997888997888rarr(1198992 1198993) is from 1198992 to 1198993 Therefore the algorithm explores

only997888997888997888997888997888rarr(119902 1198993) There is no data object found in

997888997888997888997888997888rarr(119902 1198993) Then1198993 becomes the anchor point and edges (1198993 1198994) (1198993 1198995)and (1198993 1198997) are inserted in min-heap Next the 119888119886119899119889119904119890119886119903119888ℎfunction retrieves the candidate data objects on edges (1198993 1198994)(1198992 1198993) and (1198993 1198997) whose score is better than 119904119896 On edge(1198993 1198995) data object 1198893 is retrieved with 120595(1198893) = 02 Dataobject 1198893 is inserted in the119863119896 set and the value of 119904119896 is set to02 For edges (1198993 1198994) and (1198993 1198997) there is no candidate objectfound because 1198892119905 (ldquoCaferdquo) and 1198897119905 (ldquoCafe and Bakeryrdquo) donot match with 119902119905 The algorithm continues expanding theedges whose upper-bound score is greater than 119904119896 The edge997888997888997888997888997888rarr(1198997 1198992) is explored next The upper-bound score of

997888997888997888997888997888rarr(1198997 1198992)is 17 which is less than 119904119896 Similarly for edge

larr997888997888997888997888997888(1198996 1198995) theupper-bound score is 058 lt 119904119896 Therefore the algorithmterminates and reports 1198893 as the top-1 result

Wireless Communications and Mobile Computing 7

q

q issues TkSK query at p1

Server returns a set of objects for p1

Figure 3 Illustration of directed road network

qq issues TkSK query at p2

Server returns a set of objects for p2

Figure 4 Illustration of directed road network

5 Moving Top-119896 Spatial Keyword Queries

In this section we present our method to monitor themoving top-k spatial keyword queries where query objectsare moving in a directed road network Figure 3 providesan example of TkSK in road networks where query point qissues a TkSK query at point 1199011 Note that the numbers onthe arrows in the figure indicate the order of the steps Toobtain top-k results at 1199011 the server executes Algorithm 1as mentioned in Section 42 Now consider that the queryobject is moved to 1199012 as shown in Figure 4 to retrieve thetop-k results at point 1199012 The simple method is to repeat theprocedure executed at 1199011 However the use of recomputationwhenever query q changes its location significantly increasesthe computation cost Furthermore it also increases thecommunication overhead because the query object mustreport its location whenever it moves and the server mustsend the results set To address these issues we introduce thesafe exit approach

In the proposed framework the server computes safeexit points for a query object The server maintains a set ofmoving queries and the query result remains valid until thequery objects remain inside their respective safe exit pointsWhenever a query object leaves its safe exit points the serverrecomputes theTkSK and safe exit points for the query object

Next we present our method to compute the safe exitpoints for a query objectThe safe exit point represents a pointin the segment where a safe region and nonsafe region meetWe compute the safe exit point using the divide-and-conquertechnique Before presenting the detailed methodology wedefine the terminologies used in this section

Definition 1 (safe region) A portion of a road segment thatcan guarantee that as long as the query point lies in it itstop-k results remain valid

Definition 2 (answer objects 119863+) A data object d is calledan answer object of query q if the score of data object d(120595(119889) gt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called an answer object

of query q if the score of a data object d (120595(119889) gt 120595(119889119896+1))where 119889119896+1 represents the (119896+1)119905ℎ data object in the directedroad network In other words we can state that all answerobjects are top-k results of query q

Definition 3 (nonanswer objects119863minus) A data object d is calleda nonanswer object of query q if the score of data object d(120595(119889) lt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called a nonanswerobject of query q if the score of data object d (120595(119889) lt 120595(119889119896))where 119889119896 represents the kth data object in the directed roadnetwork That is we can say that all answer objects are top-k results of query q Therefore we can state that none of thenonanswer objects are in the top-k results of query q

Definition 4 (lowest answer object 119863+119897 ) An answer object119889+ isin 119863+ is called a lowest answer object to a point 119901 isin 119866such that 120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901)where120595(119889+119897 )119901 represents the score of the lowest answer objectat point p In other words 120595(119889+119897 )119901 lt 120595(119889+119886 )119901 at point p where119889+119886 is any other answer object in the 119863+ setDefinition 5 (highest nonanswer object 119863minusℎ) A nonanswerobject 119889minus isin 119863minus is called a highest nonanswer object toa point 119901 isin 119866 such that 120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889+|119889minus|)119901) where 120595(119889+ℎ)119901 represents the score of thehighest nonanswer object at point p In other words the120595(119889minus119897 )119901 lt 120595(119889minus119886 )119901 at point p where 119889minus119886 is any other nonanswerobject in the 119863minus set

As discussed earlier the main challenge in the continuousprocessing of moving TkSK is to maintain the validity of theresult set because the movement of query objects can nullifythe result set To monitor the validity of the result set wepropose a safe-region-based approach

51 Computation of Safe Exit Points In this section wepresent our technique to compute the safe exit points Themain goal is to find a point in the road network where the

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

6 Wireless Communications and Mobile Computing

(1) Input Top-k spatial keyword query 119876119873 = (119902119897 119902119905 119896)(2) Output Top-k data objects with highest score(3) 119863119888 larr997888 0 lowastset of candidate data objects(4) max-heap 119863119896 larr997888 0 lowastcurrent Top-k set(5) 119904119896 larr997888 0 lowastk-th score in119863119896(6) min-heap larr997888 0(7) 119890119909119901119897119900119903119890119889 larr997888 0(8) min-heapinsert(119902119897 119890119889119892119890119886119888119905119894V119890)(9) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(10) update119863119896 and 119904119896 with 119889 isin 119863119888(11) whilemin-heap = 0 and (1(1 + 120572120582(119889119897 119902119897)) lt 119904119896) do(12) for each unexplored adjacent edge of (119901119886 119890119889119892119890) do(13) 119890119909119901119897119900119903119890119889 larr997888 119890119909119901119897119900119903119890119889 cup (119901119886 119890119889119892119890)(14) 119863119888 larr997888 119888119886119899119889119904119890119886119903119888ℎ((119890119894119889 119905119894119889) 119904119896)(15) update119863119896 and 119904119896 with 119889 isin 119863119888(16) end(17) min-heapinsert(adjacent node edge)(18) end(19) return119863119896

Algorithm 1 EvaluateSnapshotQuery(Node 119899119894 Edge 119890119894)

(1) Input Edge ID 119890119894119889 Term ID 119905119894119889 score of k-th object 119904119896(2) Output candidate list119863119888(3) compute 120579119905(119890119894)(4) if 120579119905(119890119894) gt 0 then(5) 119898119886119909119904119888119900119903119890(119890119894) larr997888 119888119900119898119901119906119905119890119898119886119909119904119888119900119903119890(120579119905 119889119894119904119905(119890119894 119902119897))(6) end(7) if 119898119886119909119904119888119900119903119890(119890119894) gt 119904119896 then(8) for each data object in 119890119894 do(9) compute 119889119904119888119900119903119890(10) end(11) if 119889119904119888119900119903119890 gt 119904119896 then(12) 119863119888 larr997888 119863119888 cup 119889(13) end(14) end(15) return119863119888

Algorithm 2 CandidateSearch((119890119894119889 119905119894119889) 119904119896)

119905 isin 119902119905 and the shortest distance 119904119889119894119904119905(119890119894 119902119897) between the edgeand the query location In the next step the inverted lists ofterm t are fetched if their upper-bound score is greater than119904119896 In the inverted lists the objects with score 120595(119889) greaterthan 119904119896 are returned

To understand the proposed algorithm consider theroad network presented in Figure 1 Assume that a query qgenerated a top-1 keyword query with qd ldquoItalian Restau-rantrdquo For ease of presentation we assume 120572 = 1 and thetextual relevance 120583 is the number of occurrences of querykeywords in 119889119905 divided by the number of keywords in thedocument (description of data object) For example 120595(1198894) =120583(1198894119905 119902119905)(1 + 120582(1198894119897 119902119897)) = 058 = 006 The algorithmstarts the network expansion from an active edge

997888997888997888997888997888rarr(1198992 1198993)where q is the anchor point Note that the direction of the edge997888997888997888997888997888rarr(1198992 1198993) is from 1198992 to 1198993 Therefore the algorithm explores

only997888997888997888997888997888rarr(119902 1198993) There is no data object found in

997888997888997888997888997888rarr(119902 1198993) Then1198993 becomes the anchor point and edges (1198993 1198994) (1198993 1198995)and (1198993 1198997) are inserted in min-heap Next the 119888119886119899119889119904119890119886119903119888ℎfunction retrieves the candidate data objects on edges (1198993 1198994)(1198992 1198993) and (1198993 1198997) whose score is better than 119904119896 On edge(1198993 1198995) data object 1198893 is retrieved with 120595(1198893) = 02 Dataobject 1198893 is inserted in the119863119896 set and the value of 119904119896 is set to02 For edges (1198993 1198994) and (1198993 1198997) there is no candidate objectfound because 1198892119905 (ldquoCaferdquo) and 1198897119905 (ldquoCafe and Bakeryrdquo) donot match with 119902119905 The algorithm continues expanding theedges whose upper-bound score is greater than 119904119896 The edge997888997888997888997888997888rarr(1198997 1198992) is explored next The upper-bound score of

997888997888997888997888997888rarr(1198997 1198992)is 17 which is less than 119904119896 Similarly for edge

larr997888997888997888997888997888(1198996 1198995) theupper-bound score is 058 lt 119904119896 Therefore the algorithmterminates and reports 1198893 as the top-1 result

Wireless Communications and Mobile Computing 7

q

q issues TkSK query at p1

Server returns a set of objects for p1

Figure 3 Illustration of directed road network

qq issues TkSK query at p2

Server returns a set of objects for p2

Figure 4 Illustration of directed road network

5 Moving Top-119896 Spatial Keyword Queries

In this section we present our method to monitor themoving top-k spatial keyword queries where query objectsare moving in a directed road network Figure 3 providesan example of TkSK in road networks where query point qissues a TkSK query at point 1199011 Note that the numbers onthe arrows in the figure indicate the order of the steps Toobtain top-k results at 1199011 the server executes Algorithm 1as mentioned in Section 42 Now consider that the queryobject is moved to 1199012 as shown in Figure 4 to retrieve thetop-k results at point 1199012 The simple method is to repeat theprocedure executed at 1199011 However the use of recomputationwhenever query q changes its location significantly increasesthe computation cost Furthermore it also increases thecommunication overhead because the query object mustreport its location whenever it moves and the server mustsend the results set To address these issues we introduce thesafe exit approach

In the proposed framework the server computes safeexit points for a query object The server maintains a set ofmoving queries and the query result remains valid until thequery objects remain inside their respective safe exit pointsWhenever a query object leaves its safe exit points the serverrecomputes theTkSK and safe exit points for the query object

Next we present our method to compute the safe exitpoints for a query objectThe safe exit point represents a pointin the segment where a safe region and nonsafe region meetWe compute the safe exit point using the divide-and-conquertechnique Before presenting the detailed methodology wedefine the terminologies used in this section

Definition 1 (safe region) A portion of a road segment thatcan guarantee that as long as the query point lies in it itstop-k results remain valid

Definition 2 (answer objects 119863+) A data object d is calledan answer object of query q if the score of data object d(120595(119889) gt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called an answer object

of query q if the score of a data object d (120595(119889) gt 120595(119889119896+1))where 119889119896+1 represents the (119896+1)119905ℎ data object in the directedroad network In other words we can state that all answerobjects are top-k results of query q

Definition 3 (nonanswer objects119863minus) A data object d is calleda nonanswer object of query q if the score of data object d(120595(119889) lt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called a nonanswerobject of query q if the score of data object d (120595(119889) lt 120595(119889119896))where 119889119896 represents the kth data object in the directed roadnetwork That is we can say that all answer objects are top-k results of query q Therefore we can state that none of thenonanswer objects are in the top-k results of query q

Definition 4 (lowest answer object 119863+119897 ) An answer object119889+ isin 119863+ is called a lowest answer object to a point 119901 isin 119866such that 120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901)where120595(119889+119897 )119901 represents the score of the lowest answer objectat point p In other words 120595(119889+119897 )119901 lt 120595(119889+119886 )119901 at point p where119889+119886 is any other answer object in the 119863+ setDefinition 5 (highest nonanswer object 119863minusℎ) A nonanswerobject 119889minus isin 119863minus is called a highest nonanswer object toa point 119901 isin 119866 such that 120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889+|119889minus|)119901) where 120595(119889+ℎ)119901 represents the score of thehighest nonanswer object at point p In other words the120595(119889minus119897 )119901 lt 120595(119889minus119886 )119901 at point p where 119889minus119886 is any other nonanswerobject in the 119863minus set

As discussed earlier the main challenge in the continuousprocessing of moving TkSK is to maintain the validity of theresult set because the movement of query objects can nullifythe result set To monitor the validity of the result set wepropose a safe-region-based approach

51 Computation of Safe Exit Points In this section wepresent our technique to compute the safe exit points Themain goal is to find a point in the road network where the

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 7

q

q issues TkSK query at p1

Server returns a set of objects for p1

Figure 3 Illustration of directed road network

qq issues TkSK query at p2

Server returns a set of objects for p2

Figure 4 Illustration of directed road network

5 Moving Top-119896 Spatial Keyword Queries

In this section we present our method to monitor themoving top-k spatial keyword queries where query objectsare moving in a directed road network Figure 3 providesan example of TkSK in road networks where query point qissues a TkSK query at point 1199011 Note that the numbers onthe arrows in the figure indicate the order of the steps Toobtain top-k results at 1199011 the server executes Algorithm 1as mentioned in Section 42 Now consider that the queryobject is moved to 1199012 as shown in Figure 4 to retrieve thetop-k results at point 1199012 The simple method is to repeat theprocedure executed at 1199011 However the use of recomputationwhenever query q changes its location significantly increasesthe computation cost Furthermore it also increases thecommunication overhead because the query object mustreport its location whenever it moves and the server mustsend the results set To address these issues we introduce thesafe exit approach

In the proposed framework the server computes safeexit points for a query object The server maintains a set ofmoving queries and the query result remains valid until thequery objects remain inside their respective safe exit pointsWhenever a query object leaves its safe exit points the serverrecomputes theTkSK and safe exit points for the query object

Next we present our method to compute the safe exitpoints for a query objectThe safe exit point represents a pointin the segment where a safe region and nonsafe region meetWe compute the safe exit point using the divide-and-conquertechnique Before presenting the detailed methodology wedefine the terminologies used in this section

Definition 1 (safe region) A portion of a road segment thatcan guarantee that as long as the query point lies in it itstop-k results remain valid

Definition 2 (answer objects 119863+) A data object d is calledan answer object of query q if the score of data object d(120595(119889) gt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called an answer object

of query q if the score of a data object d (120595(119889) gt 120595(119889119896+1))where 119889119896+1 represents the (119896+1)119905ℎ data object in the directedroad network In other words we can state that all answerobjects are top-k results of query q

Definition 3 (nonanswer objects119863minus) A data object d is calleda nonanswer object of query q if the score of data object d(120595(119889) lt 120595(119889119886)) where 119889119886 represents any other data object inthe directed road network Similarly we can generalize thisdefinition for TkSK a data object d is called a nonanswerobject of query q if the score of data object d (120595(119889) lt 120595(119889119896))where 119889119896 represents the kth data object in the directed roadnetwork That is we can say that all answer objects are top-k results of query q Therefore we can state that none of thenonanswer objects are in the top-k results of query q

Definition 4 (lowest answer object 119863+119897 ) An answer object119889+ isin 119863+ is called a lowest answer object to a point 119901 isin 119866such that 120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901)where120595(119889+119897 )119901 represents the score of the lowest answer objectat point p In other words 120595(119889+119897 )119901 lt 120595(119889+119886 )119901 at point p where119889+119886 is any other answer object in the 119863+ setDefinition 5 (highest nonanswer object 119863minusℎ) A nonanswerobject 119889minus isin 119863minus is called a highest nonanswer object toa point 119901 isin 119866 such that 120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889+|119889minus|)119901) where 120595(119889+ℎ)119901 represents the score of thehighest nonanswer object at point p In other words the120595(119889minus119897 )119901 lt 120595(119889minus119886 )119901 at point p where 119889minus119886 is any other nonanswerobject in the 119863minus set

As discussed earlier the main challenge in the continuousprocessing of moving TkSK is to maintain the validity of theresult set because the movement of query objects can nullifythe result set To monitor the validity of the result set wepropose a safe-region-based approach

51 Computation of Safe Exit Points In this section wepresent our technique to compute the safe exit points Themain goal is to find a point in the road network where the

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

8 Wireless Communications and Mobile Computing

query result set will change The result set will change whenthe score of highest nonanswer 119863minusℎ surpasses the score of119863+119897 Generally the textual relevance score does not changeTherefore the score of data objects only changes because ofthe spatial relevance score which can only change by themovement of query objects The computation of the safe exitpoint is based on two key observations

Observation 1 If 119863+119899120573 = 119863+119901119886 there is no safe exit point in thesegment

Explanation 119863+119901119886 represents the set of answer objects atanchor point 119901119886 whereas 119863+119899120573 represents the set of answerobjects at boundary node 119899120573 As discussed earlier the safe exitpoint is the particular point where the query results changedIf the query results at the starting node are the same as theending node of any segmentedge there does not exist anypoint where the query result is changing Hence we do notsearch the safe exit point in that segment

Observation 2 If 119863+119901119886 = 119863+119899120573 there is a safe exit point in thesegment

Explanation In contrast to Observation 1 if the query resultsare different at the starting and ending points then thereexists a point where the query results are changing Hencethere is a safe exit point in the segment

To find the safe region we observe the following cases

Case 1 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is the same)In this case both the textual and spatial relevance have thesame importance (ie 120572 = 1) In addition the top-k resultdepends only on the spatial relevance because the textualrelevance of both objects is the same The data object thatis closer to query point q becomes the answer object For anundirected edge the safe exit point 119901119904119890 is the center pointie max(119889119894119904119905(119901119904119890 119889+1 ) 119889119894119904119905(119901119904119890 119889+2 ) 119889119894119904119905(119901119904119890 119889+|119889+|)) =min(119889119894119904119905(119901119904119890 119889minus1 ) 119889119894119904119905(119901119904119890 119889minus2 ) 119889119894119904119905(119901119904119890 119889minus|119889minus|)) betweenthe lowest answer object and the highest nonanswer objectHowever in case of a directed edge where 119889119894119904119905(119901119886 119899120573) =119889119894119904119905(119899120573 119901119886) the safe exit point is either 119889+119897 or 119901119886 If 119889+119897 isin(119901119886 119899120573) then the safe exit point is 119889+119897 otherwise the safe exitpoint is 119901119886Case 2 (when 120572 = 1 and the textual relevance of the highestnonanswer object and lowest answer object is different) Inthis case the top-k result depends on all functions that are the120572 spatial and textual relevance Clearly for the undirectededges the midpoint between the lowest answer object andthe highest nonanswer object does not provide a valid safeexit point Therefore we introduce the divide-and-conquertechnique This will keep dividing the search space until weget the point where the score of the nonanswer is greater thanthat of the answer object Typically the safe exit point shouldbe closer to the data object whose score is lower Based onthis observation first we compute the midpoint in a similarfashion to Case 1 and then we continue dividing the search

space until we find the point For undirected edges the safeexit point can be computed in a similar fashion to Case 1

Case 2 also works for other cases when the safe exit pointis not the mid point between the lowest answer object andthe highest nonanswer object In these cases the safe exitpoint depends on two or more functions Therefore the safeexit point can be easily computed using the aforementioneddivide-and-conquer technique Following are the scenarioswhere the safe exit point can be computed using Case 2

(a) When 120572 = 1 and textual relevance of the nearest non-answer object and farthest answer object is different

(b) When 120572 = 1 and textual relevance of the nearestnonanswer object and farthest answer object is same

Case 3 (when 120572 = 0) This means the spatial relevance hasno effect on the score of data objects Hence no monitoringis required for this scenario

Algorithm 3 retrieves the safe exit points using theobservations we discussed earlier The core function in thisalgorithm is ComputeSafeExit(119901119886 119899120573) which finds the safeexit point in a segment between 119901119886 and 119899120573 The detailedComputeSafeExit(119901119886 119899120573) is described in Algorithm 4 FirstAlgorithm 4 determines 119889+119897 and 119889minusℎ at point 119901 isin [119901119886 119899120573]Recall that 119889+119897 is the lowest answer object to p where 119889minusℎ isthe highest nonanswer object to p Algorithm 4 computes thesafe exit point based on the cases we discussed earlier Thereare a further two scenarios for Cases 1 and 2 For Case 1 if119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then the safe exit point is the mid-point between 119889+119897 and 119889minusℎ If 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe edge is directed and therefore the safe exit point is either119901119886 or 119889+119897 If 119889+119897 lies on the edge [119901119886 119899120573] then 119889+119897 is the safe exitpoint Otherwise 119901119886 is the safe exit point

Similarly for Case 2 if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) thenthe safe exit point is computed by dividing the search space byhalf until we find the closest point such that 120595(119889minusℎ) gt 120595(119889+119897 )The safe exit point is computed in the same way as in Case 2if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886)52 Computation of Safe Exit Points for Example Considerthe same example in Figure 1 where the query point q issuesa top-1 keyword query with qt ldquoItalian restaurantrdquo For thisexample let us consider 120572 = 1 The monitoring algorithmstarts exploring from the active edge containing the queryobject q Therefore

997888997888997888997888997888rarr(119902 1198993) is explored first As shown inTable 3 for

997888997888997888997888997888rarr(119902 1198993) 119863+119902 = 1198893 and 119863+1198993 = 1198893 Accordingto Observation 1 no safe exit point exists in this segmentTherefore edges adjacent to 1198993 are explored and 1198993 becomesthe new 119901119886 The edge (1198993 1198994) is explored next Similarlythe answer object at 1198993 and 1198994 is the same 119863+1198993 = 119863+1198994 =1198893 Therefore a safe exit point does not exist in (1198993 1198994)The edge (1198993 1198997) is explored next As shown in Table 3119863+1198993 = 1198893 and 119863+1198997 = 1198896 By Observation 2 there is asafe exit point in (1198993 1198997) As shown in Figure 1 1198893119905 =1198896119905 = ldquo119868119905119886119897119894119886119899119877119890119904119905119886119906119903119886119899119905rdquo and 119889119894119904119905(1198993 1198997) = 119889119894119904119905(1198997 1198993)

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 9

(1) Input Same as Algorithm 1(2) Output 119875119878119864 a set of safe exit points(3) 119875119878119864 larr997888 0 lowastset of safe exit points(4) 119863+119901119886 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119901119886 (119901119886 119899120573))(5) lowastResults calculated using Algorithm 1(6) 119863+119899120573 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910((119899120573 (119901119886 119899120573)))(7) lowastResults calculated using Algorithm 1(8) if 119863+119901119886 = 119863+119899120573 then(9) no safe exit point lowastrefer to Observation 1(10) end(11) if 119863+119901119886 = 119863+119899120573 then(12) 119875119878119864 larr997888 119875119878119864 cup 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119901119886 119899120573) lowastsafe exit point

exist - refer to Observation 2(13) end(14) return 119875119878119864

Algorithm 3 COSK monitoring algorithm

(1) Input same as Algorithm 1(2) Output se safe exit point in (119901119886 119899120573)(3) 119863+119897 larr997888 lt 119901119863+119897 gt | for each point 119901 isin [119901119886 119899120573] 119889+119897 such that120595(119889+119897 )119901 = min(120595(119889+1 )119901 120595(119889+2 )119901 120595(119889+|119889+|)119901(4) 119863minusℎ larr997888 lt 119901119863minusℎ gt | for each point 119901 isin [119901119886 119899120573] 119889minusℎ such that120595(119889minusℎ )119901 = max(120595(119889minus1 )119901 120595(119889minus2 )119901 120595(119889minus|119889minus |)119901(5) if Case 1 then(6) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(7) 119901119904119890 =

max(119889119894119904119905(119904119890 119889+1 ) 119889119894119904119905(119904119890 119889+2 ) 119889119894119904119905(119904119890 119889+|119889+ |)) =min(119889119894119904119905(119904119890 119889minus1 ) 119889119894119904119905(119904119890 119889minus2 ) 119889119894119904119905(119904119890 119889minus|119889minus |))

(8) end(9) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(10) 119901119904119890 = 119901119886 or 119901119904119890 = 119889+119897 where 119889+119897 isin (119901119886 119899120573)(11) end(12) end(13) if Case 2 then(14) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(15) 119901119904119890 =closest point to 119901119886 such that 120595(119889minusℎ ) gt 120595(119889+119897 )(16) end(17) if 119889119894119904119905(119901119886 119899120573) = 119889119894119904119905(119899120573 119901119886) then(18) Same as Line (10)(19) end(20) end(21) return 119901119904119890

Algorithm 4 ComputeSafeExit(119901119886 119899120573)

Therefore according to Case 1 the safe exit point 1199041 isthe midpoint between 1198893 and 1198896 That is 119889119894119904119905(1199011199041198901 1198893) =119889119894119904119905(1199011199041198901 1198896) where119889119894119904119905(1199011199041198901 1198893) = 119909+3 and 119889119894119904119905(1199011199041198901 1198896) =minus119909 + 5 for 0 lt 119909 lt 3 Consequently 119909 = 1 which means thatthe distance from 1198993 to 1199011199041198901 is 1

Next we determine a safe exit point in (1198993 1198995) As shownin Table 3 the answer object at 1198995 is also the same as 1198993Hence no safe exit point exists in this edge Next

larr997888997888997888997888997888(1198996 1198995) isexplored with 119901119886 = 1198995 According to Table 3 119863+1198997 = 1198894 and

119863+1198995 = 1198893 Therefore a safe exit point exists in this edge This

edge is directed and for each point 119901 isin larr997888997888997888997888997888(1198996 1198995) the shortestdistance from p to 1198893 is from 119901 997888rarr 1198996 997888rarr 1198992 997888rarr 1198993 997888rarr 1198893Therefore 1198995 is the safe exit point

The bold lines in Figure 5 indicate the safe region of qThetop-1 result remains 1198893 until the query q lies in the safe region

Next we analyze the time complexity for determininga set of safe exit points using a set of qualifying objects119889 isin 119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573) Note that 119863+119901119886 (119863+119899120573) indicates

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

10 Wireless Communications and Mobile Computing

Table 3 Computation of safe exit points for example scenario

EdgeSegment 119901119886 119863+119901119886 119863+119899120573 119901119904119890997888997888997888997888rarr(119902 1198993) q 119863+119902 = 1198893 119863+1198993 = 1198893 none(1198993 1198994) q 119863+1198993 = 1198893 119863+1198994 = 1198893 none(1198993 1198997) 1198993 119863+1198993 = 1198893 119863+1198997 = 1198896 1199011199041198901997888997888997888997888997888rarr(1198993 1198995) 1198993 119863+1198993 = 1198893 119863+1198995 = 1198893 nonelarr997888997888997888997888997888(1198996 1198995) 1198995 119863+1198995 = 1198893 119863+1198996 = 1198894 1199011199041198902

2

q

3

1

1 1

1

1

2

1

2

1 2

1

3

2

1

1

d4 (Chinese Restaurant)

d1 (Grand Hotel)

d5 (Pub and Bar)

n1

n6

n2 n3

n4

n7

pse1

pse2

n5

d6(Italian Restaurant)

d3 (Italian Restaurant)

d2 (Cafe)

d7 (Cafe and Bakery)

Figure 5 Illustration of safe region of q

the set of k data objects that satisfies the query conditionat 119901119886 (119899120573) According to Dijkstras algorithm [26] the timecomplexity 119874(119863+119902 ) for computing a set of answer objects at aquery point q is119874(119863+119902 ) = 119874(|119864|+|119873| log |119873|)Thismeans that119874(119863+119901119886) = 119874(119863+119899120573) = 119874(|119864| + |119873| log |119873|) holds for endpoints119901119886 and 119899120573 Thus time complexity 119874(Ω119896119905ℎ) when determiningthe skyline Ω119896119905ℎ with the k-th highest score is 119874(Ω119896119905ℎ) =119862119896119905ℎ119874(|119863+119901119886 cup 119863+119899120573 cup 119863(119901119886 119899120573)|) where 119862119896119905ℎ is the numberof qualifying objects that participate in the constitution ofthe skyline with the k-th highest score Therefore the timecomplexity of determining a safe exit point coincides withthe time complexity of determining the two skylines iethe skyline 119863+119897 with the k-th highest (or lowest) score foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects This is because the safe exit point is foundat the cross point between these skylines

Figure 6 represents the skyline graph for 119896 = 1 in an edge(1198997 1198993) Let us draw the score function for 1198893 and 1198896 for theroad segment (1198997 1198993) where a safe exit point exists This isbecause 119863(1198993)+ = 1198893 and 119863(1198997)+ = 1198896 for 119896 = 1 For eachpoint 119901 isin (1198997 1198993) the distance between 1198893 and point p canbe represented as 119889119894119904119905(1198893 119901) = 119889119894119904119905(1198893 1198993) + 119897119890119899(1198993 119901) = 6 minus119897119890119899(1198997 119901) Similarly for each point 119901 isin (1198997 1198993) the distancebetween 1198896 and point p can be represented as 119889119894119904119905(1198896 119901) =119889119894119904119905(1198896 1198997) + 119897119890119899(1198997 119901) = 2 + 119897119890119899(1198997 119901) Let 119897119890119899(1198997 119901) be

n7

10

08

06

04

02

n3pse1d7

distance

Scor

e

05 10 15 20 25 30

(d6) = 1(x + 3)

(d3) = 1(minusx + 7)

Figure 6 Skyline graph for 119896 = 1 on the road segment (1198997 1198993)

a variable x (0 le 119909 le 3) We can write 120582(1198893 119901) =119889119894119904119905(1198893 119901) = 6 minus 119909 and 120582(1198896 119901) = 119889119894119904119905(1198896 119901) = 2 + 119909 Thenwe can represent score function 120595(1198893) and 120595(1198896) as follows

120595(1198893) = 120583(1198893119905 119902119905)(1 + 120572 sdot 120582(1198893 119901)) = 1(7 minus 119909) for(0 le 119909 le 3)

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 11

120595(1198896) = 120583(1198896119905 119902119905)(1 + 120572 sdot 120582(1198896 119901)) = 1(3 + 119909) for(0 le 119909 le 3)Finally we present the lemma to prove that safe exit points

computed by COSK are correct

Lemma 8 The COSK algorithm correctly computes a set ofsafe exit points

Proof We will prove the correctness of the COSK algorithmby contradiction We assume that if 119863+119901119886 = 119863+119899120573 there is nosafe exit point in a road segment (119901119886119899120573) This means that foreach point p in the road segment (119901119886119899120573) the query result atp equals 119863+119901119886 ie 119863+119901 = 119863+119901119886forall119901 isin (119901119886119899120573) However it leadsto a contradiction that 119863+119899120573 = 119863+119901119886 when 119901 = 119899120573 There-fore if 119863+119901119886 = 119863+119899120573 a safe exit point exists in (119901119886119899120573) In addi-tion a safe exit point is determined using the skyline 119863+119897 foranswer objects and the skyline 119863minusℎ with the highest score fornonanswer objects when 119863+119901119886 = 119863+119899120573 The first skyline is acomposite polyline drawn from answer objects in 119863+119901119886 Thesecond skyline is a composite polyline drawn from nonan-swer objects in 119863+119899120573 cup 119863(119901119886 119899120573) minus 119863+119901119886

6 Monitoring Query Results and Safe Regionsin Dynamic Directed Road Networks

In this section we discuss the monitoring of spatial key-word queries in dynamic road networks where the networkdistance changes depending on the traffic conditions Theupdates on weight of some edges may invalidate the queryresults or safe region of q even though the query objectq remains within their respective safe region Figure 7illustrates an example of changing the weights edges

larr997888997888997888997888997888(1198991 1198992)and

larr997888997888997888997888997888(1198991 1198996) For convenience we consider 120572 = 1 and qt =ldquoItalian restaurantrdquo In Figure 7(a) the top-1 result is 1198891 andbold lines show the safe region of query q Now consider attime 119905119895 the weights of two edgeslarr997888997888997888997888997888(1198991 1198992) andlarr997888997888997888997888997888(1198991 1198996) changeddue to heavy traffic condition as shown in Figure 7(b) Theupdate in weight of edges may invalidate the query resultor safe region of q Therefore it is necessary to monitor thevalidity of results and safe region when the changes occur

Next we introduce a monitoring region to monitor thevalidity of the safe region effectively when the weight ofan edge is changed Monitoring region MR contains all thepoints between query point q and lowest answer object andhighest nonanswer object Formally it is defined as 119872119877 =119889119894119904119905(119902119863+119897 ) cup 119889119894119904119905(119902119863minusℎ) where 119889119894119904119905(119902119863+119897 ) is the distancebetween q and lowest answer object and 119889119894119904119905(119902119863minusℎ) is highestnonanswer object In given example the 119863+119897 = 1198891 and 119863minusℎ =1198892 1198893 Therefore the dotted lines in Figure 8(a) shows themonitoring region of query object q

Now at time 119905119895 the update to edgeslarr997888997888997888997888997888(1198991 1198996) and larr997888997888997888997888997888997888(1198991 1198891)

which is not part of monitoring region can safely be ignoredHowever the updated on segment

997888997888997888997888997888997888rarr(1198992 1198891)which is associatedwith monitoring region may nullify the results As shown in

Figure 8(b) after update the top-1 result becomes 1198892 and boldlines represents the new safe region of q

Algorithm 5 monitors the validity of result set and saferegion of query object qwhen the weight of any edge changesLet us consider weight of edge (119899119894 119899119895) changes at time 119905119895First algorithm checks whether edge (119899119894 119899119895) is associatedwith monitoring region or not If it is not part of monitoringregion then algorithm simply ignores the update in edge(119899119894 119899119895) and query results and safe region remains valid Incontrast if edge is associated with monitoring region (ie119872119877cap(119899119894 119899119895) = 0) then algorithm evaluates the query resultsConsequently the top-k results and safe region of queryq needs to be updated Finally the algorithm updates themonitoring region of q

7 Performance Evaluation

In this section we evaluate the performance of COSKthrough simulation experiments We describe our experi-mental settings in Section 71 and we present our experimen-tal results for static and dynamic road networks in Sections72 and 73 respectively

71 Experimental Settings All of our experiments wereperformed using real road networks namely OldenburgSan Francisco and San Joaquin All three road networkswere obtained from [27] The original road network of SanFrancisco had 21047 nodes and 21692 edges We reformat-ted the network pruned approximately 30 of the nodesand adjusted the edges and their weights accordingly Thisresulted in a network with 14732 nodes and 14316 edgesBoth the direction of edges and data objects on the edgeswere generated randomly The description of each data objectwas extracted from Twitter messages [28] and we assignedone tweet per data object Table 4 presents the characteristicsof the data sets used in the experimental evaluation Wesimulated moving query objects by using a spatiotemporaldata generator [29] The input to generator was the road net-work of the data set used and the output was the set of queryobjects moving on the road network Each experiment had100 moving queries which were continuously monitored for100 timestamps (1 timestamp = 1 second) and the averageresult was reported in the experiments

As a benchmark for COSK in static road network weimplemented a CMTkSK+ algorithm [22] which also contin-uously monitored the moving top-k spatial keyword queriesin the road networks However this algorithm was originallydesigned for undirected road networks To make a faircomparison we modified CMTkSK+ to process top-k spatialkeyword queries in directed road networks and called itCMTkSK+ Specifically we modified the distance computa-tion method between two points such that in directed roadnetworks 119889119894119904119905(1199011 1199012) = 119889119894119904119905(1199012 1199011) Since CMTkSK+ doesnot handle top-k spatial queries in dynamic road roads wecompared the performance of COSK with basic algorithmwhich recomputes the results whenever query object changesits location All algorithms were implemented in Java andwere executed on a desktop PC 280-GHz Intel Core i5 with

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

12 Wireless Communications and Mobile Computing

3

q5 5

2 3

3

2

2 3 5

11

d3 (Chinese Restaurant)

n1

n6

n2 pse2

pse1

pse3

n4n5

n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Safe region at time 119905119894

9

q10 5

6 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6

n2 n3

n4n5

d2 (Italian Restaurant)d1 (Italian Restaurant)

(b) Updating weight oflarr997888997888997888997888997888997888(1198991 1198992) and

larr997888997888997888997888997888997888(1198991 1198996) at time 119905119895

Figure 7 Updating the weight of edges in a dynamic road network where 119905119894 lt 119905119895

3

q5 5

2 4

3

2

2 3 5

1

d3 (Chinese Restaurant)

n1

n6 n4n5

n2 n3d2 (Italian Restaurant)d1 (Italian Restaurant)

(a) Monitoring region at time 119905119894

9

q10 5

5 4

233

2

2 3 5

11

037

pse2pse1

pse3

d3 (Chinese Restaurant)n6 n4n5

n2 n3d2 (Italian Restaurant)n1 d1 (Italian Restaurant)

(b) New safe region at time 119905119895

Figure 8 Monitoring region and updated safe region at time 119905119895

(1) InputMonitoring regionMR updated edge (119899119894 119899119895)(2) Output none(3) if 119872119877cap (119899119894 119899119895) = 0 then(4) lowastedge (119899119894 119899119895) is not part of monitoring region(5) ignore the change in the weight of edge (119899119894 119899119895)(6) end(7) 119875119878119864 larr997888 0 lowastset of safe exit points(8) else(9) 119863119896119906119901119889 larr997888 119864V119886119897119906119886119905119890119878119899119886119901119904ℎ119900119905119876119906119890119903119910(119899119894 119890119894) lowastupdate set of

top-k results(10) 119875119878119864119906119901119889 larr997888 119862119900119898119901119906119905119890119878119886119891119890119864119909119894119905(119875119886 119899120573) lowastupdate safe exit

points(11) 119872119877119906119901119889 larr997888 119862119900119898119901119906119905119890119872119900119899119894119905119900119903119894119899119892119877119890119892119894119900119899(119863+119897 119863minusℎ )

lowastupdate monitoring region(12) end

Algorithm 5 MonitoringSafeRegion(MR(119899119894 119899119895))

Table 4 Summary of datasets

Attribute Oldenburg San Francisco San JoaquinTotal no of nodes 6104 14732 18262Total no of edges 7034 14316 23876Percentage of directed edges 30 30 30Total no of objects 5627 11453 19098Average no of objects per edge 08 08 08Total no of words 49517 103649 166153

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 13

Table 5 Experimental parameter settings

Parameter RangeNumber of results (k) 5 10 15 20 25Number of keywords (n) 1 2 3 4 5Query parameter (120572) 001 01 1 10 100Dataset Oldenburg San Francisco San JoaquinNumber of data objects (119873119863) 10 20 30 40 50 (x1000)Speed of query objects (119881119902119903119910) 25 50 75 100 125 (kmh)Mobility (119872119902119903119910) 20 40 60 80 100Ratio of directed edges (119864119889119894119903) 10 20 30 40 50Ratio of updated edges (119864119906119901119889) 15 30 60 80 100

8GB of memory In the experiments we compared (1) queryprocessing times (2) edges processed ie the number ofedges processed for retrieving query results and (3) indexsizes Table 5 summarizes the parameters used in the exper-iments In each experiment we varied a single parameterwithin the range that is shown in Table 5 while maintainingthe other parameters at the bolded default values

We evaluated the performance of the algorithms by usingthe following measures (1) total amount of server CPUtime which indicates the query processing time and (2)total communication cost as the total number of points (iethe location updates sent by query objects and the queryresults and safe exit points returned by the server) transferredbetween clients and the serverThebattery power andwirelessbandwidth consumption typically increase with the amountof data transferred between objects (clients) and serversThus we used the amount of transferred data as a metric toevaluate the communication cost

72 Experimental Results of Top-k Spatial KeywordQueries in Static Road Networks

721 Effect of k Figure 9 indicates the effect of the numberof results on the query processing time and communicationcost for both algorithms Figure 9(a) indicates that the queryprocessing time increases for both algorithms as the value ofk increases This is expected because with an increase in kmore data objects are required to be explored and verifiedNevertheless COSK significantly outperforms CMTkSK+ fortwo main reasons First a relevant object search is very effi-cient when using the highest significant factor and secondCOSKdoes not need to verify the set of answer objects as longas the query object lies in a safe region On the other handthe CMTkSK+ query processing time increases significantlybecause it has to monitor and verify the set of candidateobjects periodically In Figure 9(b) the communication costsfor both algorithms increase as the number of objects in-creases However the proposed algorithm demonstrates su-perior performance compared to CMTkSK+ because client-server communication is not required when the query objectlies within the safe exit points whereas in CMTkSK+ thequery object is required to report its location to the serverwhenever it moves

722 Effect of119873119863 This experimentwas conducted on datasetSan Joaquin This dataset included 19098 data objects there-fore we randomly generated approximately 30000 additionaldata objects on different edges In Figure 10 we evaluate theperformance of COSK and CMTkSK+ by varying the cardi-nality of the data objects Note that119873119863 = 10119870 corresponds toa low density of data points while119873119863 = 50119870 corresponds toa high density In Figure 10(a) it is interesting to notice thatthe query processing times of both algorithms decrease asthe cardinality of the data objects increases For CMTkSK+this is because with high density the monitoring range of aquery decreases However for COSK it is mainly becausewhen the data density is high fewer edges are required tobe expanded which decreases the query processing time InFigure 10(b) we study the influence of the cardinality of thedata objects on the communication costs The experimentalresults indicate that the communication costs of CMTkSK+incur almost constant communication costs regardless ofdata object cardinality However the communication costsof COSK increase in proportion to the 119873119863 value This isexpected because the safe region becomes smaller as thedensity of the data objects increases which increases thecommunication costs

723 Effect of Query Keywords (n) Figure 11 shows thequery processing time and communication for COSK andCMTkSK+ as a function of the number of query keywordsFigures 11(a) and 11(b) show the trend that the performanceof both algorithms degrades when the number of keywordsincreases This is mainly because by increasing the numberof query keywords the number of relevant objects may alsoincrease resulting in a higher query processing time andcommunication cost However the safe-region-based algo-rithm COSK scales better than CMTkSk+ because of its lessexpensive monitoring technique

724 Effect of 120572 Figure 12 demonstrates the impact of queryparameter 120572 on the query processing time and on the com-munication cost A small value of 120572 indicates a greater im-portance of textual relevance whereas a high value of 120572gives more preference to the spatial relevance It is interestingto note that the query processing time is lower for higher

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

14 Wireless Communications and Mobile Computing

k

50

10

10

15 20

20

30

Que

ry p

roce

ssin

g tim

e (s)

COSKCMTkSK+

40

25

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

k

5 10 15 20 25

(b) Communication cost

Figure 9 Effect of k on query processing time and number of edges processed

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

10k 20k 30k 40k 50kND

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

tran

sferr

ed m

essa

ges

1M

10 20 30 40 50ND

(b) Communication cost

Figure 10 Effect of119873119863 on query processing time and communication cost

values of 120572 which indicates more importance to the spatialrelevance This is mainly because when the spatial relevanceis higher fewer edges and objects are required to be exploredand processed to determine the top-k data objects Observethat in Figure 12(b) the number of messages sent by COSKdecreases sharply with an increase in 120572725 Effect of Speed Figure 13(a) demonstrates the influenceof the speed of the query objects on the query processingtime of the COSK and CMTkSK+ algorithms The experi-mental results indicate that the performance of CMTkSK+is not significantly influenced by the speed of the query

objects because the candidate objects must be continuouslymonitored after a regular interval of time regardless ofthe speed On the other hand for COSK the performancegradually decreases as the speed of the query objects increasesbecause the objects leave their respective safe regions morefrequently Figure 13(b) shows the communication costs ofCOSK and CMTkSK+ with respect to the speed of the queryobjects CMTkSK+ incurs almost constant communicationcosts because a server-initiated request to verify the candidateobjects does not depend on the speed For COSK the queryobjects cross safe regions more frequently when the speed ishigh which increases the communication costs

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 15

Number of keywords1 2 3 4 5

COSKCMTkSK+

0

15

30

45

Que

ry p

roce

ssin

g tim

e (s)

60

(a) Query processing time

COSK

Number of keywords

CMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1 2 3 4 5

(b) Communication cost

Figure 11 Effect of number of keywords on query processing time and communication cost

001 01 1 10 100

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

001 01 1 10 100

(b) Communication cost

Figure 12 Effect of 120572 on query processing time and communication cost

726 Effect of Mobility Figure 14 shows the effect of mobility119872119902119903119910 (mobility refers to the percentage of query objects thatare moving at any timestamp) on the performance of COSKand CMTkSK+ algorithms As expected the query pro-cessing time and communication costs for both algorithmsincrease with119872119902119903y Nevertheless COSK performs better thanCMTkSK+ in terms of query processing time and commu-nication costs

727 Effect of Directed Edges Figure 15 shows the impactof percentage of directed edges 119864119889119894119903 on the performance ofCOSK and CMTkSK+ algorithms The query processing time

increases with 119864119889119894119903 because algorithm needs to explore moreedges to retrieve the top-k keyword queries However thecommunication cost is not significantly affected by the valueof 119864119889119894119903 for both the algorithms

728 Effect of Datasets Figure 16 demonstrates the indexsizes of the COSK and CMTkSK+ approaches for differentdatasets As shown in Figure 16 both algorithms have similarindex sizes However COSK has minor space overheadbecause it stores additional information of the highest signifi-cance factor 120579119905 of edges More important this space overheadis minimal as compared to the gain achieved by COSK inquery processing time and communication costs

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 16: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

16 Wireless Communications and Mobile Computing

25 50 75 100 125

COSKCMTkSK+

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

Vqry

(a) Query processing time

COSKCMTkSK+

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

25 50 75 100 125Vqry

(b) Communication cost

Figure 13 Effect of speed on query processing time and communication cost

20 40 60 80 100Mqry

COSKCMTkSK+

0

15

45

30

60

Que

ry p

roce

ssin

g tim

e (s)

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

20 40 60 80 100Mqry

1k

COSKCMTkSK+

(b) Communication cost

Figure 14 Effect of mobility on query processing time and communication cost

73 Experimental Results of Top-k Spatial Keyword Queriesin Dynamic Road Networks In this section we evaluate theperformance of COSK and basic algorithm for dynamic roadnetworks The 119864119906119901119889 indicates the percentage of all edges thatchange their weight at each timestamp The length of anupdated edge is randomly selected between 01 to 10 times theoriginal length Figure 17(a) depicts the query processing timeof COSK and basic algorithm It is evident from the figure thatquery processing time of basic algorithm is not significantlyaffected by 119864119906119901119889 This is mainly because the query objectsissue top-k spatial queries at each timestamp However query

processing time of COSK increases with the value of 119864119906119901119889because the probability that the updated edge may associatedwith the monitoring region of query q increases with 119864119906119901119889Therefore when 119864119906119901119889 becomes large the results need to befrequently updated which increases the query processingtime Figure 17(b) shows the communication costs of COSKand basic algorithm with respect to 119864119906119901119889 Basic algorithmincurs almost constant communication costs regardless of thevalue of 119864119906119901119889 In contrast the communication cost of COSKincreases with 119864119906119901119889 because the query result and safe regionsneeds to be frequently updated

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 17: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 17

COSKCMTkSK+

10 20 30 40 50Edir

0

10

20

30

Que

ry p

roce

ssin

g tim

e (s)

40

(a) Query processing time

100

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

1k

10 20 30 40 50Edir

eSPAKCMTkSK+

(b) Communication cost

Figure 15 Effect of 119864119889119894119903 on query processing time and communication cost

COSKCMTkSK+

0

15

45

30

60

Inde

x siz

e (M

B)

OldenburgDatasets

San Francisco San Joaquin

Figure 16 Effect of dataset on index size

8 Conclusion

In this paper we investigated moving top-k spatial keywordqueries in directed and dynamic road networksWepresentedan efficient indexing framework using inverted files thatindexes the data objects on edges allowing for the effectivesearching of data objects relevant to queries in terms ofboth textual and spatial relevance We also presented a safe-exit-based algorithm called COSK to monitor moving top-k spatial keyword queries We demonstrated that the queryresults remain valid as long as the query object resides withina safe region Furthermore COSK can effectively monitor thevalidity of query results and safe regions in dynamic roadnetworks Finally an experimental evaluation conducted on

real road networks demonstrated that COSK significantlyreduced the query processing time and communication costscompared to the CMTkSK+ algorithm

Data Availability

The real road network data used in this study are also used inmany previous studies The road network data is cited in themanuscript and it is available at httpswwwcsutahedusimlifeifeiSpatialDatasethtm To simulate the moving queriesthe authors used the spatiotemporal data generator which isalso used in previous studiesThe research article of generatoris cited in the manuscript The documentation and source

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 18: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

18 Wireless Communications and Mobile Computing

0

20

40

60

Que

ry p

roce

ssin

g tim

e (s)

80

15 30 45 60 75Eupd

COSKBasic

(a) Query processing time

15 30 45 60 75Eupd

100

1k

10k

100k

of

mes

sage

s tra

nsfe

rred

1M

COSKBasic

(b) Communication cost

Figure 17 Effect of 119864119906119901119889 on query processing time and communication cost

files of generator are available at httpsiapgjade-hsdeper-sonenbrinkhoffgenerator They used the Twitter tweetsfor generating the description of data objects and also querykeywords The tweets used can be accessible at httpfollow-thehashtagcomdatasetsfree-twitter-dataset-usa-200000-free-usa-tweets

Conflicts of Interest

The authors declare that there is no conflicts of interestregarding the publication of this paper

Acknowledgments

Hyung-JuChowas supported by theNational Research Foun-dation of Korea (NRF) grant funded by the Korean Govern-ment (MSIP) (NRF-2016R1A2B4009793) and this researchwas partially supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF)fundedby theMinistry of Education (2016R1D1A1B03934129)

References

[1] D Papadias N Mamoulis J Zhang and Y Tao ldquoQuery pro-cessing in spatial network databasesrdquo in Proceedings of the 29thInternational Conference on Very Large Data Bases (VLDB rsquo03)pp 802ndash813 September 2003

[2] H-J Cho K Ryu and T-S Chung ldquoAn efficient algorithm forcomputing safe exit points of moving range queries in directedroad networksrdquo Information Systems vol 41 pp 1ndash19 2014

[3] G Tsatsanifos and A Vlachou ldquoOn processing Top-k spatio-textual preference queriesrdquo in Proceedings of the 18th Interna-tional Conference on ExtendingDatabase Technology (EDBT rsquo15)pp 433ndash444 March 2015

[4] R Li A X Liu A L Wang and B Bruhadeshwar ldquoFast rangequery processing with strong privacy protection for cloud com-putingrdquo Proceedings of the VLDB Endowment vol 7 no 14 pp1953ndash1964 2014

[5] G Cong C S Jensen andDWu ldquoEfficient retrieval of the Top-k most relevant spatial web objectsrdquo Proceedings of the VLDBEndowment vol 2 no 1 pp 337ndash348 2009

[6] Z Li K C K Lee B Zheng W-C Lee D Lee and X WangldquoIR-tree An efficient index for geographic document searchrdquoIEEE Transactions on Knowledge and Data Engineering vol 23no 4 pp 585ndash599 2011

[7] Y Zhou X Xie C Wang Y Gong and W Ma ldquoHybrid indexstructures for location-based web searchrdquo in Proceedings of the14th ACM International Conference on Information and Knowl-edge Management pp 155ndash162 Bremen Germany October2005

[8] J Zobel and A Moffat ldquoInverted files for text search enginesrdquoACM Computing Surveys vol 38 no 2 2006

[9] N Beckmann H Kriegel R Schneider and B Seeger ldquoR-anefficient and robust accessmethod for points and rectanglesrdquo inProceedings of the ACM SIGMOD International Conference onManagement of Data vol 19 pp 322ndash331 May 1990

[10] R Hariharan B Hore C Li and S Mehrotra ldquoProcessing spa-tial-keyword (sk) queries in geographic information retrieval(gir) systemsrdquo in Proceedings of the 19th International Confer-ence on Scientific and Statistical DatabaseManagement (SSDBMrsquo07) July 2007

[11] I De FelipeV Hristidis andN Rishe ldquoKeyword search on spa-tial databasesrdquo in Proceedings of the 24th International Confer-ence on Data Engineering (ICDE rsquo08) pp 656ndash665 April 2008

[12] J B Rocha-Junior O Gkorgkas S Jonassen and K NoslashrvagldquoEfficient processing of top-k spatial keyword queriesrdquo inProceedings of the International Symposium on Spatial andTemporal Databases pp 205ndash222 Springer 2011

[13] D Zhang K-L Tan andAK Tung ldquoScalable top-k spatial key-word searchrdquo in Proceedings of the 16th International Conferenceon Extending Database Technology pp 359ndash370 2013

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 19: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

Wireless Communications and Mobile Computing 19

[14] J B Rocha-Junior andK Noslashrvag ldquoTop-k spatial keyword quer-ies on road networksrdquo in Proceedings of the 15th InternationalConference on Extending Database Technology pp 168ndash179Berlin Germany March 2012

[15] H-J Cho S J Kwon and T-S Chung ldquoA safe exit algorithmfor continuous nearest neighbor monitoring in road networksrdquoMobile Information Systems vol 9 no 1 pp 37ndash53 2013

[16] D Yung M L Yiu and E Lo ldquoA safe-exit approach for efficientnetwork-based moving range queriesrdquo Data amp KnowledgeEngineering vol 72 pp 126ndash147 2012

[17] M Attique H Cho R Jin and T Chung ldquoEfficient Processingof Continuous Reverse k Nearest Neighbor on Moving Objectsin Road Networksrdquo ISPRS International Journal of Geo-Infor-mation vol 5 no 12 p 247 2016

[18] H G Elmongui M F Mokbel and W G Aref ldquoContinuousaggregate nearest neighbor queriesrdquoGeoInformatica vol 17 no1 pp 63ndash95 2013

[19] D Wu M L Yiu C S Jensen and G Cong ldquoEfficient con-tinuously moving top-k spatial keyword query processingrdquo inProceedings of the IEEE International Conference on Data En-gineering (ICDE rsquo11) pp 541ndash552 Hannover Germany April2011

[20] W Huang G Li K-L Tan and J Feng ldquoEfficient safe-re-gion construction for moving top-k spatial keyword queriesrdquoin Proceedings of the 21st ACM International Conference onInformation and Knowledge Management pp 932ndash941 2012

[21] L Guo J ShaoHHAung andK-L Tan ldquoEfficient continuoustop-k spatial keyword queries on road networksrdquoGeoInformat-ica vol 19 no 1 pp 29ndash60 2014

[22] Y Li G Li L Shu Q Huang and H Jiang ldquoContinuous moni-toring of top-k spatial keyword queries in road networksrdquo Jour-nal of Information Science and Engineering vol 31 no 6 pp1831ndash1848 2015

[23] M Attique A Khan and T-S Chung ldquoESPAK Top-k spatialkeyword query processing in directed road networksrdquo in Pro-ceedings of the Workshops of the International Conference onExtending Database Technology and the International Confer-ence on DatabaseTheory (EDBTICDT rsquo17) March 2017

[24] G Salton and C Buckley ldquoTerm-weighting approaches in auto-matic text retrievalrdquo Information Processing ampManagement vol24 no 5 pp 513ndash523 1988

[25] V N Anh O de Kretser and A Moffat ldquoVector-space rankingwith effective early terminationrdquo in Proceedings of the 24th An-nual International ACM SIGIR Conference pp 35ndash42 NewOrleans LO USA 2001

[26] E W Dijkstra ldquoA note on two problems in connexion withgraphsrdquo Numerische Mathematik vol 1 pp 269ndash271 1959

[27] ldquoReal datasets for spatial databasesrdquo httpswwwcsutahedulifeifeiSpatialDatasethtm

[28] ldquoTwitterrdquo httpstwittercom[29] T Brinkhoff ldquoA framework for generating network-basedmov-

ing objectsrdquo GeoInformatica vol 6 no 2 pp 153ndash180 2002

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 20: Efficient Processing of Moving Top- Spatial Keyword Queries ...downloads.hindawi.com/journals/wcmc/2018/7373286.pdfTop-k spatial keyword queries in road networks were introduced by

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom