Modeling Crowd Flows Network to Infer Origins and ...urbcomp.ist.psu.edu › 2018 › papers ›...

5
Modeling Crowd Flows Network to Infer Origins and Destinations of Crowds from Cellular Data Ai-Jou Chou Department of Computer Science National Chiao Tung University Hsinchu, Taiwan [email protected] Gunarto Sindoro Njoo Department of Computer Science National Chiao Tung University Hsinchu, Taiwan [email protected] Wen-Chih Peng Department of Computer Science National Chiao Tung University Hsinchu, Taiwan [email protected] ABSTRACT Massive spatiotemporal data is generated every day and everywhere make it possible for modeling citywide human mobility. However, few studies have investigated where the origins and destinations of the crowd flows in the urban areas. Therefore, the aim of this paper is to infer origins and destinations of crowds. We present a dynamic weighted directed graph to model the mobility of crowds, called the crowd flows network. In particular, we use a cellular dataset of approximately 7 million spatiotemporal records from Chunghwa Telecom in Taiwan to demonstrate the practicality of our proposed crowd flows network. Results reveal that crowd flows network captures meaningful features from the cellular data and the find- ings can be seen as the first step towards the more comprehensive understanding of citywide crowds’ flows mobility. CCS CONCEPTS Information systems Spatial-temporal systems; KEYWORDS Crowd flows modeling; urban computing; cellular data; 1 INTRODUCTION In recent years, there has been a rapid proliferation of smartphones which is not only changing the way people behave and interact but also generating massive spatial- and temporal-aware data, such as cellular data from telecommunication providers, GPS data from location-based applications, and geo-tagged contents published in the online social networks. The spatial- and temporal-aware data might be used as a proxy to depict the movement of people in the densely populated cities. For example, business managers would like to know where potential customers come from, car- hailing companies would inquire traffic demands in particular area of the city, and telecommunication providers are interested in where the speed dropping experiences arise from. These problems are highly pertinent to the crowd flows in the city. In particular, there are three types of spatiotemporal data that can be leveraged to understand the crowds’ movement in the urban level and each data source exhibits different characteristics. GPS data extracted from smartphones has very high density yet it incurs high energy consumption and sometimes might not be available due to the lack of signal or due to privacy issue. On the other hand, check-in data is a data that is provided by the users willingly; it is more energy UrbComp’18, August 2018, London, UK © Copyright held by the owner/author(s). efficient and would not pose any privacy problems. However, check- in data is known for its sparsity problem and bias towards certain kind of users [5]. Finally, the cellular data is widely available as the data is collected by the telecom company whenever the user utilizes communication services. Unlike GPS, it neither incurs high energy consumption nor violates user’s privacy. However, it has more data compared to the check-in data. The only limitation of the cellular data is the accuracy of the user’s location is not fine grained. (a) Modeling the destinations (b) Modeling the origins Figure 1: The illustration of the origins and destinations modeling of the crowds’ movement. There are two types of crowd flows can be observed in the move- ment data: inflow and outflow [8]. To forecast the inflow and out- flow of crowds, a deep-learning based method called ST-ResNet [7] has been applied on the taxicab GPS data and bike rental data. Because the inflow and outflow can be measured by different data sources, the authors in [6] explored a multi-view deep-learning based framework for the problem of predicting taxi demand. The limitation of the above studies is that their models only consider the GPS data of a single transportation mode, that is, the crowd flow data only comes from people who take a particular transporta- tion mode, i.e. people on a taxicab and people using a bike-sharing system. In contrast, the cellular data we use records the approxi- mate location and timestamp of the crowd in various transportation modes because user’s devise connects to the base station actively or passively whenever they are stationary, walking, taking public transportation and driving. Moreover, the biggest difference be- tween the above studies and ours is that they only discuss crowd flows that flow in and out of a region, the number of flows in each and every region, and we are looking at crowd flows that flow in or out of a region when given a destination or an origin, which makes our problem different and more practical. In the light of the next location prediction, [14] are some state- of-the-arts in the area. However, these research mostly focused on the personal mobility modeling and location recommendation from check-in data. While, our work focuses on exploring the origins

Transcript of Modeling Crowd Flows Network to Infer Origins and ...urbcomp.ist.psu.edu › 2018 › papers ›...

Page 1: Modeling Crowd Flows Network to Infer Origins and ...urbcomp.ist.psu.edu › 2018 › papers › modeling.pdf · Modeling Crowd Flows Network to Infer Origins and Destinations of

Modeling Crowd Flows Network to Infer Origins andDestinations of Crowds from Cellular Data

Ai-Jou ChouDepartment of Computer ScienceNational Chiao Tung University

Hsinchu, [email protected]

Gunarto Sindoro NjooDepartment of Computer ScienceNational Chiao Tung University

Hsinchu, [email protected]

Wen-Chih PengDepartment of Computer ScienceNational Chiao Tung University

Hsinchu, [email protected]

ABSTRACTMassive spatiotemporal data is generated every day and everywheremake it possible for modeling citywide human mobility. However,few studies have investigated where the origins and destinations ofthe crowd flows in the urban areas. Therefore, the aim of this paperis to infer origins and destinations of crowds. We present a dynamicweighted directed graph to model the mobility of crowds, calledthe crowd flows network. In particular, we use a cellular dataset ofapproximately 7 million spatiotemporal records from ChunghwaTelecom in Taiwan to demonstrate the practicality of our proposedcrowd flows network. Results reveal that crowd flows networkcaptures meaningful features from the cellular data and the find-ings can be seen as the first step towards the more comprehensiveunderstanding of citywide crowds’ flows mobility.

CCS CONCEPTS• Information systems→ Spatial-temporal systems;

KEYWORDSCrowd flows modeling; urban computing; cellular data;

1 INTRODUCTIONIn recent years, there has been a rapid proliferation of smartphoneswhich is not only changing the way people behave and interactbut also generating massive spatial- and temporal-aware data, suchas cellular data from telecommunication providers, GPS data fromlocation-based applications, and geo-tagged contents publishedin the online social networks. The spatial- and temporal-awaredata might be used as a proxy to depict the movement of peoplein the densely populated cities. For example, business managerswould like to know where potential customers come from, car-hailing companies would inquire traffic demands in particular areaof the city, and telecommunication providers are interested in wherethe speed dropping experiences arise from. These problems arehighly pertinent to the crowd flows in the city. In particular, thereare three types of spatiotemporal data that can be leveraged tounderstand the crowds’ movement in the urban level and eachdata source exhibits different characteristics. GPS data extractedfrom smartphones has very high density yet it incurs high energyconsumption and sometimes might not be available due to the lackof signal or due to privacy issue. On the other hand, check-in datais a data that is provided by the users willingly; it is more energy

UrbComp’18, August 2018, London, UK© Copyright held by the owner/author(s).

efficient and would not pose any privacy problems. However, check-in data is known for its sparsity problem and bias towards certainkind of users [5]. Finally, the cellular data is widely available asthe data is collected by the telecom company whenever the userutilizes communication services. Unlike GPS, it neither incurs highenergy consumption nor violates user’s privacy. However, it hasmore data compared to the check-in data. The only limitation ofthe cellular data is the accuracy of the user’s location is not finegrained.

(a) Modeling the destinations (b) Modeling the origins

Figure 1: The illustration of the origins and destinationsmodeling of the crowds’ movement.

There are two types of crowd flows can be observed in the move-ment data: inflow and outflow [8]. To forecast the inflow and out-flow of crowds, a deep-learning based method called ST-ResNet[7] has been applied on the taxicab GPS data and bike rental data.Because the inflow and outflow can be measured by different datasources, the authors in [6] explored a multi-view deep-learningbased framework for the problem of predicting taxi demand. Thelimitation of the above studies is that their models only considerthe GPS data of a single transportation mode, that is, the crowdflow data only comes from people who take a particular transporta-tion mode, i.e. people on a taxicab and people using a bike-sharingsystem. In contrast, the cellular data we use records the approxi-mate location and timestamp of the crowd in various transportationmodes because user’s devise connects to the base station activelyor passively whenever they are stationary, walking, taking publictransportation and driving. Moreover, the biggest difference be-tween the above studies and ours is that they only discuss crowdflows that flow in and out of a region, the number of flows in eachand every region, and we are looking at crowd flows that flow in orout of a region when given a destination or an origin, which makesour problem different and more practical.

In the light of the next location prediction, [1–4] are some state-of-the-arts in the area. However, these research mostly focused onthe personal mobility modeling and location recommendation fromcheck-in data. While, our work focuses on exploring the origins

Page 2: Modeling Crowd Flows Network to Infer Origins and ...urbcomp.ist.psu.edu › 2018 › papers › modeling.pdf · Modeling Crowd Flows Network to Infer Origins and Destinations of

UrbComp’18, August 2018, London, UK Ai-Jou Chou, Gunarto Sindoro Njoo, and Wen-Chih Peng

Table 1: Example of mobility trace

rid timestamp coordinates1 01/13/17 8:30AM (25.038, 121.529)2 01/13/17 8:34AM (25.038, 121.536)3 01/13/17 9:04AM (25.045, 121.533)4 01/13/17 9:13AM (25.051, 121.551)5 01/13/17 9:16AM (25.051, 121.563)

and destinations of crowd flows using cellular data in the city-level. Figure 1 illustrates the problem that we tackles in our work.Instead of predicting the outflow volume of a particular region (i.e.,shown in pale yellow) in Figure 1(a), we aim to infer the possibledestinations of the crowds. Similarly, instead of knowing the inflowvolume of the center region, we aim to infer the origins of thecrowds (as shown in Figure 1(b)).

The key contributions of this work are explained as follows. Wepropose a method to model the mobility of the crowds based ona dynamic weighted directed graph called crowd flows network.A crowd flows network characterizes the transition of the crowdsbetween origin and destination regions as well as describes theinflow and outflow time series between regions. The practicality ofthe proposedmethodology is demonstrated through the case studies.To the best of our knowledge, this is the first work to model crowdflows network to infer the possible origins and destinations by usingcellular data. While our work is still at the early stage, the findingsfrom the crowd flow network could have broad implications forunderstanding citywide crowds movement.

2 PROBLEM DEFINITION2.1 PreliminariesWe denote the set of regions by L = {l0, l1, . . . , lN } which partitiona city into N non-overlapping rectangle areas and the set of timeslots T = {t0, t1, . . . , tM } which divide a day into M time slots byusing ∆t minutes as the length of a time slot.

The mobility trace of a user u is a time-ordered sequence S =record1, record2, . . . , recordn . Each record is a tuple (timestamp,coordinates) where (1) timestamp is the timestamp string (2) coordinatesis associated with geographical coordinates representing the loca-tion of the user at the timestamp. Table 1 shows an example ofmobility trace.

Given a mobility trace S , the region set L and the time slot setT , a projection trajectory S ′ = r1, r2, . . . , rm is a mobility tracebinding with L and T . For a record ri = (ti , li ) that projected from(timestamp, coordinates), ri satisfies: (1) ti ≤ timestamp < ti+1; (2)coordinates lies within the li . We treat coordinates lies outside L asan outlier and discard the record.

2.2 Origins and Destinations of CrowdsConsidering a large amount of users U accompanied by generatedtrajectories, we aim to infer origins and destinations of U at anytime periods by modeling the crowd flows around the urban space.

Definition 1 (Origin and Destination). If two consecutive recordsi and j in the trajectory of user u satisfies (1) ri , r j ; (2) ti , tj , weregard ri as the origin and r j as the destination.

(a) Toy example of a crowd flowsnetwork

(b) Adjacency matrix

Figure 2: An illustration of crowd flows network and the ad-jacency matrix. The boxed row indicates Region b as originwith destinations (a, c) and corresponding outflow (3, 10).The boxed column indicates Region c as destination withorigins (b, d) and corresponding inflow (10, 1).

Inferring crowd flows destinations. Given a region l as theorigin, current time period p and time threshold τ , the problem aimsto identify a set of regions where users U will move into after τ timeslots.

Inferring crowd flows origins. Given a region l as the destina-tion, current time period p and time threshold τ , the problem aims toidentify a set of regions where usersU from and will move into regionl after τ time slots.

3 METHODOLOGYOur goal is to infer the origins and destinations of crowds in thecity. First, we discuss modeling the crowd flows into directed net-works through large scale cellular data. Second, we present ourcrowd flows network reflects several static and dynamic propertiesbetween crowds, regions and time.

3.1 Crowd Flows Network FormulationA crowd flows network is a graph where nodes represent regions.Two nodes are connected by a directed edge at a specific timeperiod if there have individuals moving from one region to anotherregion during that time period. Edge weights represent the numberof individuals moving during that time period. The crowd flowsnetwork characterizes the dynamics of people moving around anurban space. Nodes can further hold attributes such as coordinatesof center point, POI list, etc. Figure 2(a) shows a toy example of acrowd flows network including five nodes. 2(b) is the correspondingadjacency matrix. Rows in the adjacency matrix represents the nodeas origin and crowd flowsmove intowhich nodes. The sum of valuesin a row means the total outflows of the node. Similarly, columns inthe adjacency matrix represents the node as destination and crowdflows move from which nodes. The sum of values in a columnmeans the total inflows of the node.

Based on Definition 1 in Section 2, a pair of origin and destinationand the time information including start time, end time and traveltime from user trajectory can be collected. Remember that we havediscretized time into fixed-size time slots. The crowd flow networkis then defined as follows.

Page 3: Modeling Crowd Flows Network to Infer Origins and ...urbcomp.ist.psu.edu › 2018 › papers › modeling.pdf · Modeling Crowd Flows Network to Infer Origins and Destinations of

Modeling Crowd Flows Network to Infer Origins and Destinations of Crowds from Cellular Data UrbComp’18, August 2018, London, UK

Figure 3: Crowd flows network constructed from real worldcellular data with the corresponding city map. For simplic-ity, we ignored the direction of edges in the network.

Definition 2 (Crowd flows network). Given a time window sizew and a time threshold τ , we aggregate the pairs of origin anddestination from users U for everyw time slots if the travel timeis equal to τ . The crowd flows network is further defined as Gk =

(V ,Ek ).V is the stationary set of regions. Ek is the set of edges overa period k . An edge (vi ,vj ,w) ∈ Ek points from the origin vi tothe destination vj associated with a weightw ∈ [1,∞) indicatingthe number of individuals. Note that (vi ,vj ,w) is different from(vj ,vi ,w) since edge is directed.

3.2 Network Properties3.2.1 Connection Structure. Intuitively, people driving car or

taking subway can arrive farther regions in a short time in the city.Since the travel time is fixed to τ when we construct a crowd flowsnetwork, the network reflects this intuition and implicitly depictthe transport infrastructures. In the Figure 3, the dense horizontalpart is a highway corresponding the black line in the middle on themap. We observed nodes on the highway region are linked to othermore distant nodes and have wider east-west range. Therefore,the length of connection in the network can be seen as a speedfactor and the direction of connection reflects the general directionof transport. For instance, node represents a train station regionwhose connection exhibits a southwest-northeast structure meetsthe railway.

3.2.2 Region Importance. Region importance is a measure basedon global node ranking in the crowd flows network. The higherthe region importance, the more likely crowds moving into andout of the region. Therefore, the region is possibly a commercialintersection or a transport hub. The lower the region importance,the more likely the region is sparsely inhabited or inaccessible. Theregion importance is tied closely with crowd flows movement.

3.2.3 Contextual Neighbors. A node is contextual neighbor ofanother node means they are directly connected in the crowd flowsnetwork. When a direct edge existed between two regions, the

(a) Contextual Neighbors (b) Semantic Neighbors

Figure 4: An illustration of contextual and semantic neigh-bors. In the left figure, red regions are contextual neighborsof the green region. In the right figure, all red regions aresemantic neighbors each others since they have a commondestination.

region which the edge point from is an origin and the region whichthe edge point to is a destination, together they present a crowdflow movement behavior. Regions are not arbitrary connected, intu-itively, two region with close geographical distance are more likelyconnected. Also, we found that two region connected by trans-portation network are often connected and their edge weight evenhigher than two geographically close regions with the same originregions. On the other hand, contextual relationships are pertinentto time influence. For example, people tend to have a lunch nearthe working place at noon but in the evening people tend to go faraway to shopping malls, theatres and bars to relax after work.

3.2.4 Semantic Neighbors. Semantic neighbors indicate that twonodes are not necessarily connected, but they share common con-textual neighbors in the crowd flows network. Comparing to con-textual neighbors, semantic neighbors are implicit in the crowdflows network. In reality, semantic relationship reflects similar re-gion functionality between two regions. A real illustration of thisis Figure 4(b), red regions have semantic relationship each otherand their common neighbors are MRT stations on the green line inthe middle. In fact, these red regions are residential area along theMRT green line.

4 EXPERIMENTS4.1 DatasetWe use a cellular dataset from Chunghwa Telecom, which is thelargest telecommunication provider in Taiwan. The dataset containscellular data about 314,000 users from 12/31/2016 to 01/13/2017 forthe city Taipei and New Taipei, so-called Greater Taipei area. Wedivided the area into 49 x 49 regions. The size of each region is0.5km x 0.5km.

4.2 Case Studies4.2.1 Inferring K destinations of crowd flows. Given a specific

origin and K, we infer K destinations from the crowd flows network.Figure 5 shows four origins and the query results. K is set as 30and query period is on 01/09/2017 from 8 a.m. to 10 a.m. Timethreshold τ is set as 5 minutes. Four origins are located in Shilindistrict, Songshan district, Banqiao district and the vicinity of Taipei101. The query results show that K destinations which have edge

Page 4: Modeling Crowd Flows Network to Infer Origins and ...urbcomp.ist.psu.edu › 2018 › papers › modeling.pdf · Modeling Crowd Flows Network to Infer Origins and Destinations of

UrbComp’18, August 2018, London, UK Ai-Jou Chou, Gunarto Sindoro Njoo, and Wen-Chih Peng

(a) Origins

(b) Shilin (c) Songshan

(d) Banqiao (e) Taipei 101

Figure 5: Inferring K destinations (τ = 5 minutes)

(a) Shilin (b) Taipei 101

Figure 6: Inferring K destinations (τ = 10 minutes)

connected to the origin at the center. The destination with darkercolor means more people moving from the origin to the destination.

Figures 5 reveal destinations of crowds are influenced by thetransportation infrastructures as well as the region functionalityin the city. All of them contain Taipei mass rapid transit (MRT)stations serving greater Taipei. Figure 5(a) shows Shilin station ison the read line, Songshan station is a terminal station of the greenline, Banqiao and Taipei 101 station are on the blue line. Moreover,Songshan and Banqiao stations are also train stations (TRA) servedas part of conventional rail in Taiwan. Banqiao is also a High-SpeedRail (HSR) station served as a high-speed rail along the west coastof Taiwan. Therefore, Banqiao and Songshan connect to remotedestinations in Figure 5(c) and Figure 5(d) which are the other TRAand HSR stops. Furthermore, the connection structures of Shilin,Songshan, and Banqiao match the railway route on the map inFigure 5(a). Figure 5(e) shows destinations in the vicinity of Taipei101 are limit to nearby regions which is one of the most crowdeddistricts in Taipei.

4.2.2 Destinations of different time threshold. We experimentwith different time threshold τ to test our proposed crowd flowsnetwork with different travel time between origin and destination.Figure 6 presents the query results of Shilin and Taipei 101 whenτ is set as 10 minutes. K and query period are same as the previ-ous section. Comparing to Figure 5(b) when τ is set as 5 minutes,

(a) Inflow of top 4 destinations

(b) Region map

Figure 7: Top destinations of the different time period fromRegion 1543.

destinations of crowds in Figure 6(a) are reached more distinct re-gions. The farthest destination of Shilin is Fuxinggang station onthe red line. The travel time released by the Taipei Metro Companybetween Shilin and Fuxinggang station is about 14 minutes meetsour time threshold setting.

4.2.3 Destinations of different time period. We aim to presentcontextual neighbors changed over time by our proposed crowdflows network. The case can be seen as a commuting behavior ofcrowds. Based on our crowd flows network, outflow and inflowtime series between any two regions are edge weights at the dif-ferent time period. Given Region 1543 as an origin and its top 4destinations Region 1447, 1400, 1586, and 1732. Four inflow timeseries of the destinations are plotted in different colors in Figure 7.Figure 7(a) shows Region 1447 and Region 1440 have higher crowdinflows from the origin 1543 in the morning. Figure 7(b) showsthe route is from suburbs to core city. However, in the evening,Region 1586 and Region 1732 turn to have higher crowd inflowsfrom the origin 1543. Figure 7(b) shows the route is reversely fromcore city to suburbs. This is an example of the commuting behaviorof crowds in urban space. We found such similar results on theregions located between suburbs and the core city.

5 CONCLUSIONSIn this paper, we present a method to infer origins and destinationsof the crowds by modeling crowd flow network from large-scalecellular data. The crowd flows network described in this work

Page 5: Modeling Crowd Flows Network to Infer Origins and ...urbcomp.ist.psu.edu › 2018 › papers › modeling.pdf · Modeling Crowd Flows Network to Infer Origins and Destinations of

Modeling Crowd Flows Network to Infer Origins and Destinations of Crowds from Cellular Data UrbComp’18, August 2018, London, UK

could serve as the basis for understanding the interactions betweenregions, crowds, and time in the urban space. The four propertiesrelated to the crowd flow network illustrated are of importancein explaining urban dynamics. For evaluation, we have conductedseveral case studies with three different scenarios: inferring topK destinations, querying destinations of different time threshold,and querying destinations of the different time period. The resultsdemonstrate that the crowd flows network is interpretable anddistill valuable information from massive spatial- and temporal-aware data. For future work, we are exploring network embeddingmethods properly preserving the properties of crowd flows networkin other to predict origins and destinations of crowds.

REFERENCES[1] Shanshan Feng, Gao Cong, Bo An, and Yeow Meng Chee. 2017. POI2Vec: Geo-

graphical Latent Representation for Predicting Future Visitors. In Proceedings ofthe Thirty-First AAAI Conference on Artificial Intelligence.

[2] Jing He, Xin Li, Lejian Liao, Dandan Song, andWilliam K Cheung. 2016. Inferring aPersonalized Next Point-of-Interest Recommendation Model with Latent BehaviorPatterns. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence(2016), 137–143.

[3] Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. Predicting the NextLocation : A Recurrent Model with Spatial and Temporal Contexts. Proceedings ofthe Thirtieth AAAI Conference on Artificial Intelligence (2016), 194–200.

[4] Xin Liu, Yong Liu, and Xiaoli Li. 2016. Exploring the context of locations for person-alized location recommendations. In Proceedings of the Twenty-Fifth InternationalJoint Conference on Artificial Intelligence, Vol. 2016-Janua. 1188–1194.

[5] Gang Wang, Sarita Y Schoenebeck, Haitao Zheng, and Ben Y Zhao. 2016. "WillCheck-in for Badges": Understanding Bias and Misbehavior on Location-basedSocial Networks. In Proceedings of the Tenth International AAAI Conference on Weband Social Media (ICWSM). 417–426.

[6] Huaxiu Yao, Fei Wu, Jintao Ke, Xianfeng Tang, Yitian Jia, Siyu Lu, Pinghua Gong,Jieping Ye, Didi Chuxing, and Zhenhui Li. 2018. Deep Multi-View Spatial-TemporalNetwork for Taxi Demand Prediction. In Proceedings of the Thirty-Second AAAIConference on Artificial Intelligence.

[7] Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal ResidualNetworks for Citywide Crowd Flows Prediction. In Proceedings of the Thirty-FirstAAAI Conference on Artificial Intelligence.

[8] Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban Computing:Concepts, Methodologies, and Applications. ACM Transactions on IntelligentSystems and Technology 5, 3 (2014), 1–55.