Classiﬁcation and Visualization for Symbolic People Flow...

Classification and Visualization for Symbolic People Flow Data

Yuri Miyagia, Masaki Onishib, Chiemi Watanabec, Takayuki Itoha, Masahiro Takatsukad

aOchanomizu University, Japan.bNational Institute of Advanced Industrial Science and Technology, Japan.

cUniversity of Tsukuba, Japan.dThe University of Sydney, Australia.

Abstract

People flow information brings us useful knowledge in various industrial and social fields including traffic, disaster prevention, andmarketing. However, it is still an open problem to develop effective people flow analysis techniques. We considered compressionand data mining techniques are especially important for analysis and visualization of large-scale people flow datasets. This paperpresents a visualization method for large-scale people flow dataset featuring compression and data mining techniques. This methodfirstly compresses the people flow datasets using UniversalSAX, an extended method of SAX (Symbolic Aggregate Approxima-tion). Next, we apply algorithms inspired by natural language processing to extract movement patterns and classify walking routes.After this process, users can interactively observe trajectories and extracted features such as congestions and popular walking routesusing a visualization tool. We had experiments of classifying and visualizing walking routes using two types of people flow datasetrecorded at an exhibition and a corridor applying our method. The results allow us to discover characteristic movements such asstopping in front of particular exhibits, or persons who passed same places but walked at different speeds.

Keywords: People flow, Symbolic trajectory, Movement pattern, Graph visualization

1. Introduction

Security cameras record many pictures of pedestrians everyday. It is worth analyzing the pictures to discover movementpatterns of the people, since we can get useful information tosolve many social problems. For example, we can establishbetter evacuation routes, find causes of traffic jams, and comeup with product displays that attract more customers, by inter-preting the discovered movement patterns. However, it may bedifficult to understand overall trends and find important knowl-edge from large amount of people flow datasets in a short time.For example, sizes of our datasets containing people flow dur-ing several weeks may be more than 10 GB. We can preservesuch datasets; however, it is not easy to observe or edit thedatasets shortly (e.g. within a couple of minutes). It is stillan open problem to develop compression and data mining tech-niques and visualize the movement patterns discovered fromlarge-scale people flow datasets.

In this paper, we propose a visualization technique for large-scale people flow datasets. The technique firstly compressesthe people flow data applying UniversalSAX [1], an extendedimplementation of SAX (Symbolic Aggregate Approximation).It converts the numeric position data to sets of smaller sizes ofcharacter datasets. It then applies natural language processingalgorithms to the character datasets to quickly discover typical

Email addresses: [email protected] (Yuri Miyagi),[email protected] (Masaki Onishi), [email protected](Chiemi Watanabe), [email protected] (Takayuki Itoh),[email protected] (Masahiro Takatsuka)

movement patterns and classify them. Finally, the technique vi-sualizes the people flow datasets focusing on the movement pat-terns. Users can observe people flow interactively using threetypes of views, overview view, detailed view and clusteringview.

This paper shows case studies with real-world people flowdatasets in an exhibition and a corridor, then discusses effective-ness of the presented tool. In the first case, we used a 669MBpeople flow dataset and compressed it to only 115KB. Then, wevisualized characteristic points such as congestion and walkingroutes. In the second case, we applied clustering before visual-izing trajectories. We compared results of clustering followingdifferent conditions.

The remainder of this paper is organized as follows. In Sec-tion 2, we introduce related work on analysis of trajectories ofpedestrians and other moving things. Section 3 presents the de-tail of the proposed technique. Section 4 introduces case studiesusing trajectory datasets recorded at an exhibition and a corri-dor. Section 5 contains a discussion of visualizations using thetechnique. Section 6 summarizes this paper.

2. Related Work

This section introduces existing studies to analyze peopleflow and trajectories of other moving things. Visualizing tra-jectories using graphs is major technique. Hocker [2] et al. pro-posed a graph structure to represent paths of walkers and analgorithm which searches for particular trajectories. Cetinkaya[3] et al. compared 4 types of graph visualization techniques us-ing datasets of locations of states. These studies did not focus

Preprint submitted to Elsevier August 23, 2017

on summarization of trajectories, or development of interactivevisualization tools.

Not only using graphs, there have been many existing meth-ods which classify and visualize spatio-temporal people flowdata recorded as real values. Following studies commonly vi-sualized trajectories or their features on maps. Andrienko et al.[4] collected datasets from wide range using GPS, and analyzedproperties of various moving objects, using both of drawingspecific trajectories and bar charts on maps. On the contrary,our methods supposes that datasets are collected by camerasin small area like one floor in a shop. Andrienko et al. [5] alsoproposed interactive clustering method for trajectories. Further-more, they visualized popular walking patterns on maps. Guo etal. [6] developed a composite visualization tool to analyze pat-terns of various objects such as pedestrians, bicycles, and cars.They adopted not only direct drawing of trajectories on maps,but also other visualization methods including piled polylinecharts, scatterplots, and parallel coordinates plots. These tech-niques do not apply data compression techniques for trajectorydatasets. Wang et al. [7] also extended the technique presentedin [6], and visualized moving patterns of taxis in wider regions.Krueger et al. [8] presented an improved visualization systemfor chronologically GPS datasets. The main feature of this sys-tem is that users can move a circle looked like a lens, and focuson particular regions to observe detailed information such asspeeds and directions. Wang et al. [9] extracted and visualizedfeatures of automobiles passing at particular positions, apply-ing datasets collected using many sensors on roads. Lu et al.[10] proposed TrajRank, a visualization system for trajectoriesfor vehicles like taxies. They separated trajectories to segmentsand calculated their ranks to suggest characteristic travel behav-iors.

Other techniques introduced below commonly visualized tra-jectories on pictures of camera views. Mehran et al. [11] pro-posed a technique using streamline visualization and abnormal-ity detection methods. They claimed that streamlines are supe-rior to path lines. Yabushita et al. [12] proposed a techniquewhich summarizes pedestrians’ trajectories recorded at openspaces where definite routes are not constructed. This tech-nique effectively represents major routes of pedestrians; how-ever, it misses some types of important information includingtemporal tendency and walking speeds. Ko et al. [13] focusedon angles of trajectories of walking people, and extracted ir-regular movings such as a zigzag. They also tried to transformthe trajectories onto pictures of camera views, and visualizedthem in order for users to understand the angles of the trajecto-ries. Fukute et al. [14] applied a spectral clustering algorithmto pedestrians’ trajectories to classify them to meaningful setsof walking patterns, and visualized temporal transition of pop-ulations for each cluster by applying a piled polyline chart. An-drienko et al. [15] proposed “trajectory wall” as an extension ofspace-time cubes. Users can grasp trajectories in a same clus-ter which have similarities about routes. Guo et al. [16] classi-fied walkers’ trajectories according to their speed and direction.They also developed a system to visualize important trajectoriesusing meaningful colors based on HSV model. However, thetool does not support interactive trajectory selection for detail-

on-demand visualization.Following researches focused on observation of moving pat-

terns, though analysis of trajectories is not exactly a main task.Gupta et al. [17] worked on to visualize relationships amongsmall number of pedestrians. They did not visualize particularshapes of trajectories, but applied a gantt chart and visualizedplaces where people stayed. This representation is especiallyuseful for users because they can find places where multiplepersons stayed at the same time. Krueger et al. [18] presentedan improvement of another visualization system named Trav-elDiff. The system consumes messages in Twitter instead ofapplying datasets of trajectories, and visualizes movement pat-terns and crowded places. They tried to visualize three typesof datasets, for pedestrians, taxies, and airplanes, as graphs andheatmaps.

AI-Dohuki et al. [19] developed a visualization system fortrajectories of taxies. The system can visualize not only sta-tistical traffic information but also messages which answer toquestions input by users. For example, users can select particu-lar streets or time periods, and visualize related datasets.

Also, there have been many techniques on compression andpattern discovery techniques for people flow datasets. Teknomoet al. [20] analyzed moving patterns of customers at a super-market. They allocated letters to each intersection in a hy-permarket and expressed customers walking routes as strings.Also, they analyzed populations and length of walking times.Thack et al. [21] converted the spatiotemporal trajectories pre-serving the distances in the original space, and then divided thetrajectories. They succeeded to distinguish four types of tra-jectories collected under different conditions, by taking into ac-count both positions and movement patterns. Oates et al. [22]successfully extracted motifs of trajectories from noisy peopleflow datasets by applying a context-free grammar technique.These studies adopted SAX or TraSAX (an extension of SAX)during compressing and visualizing datasets, as we also adopt.However, a common issue in these studies is that they displayedall recorded trajectories as is. Their visualization results aretherefore so complex that it may be difficult to find importantroutes or places. Contrary to these methods, our techniquefirstly extracts typical movement patterns and then visualizesthem noticeably. Yada [23] analyzed movements of customersin a supermarket, and searched for popular sections. He con-verted original datasets to sequences of characters that indicatesections where customers moved to. However, important in-formation including times of staying at each section is not pre-served after converting.

3. Presented Visualization Tool

This section presents the processing flow and detailed de-scription of technical components of the presented technique.Figure 1 shows the whole processing flow of our proposed tech-nique. After the preprocessing including trajectory recording,conversion to strings and feature extraction, users can interactwith the visualization.

Section 3.1 defines the people flow datasets. Section 3.2 de-scribes a technique to convert the trajectories into sets of char-

2

Figure 1: Processing flow of the proposed technique.

acters, following by the discovery of typical movement featuresdescribed in Section 3.3. Lastly, we describe a visualizationtechnique which emphatically displays the features in Section3.4.

3.1. Recording People Flow DataWe define that a record of people flow data includes the fol-

lowing information:

• Time at which the position of a walker is measured.

• ID of the walker.

• Position of the walker in a 2D space (x, y).

We can construct a trajectory of this walker by collectingthe records which have the particular ID corresponding to thewalker, and then chronologically ordering the collected records.

Our current implementation uses a RGB-D camera Xtion torecord the people flow data applying a technique proposed in[24]. We assigned a particular ID to each walker, and measuredpositions of heads of pedestrians every dozens of milliseconds.Xtion can measure positions of pedestrians in a real three di-mensional space; however, we adopted only two dimensionalcoordinates on floors.

3.2. Converting People Flow Data to Sequences of CharactersIn the next process, we generate sequences of characters from

position values in the people flow datasets, in order to reducethe data sizes and make it easier to extract movement features.Figure 2 illustrates this process. We apply UniversalSAX [1],

Figure 2: Flow of converting position values.

an extended implementation of SAX (Symbolic Aggregate Ap-proximation) which converts time series data recorded as realvalues to sequences of characters. There have been severalother techniques on extended implementation of SAX to dealwith multidimensional real values; UniversalSAX has advan-tages against other techniques on preservation of numeric fea-tures of all dimensions and distances among data items.

The following briefly describes the processing flow of thesetup phase of UniversalSAX:

1. Divide multidimensional space to multiple regions, andgenerate a distances table among the regions.

2. Allocate a particular character to each of regions so thatwe can convert positions described as real values to char-acters.

Users can adjust the resolution of space division with the fol-lowing four parameters:

d : dimension of data

2q : number of partitions in each axis

z : number of characters

2b : threshold value to divide large regions

UniversalSAX assigns characters to region by applying aHilbert curve, a kind of space-filling curves. In this process,we firstly separate a d-dimensional space (d is 2 in this study)to lattices where each axis is divided to 2q segments. Then,we generate a Hilbert curve that passes every block once. Ablock is a smallest square which composes regions. Each blockis assigned a sequential number that indicates the order whichthe Hilbert curve passes. Unlike other space-filling curvessuch as Z-ordering, Hilbert curve always passes an adjoiningblock right after passed one block. Pairs of neighbor blocksare assigned closer numbers, while apart blocks are assignedentirely different ones. Therefore, the one dimensional blocknumbers represent original coordinates in a multi-dimensionalspace, preserving correlations of distances among data items.Next, we generate z regions by grouping these blocks, and as-signs characters alphabetically to each region in the order ofthe sequential numbers of the blocks. The number of blocksbelonging to one region depends on distribution of data items(i.e. walkers).

3

Figure 3: Applying run length encoding to a walking route.

Smaller number of blocks are assigned to regions that manydata items belong to. On the contrary, larger number of blocksare assigned to a region if few data items belong to. Thereare two reasons to adjust areas of regions according to den-sity of data items, not regularly separating the space in a gridpattern. One is to keep fine resolution to preserve more accu-rate spatial information in high density regions such as crowdedplaces. Crowded place is usually worth paying attention to un-derstand walking patterns of people. The other is to avoid lackof characters to assign as names of regions by too fine separa-tion at sparse regions. At the same time, this process saves atable of distances among all regions conveniently. We can re-fer the table while calculating distances among walking routes.This lookup-table-based implementation reduces the computa-tion time of this process.

After completing the setup process described above, we canconvert people flow datasets to sequences of characters whichform much smaller datasets. In this process, we firstly apply anAffin transformation to positions at each time step of the trajec-tories to complete the calibration. Then, we generate sequencesof characters from pedestrians’ walking routes with the regionsdivided by the setup process. The following set G is a group ofcharacters for names of regions.

G = {g1 g2 ... gz} (1)

gi (i = 1, 2, ... , z) are characters as names of regions, selectedfrom ‘A’-‘Z’, ‘a’-‘z’, and some symbols such as ‘\’, ‘[’, and ‘]’in this order. Then, the process chooses l letters as s j ( j =1, 2, ... , l) from gi allowing duplication in order to representa walking route. The selected s j are chronologically ordered,and compose one string S k (k = 1, 2, ... , p) which mean awalking route. p is the number of recorded walkers. Then wegenerate P as a collections of walking routes.

P = {S 1 S 2, ... S p} (2)

The sequence of characters usually construct much smallerdatasets comparing with the original people flow datasets. Tocompress the datasets further, we apply the Run-Length en-coding to the sequences of characters, as shown in Figure 3.From each S k, we generate p Run-Length codes defined as S ′k.Run-length encoding is a reversible compression method whichis especially effective for sequences which include repetitiousappearance of any letters; it corresponds to a situation that a

walker stays for a while in one region. Contribution of run-length encoding is not only the compression of the sequencesof characters: it also assists the discovery of places which walk-ers stay for a while, by searching for continuously appearingcharacters.

Above process is deeply related to complexity and scalabilityof our technique. We suppose that one character correspondsto a name of a region. The maximum number of regions areapproximately 60 which is a total of available alphabets andother characters in our implementation. Users have to apply re-gion division and generation of graphs to datasets in each Xtionwhich can record about 5 square meters, if they analyze widerareas at the same time.

3.3. Extracting Features of People Flow

We extract features in the sequences of characters to under-stand property of people flow datasets. In this study, we focusedon following features to find.

• Congestions.

• Changes of populations in different time periods.

• Representative walking routes.

One of the features is where people stop walking. We can dis-cover congestions and their average staying time, by searchingfor the gi which continuously appear in S k. Furthermore, wecalculate distances among the sequences of characters to sum-marize or search for particular walking routes.

Before calculating distances, we correct abnormally longstrings generated by the people flow conversion process de-scribed in Section 3.2 to calculate appropriate distances. (SeeFigure 4.) Usually, Run-Length codes generated from move-ments covering wide places get longer than other ones frommovements staying at limited regions. However, if a walkerstopped on a border of two regions ‘A’ and ‘B’, positionsrecorded at regular time intervals may be occasionally judgedas ‘A’ and converted to ‘B’ at other times, affected by noises orother factors. It happens that a generated Run-length code havealternately appearing ‘A’ and ‘B’ with small numbers whichmeans short staying times while converting such positions. Inother words, the code means round trips between two regions‘A’ and ‘B’, even though the walker actually stopped. Focus-ing on camera view in this study, whole view is about fivesquare meters, and approximate area of each region is fromtens of square centimeters to one square meter, so actually itdoes not often happen that walkers have such round trips manytimes.）To solve this problem, we arrange such Run-lengthcodes. Specifically, we convert repetitions of two letters andstaying times at each region, to one of the letters and stayingtime there. We select one letter which has longer staying timethan the other regardless of a degree of difference, and set newstaying times by accumulating staying times at each region. Inthis implementation, we correct parts which have repetitions ofpatterns “ABA” of way points (like “ABABA”). We do not ad-just only one appearing (like “ABA”), since it is likely a realturning.

4

Figure 4: Correction of abnormal strings.

Figure 5: An example of calculating original Levenshtein Distance.

After this process, we calculate distances among S ′k. Weadopt Levenshtein Distance (LD), one of popular definitionsof distances between sequences of characters. An advantageof applying LD is that we can easily extend LD fitting to data,because the algorithm of LD calculation is simple. We sup-pose to use the distances in searching for specified trajectoriesand clustering. Users can control conditions of these processesby replacing the formula of the distance. We can calculate LDbetween every (S ′1, S ′2) pair, even if S ′1 and S ′2 have differentlengths. If length of S ′1 is m1, and that of S ′2 is m2, time com-plexity to calculate LD between them is O(m1m2). This methoddetermines how many operations (insert, delete or replace onecharacter) are required to convert S ′1 to S ′2. The number of op-erations corresponds to the distance between sequences S ′1 andS ′2. We regard S ′1 as closer to S ′2, if the less number of operationis required to convert S ′1 to S ′2. Specifically, this method calcu-lates sums of conversion costs, while supposing costs of anyoperations as 1. We can determine the smallest cost to convertS ′1 to S ′2 applying dynamic programming. (See Figure 5.)

Here, original definition of LD causes a problem in our study.Degrees of a difference between two characters are ignored inthe original definition since all costs are uniformly defined as 1.For this reason, we cannot calculate degrees of differences be-tween any two letters, and express differences among walkingroutes accurately. Suppose three region ‘A’, ‘B’, and ‘C’, anda walking route“ABC”, and compare this walking route “ABC”to other walking routes “ADC” and “AZC”, by calculating dif-ferences using LD. If region ‘D’ is close to ‘B’, and region

Figure 6: Costs of each operation.

‘Z’ is far from ‘B’, we should set a distance between “ABC”and “AZC” larger than that of “ABC” and “ADC”. However, inpractice LD(“ABC”, “ADC”) and LD(“ABC”, “AZC”) get thesame value 1, and no differences between the distances are pro-duced.

To solve this problem, we apply a weighted LD (WLD) [25]which can reflect differences between arbitrary pairs of char-acters by flexibly adjusting costs of insert, delete and replaceoperations. Furthermore, we regard a pair of one name of re-gion and staying time there as one unit u j′ ( j′ = 1, 2, ... , l′)and calculate sum of cost of operating one unit. We defineS ′k = (u1, u2, ..., ul′ ), where t j′ denotes how many times g j′ con-tinuously appeared:　

u j′ =

g j′ (t j′ = 1)g j′ t j′ (otherwise)

(3)

We define costs of each operations as shown in Figure 6 usingthese definitions.

ud(u j′1 , u j′2 ) denotes distances between two units defined as:

ud(u j′1 , u j′2 ) = d(g j′1 , g j′2 ) + wtime

√|t j′1 − t j′2 | (4)

The first term denotes distances between two regions, and thesecond one means staying times at each region. We simply re-fer a lookup-table generated by the setup process described inSection 3.2, to identify a distance between two regions g j′1 andg j′2 as d(g j′1 , g j′2 ). c(g j′1 , g j′2 ), which is a cost to operate usingg j′1 and g j′2 , increases when the region g j′1 and g j′2 are far awaysince d(g j′1 , g j′2 ) gets higher value. wtime is weight of influenceof staying times. If wtime is zero, the system distinguish walk-ing routes according to only shapes or way points. We definesame formulas for inserting and deleting operations since theyare just opposite each other. We treat the formula as insertingoperation if the modified unit is at the top of S ′k. Conversely,we treat it as deleting operation if the unit is at the last. We canchange in values of costs by not only which type of operation isapplied but also to which letter we apply operations, by usingd(g j′1 , g j′2 ). If two S ′k include different letters, we can express notonly information that way points are not same, but also degreesof differences between g j′1 and g j′2 . At this moment, we add apositive real number wlength to costs of inserting and deleting, so

5

as to make these costs larger than cost of converting. When wecalculate LD between two S ′k that have different lengths, insertand delete tend to be required operations. Hence, adding wlength

causes that LD between S ′k that have different lengths get larger.Length of S ′k denotes how many regions does the walker pass,and get a different value whether the walker moved wide areaor not. We can classify these movements easily using u j′ . Itis possible to define an operation to swap the order of a pairof characters as one operation. We did not implement it as anindividual operation because a sequence and its reverse cannotbe treated as the same meaning; such sequences correspond toopposite directions of walking routes in our study.

In summary, LD calculation in our study takes into accountthe following features:

• Places where walkers passed.

• Lengths of the walking routes.

• Directions.

• Staying times.

We can classify trajectories or search for particular walkingroutes using results of these calculations, and show walking pat-terns such as typical walking routes to observers. Users can se-lect a method to display the patterns. For example, if they wantto count numbers of persons who passed specified routes, theycan input the route as search term and find corresponding trajec-tories. On the other hand, when using people flow data recordedat places which do not have designated walking routes, userscan hardly understand existing patterns of movements. Thenwe apply clustering to dataset of strings, and divide them tosome groups automatically.

We select K-medoids as a method of clustering. K-medoidsis a non-hierarchical clustering method which is suitable fordatasets containing different lengths of trajectories. The processis similar to the major non-hierarchical clustering algorithm K-means. But K-medoids applies not centroids but medoids se-lected from existing elements. That means we have to calculateonly distances among elements. Instead of describing walk-ing routes by vectors that have same length, we calculate dis-tance matrix among elements as mentioned above, and applyK-medoids clustering. Users have to carry out the calculationof the matrix before the clustering only once. Hence, we havenot developed user interfaces for the processes which we men-tioned in section 3.1, 3.2, and 3.3. This process takes the largestpart of the computation time in our technique, because the sys-tem compares all the possible pairs of trajectories.

3.4. Visualizing Walking RoutesFinally, we visualize people flow data as sequences of char-

acters emphasizing its features. Briefly, users can observe threetypes of visualizations as shown in Figure 1. Overview anddetailed views consume strings generated by the process intro-duced in Section 3.2. Clustering view needs a result of cluster-ing which is mentioned in Section 3.3. This process displaysthe connections of the nodes based on the sequences of charac-ters to represent the walking routes. Figure 7 shows an interface

Figure 7: An interface of the visualization tool.

for the visualization. Users can select contents to visualize us-ing a panel on the left side. At the same time, the system drawsa graph on the right side. We generate nodes at the centers ofthe regions divided by the setup process described in Section3.2. We propose two types of operations with this representa-tion. One is to represent abstract information of the data withthe overview, and the other is to select particular regions inter-actively and then display detailed information at the selectedregions. Users can find multiple regions that are worth observ-ing by selectively displaying the following:

• Regions in which many walkers densely stopped.

• Number of walkers moving across a particular pair of re-gions.

We visualize the congestion of walkers at each region bydrawing circles at the nodes. Their radii depict the numbers ofwalkers divided by area of each region, and their colors depictthe average staying times of the walkers at each region. Warmercolors are assigned to the circles when the average staying timesare longer. We also represent the populations of walkers mov-ing across the pairs of regions by widths of segments connect-ing the corresponding pairs or nodes.

Using the functions mentioned above, users can narrow downseveral regions that worth exploring finely. They can select par-ticular regions and observe detailed information related to theregions using the following:

• Animation of walking routes.

• Search function for walkers who passed a particular route.

Users can observe animations that show particular walkingroutes. Our implementation draws segments connecting pairsof regions increasingly in the temporal order. Users can selectsegments to be drawn by selecting a node and a radio button.Specifically, our implementation allows to interactively select aparticular region to draw segments representing walkers leav-ing or coming to. We adjust densities in segments reflectingthe number of the segments. Our implementation separately

6

Figure 8: Visualizations of small number of segments (left) and bundles ofsimilar segments(right).

draws each segment to directly compare them, as shown inFigure 8(left), if there are small number of segments to dis-play. Otherwise, our implementation bundles similar segmentsso that they look like a single thick segment, as shown in Figure8(right). This representation avoids too complex visualizationresults caused by huge number of scattered visible segments.Colors of the segments indicate the time when walkers passedthe selected region: earlier walking routes are drawn in warmercolors, while later routes are in cooler colors. Users can under-stand various features of the people flow, such as where walk-ers passed continuously, and which time the regions are mostcrowded, by observing the visualization results.

Our implementation also provides a user interface for searchoperations to specify the segments to be drawn. When auser enters a keyword associated to a particular walking route,our implementation calculates distances between the specifiedwalking routes and others. Keyword is a Run-Length codelike ”A2B4C”, or a connection of region names like ”ABC”.Then, only walking routes similar to the specified walking routeare displayed. The presented technique can assist the under-standing of the peculiarity of walkers’ actions sufficiently andquickly, by visualizing people flow data emphasizing the ten-dency of the walkers.

Our implementation also features visualization of clusteringresults. Users can choose all clusters or one of them to be dis-played. Our implementation specifies colors of clusters basedon the following rule using the HSB model as shown in Figure9.

Hue =300 v

(V − 1)(5)

S aturation = smin + wsattgv

pv(6)

Brightness = bmin + wbright (255 − bmin)pv

p(7)

In fomula (5), V is the total number of clusters, and v is thenumber of selected clusters to be visualized. Particular hues areassigned to each of the clusters, for example cluster 0 gets red,and cluster 5 gets blue.

Saturation is also specified for each region. smin is a naturalnumber to avoid S gets too low and therefore it gets difficultto visually distinguish hues of clusters.wsat arranges ranges of

Figure 9: Selecting colors of trajectories.

saturation. tgv is the average staying time of people who passedthe region g and belonged to the cluster v. Vivid colors areassigned to places that many people tend to stop.

Brightness denotes populations of each cluster in our imple-mentation. Here, we would like to discover typical moving pat-terns rather than exceptional ones in this study. Thus, we colorclusters that have many trajectories lighter in order to make theclusters conspicuous. If the clusters have larger brightness, it iseasier to visually recognize changes of colors while increasingor decreasing their saturation. Therefore, users can recognizedifferences of average staying times at each region. bmin is anatural number to avoid assimilation of trajectories and blackbackground. vsat sets amount of changes when pv changes.

We also draw yellow hexagons on departure points of tra-jectories, and white rhombuses on their arrival points to de-pict moving directions. Sizes of the figures reflect numbers ofpassed trajectories there.

4. Experiments

4.1. Exhibition use case

We visualized people flow data recorded in an exhibition.There were two doorways in the room of the exhibition. Onelied at the lower edge in the recorded picture, while the otherwas at the upper right corner. There were exhibits in the left andupper sides, as shown in Figure 10. The dataset contained 5531trajectories recorded for 8 hours (9:00 to 17:00). We dividedthe whole floor in the recorded view to 44 regions by Univer-salSAX, and assigned characters ‘A’-‘Z’, ‘\’, ‘]’, ‘ˆ’, ‘ ’, ‘”,‘a’-‘l’ to the regions. The size of original dataset was 669MB.It was compressed into 115KB including 62KB sequences ofcharacters and 53KB region information after applying Univer-salSAX. We firstly visualized congestions to briefly understandthe tendency of the people flow as shown in Figure 11, and thensearched for where many people passed.

There were many large circles painted in yellow or orange,from lower-left to upper-right regions, in the visualization re-sult. It depicted the regions that were on a walking route wheremany people passed smoothly. On the other hand, circles atupper-left regions got red, which suggested certain number ofwalkers stopped in front of exhibit there to look carefully. Es-pecially, circles at the upper-left corner were larger than others,

7

Figure 10: Camera view.

Figure 11: Visualization of congestions.

which depicted exhibits on the circles got attention of the par-ticipants.

Next, we visualized particular walking routes and comparedpopulations in each time periods, as shown in Figure 12. Thesystem generated a graph shown in Figure 12. We added an-notations regarding the room and colors of nodes. We coloredtrajectories based on HSV model, where higher hue values cor-respond to movements that occurred later. Each colors corre-spond to a particular hour; for example, segments drawn in reddepict walkers’ movement during 9:00 to 10:00. We could findvarious colors from red to purple near the exhibits, which il-lustrates the exhibit attracted people constantly. In particular,there were more dark blue segments corresponding to partici-pants visited during 14:00 to 15:00, which illustrates more par-ticipants visited the exhibits around midday compared to othertimes.

Then, we selected more specific walking routes as sequen-tial characters, and counted numbers of people who passed theroutes by the search function. Figure 13 shows eight types ofroutes that we searched for. We focused on two doors and theexhibition at the upper left corner, and specified these routesbased on their directions and way points. We manually dreweight arrows which depict each routes as input. There werelarger number of walkers (corresponding to routes 1, 3, and 5

Figure 12: Visualizing walking routes passing through the selected region.

Figure 13: Result of counting numbers of people.

in Figure 13) coming from door A, while fewer other walkers(corresponding to routes 2, 4, and 6 in Figure 13) came fromthe opposite direction. Totally, many people entered the roomfrom door A. As above, exhibits constantly had walkers’ atten-tion enough to make them stop walking. On the other hand,several walkers (corresponding to routes 5 and 6 in Figure 13)straightly moved from a door to the other, not passing in front ofthe exhibits. Moreover, small number of walkers (correspond-ing to routes 7 and 8 in Figure 13) turned buck after moving tothe upper left exhibition. These indicate that we ought to devisearrangement of the exhibition to attract more visitors.

As introduced above, we successfully visualized various fea-tures of the people flow. There might be several lacks of fineinformation comparing with the original data, such as exactpassing time and smooth geometry of the walking routes. Onthe other hand, we could quickly find the properties of peo-ple flow using only 62KB sequences of characters and 53KBregion information, compressed from 669MB of the originaldata. This section illustrated that the presented visualization

8

Figure 14: Camera view at a corridor and a result of visualizing passable routes.

technique succeeds to reduce the costs required to analyze peo-ple flow data.

4.2. Corridor use case

This subsection introduces results of another experiment us-ing datasets recorded at a corridor next to the exhibition room,as shown in Figure 14. We compressed the dataset including302 trajectories from 5MB to 8KB (99.84% Compression rate).We intentionally used this small dataset, because we repeatedlyapplied clustering while changing parameter values for distancecalculation. Remark that users do not have to repeat cluster-ing in practical usage. We applied clustering to visualize thedatasets, changing in numbers of clusters, weights of featureswhile calculating distances. We selected numbers of clusters as5, 10, and 15. We found 5 was not an appropriate number torepresent every known movement patterns. On the other hand,unnoticeable trajectories formed a single cluster when the num-ber is 10 or more. We changed in the weights as follows.

• wlength = 0.1 or 1.0

• wtimes = 0.0 or 5.0

Major differences among clusters are not observed while chang-ing in weights. Comparing constructions where wtimes is 0.0 or5.0, we can find correspondence among clusters and how tra-jectories belonged to different clusters.

In this experiment, we verified efficiency of correction forabnormally long strings that we described in section 3.3. Thesystem generated unnatural clusters from strings when we didnot apply the correction process. The clusters included partic-ular trajectories that are similarly very long but contain variousletters. On the other hand, visualization results seemed to bemore natural when we applied the corrected dataset, becausesuch unnatural clusters disappeared. Figure 15 shows the dis-tribution of trajectories while staying time information is nottaken into account. Figure 16 shows the result when stayingtime information is taken into account.

Figure 17 is an example of visualizing results of clustering.Trajectories in these clusters did not disperse to many types ofclusters. Clusters in the left column are results where wtimes =

0.0, and the right ones are generated where wtimes = 5.0. Whilevisualizing each cluster, we can find that trajectories in onecluster have a couple of representative departure and arrivalpoints. On the other hand, when wtimes = 5.0, many trajecto-ries belong to clusters in the right column. Arrows depict how

Figure 15: Generated clusters excluding staying time information.

Figure 16: Generated clusters including staying time information.

many trajectories belong to clusters in the left column movedto clusters in the right column. Observing the results of visual-izing cluster 1 and 9 in the right column, shapes of clusters aresimilar, though they have different saturations. Whole satura-tions of cluster 1 is higher than those of cluster 9. That meanscluster 1 includes trajectories of walkers who stopped or walkedslowly in the way, and trajectories belonged to cluster 9 rep-resent moving people. However, while increasing weights oftimes, representative departure points are scattered and classi-fying based on way points are ambiguous. Specifically, trajec-tories started from the upper border and went to the lower oneand ones from door of the left room are mixed. Users shouldchange in weights based on whether they want to focus on waypoints or times to move.

Figure 18 is a part of a clustering result where the numberof clusters is 15. We focused on cluster 14 generated wherewtimes = 0.0, and traversed trajectories in the cluster. Whilesetting wtimes = 5.0, almost 80 percent of trajectories went tofive clusters in Figure 18, and approximately half of trajectoriesmoved into cluster 1 or cluster 14. Cluster 1 includes trajecto-ries that passed to the left side or the corridor without stoppingfor a long time. On the other hand, cluster 14 includes trajec-tories at the right side and high saturations which depict longerstaying time. Trajectories that passed the left side denote peo-ple who came from the room and turned right soon to move.Meanwhile, people at the right side of the corridor are not al-ways moving, but stopped near the wall so as not to distributewalkers. Thus, we can separate these different types of move-ment not from way points but staying times.

9

Figure 17: Comparing affiliations of trajectories in particular clusters.

Figure 18: Dividing trajectories which passed almost same places by featuresof staying times.

5. Discussion

The presented technique can deal with huge people flowdatasets and simply visualize movement patterns discoveredfrom the datasets, as solutions of the two problems which wementioned in section 1. Compression by UniversalSAX andRun-Length encoding can archive close to 100% of compres-sion rates. The rate is high enough to increase capacity forlarge datasets and process the data in a short time. Our sys-tem can reduce the sizes of processing data while preservinginformation of way points and staying times. These propertieslead the results of the visualizations. This paper introduced that

the presented analysis and visualization techniques generatedmeaningful results from the compressed data. We found vari-ous information; such as crowded places and popular walkingroutes. Finding such information is useful in a variety of loca-tions including markets and stations, as well as exhibitions. Es-pecially, the system distinguished way points and staying timesby the formulas defined in section 3.3. By arranging the valuesof weights in the calculation of distances, we found differenttypes of patterns. For example, we found particular walkingpatterns including staying next to the wall in the corridor.

On the other hand, we found remaining issues in our tech-nique. First, the bundling is not exactly effective in this ex-periment. The system visualized trajectories along edges of agraph. However, we cannot easily and accurately understand di-rections of the walkers. Also, it is also difficult to correctly un-derstand complex patterns, such as patterns generated by peo-ple who repeatedly passed the same place. Second, the ruleof colors for the visualization of clustering results may cause acouple of faults. When the values get low, users cannot alwaysdistinguish the trajectories, since the trajectories get dark. Inthe current implementation, we defined representative walkingpatterns and crowded places as primary targets which we wantto find. They correspond to highlighted trajectories which havehigh values of brightness or saturation. Therefore, we couldfind such trajectories or clusters quickly from the results. Onthe other hand, even though dark trajectories are not the primarytargets of the visualization, they also depict useful informationsuch as irregular movings. So we should develop better rules ofcolors to catch such trajectories quickly.

6. Conclusions

In this paper, we proposed a technique to visualize featuresof people flow data by converting real values of walking routesinto sequences of characters. This technique allows users find-ing nature of people flow and important factors quickly. Es-pecially, we focused on two kinds of features, way points andstaying times, then showed they are effective to classify trajec-tories.

Future issues of this study are the following.

• Improvement of formulas and conditions of clustering.

• More functionality for searching for walking routes.

• Manual specification of region division in the setup pro-cess.

• Inferring the semantics of walking actions.

In these experiments, we demonstrated shapes of trajecto-ries and staying times are important to distinguish movementsof people. On the other hand, we should discuss further goalsand methodologies, such as how to automatically specify ap-propriate number of clusters for proper classification, or how todetermine balance of weights for calculating distances amongsequences of characters.

10

We would like to develop additional user-specified condi-tions to the search user interface. We would like to ad-just the costs for distance calculation according to the user-specified conditions, so that we can search for various walk-ing routes from various viewpoints. It is also useful to developa sketch user interface to query the walking routes with arbi-trarily shaped trajectories, instead of specifying sequences ofcharacters.

In current processing of region division, we cannot specifypositions and shapes of borders between the regions. We wouldlike to develop a user interface to specify the attributes, posi-tions, and shapes of the regions for the cases that we know theobjects in the scene which should divide the regions.

In addition to the above development, we would like to chal-lenge how we can visualize the semantics of walking actionsfrom the results of summarization and grouping of walkingroutes. Adopting a method of analyzing actions of people, wewould like to visualize its result with trajectories.

References

[1] A. Onishi, C. Watanabe: Universal SAX: Applied SAX to the Multidi-mensional Time Series Data Using the Space Filling Curve, DBSJ Jour-nal, 11 (1), 43-48, 2012.

[2] M. Hocker, V. Berkhahn, A. Kneidl, A. Borrmann and, W. Klein: Graph-based approaches for simulating pedestrian dynamics in building mod-els, eWork and eBusiness in Architecture, Engineering and Construction,389-394, 2010.

[3] E. K. Cetinkaya, M. J. F. Alenazi, Y. Cheng, A. M. Peck, J. P. G. Sterbenz:A comparative analysis of geometric graph models for modelling back-bone networks, Optical Switching and Networking, 14, 95-106, 2014.

[4] G. Andrienko, N. Andrienko, S. Wrobel: Visual Analytics Tools for Anal-ysis of Movement Data, ACM SIGKDD Explorations Newsletter, 9 (2),38-46, 2007.

[5] G. Andrienko, N. Andrienko, S. Rinzivillo, M. Nanni, D. Pedreschi, F.Giannotti: Interactive visual clustering of large collections of trajectories,2009 IEEE Symposium on Visual Analytics Science and Technology, 3-10, 2009.

[6] H. Guo, Z. Wang, B. Yu, H. Zhao, X. Yuan: TripVista: Triple PerspectiveVisual Trajectory Analytics and its application on microscopic traffic dataat a road intersection, IEEE Pacific Visualization Symposium, 163-170,2011.

[7] Z. Wang, H. Guo, X. Yuan, H. Liu, H. Zhang: Discovery Exhibition:Visual Analysis on Traffic Trajectory Data, Poster Proceedings of IEEEVisualization Discovery Exhibition, 2011.

[8] R. Krueger, D. Thom, M. Woerner, H. Bosch, and T. Ertl: Trajecto-ryLenses - A Set-based Filtering and Exploration Technique for Long-term Trajectory Data, Computer Graphics Forum (proceedings of TheEurographics Conference on Visualization), 32 (3), 451-460, 2013.

[9] Z. Wang, T. Ye, M. Lu, X. Yuan, H. Qu, J. Yuan, Q. Wu: Visual Explo-ration of Sparse Traffic Trajectory Data, IEEE Transactions on Visualiza-tion and Computer Graphics, 20 (12), 1813-1822, 2014.

[10] M. Lu, Z. Wang, X. Yuan: TrajRank: Exploring travel behaviour on aroute by trajectory ranking, IEEE Pacific Visualization Symposium, 311-318, 2015.

[11] R. Mehran, B. E. Moore, M. Shah: A Streakline Representation of Flowin Crowded Scenes, 11th European conference on computer vision con-ference on Computer vision, 439-452, 2010.

[12] H. Yabushita, T. Itoh: Summarization and Visualization of PedestrianTracking Data, 15th International Conference on Information Visualisa-tion (IV2011), 537-542, 2011.

[13] J. G. Ko, J. H. Yoo: Rectified Trajectory Analysis Based Abnormal Loi-tering Detection for Video Surveillance, 1st International Conference onArtificial Intelligence, Modelling and Simulation, 289-293, 2013.

[14] A. Fukute, M. Onishi, T. Itoh: A Linked Visualization of Trajectory andFlow Quantity to Support Analysis of People Flow, 17th InternationalConference on Information Visualisation (IV2013), 561-567, 2013.

[15] G. Andrienko, N. Andrienko, H. Schumann, C. Tominski: Visualizationof Trajectory Attributes in Space-Time Cube and Trajectory Wall, Car-tography from Pole to Pole - Lecture Notes in Geoinformation and Car-tography, 157-163, 2014.

[16] Y. Guo, Q. Xu, X. Li, X. Luo, M. Sbert: A New Scheme for TrajectoryVisualization, 18th International Conference on Information Visualisation(IV2014), 40-45, 2014.

[17] S. Gupta, M. Dumas, M. J. McGuffin, T. Kapler: MovementSlicer: BetterGantt charts for visualizing behaviors and meetings in movement data,IEEE Pacific Visualization Symposium, 168-175, 2016.

[18] R. Krueger, G. Sun, F. Beck, R. Liang, T. Ertl: TravelDiff: Visual compar-ison analytics for massive movement patterns derived from Twitter, IEEEPacific Visualization Symposium, 176-183, 2016.

[19] S. Al-Dohuki, F. Kamw, Y. Zhao, C. Ma, Y. Wu, J. Yang, X. Ye, F. Wang,X. Li, W. Chen: SemanticTraj: A New Approach to Interacting with Mas-sive Taxi Trajectories, IEEE Transactions On Visualization and ComputerGraphics, 23 (1), 11-20, 2017.

[20] K. Teknomo, G. P. Gerilla: Pedestrian Static Trajectory Analysis of aHypermarket, Proceedings of the Eastern Asia Society for TransportationStudies, 7, 2009.

[21] N. H. Thach, E. Suzuki: A Symbolic Representation for Trajectory Data,The Japanese Society Artificial Intelligence, 1A2-2, 2010.

[22] T. Oates, A. P. Boedihardjo, J. Lin, C. Chen, S. Frankenstein, S. Gandhi:Motif Discovory in Spatial Trajectories using Grammar Inference, ACM

11

International Conference on Information and Knowledge Management(CIKM 2013), 1465-1468, 2013.

[23] K. Yada: String analysis technique for shopping path in a supermarket,Journal of Intelligent Information Systems, 36 (3), 385-402. 2011.

[24] M. Onishi, I. Yoda: Dynamic Trajectory Extraction from Stereo VisionUsing Fuzzy Clustering, The transactions of the Institute of Electrical En-gineers of Japan. C, A publication of Electronics, Information and SystemSociety, 128 (9), 1438-1446, 2008.

[25] K. Yarimizu: “Weighted Levenshtein Distance” for Calculating the Dis-tance between Words, Meikai Japanese language journal (18), 179-194,2013.

12

Classiﬁcation and Visualization for Symbolic People Flow...

Documents

Transcript of Classiﬁcation and Visualization for Symbolic People Flow...