Enhanching in Vehicle Digital Maps via GPS...

1

Enhanching in Vehicle Digital Maps via GPSCrowdsourcing

AbstractmdashIn this paper we propose a simple and effectivemethod to extend digital maps with the location and timing ofstop-signs and traffic lights in a city given GPS traces collectedby on-road vehicles Our system finds the location and timingof traffic lights and stop-signs using a small number of tracesper road segment We developed and evaluated the system byapplying it to two sets of GPS traces mdashopen street map a multi-city publicly available database of GPS traces and a set of tracescollected in west Los Angeles Evaluation results show that oursystem can estimate the locationtype of stop-signs with as littleas 5 traversal per road segment and the locationtiming of trafficlights with as little as 7 traversals with more than 90 accuracy

I INTRODUCTION

Digital Maps are part of our daily activities mdashthey conveyvital information for a plethora of location-enhanced appli-cations such as Yelp Flicker Facebook Google Maps orsimply location-aware searches PDA and smartphone usersare progressively more dependent on GPS and location basedservices (LBS) for both in vehicle and pedestrian naviga-tion [1] Map services are also evolving in nature from uni-directional to bi-directional mdashthe street topology is completedwith dynamic information including traffic and user-createdcontents Current market trends show an increasing need ofup-to date geo-referenced information as a key componentin the design of mobile systems such as networked vehiclesparticipatory sensing mobile social networks or personal fit-ness [2][3] The Android based Google Maps-Navigation forinstance is one of the first pro-sumer based navigation servicesfor smartphones Participating users upload GPS probes ontogoogle servers User-provided content is used in the trafficestimation together with more traditional data sources suchas roadway sensors cellular video etc [4][5] The currentbusiness model based on few companies that pay up-frontfor the hefty costs to build the world geographical databaseis no longer able to sustain the increasing demand for newdata attributes such as realtime traffic points of interest userreviews etc Navteq and Teleatlas the two major map providersfor both in-dash and on line navigation services are forced torely gradually more on user provided data [6][7] Similarlyon-line and in-dash navigation providers such as Google andTomTom are using customer provided data to increase thedata accuracy and definition while cutting data-licensing costsand reducing the dependency from cartography providers [8]Finally consumers gained an essential role in the processof building open-source digital maps as demonstrated by theOpen Street Map project (OSM) an on line-site designedto build a free editable map of the whole planet derivedfrom GPS traces uploaded by users worldwide [9] Updatabledigital maps are poised to become an enormous enabler of

highly dynamic and interactive mobile systems that piggybackon few user-provided information to convey a plethora ofadvanced applications to a large base of consumers [10]Future navigation systems for example may employ dynamicextended attributes on stops signs traffic lights and trafficstatus to perform a multi-commodity optimization on carbonfootprint commuting time and traffic flow at the same time

In this paper we show how GPS data collected from personaland in-vehicle devices can be used to extend digital mapswith the location of stop signs traffic lights and relativetiming mdashthese information are today rare and local to eachmunicipality We present an algorithm that can extract usefultopology information from such collective GPS traces andshow that even a tiny fraction of vehicles so equipped canenable the extension of digital maps with new attributes GPSdevices introduce inaccuracies and discontinuities in the datamaking the identification of map features more difficult Forinstance the sampling may suffer from sudden discontinuitieschanges in sample frequency and location errors thus reportingas moving a vehicle lined-up at the traffic light We copewith such artifacts by performing a timelocation correlationanalysis on the GPS traces to identify unique patterns on eachstreet-segment and intersection that indicate the likely locationand timing of stop-signs and traffic lights Furthermore weshow that the proposed algorithms yet work with missing datacorrectly identifying the location of traffic lights and stop signsusing only the data relative to a subset of the roads formingthe intersection For evaluation we applied our method totwo different trace collections the Open Street Maps publicdatabase that contains more than 300000 GPS traces worldwide and a set of traces we collected in the greater LosAngeles area

II SYSTEM DESCRIPTION

The core of our system can be summarized as the sequenceof the following basic tasksMap Information Extraction and processing the raw filescontaining the digital maps are parsed and the information ishomogenized and loaded into efficient data structuresTrace Selection the GPS traces are preprocessed discardingthe ones that present too many discontinuities and changes inthe sampling frequencyTrace Mapping the GPS traces are mapped onto the basicgeographic maps through an advanced reverse geocodingprocess applied on a dynamic time-based window instead ofa single GPS pointStop-sign and traffic light information extraction a time-space pattern analysis on the resulting dataset extrapolates theposition of stop signs and traffic lightsIn the following we describe in detail each part of our system

2

A Map Preprocessing

The best representation of a digital map is a directed graphthat consists of a set of nodes (ie the intersections) andoriented edges (ie the road segments) This representationis indeed one of the most used by most map databases built torepresent the physical road layout for navigation and displaypurposes However the mere physical representation of thetopology is not always suitable for our feature extractionalgorithms We need a logical representation of the roadlayout in particular we need to redefine the intersections androads as atomic entities that can be directly used in the featureextraction For example intersections of separated roads1represented by two separated edges would be stored in adigital map database such as the census TIGER datbase [11]as two different nodes However urban planners and trafficengineers consider these intersections as a single intersectionand it are indeed regulated by a single traffic regulator (stopsign or traffic light) We store the digital maps in dictionariesusing a logical representation in which the road system isseen as a set of intersections and a set of oriented waysthat interconnects them intersections and ways are abstractedas atomic and may include several physical nodes and edgesrespectively While this representation can be obtained startingfrom any map database for the sake of this study we consid-ered two very well known databases TIGER Census [11] andOpenStreetMap [9]

B Trace SelectionOpenStreetMap provides a large amount of user generated

GPS traces However the wide variety of consumer GPSdevices results in a large number of traces that do not meetthe minimum requirements for our algorithm to extract usefulinformation from them The most impacting feature is thesampling rate Many of the traces that we analyzed areaffected by a highly variable sampling interval A variablesampling interval is most likely due to a poor GPS signal asmany commercial GPS devices do not output any positioninformation when it is not possible to obtain a fixing forthe current point For each trace in the dataset we computethe most common sampling interval (SI) we then define asampling band as shown in equation 1

12

SI lt SI lt32

SI (1)

We discard the traces that do not have at least 80 of thesamples contained within the sampling-band In essence weonly consider traces that have a fairly constant samplingrate Furthermore in order to extract user-behavior such asslow-down and standstill vehicles we need a relatively highfrequency sampling Longer sampling periods result in aninaccurate estimation of the speed of the vehicle Thereforewe discard all traces that have SI gt 4 Finally we also discardtraces that are not recorded by vehicles moving in theirnormal environment like traces recorded by pedestrians easilyrecognizable analyzing the maximum speed

1For separated road is intended a road where the lanes in one direction arephysically separated usually with a concrete island from the lanes on theopposite direction

C Trace MappingOnce the traces dataset is cleaned from unstable and unreli-

able traces as described above it is possible to map each GPStrace onto the underlying road system Our approach involvesan efficient reverse geocoding algorithm for the mapping ofeach single point of the trace and an advanced correctionalgorithm that takes into account the actual space-temporalrelationship among subsequent points of the same trace

The reverse geocoding algorithm maps each GPS point tothe closest road segment This process performs a lookup overthe entire street database to find the road segment that is closestto the current GPS coordinate We optimized the lookup pre-ordering all the streets and intersections by longitudelatitudeThere are several drawbacks in evaluating each point in a GPStrace as independent entity In particular

bull GPS traces are affected by bursty positioning errorsIn fact as the car moves the surrounding environment(buildings vegetation etc) might shade some of thesatellites causing a local positioning error [12] Theselocal errors might result in a wrong reverse geocodinglocation

bull In digital maps roads are represented as segments thuswith no information about their width Therefore inproximity of an intersection the position of a vehicletraveling on a straight road may be geocoded into thecrossing road [13][14] In fact the car would be movingaside of the segment representing the road is traveling onAt intersections the segment representing the crossingroad will be geometrically closer than the actual traveledroad as shown in figure 1

Fig 1 Common reverse geocoding failure approaching an intersection

For the above reasons it is crucial to exploit the temporal-space relationship between GPS samples belonging to thesame trace We implemented a reverse geocoding correctionalgorithm that considers both spatial and temporal informationOur algorithm scans each trace two times first forwardsand then backwards In both scans we take advantage ofthe previous reverse-geocoding results thus the first scanpropagates forward information about the past and the secondscan propagates backward information about the future Forthe sake of simplicity we here detail the forward scan asthe backward scan just operates inverting the indexes Thescanning algorithm takes different actions if the vehicle ismoving or if it is standstill we will discuss how to distinguishbetween these two cases in the next section

3

For each GPS sample our mapping procedure considersa set of candidate roads ranked by distance from the pointcoordinates If the vehicle is standstill the algorithm mapsthe current position on the same road of the previous positionif such road is in the current candidate road-set Instead if thevehicle is moving the algorithm will try to map the vehicleamong the candidate road-set on a street oriented as thevehicle movement giving highest priority to the road selectedby the previous match and decreasing priority accordingto the orientation to the streets connected to the previousmatch If none of these is a possible ie the candidate road-set does not contain either the previous match nor a roadconnected to it the algorithm returns null and enters theseek for temporary match mode In this mode the algorithmcompares the closest road to the road most closely oriented asthe vehicle movement If they are the same (ie same road)and remain the same for 3 consecutive samples the last matchbecomes effective and the seek for temporary match mode isover The remaining null mapping points will then be fixed bythe backward scan if possible

After the two scans we further process the trace to identifytemporal and path holes Indeed the traces might still havesome missing samples or errors In such cases if the twosides of the temporal hole are matched on roads that are notconnected to each other (indicating a path hole) we split thetrace into two new ones Finally we proceed removing allpoints that could not be matched on the road system andthe points that are matched onto a freeway In this study weare seeking stop signs and traffic lights that are not presenton freeways In particular if a trace contains freeway andnon freeway components (ie a trace moving on surfaceroad going on a freeway and then back on surface roads)we identify each component and split the trace accordinglyremoving the freeway points

D Trace Processing

The ultimate goal of this work is extracting the locationand timing of traffic regulators such as stop signs and trafficlights in a urban road system In order to achieve this goalwe need to initially classify the drivers behavior to build a setof abstractions that can be used in the identification of roadregulators In particular we initially focus on the followingbasic behaviors

bull Slow downs the vehicle speed decreasesbull Standstillsbull Turning choices at each intersection

Slow Downs For the purpose of this study we only con-sider slow downs that happen in proximity of an intersectionIn fact a vehicle can slow down for any number of reasonsbut it is only at intersections that is forced to slow down dueto road or traffic signals We estimate the speed of the vehicleat each trace point as the derivative of the movement betweenthe point itself and the next one We consider a vehicle asslowing down if its speed falls under 5ms If this slow downhappens within 50 meters of an intersection (that the vehiclewill traverse) we save it as related to the way the vehicle istraveling on and the upcoming intersection

Standstills Standstill situations recognition might seema very simple operation however the GPS error turns it in avery hard task As shown in figure 2 a vehicle that is standingstill will record a trace that is in continuous movement Forthis reason a simple evaluation of the speed is not sufficientas the speed will always be greater than zero Through aset of experiments we measured that a fixed GPS devicereturns geographical coordinates that are within 10 metersof its actual position We designed a heuristic to determinestandstill situation that takes into account GPS artifacts Thealgorithm is divided into two steps The first step recognizesall ldquocandidatesrdquo for standstill selecting the points of the tracecharacterized by a speed lower than 4ms In the second stepwe analyze these selected dataset using a time window of 10seconds If possible the time window is centered on the pointwe are analyzing This is not possible at the beginning and atthe end of the sequence of ldquocandidaterdquo points in which casewe use only the future or the past points respectively If thedistance between the two extremes of the time windows issmaller than 20 meters we consider the point as a standstill

Fig 2 Random movement recorded by the GPS for a standstill vehicle

Turning Choices The traces at this point are mappedon to the road network However there might still be someerrors in the sequence of ways that the trace is mapped onTo correct these errors we scan the trace considering eachway and intersection they are going through Considering setsof two intersections at a time we are able to remove wrongmappings In fact a vehicle can move from one intersectionto another using only the way that is connecting them Thenif any other mapping is present in between the traversal oftwo intersections we correct it and map those points onto theway that is connecting the two intersections By performingthis operation we can also extract statistics about the turningchoices at intersections (left right turn or straight)

E Feature Extraction

Slow-downs and a standstills defined in section II-D pavethe road for a further level of abstraction slow-downs andstandstill events are indeed the foundation on which we canbuild the traffic regulators identification algorithm The trafficregulators that we focus on are traffic lights and stop signsWe designed a set of heuristics derived from on-the-roadobservations and traffic regulations These heuristics take intoconsideration the fact that the GPS traces may contain artifacts

4

are affected by inaccuracy and have a finite sampling rateWe separated the two procedures for the extrapolation of stopsigns and traffic lights In particular for each intersection wefirst look for a stop sign regulator and only if the probabilityof being a stop sign is below threshold we check if it isregulated by a traffic light Due to missing or incomplete databoth heuristics might not be able to provide a clear result insuch cases we provide our ldquobest guessrdquo through heuristics thatconsider some basic traffic rules In the following we describeeach procedure in detail

Stop Signs Field observations and the traffic law sug-gest that all cars approaching an intersection with a stop signwill slow down stop for a few seconds and then start movingagain This assumption though is not confirmed by the GPStraces dataset A first explanation is the actual behavior ofdrivers most drivers do not exactly stop but they just slowdown in proximity of the intersection Secondly as discussedin the previous section even when the vehicle is actuallystandstill the GPS output may be misleading Finally thetrace sample rate implies a lower bound to the length of theperiod that we can recognize as a standstill For instancea sampling period of 1 second (that is actually the best wecould find in the dataset) only allows to detect stopping timesof at least 2 seconds For these reasons we use the slowdowns informations as defined in section II-D as indicationthat a vehicle is approaching an intersection with a stop Foreach intersection we analyze the GPS traces on its incomingways We mark a way as regulated by a potential stop if atleast SST of the traces slows down We define SST as theStop Sign Threshold and we found its optimal value to be80 (see section III-B) The obtained per-way informationis aggregated to include all the ways connected to the sameintersection In a crossroads regulated by stop signs at most2 ways could have the right of way while all the others haveto yield regardless of how many they are Therefore if all orall but one ways belonging to a given intersection are markedas potential stops the intersection is identified as stop-signregulated and all the ways marked with a potential stopsbecome actual stop signs If all but two ways are markedas potential stops the algorithms is not able to make animmediate decision and refers the intersection to the trafficlight recognition algorithm On the road observations showthat at the intersections between main roads and secondaryroads the traffic lights are usually enhanced with inductionloops that only switch the light to green on demand for thesecondary road As a result most vehicles approaching thiskind of intersection from the secondary road will have to stopTo make sure we are not confusing this kind of intersectionwith a 2-way stop case the decision is deferred

Traffic Lights The behavior of vehicles approaching atraffic light depends on if the light is actually red or greenAnalyzing the traces approaching a traffic light as expectedwe could observe that only a fraction of vehicles stops Thisfraction depends on the ratio between the green and the redlight duration an information very hard to have a priori Wemark a way as a potential traffic light if at least T LT of the

vehicles approaching the intersection comes to a stop2 Wedefine T LT as the Traffic Light Threshold and we found itsoptimal value to be 15 (see section III-B) If an intersectionis regulated by a traffic light all its incoming ways shouldcomply with this requirement However due to the limitedamount of traces that we have per each intersection and to therandomness of the driver behavior we consider an intersectionregulated by a traffic light if half plus one of the incomingways are marked as potential traffic light

A special case is a potential two-way stop intersection de-tailed in the previous paragraph The intersection is marked asregulated by a traffic light only if all the ways not classified bythe stop-sign identification mechanism satisfy the requirementsto be potential traffic light In all other cases we mark it as atwo-way stop intersection

Once the intersection is marked as a traffic light we thenproceed estimating the red light duration on each way For thispurpose we select the 95th percentile of the standstill timesas the duration of the red light thus discarding peculiar casessuch as left turns parkings etc as also suggested in [15]

Dealing with Missing Information If an intersectioncould not be classified either as regulated by a stop signor regulated by a traffic light it might be because someinformation is missing If actually one or two ways cominginto the intersection do not have traces traveling on them wereprocess the intersection Both the heuristics described aboveare executed one more time but this time the ways with missingvalues are not considered ie we evaluate the intersection asif the ways with missing traces would not exist The results ofthis further processing are marked as best guesses as we donot have sufficient information to provide a certain estimationHowever as shown in the evaluation section these guesses stillperform fairly well

III EVALUATION

A Characteristics of The Dataset

To assess the performance of our system we tested it ontwo sets of traces traces extracted from OpenStreetMap and aset of traces that we collected driving vehicles equipped withGPS for several days in the greater Los Angeles Area

OpenStreetMap OpenStreetMap is a free editable mapdatabase of the whole world It is built by the contributionof users from all over the world Conceptually users uploadtheir GPS traces and as an option they can edit the roads theyhave traveled on Afterwards system administrators check theresult of the editing and validate it This approach guaranteesan always up-to-date map at virtually no cost In addition allthe GPS traces that are uploaded are available to other users fordownload At the time of writing there were 307000 traces onthe database but this number keeps growing as more and moreusers join the community In Table I are reported the number oftraces that traverse some of the regions where OpenStreetMapusers are most active in USA and Germany Indeed Germanyis the most active country and its digital map is for the mostpart created by users In USA instead the map is a derivationof the TIGER database with updates performed by the users

2For Coming to a Stop we refer to the standstill as defined in section II-D

5

Region Number of TracesOberbayern Germany 37795Los Angeles County California 2584Marion County Indiana 1133Orange County California 876Santa Clara County California 848Travis County Texas 845

TABLE IDISTRIBUTION OF OPENSTREETMAP TRACES OVER THE MOST ACTIVE

REGIONS OF USA AND GERMANY

065

07

075

08

085

09

095

1

0 10 20 30 40 50

Por

tion

of In

ters

ectio

n

Number of traces

Los Angeles County CAMarion County IN

Oberbayern GermanyOrange County CA

Fig 3 Cumulative distribution of intersections as a function of the numberof traces traversing them

Although the number of traces is fairly large their geo-graphical coverage is far from global In figure 3 we show thecumulative distribution of intersections as a function of thenumber of traces that traverses them We can observe that 95of the intersections are traversed by less than 10 traces Thesenumbers get worse if we consider how the traces traverse eachintersection In figure 4 we show the number of intersectionsthat are traversed by a minimum number of traces on all (N)incoming ways all but one (Nminus1) and all but two (Nminus2) ofthe incoming ways for the region of Oberbayern Germany Wecan observe that the number of intersections that have at least5 traces on all ways is close to 0 Furthermore the data showthat most of the intersections are largely traversed through thesame way that is usually a primary road To partially copewith this our algorithms provide a best guess performing ananalysis with missing data

0 2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

0 2 4 6 8 10 12 14

Por

tion

of In

ters

ectio

n

Number of traces

N WaysN - 1 WaysN - 2 Ways

Fig 4 Distribution of intersections as a function of the number of tracestraversing them on varying number of ways for the Oberbayern region

The CALI Dataset As discussed in the previous para-graph using only the OpenStreetMap traces there are veryfew intersections that are covered by a sufficient amount oftraces to be considered Moreover on these few intersectionsthe traces are unevenly distributed as most traces traverse

the main roads and just a few traverse secondary roads Forevaluation we had to know the ground truth of GPS traceson traffic regulators Thus we decided to collect a new morecomplete dataset of traces to find what is the lowest amountof information that our system needs to provide confidentresults We collected GPS traces in two different areas ofcalifornia The first area3 consists of a 800x800 meters squarethat includes 28 intersections of which 3 regulated by a trafficlight and 25 by stop signs mdashwe label this dataset as CALI-IThe second area4 consists just of 4 subsequent intersections allregulated by traffic lights mdash this dataset is labeled as CALI-II We designed our mobility pattern in such a way that thetraces would be evenly distributed over all the intersectionsin the two areas In order to minimize potential interferencein the data acquisition we asked some of our students todrive the instrumented vehicles in three different week dayson October 26-29 2009 and November 26 2009 To capturean extensive time window the vehicles have been driven from7am to 1230pm and from 2pm to 6pm in all the experiments

B Experimental Results

In this section we present the results of the experimentalevaluation of the proposed system We first present an anal-ysis on the sensitivity of the proposed system to the setupparameters We then evaluate the robustness of the systemagainst missing data Finally we test the overall performanceof the system on the OpenStreetMap database for both trafficregulator recognition and estimation of the red light durationWe present the performance of our system only for the greaterLos Angeles Area Nevertheless we tested our algorithms onother geographical areas but we were not able to verify ourresults and therefore to present them

Sensitivity to Setup Parameters The proposed systeminvolves the use of a set of parameters that determine theoverall performance In this section we present an analysis ofthe impact of these parameters on the performance provingthat the values chosen for these parameters are actually theoptimal ones In the following we focus on the two mostimpacting parameters

bull Stop Sign Threshold (SST ) the threshold of minimumportion of slow downs needed to mark a way as a wayregulated by a potential stop (see section II-E)

bull Traffic Light Threshold (T LT ) the threshold of minimumportion of standstills needed to mark a way as a potentialtraffic light (see section II-E)

We selected a set of 61 intersections in the Los Angelescounty Among these intersections 25 are regulated by stopsigns and 36 are regulated by traffic lights We first investigatethe impact of SST by running our algorithm varying SST intthe interval [051] and fixing T LT = 015 Figure 5 showsthe portion of correctly recognized traffic regulators using thissetup In particular we show the correctly recognized stopsigns traffic lights and the total aggregate success ratio forall intersections together with the portion of intersection for

3 Centered at 342prime4815primeprimeN 11826prime292primeprimeW4 Centered at 341prime465primeprimeN 11829prime3548primeprimeW

6

013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

0513 05513 0613 06513 0713 07513 0813 08513 0913 09513 113 Por$on

13 of13 R

ecognized13 Re

gulators13

Stop13 Sign13 Threshold13

Stop13 Signs13 Traffic13 Lights13 Total13 No13 Response13

Fig 5 Sensitivity to Setup Parameters Portion of correctly recognizedregulators for varying SST with fixed T LT = 15

which the system could not determine the traffic regulator Wecan observe that for low values of SST the system correctlyrecognizes all stop signs but at the same time performspoorly on traffic lights This is due to the fact that the systemprocesses as traffic light candidates only the intersections thatare not marked as regulated by stop signs Then for low SSTmany intersections will be erroneously marked as regulatedby stop signs and not considered as traffic lights For highervalues of SST the system starts failing to recognize stopsigns Consequently a higher portion of traffic light is correctlyextrapolated but at the same time the portion of intersectionsthat can not be assigned any regulator grows as well We findthe best trade-off at SST = 08

013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

00513 0113 01513 0213 02513 0313 03513 0413 04513 0513 Por$on

13 of13 R

ecognized13 Re

gulators13

Traffic13 Light13 Threshold13


Fig 6 Sensitivity to Setup Parameters Portion of correctly recognizedregulators for varying T LT with fixed SST = 80

We investigate also the impact of T LT by running ouralgorithm varying T LT int the interval [00505] and fixingSST = 08 Figure 6 shows the portion of correctly identifiedregulators as a function of T LT In particular we present theportion of correctly identified stop signs and traffic lights theresulting overall performance on all regulators and the portionof intersections that could not be assigned either a stop signor a traffic light We can observe a trend that is inverse tothe one related to SST Indeed for a growing T LT the systemidentifies correctly more and more stop signs and always lesstraffic lights At the same time the portion of intersection that

could not be assigned any regulator increases We can find thebest trade-off at T LT = 015

075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of S

top

Sig

ns R

ecog

nize

d

Number of traces


(a)

02

03

04

05

06

07

08

09

1

2 3 4 5 6 7 8 9 10Por

tion

of T

raffi

c Li

ghts

Rec

ogni

zed

Number of traces


(b)

06

065

07

075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of R

egul

ator

s R

ecog

nize

d

Number of traces


(c)

0

02

04

06

08

1

12

14

2 3 4 5 6 7 8 9 10

Val

ue o

f the

Thr

esho

ld

Number of traces

Stop Discretized ThresholdStop Ideal Threshold

Traffic Light Discretized ThresholdTraffic Light Ideal Threshold

(d)

Fig 7 Robustness to missing data results on the CALI data set Averagenumber of successful extrapolations for intersections regulated by (a) stopsigns (b) by traffic lights and (c) for all intersections as a function of thenumber of traces used on each incoming way for all (N) all but one (Nminus1)all but two (Nminus2) ways (d) Value of the threshold that is used in the discretecase for both traffic lights and stop signs as a function of the number of tracesused on each incoming way

Robustness to missing data We tested the robustnessof the system against missing data on all the intersectionsin the CALI dataset We consider only intersections that aretraversed by at least 10 traces on all incoming ways All 32intersections in the CALI dataset fulfill this requirement We

7

performed the tests using information on all all but one and allbut two ways coming into the intersection In all three cases werun the algorithm 100000 times randomly selecting a subsetof incoming ways In each run we use a fixed number ofrandomly chosen traces on each selected way Figure 7 showsthe results obtained on the CALI dataset In particular figure7(a) shows the average number of successful extrapolations forintersections regulated by stop signs The system successfullyrecognizes the traffic regulator in more than 90 of the casesif we use all or all but one incoming ways If we use all buttwo ways the performance drops but it is still around 80that is a very good result as we have information on onlyhalf of the incoming ways (all intersections are 4-way) It isremarkable that the performance is better in the case of all butone ways than the performance obtained using all ways This iseasily explainable if we consider that when one way is missinginformation that way is not considered by the algorithm thusthe traces that do not slow down on that way are not taken intoaccount Figure 7(b) shows the average number of successfulextrapolations for intersections regulated by traffic lights Wecan observe a drop in the success ratio with 3 and 8 tracesThis drop is due to the discretization of SST As shown infigure 7(d) the actual value of SST in the discrete case isdifferent from the optimal 80 In particular for 3 traces SSTdiscretized is equal to 067 Such a low SST results in a verygood performance for the stop signs but a poor performancefor traffic lights as shown in figure 5 The same conclusioncan be drawn for the case of 8 traces Figure 7(c) shows theaggregated result of the success ratio over all the intersectionsThe system can correctly extrapolate the traffic regulator inmore than 90 of the cases with as low as 5 traces on eachway

Overall Performance Evaluation Having proved that theproposed system can cope with missing data we evaluate itsperformance over the complete dataset In the area relative tothe CALI dataset we can achieve a 100 accuracy In fact thesystem can correctly recognize the 25 intersections regulatedby stop signs and the 7 intersections regulated by trafficlights In addition the system provides ldquobest guessesrdquo on 17intersections surrounding the considered area that actuallymatch the real traffic regulators of such intersections Figure8 shows a snapshot of the map of area 1 An additionallayer shows the type of traffic regulator assigned to eachintersection white hexagons represent stop signs and purplecircles represent traffic lights

As discussed in section III-A in the OpenStreetMap datasetthere are very few intersection that offer a good set oftraces For this reason we have been forced to select the 65intersections for which we have at least 5 traces on all butone incoming ways Out of these 65 intersections the systemrecognizes correctly 51 of them leading to a success rate of78 The lower performance is due to the missing data in oneway and the overall quality of the GPS traces on the remainingways

Red Light Duration Estimation Finally we present theresults of the evaluation of the red light duration In tableII we report the measured duration of the red light and itsstandard deviation together with the estimate provided by our

Fig 8 Traffic regulators extrapolation from the CALI dataset matches reality100

algorithm Our algorithm estimates the red light duration asthe 95th percentile of the stop time of all traces incomingfrom each particular way For this reason the estimate reflectsa higher bound to the red light duration In addition we canobserve that the red light duration is always overestimatedby about 5 seconds This overestimation is due to the wayour algorithm recognizes the vehicles stops (see section II-D)Since we are using a 10 seconds window the vehicle could stillbe slowing down for the first part of the time window and stopin the second part Therefore the stop times are likely to beoverestimated and consequently the red light duration This iseasily fixable introducing a simple correction offset to accountfor the error introduced by the time-window estimator

Intersection (Direction) Average Time (σ) EstimateA (EW WE) 1775(52) 24A (NS SN) 4248(598) 51B (EW WE) 2436(044) 29B (NS SN) 3352(029) 36C (EW WE) 2249(526) 31C (NS SN) 2252(748) 27D (EW WE) 2235(37) 29D (NS SN) 2356(164) 28

TABLE IIAVERAGE RED LIGHT DURATION MEASURED 10 TIMES COMPARED TO THE

ESTIMATE OF OUR ALGORITHM σ IS THE STANDARD DEVIATION

IV RELATED WORK

Feature extraction from GPS traces has been researchedsince the the first GPS devices hit the markets aiming at theextraction of information on user habits as well as environmen-tal features most the research contributions however focus onTraffic status and lane extraction techniques

For example in [16] and [17] the authors present datamining algorithms for the refinement and enhancement ofexisting map information Information such as number of lanesand intersection structure could be useful for safety applicationsuch as aided lane departure or collision prevision Howeverthis information represents high-costlow-gain additional detail

8

and the mapping companies are not willing to invest onits gathering Therefore the use of low-cost collective GPStraces seems to be the best solution In [18] the authorspropose an automated procedure for the inference of the mapinformation itself Such procedures might be very useful inorder to avoid human intervention in the creation of themaps Similar approach have been used by popular sites suchOpenStreetMaps[9]

Recently many research studies focused on extracting theTraffic Status GPS traces In particular in [15] the authorsidentify congestion situation on the road using GPS traces anda model based on historical data Similarly in [19] the authorsexploit GPS traces to infer the travel behaviors of swedishdrivers as first step to predict the traffic conditions on theroad In [20] the authors propose the use of roadside units todetect the road congestion and a GPSGSM assisted navigatorto re-route the traffic in less congested areas

In [21] the authors present a technique for the associationof traffic controls to the driver behavior The purpose of thiswork is very similar to the one we are presenting in this paperHowever the approach is significantly different It involves theuse of machine learning techniques and specifically neuralnetworks thus requiring an initial training and limiting theability to perform on-line recognition Changes in the datasetstructure require a new training since the authors did notestablish a formal relationship between the driver behaviorand the various factors taken as inputs Finally the authors donot look at traffic light cycles and performed both the trainingand test phase using a small ad hoc dataset Our work incontrast perform an analysis based on the trace key featuresand exploits them in discovering traffic regulators location andcharacteristics additionally our system has been designed towork on-line as well as off-line the data can be indeed updatedwhile the vehicles are still on the road Finally our algorithmsallows to dynamically exploit changes in the data structure(ie new stops new traffic in another way etc) We testedour algorithms against a large public dataset of GPS traceson different cities in different continents and the results showthat the proposed methods performs well in the majority ofscenarios

V CONCLUDING REMARKS

In this work we presented a simple yet effective method toextract location and timing for traffic regulators such as stop-signs and traffic lights with just 5 and 7 traversals respectivelyThe evaluation shows that our methods perform well also withmissing data identifying with an accuracy higher than 85 thetype and timing of the traffic regulator even if data is missingon one of the ways belonging to the intersection

We learnt that the quality of the GPS dataset is key toperform feature extraction on mobile vehicles In particularvariations of the sampling rate and the sampling frequencyitself are key to infer useful information from cars on theroad

This paper provides a proof of concept for the realizationof a system for processing large amount of real time tracesextracting useful information The location of stop signs and

traffic lights are only an example of useful informationthey could be extended to commuting times real time trafficupdates turning statistics etc

REFERENCES

[1] ABI-Research Handset-based navigation for pedestrians mdashtrends mar-ket players and forecasts Technical report ABI Research Technologyand Market Intelligence Company September 2009

[2] ABI-Research Gps-enabled handsets mdashmarket technology and businessfactors driving the growth of gps a-gps and hybrid positioning formobile handset location Technical report ABI Research Technologyand Market Intelligence Company March 2009

[3] ABI-Research Sport and fitness gps devices and applications mdashgps watches cycling computers golf rangefinders and smartphoneapplications Technical report ABI Research Technology and MarketIntelligence Company September 2009

[4] Google inc Google maps navigation (beta) httpwwwgooglecommobilenavigation November 2009

[5] Dash Express inc mdash dash driver network - trutraffic httpwwwdashnetproduct

[6] NAVITEQ inc mdash company site httpwwwnavteqcom November2009

[7] TELEATLAS inc mdash whitepapers Hd traffic - report n ta-ct031959httpwwwteleatlascom November 2009

[8] TomTom inc mdash tomtom plus services httpwwwtomtomcomplusNovember 2009

[9] M Haklay and P Weber OpenStreetMap user-generated street mapsIEEE Pervasive Computing 7(4)12ndash18 2008

[10] ABI-Research Traffic information for navigation systems mdashservicesand infrastructure for real-time historical and predictive traffic dataTechnical report ABI Research Technology and Market IntelligenceCompany September 2007

[11] Census Topologically Integrated Geographic Encoding and Referencingsystem httpwwwcensusgovgeowwwtiger Last Accessed in April2010

[12] S Savasta M Pini and G Marfia Performance assessment of a com-mercial gps receiver for networking applications In Proceedings IEEEInternational Consumer Communications and Networking ConferenceLas Vegas NV USA pages 613ndash617 2008

[13] DL Zimmerman X Fang S Mazumdar and G Rushton Modelingthe probability distribution of positional errors incurred by residentialaddress geocoding International Journal of Health Geographics 6(1)12007

[14] N Bicocchi G Castelli M Mamei A Rosi and F ZambonelliSupporting location-aware services for mobile users with the where-abouts diary In Proceedings of the 1st international conference onMOBILe Wireless MiddleWARE Operating Systems and ApplicationsICST (Institute for Computer Sciences Social-Informatics and Telecom-munications Engineering) ICST Brussels Belgium Belgium 2008

[15] J Yoon B Noble and M Liu Surface street traffic estimation InProceedings of the 5 th international conference on Mobile systemsapplications and services 2007

[16] S Rogers and S Schroedl Creating and evaluating highly accuratemaps with probe vehicles Intelligent Transportation Systems IEEEpages 125ndash130

[17] S Schroedl K Wagstaff S Rogers P Langley and C Wilson MiningGPS traces for map refinement Data mining and knowledge Discovery9(1)59ndash87 2004

[18] S Edelkamp and S Schrodl Route planning and map inference withglobal positioning traces Lecture Notes In Computer Science 2598128ndash151 2003

[19] S Schrdquoonfelder KW Axhausen N Antille and M Bierlaire Exploringthe potentials of automatically collected GPS data for travel behaviouranalysis-a swedish data source GI-Technologien frdquour Verkehr und Logistik 13155ndash179 2002

[20] J Fawcett and P Robinson Adaptive routing for road traffic IEEEComputer Graphics and Applications 20(3)46ndash53 2000

[21] CA Pribe and SO Rogers Learning to associate observed driverbehavior with traffic controls Transportation Research Record Journalof the Transportation Research Board 1679(-1)95ndash100 1999

[22] E Inc ESRI Shapefile Technical Description httpwwwesricom LastAccessed in April 2010

Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation

Characteristics of The Dataset

Experimental Results

Related Work

Concluding Remarks

References

2

A Map Preprocessing

The best representation of a digital map is a directed graphthat consists of a set of nodes (ie the intersections) andoriented edges (ie the road segments) This representationis indeed one of the most used by most map databases built torepresent the physical road layout for navigation and displaypurposes However the mere physical representation of thetopology is not always suitable for our feature extractionalgorithms We need a logical representation of the roadlayout in particular we need to redefine the intersections androads as atomic entities that can be directly used in the featureextraction For example intersections of separated roads1represented by two separated edges would be stored in adigital map database such as the census TIGER datbase [11]as two different nodes However urban planners and trafficengineers consider these intersections as a single intersectionand it are indeed regulated by a single traffic regulator (stopsign or traffic light) We store the digital maps in dictionariesusing a logical representation in which the road system isseen as a set of intersections and a set of oriented waysthat interconnects them intersections and ways are abstractedas atomic and may include several physical nodes and edgesrespectively While this representation can be obtained startingfrom any map database for the sake of this study we consid-ered two very well known databases TIGER Census [11] andOpenStreetMap [9]

B Trace SelectionOpenStreetMap provides a large amount of user generated

GPS traces However the wide variety of consumer GPSdevices results in a large number of traces that do not meetthe minimum requirements for our algorithm to extract usefulinformation from them The most impacting feature is thesampling rate Many of the traces that we analyzed areaffected by a highly variable sampling interval A variablesampling interval is most likely due to a poor GPS signal asmany commercial GPS devices do not output any positioninformation when it is not possible to obtain a fixing forthe current point For each trace in the dataset we computethe most common sampling interval (SI) we then define asampling band as shown in equation 1

12

SI lt SI lt32

SI (1)

We discard the traces that do not have at least 80 of thesamples contained within the sampling-band In essence weonly consider traces that have a fairly constant samplingrate Furthermore in order to extract user-behavior such asslow-down and standstill vehicles we need a relatively highfrequency sampling Longer sampling periods result in aninaccurate estimation of the speed of the vehicle Thereforewe discard all traces that have SI gt 4 Finally we also discardtraces that are not recorded by vehicles moving in theirnormal environment like traces recorded by pedestrians easilyrecognizable analyzing the maximum speed

1For separated road is intended a road where the lanes in one direction arephysically separated usually with a concrete island from the lanes on theopposite direction

C Trace MappingOnce the traces dataset is cleaned from unstable and unreli-

able traces as described above it is possible to map each GPStrace onto the underlying road system Our approach involvesan efficient reverse geocoding algorithm for the mapping ofeach single point of the trace and an advanced correctionalgorithm that takes into account the actual space-temporalrelationship among subsequent points of the same trace

The reverse geocoding algorithm maps each GPS point tothe closest road segment This process performs a lookup overthe entire street database to find the road segment that is closestto the current GPS coordinate We optimized the lookup pre-ordering all the streets and intersections by longitudelatitudeThere are several drawbacks in evaluating each point in a GPStrace as independent entity In particular

bull GPS traces are affected by bursty positioning errorsIn fact as the car moves the surrounding environment(buildings vegetation etc) might shade some of thesatellites causing a local positioning error [12] Theselocal errors might result in a wrong reverse geocodinglocation

bull In digital maps roads are represented as segments thuswith no information about their width Therefore inproximity of an intersection the position of a vehicletraveling on a straight road may be geocoded into thecrossing road [13][14] In fact the car would be movingaside of the segment representing the road is traveling onAt intersections the segment representing the crossingroad will be geometrically closer than the actual traveledroad as shown in figure 1

Fig 1 Common reverse geocoding failure approaching an intersection

For the above reasons it is crucial to exploit the temporal-space relationship between GPS samples belonging to thesame trace We implemented a reverse geocoding correctionalgorithm that considers both spatial and temporal informationOur algorithm scans each trace two times first forwardsand then backwards In both scans we take advantage ofthe previous reverse-geocoding results thus the first scanpropagates forward information about the past and the secondscan propagates backward information about the future Forthe sake of simplicity we here detail the forward scan asthe backward scan just operates inverting the indexes Thescanning algorithm takes different actions if the vehicle ismoving or if it is standstill we will discuss how to distinguishbetween these two cases in the next section

3



D Trace Processing









4








III EVALUATION





5




065

07

075

08

085

09

095

1

0 10 20 30 40 50

Por

tion

of In

ters

ectio

n

Number of traces





0 2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

0 2 4 6 8 10 12 14

Por

tion

of In

ters

ectio

n

Number of traces












6

013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

0513 05513 0613 06513 0713 07513 0813 08513 0913 09513 113 Por$on

13 of13 R

ecognized13 Re

gulators13





013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

00513 0113 01513 0213 02513 0313 03513 0413 04513 0513 Por$on

13 of13 R

ecognized13 Re

gulators13






075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of S

top

Sig

ns R

ecog

nize

d

Number of traces


(a)

02

03

04

05

06

07

08

09

1

2 3 4 5 6 7 8 9 10Por

tion

of T

raffi

c Li

ghts

Rec

ogni

zed

Number of traces


(b)

06

065

07

075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of R

egul

ator

s R

ecog

nize

d

Number of traces


(c)

0

02

04

06

08

1

12

14

2 3 4 5 6 7 8 9 10

Val

ue o

f the

Thr

esho

ld

Number of traces



(d)



7










IV RELATED WORK



8









REFERENCES























Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation



Related Work

Concluding Remarks

References

3



D Trace Processing









4








III EVALUATION





5




065

07

075

08

085

09

095

1

0 10 20 30 40 50

Por

tion

of In

ters

ectio

n

Number of traces





0 2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

0 2 4 6 8 10 12 14

Por

tion

of In

ters

ectio

n

Number of traces












6

013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

0513 05513 0613 06513 0713 07513 0813 08513 0913 09513 113 Por$on

13 of13 R

ecognized13 Re

gulators13





013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

00513 0113 01513 0213 02513 0313 03513 0413 04513 0513 Por$on

13 of13 R

ecognized13 Re

gulators13






075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of S

top

Sig

ns R

ecog

nize

d

Number of traces


(a)

02

03

04

05

06

07

08

09

1

2 3 4 5 6 7 8 9 10Por

tion

of T

raffi

c Li

ghts

Rec

ogni

zed

Number of traces


(b)

06

065

07

075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of R

egul

ator

s R

ecog

nize

d

Number of traces


(c)

0

02

04

06

08

1

12

14

2 3 4 5 6 7 8 9 10

Val

ue o

f the

Thr

esho

ld

Number of traces



(d)



7










IV RELATED WORK



8









REFERENCES























Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation



Related Work

Concluding Remarks

References

4








III EVALUATION





5




065

07

075

08

085

09

095

1

0 10 20 30 40 50

Por

tion

of In

ters

ectio

n

Number of traces





0 2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

0 2 4 6 8 10 12 14

Por

tion

of In

ters

ectio

n

Number of traces












6

013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

0513 05513 0613 06513 0713 07513 0813 08513 0913 09513 113 Por$on

13 of13 R

ecognized13 Re

gulators13





013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

00513 0113 01513 0213 02513 0313 03513 0413 04513 0513 Por$on

13 of13 R

ecognized13 Re

gulators13






075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of S

top

Sig

ns R

ecog

nize

d

Number of traces


(a)

02

03

04

05

06

07

08

09

1

2 3 4 5 6 7 8 9 10Por

tion

of T

raffi

c Li

ghts

Rec

ogni

zed

Number of traces


(b)

06

065

07

075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of R

egul

ator

s R

ecog

nize

d

Number of traces


(c)

0

02

04

06

08

1

12

14

2 3 4 5 6 7 8 9 10

Val

ue o

f the

Thr

esho

ld

Number of traces



(d)



7










IV RELATED WORK



8









REFERENCES























Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation



Related Work

Concluding Remarks

References

5




065

07

075

08

085

09

095

1

0 10 20 30 40 50

Por

tion

of In

ters

ectio

n

Number of traces





0 2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

0 2 4 6 8 10 12 14

Por

tion

of In

ters

ectio

n

Number of traces












6

013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

0513 05513 0613 06513 0713 07513 0813 08513 0913 09513 113 Por$on

13 of13 R

ecognized13 Re

gulators13





013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

00513 0113 01513 0213 02513 0313 03513 0413 04513 0513 Por$on

13 of13 R

ecognized13 Re

gulators13






075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of S

top

Sig

ns R

ecog

nize

d

Number of traces


(a)

02

03

04

05

06

07

08

09

1

2 3 4 5 6 7 8 9 10Por

tion

of T

raffi

c Li

ghts

Rec

ogni

zed

Number of traces


(b)

06

065

07

075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of R

egul

ator

s R

ecog

nize

d

Number of traces


(c)

0

02

04

06

08

1

12

14

2 3 4 5 6 7 8 9 10

Val

ue o

f the

Thr

esho

ld

Number of traces



(d)



7










IV RELATED WORK



8









REFERENCES























Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation



Related Work

Concluding Remarks

References

6

013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

0513 05513 0613 06513 0713 07513 0813 08513 0913 09513 113 Por$on

13 of13 R

ecognized13 Re

gulators13





013 0113 0213 0313 0413 0513 0613 0713 0813 0913 113

00513 0113 01513 0213 02513 0313 03513 0413 04513 0513 Por$on

13 of13 R

ecognized13 Re

gulators13






075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of S

top

Sig

ns R

ecog

nize

d

Number of traces


(a)

02

03

04

05

06

07

08

09

1

2 3 4 5 6 7 8 9 10Por

tion

of T

raffi

c Li

ghts

Rec

ogni

zed

Number of traces


(b)

06

065

07

075

08

085

09

095

1

2 3 4 5 6 7 8 9 10

Por

tion

of R

egul

ator

s R

ecog

nize

d

Number of traces


(c)

0

02

04

06

08

1

12

14

2 3 4 5 6 7 8 9 10

Val

ue o

f the

Thr

esho

ld

Number of traces



(d)



7










IV RELATED WORK



8









REFERENCES























Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation



Related Work

Concluding Remarks

References

7










IV RELATED WORK



8









REFERENCES























Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation



Related Work

Concluding Remarks

References

8









REFERENCES























Introduction

System Description

Map Preprocessing

Trace Selection

Trace Mapping

Trace Processing

Feature Extraction

Evaluation



Related Work

Concluding Remarks

References

Enhanching in Vehicle Digital Maps via GPS...

Documents

Transcript of Enhanching in Vehicle Digital Maps via GPS...