Learning Significant Locations and Predicting User ...Learning Significant Locations and Predicting...

60
Georgia Tech Learning Significant Locations and Predicting User Movement with GPS Daniel Ashbrook and Thad Starner Contextual Computing Group http://www.cc.gatech.edu/ccg College of Computing, GVU Center Georgia Institute of Technology Atlanta, GA USA

Transcript of Learning Significant Locations and Predicting User ...Learning Significant Locations and Predicting...

GeorgiaTech

Learning Significant Locations and Predicting

User Movement with GPS

Daniel Ashbrook and Thad Starner

Contextual Computing Grouphttp://www.cc.gatech.edu/ccg

College of Computing, GVU CenterGeorgia Institute of Technology

Atlanta, GA USA

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Motivation

• Location is a very common form of context– easy to collect

– infer other pieces of context

• Most applications rely only on user’s current location

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Motivation

• How can we improve location context?• Look for patterns of movement and learn

user’s daily schedule– predict where user is going based on where

user has been

• Goal: computer can act as agent– offer suggestions at appropriate times– enable collaboration between colleagues

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Applications

• Potential applications for location prediction

• Single–user applications– system only knows about one user’s

movements

• Multi–user applications– system combines predictions for several

people

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Applications

• Single user: Pre–emptive Reminders– remind user at an appropriate time

– example: library book•try to determine if user will pass library today

•only then remind user to take book before leaving home

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Applications

• Single user: Wireless caching– wireless networks often unavailable

•lack of infrastructure

•radio shadows (buildings, subway)

– hide lack of connectivity by caching

– predict when caching will be insufficient•warn user

•suggest alternative routes

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Applications

• Single user: Wireless caching– cache even when network is available

•transmission power can increase with 4th power of distance in complex environments (i.e., city)

•cost can vary with network used, time of day

– prediction can allow savings•of battery power

•of money

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Applications

• Multi–user: Enabling collaboration– “Will I see Bob today?”

•compare the user’s and Bob’s schedules

•give yes or no answer

– Scheduling many–person meetings•find when most people are free and suggest a time

•also discover most convenient place to meet

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Applications

• Multi–user: Favor exchange– remotely coordinate favor trading

– example: FedEx/UPS package trading

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Related Work

• Bhattacharya — cell phone prediction

• Davis — prediction with ad–hoc networks

• Kortuem — Walid

• Marmasse — comMotion

• Liu — predictively caching network architecture

• Orwant — Doppelgänger

• Sparacino — Museum Wearable

• Wolf — travel diaries

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Hardware

• Garmin GPS model 35-LVS

• GeoStats data logger– 1 MPH recording limit

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Hardware

• Preliminary data collected in Atlanta Sep-Dec 2001

• Data currently being collected from multiple users in Zürich, Switzerland

Preliminary data—Atlanta, GA

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Preliminary implementation– finds points of possible significance

– creates probabilistic model of user’s movements•Markov model

– using model, simple queries are possible:•“The user is at home. Where will she go next?”

•“How likely is the user to visit the grocery store today?”

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Markov model– collection of nodes

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Markov model– collection of nodes

– transitions between nodes

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Markov model– collection of nodes

– transitions between nodes

– each transition has a probability of occurring

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Markov model– collection of nodes

– transitions between nodes

– each transition has a probability of occurring

– can also have self–

transitions

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Our Markov model– nodes are significant

locations

– transitions are trips between those locations

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Significance– how do we determine if a particular GPS

coordinate might have some meaning to the user?

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Places– logged GPS

coordinates with more than time t of “resting time”

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to pick t ?

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to pick t ?– try lots of values

– graph number of places found for each value

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to pick t ?– try lots of values

– graph number of places found for each value

– but relationship is nearly linear!

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to pick t ?– try lots of values

– graph number of places found for each value

– but relationship is nearly linear!

– so we pick an arbitrary value: t = 10 minutes

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

All data

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

All data Only places,with t = 10m

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Locations– problem: too many places

•GPS inaccuracy

•different exit points from buildings

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Locations– problem: too many places

•GPS inaccuracy

•different exit points from buildings

– solution: cluster places to form locations•all places within a radius r of a particular place

form a single location

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

All data Only places,with t = 10m

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

All data Only locationsOnly places,with t = 10m

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to pick radius r ?

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to pick radius r ?– too large value

• too few clusters• unrelated places

together

– too small value• too many clusters

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to pick radius r ?– too large value

• too few clusters• unrelated places

together

– too small value• too many clusters

• Solution:– try various values for r– find knee in graph

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Clustering places into locations– pick one place (•)

– find all places within radius r (•)

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Clustering places into locations– pick one place (•)

– find all places within radius r (•)

– find the mean of those places (x)

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Clustering places into locations– pick one place (•)

– find all places within radius r (•)

– find the mean of those places (x)

– repeat with x as the new center

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Clustering places into locations– pick one place (•)

– find all places within radius r (•)

– find the mean of those places (x)

– repeat with x as the new center

– continue until the mean stops

changing

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Clustering places into locations– pick one place (•)

– find all places within radius r (•)

– find the mean of those places (x)

– repeat with x as the new center

– continue until the mean stops

changing

– start again with another place– repeat until no more places

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Sublocations– problem: subsuming

smaller-scale paths

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Sublocations– problem: subsuming

smaller-scale paths– solution: create

sublocations within larger clusters

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to determine if sublocations exist?

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How to determine if sublocations exist?– use same knee &

graph algorithm on each location

– if no knee exists, not enough points to form sublocations

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Sublocations can have multiple scales– Country level

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Sublocations can have multiple scales– Country level

– State level

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Sublocations can have multiple scales– Country level

– State level

– City level

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Sublocations can have multiple scales– Country level

– State level

– City level

– Campus level

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Prediction– each location gets a unique ID

•user may provide a unique name for each locationsuch as “home” or “work”

– replace each place in original list with ID•result: list of locations that were visited, in the

order that they were visited

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• For each location– count number of visits

to each other location

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• For each location– count number of visits

to each other location– count total number of

visits to other locations

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• For each location– count number of visits

to each other location– count total number of

visits to other locations

– divide to get probability of transition

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• For each location– count number of visits

to each other location– count total number of

visits to other locations

– divide to get probability of transition

– result: Markov model for each location

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• People don’t move randomly!– 23 locations total, so chance of A→? = 1/22

= 4.5%

– measured ratio CRB→Home = 16/77 = 21%

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• Orders of Markov model– 1st order A → ?

•a given state’s transition probabilities only depend on that state

– 2nd order B → A → ?•a given state’s transition probabilities depend on

that state and the previous state

– and so on…

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• First order predictions

4%

6%

6%

10%

13%

13%

21%

% Chance

3/77CRB → Taco Bell

5/77CRB → 10th/14th St.

5/77CRB → GA400

8/77CRB → Grocery store

10/77CRB → Jake’s Ice Cream

10/77CRB → Hardware store

16/77CRB → Home

ProbabilityMovement

••• ••

• •••

Random chance: 1/22 = 4.5%

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

7%

7%

14%

21%

21%

70%

% Chance

1/14Home → CRB → 10th/14th St.

1/14Home → CRB → GA400

2/14Home → CRB → Jake’s Ice Cream

3/14Home → CRB → Grocery store

3/14Home → CRB → Home

14/20Home → CRB

ProbabilityMovement

0%0/14Home → CRB → Hardware store

• Second order predictions

••• ••

• •••

Random chance: 1/22 = 4.5%

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Software

• How many orders to use?– sequence of 141 locations visited

– 23 total unique locations

86

82

73

56

Observed unique paths

137

138

139

140

Approx. expected unique paths

23 * 224 = 5,387,8884

23 * 223 = 244,9043

23 * 222 = 11,1322

23 * 221 = 5061

PermutationsOrder

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Future Work

• Collect more data– Georgia Tech students in Zürich & Atlanta

• Investigate other sensors for smaller scales– RF/IR beacons

• Consider privacy policies• Add time of day to Markov model

– predict when a user will leave as well as where they’re going

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Future Work

• Schedule “sharpness”– always on time = important ?– example: work at 8AM vs. grocery store

• Speed of model update vs. accuracy– new schedule for college students every term– weight new events more heavily?

•how to avoid unduly weighting one–time trips?•use confidence intervals to determine schedule

changes

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Future Work

• Real–time update of models– currently, data is post–processed

– need full wearable computers for real–time

• User interface– visualize location model

– allow user to influence model

• Favor trading implementation

Daniel Ashbrook and Thad StarnerGeorgia

Tech

Thank You

Questions?