移动应用 普适计算...
Transcript of 移动应用 普适计算...
![Page 1: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/1.jpg)
移动应用(普适计算)中的数据挖掘
谢 幸
微软亚洲研究院 2011年10月18日
![Page 2: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/2.jpg)
Ubiquitous Computing (1988)
Mark Weiser (director, computer science lab, Xerox PARC)
Ubiquitous computing names the third wave in computing, just now beginning. First were mainframes, each shared by lots of people. Now we are in the personal computing era, person and machine staring uneasily at each other across the desktop. Next comes ubiquitous computing, or the age of calm technology, when technology recedes into the background of our lives.
UbiComp principles The purpose of a computer is to help you do something else The best computer is a quiet, invisible servant The more you can do by intuition the smarter you are; the computer should extend your unconscious Technology should create calm
![Page 3: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/3.jpg)
Devices: Tabs, Pads and Boards
Tab, accompanied or wearable centimeter sized devices, e.g., smartphones, smart cards Pad, hand-held decimeter-sized devices, e.g., laptops Board, meter sized interactive display devices, e.g., horizontal surface computers and vertical smart boards. Three screens and a cloud
![Page 4: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/4.jpg)
InfoPad (UC Berkeley, 1990-1996)
Explore architecture and systems level issues for the design of a mobile computer providing ubiquitous access to real-time media in an indoor environment
![Page 5: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/5.jpg)
Mobile Operating Systems
TRON (1984) and T-Engine (2002)
Started by Ken Sakamura (Professor, University of Tokyo)
Commercial OS
Windows Phone/Windows CE
Apple IOS
Google Android
Symbian OS
RIM BlackBerry OS
![Page 6: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/6.jpg)
Active Badges (Olivertti Research, 1989)
First automated indoor location system
The small device worn by personnel transmits a unique infra-red signal every 10 seconds.
Each office within a building is equipped with one or more networked sensors which detect these transmissions.
![Page 7: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/7.jpg)
Cooltown (HP Labs Palo Alto, 1999)
Vision: CoolTown will interweave the World Wide Web with people, places and things in the physical world. In essence, everything will have a Web page.
Internet of Things (Auto-ID Center, MIT, 1999): networked interconnection of everyday objects
![Page 8: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/8.jpg)
Smart-Its (Karlsruhe, 2001)
Small embedded computers with communication and sensing components that can be further integrated with everyday objects
Collaboration with many Europe research institutes
MediaCup (Karlsruhe, 1999): ordinary coffee cup augmented with sensing, processing and communication capabilities
![Page 9: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/9.jpg)
Guide (Lancaster University, 1997)
First mobile electronic guidebook for use by tourists
Context-sensitive
Obtain all information via a wireless communications link
![Page 10: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/10.jpg)
Living Laboratories (Georgia Tech, 1995)
Classroom 2000 -> eClass (1995) An excellent example of ubicomp being applied to a real problem, and creating a solution of measurable value
Aware Home (1999)
Camera and RFID tags Smart floor Monitor the electricity, gas, water and waste lines
Augmented Offices
![Page 11: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/11.jpg)
Microsoft Research Projects
SenseCam (1999): a wearable digital camera that is designed to take photographs passively, without user intervention, while it is being worn.
MyLifeBits (2002): a lifetime store of everything
EasyLiving (1998): A smart room designed to support both work and recreational activities
RADAR (2000): Wi-Fi signal-strength based indoor positioning system.
![Page 12: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/12.jpg)
Intel Research Projects
Place Lab (2003) Low-cost, easy-to-use device positioning for location-enhanced computing applications GSM tower, Bluetooth, 802.11 access points
Mote (2004)
Tiny, self-contained, battery-powered computers with radio links Communicate and exchange data with one another, and to self-organize into ad hoc networks
![Page 13: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/13.jpg)
Context Awareness
A key concept in Ubicomp: deal with linking changes in the environment (physical world) with computing systems
Acquisition of context
Abstraction and understanding of context
Application behavior based on the recognized context
Build intelligence about physical world in computing systems
Environment Users
Computing systems
![Page 14: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/14.jpg)
Context and Sensors
Sensor: a device that measures a physical quantity and converts it into a signal which can be read by an observer or by an instrument (from Wiki) Device time Device location
GPS, Wi-Fi, cell-tower, Bluetooth
Device movement Accelerometer, gyroscope Digital compass
Environment Microphone Camera, ambient light sensor Proximity sensor Barometer, humidity sensor, thermometer
![Page 15: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/15.jpg)
Make the Cloud Intelligent
The coming era of cloud computing brings new opportunities to this long studied research area
By accumulating and aggregating context from multiple users, multiple devices, and over a long period, we can obtain collective social intelligence from them
Environment Users
Cloud
![Page 16: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/16.jpg)
Future Devices = Universal Sensors
Data +
Intelligence
Third Party Services
Microsoft Services
![Page 17: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/17.jpg)
Location/Sensor + Social Networks
Color Foursquare Into_Now
![Page 18: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/18.jpg)
Location: the Most Important Context Data
GPS will be installed on 40+% phones by 2011 worldwide
Location based service (LBS) will become a 13B business by 2013
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2004 2005 2006 2007 2008 2009 2010 2011
Pe
rce
nta
ge o
f To
tal S
ales
AfricaAsia/PacificEastern EuropeJapanLatin AmericaMiddle EastNorth AmericaWestern EuropeTotal
Source: Gartner Dataqueste
![Page 19: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/19.jpg)
Projects in MSR Asia
GeoLife: Building Social Networks Using Human Location History (WWW 2010/2009, AAAI 2010, SIGMOD 2010)
Mining Geo-Tagged Photos for Travel Recommendation (ACM MM 2010/2009)
T-Drive: Driving Directions Based on Taxi Traces (ACM GIS 2010/2009)
Knowledge from Taxi Drivers Map and Navigation Service
Knowledge from Photographers Travel Service
Knowledge from General People Social Network Service
![Page 20: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/20.jpg)
GeoLife: Building Social Networks Using Human Location History
![Page 21: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/21.jpg)
GPS Devices and Users
167 users, Apr. 2007 ~ Dec. 2010
16%
45%
30%
9%
age<=22 22<age<=25
26<=age<29 age>=30
18%
14%
10%58%
Microsoft emplyeesEmployees of other companies Government staffColleage students
![Page 22: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/22.jpg)
A Free Large-Scale GPS Dataset
http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/
![Page 23: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/23.jpg)
GPS Log Processing
GPS trajectories*
p4
p3
p5
p6
p7
a stay point s
p1
P2
Latitude, Longitude, Arrival Timestamp
p1: 39.975, 116.331, 9/9/2009 17:54
p2: 39.978, 116.308, 9/9/2009 18:08
…
pK: 39.992, 116.333, 9/12/2009 13:56
a GPS trajectory
stay region r
Raw GPS points Stay points
• Stand for a geo-spot where a user has stayed for a while • Preserve the sequence and vicinity info
Stay regions
• Stand for a geo-region that we may recommend • Discover the meaningful locations
* In GPS logs, we have some user comments associated with the trajectories.
![Page 24: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/24.jpg)
Understanding User Mobility - 1
Inferring transportation modes from GPS data
Differentiate driving, riding a bike, taking a bus and walking
Difficulties
Velocity-based method cannot handle this problem well (<0.5 accuracy)
People usually transfer their transportation modes in a trip
The observation of a mode is vulnerable to traffic condition and weather
![Page 25: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/25.jpg)
Understanding User Mobility - 2
The 1st finding: walking is a transition between other modes
Partition a trajectory into segments of different modes
Handle congestion to some extent
WalkBus
Certain Segment
Denotes a non-Walk Point: P.V>Vt or P.a>at
Denotes a possible Walk point: P.V<Vt and P.a<at
(b)
(c)
Backward Forward
Driving
(a)
Certain Segment3 Uncertain Segments
Driving
![Page 26: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/26.jpg)
Understanding User Mobility - 3
The 2nd finding: many features are more discriminative than velocity Heading Change Rate (HCR)
Stop Rate (SR)
Velocity change rate (VCR)
>0.65 accuracy
H1p1
p2
p3
p1.V1
p2.V2
L1, T1
p1. head p2. head
Velocity
Velocity
Velocity
Distance
Distance
Distance
a) Driving
b) Bus
c) Walking
Vs
Vs
Vs
![Page 27: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/27.jpg)
Understanding User Mobility - 4
Post-processing Transition probability between different transportation modes
P(Bike|Walk) and P(Bike|Driving)
Typical user behaviors based on location
Constrains of the real world
Segment[i-1]: Driving Segment[i]: Walk Segment[i+1]: Bike
P(Driving): 75%
P(Bus): 10%
P(Bike): 8%
P(Walk): 7%
P(Bike): 62%
P(Walk): 24%
P(Bus): 8%
P(Driving): 6%
P(Bike): 40%
P(Walk): 30%
P(Bus): 20%
P(Driving): 10%
Ground Truth
Inference result
Transition P(Walk|Driving) Transition P(Bike|Walk)
Bus stop
Bus stop
![Page 28: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/28.jpg)
Understanding User Mobility - 5
The 3rd finding: users’ GPS logs imply road network
Use the location constrains and typical user behaviors as probabilistic cues
Being independent of the map information
M={Driving, Walk, Bike, Bus},
E.g., P(M0) = P(Driving); P(M3|M1)= P(Bus | Walk);
N1 N2
N7 N8
N6
N5
N3
N1 N2
N5
N3
N4
N1 N4N8 N5
P18(Mi)
P185(Mi|Mj)
Building Graph
(3) Spatial indexing(4) Probability calculation
N7 N8
N6
Change points and
start/end points(1) (2)
A start or end point A change point
P85(Mi) P54(Mi)
P854(Mi|Mj)
P581(Mi|Mj) P458(Mi|Mj)
![Page 29: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/29.jpg)
Understanding User Mobility - 6
AD CP/P CP/R
Velocity-based method 0.49 0.15 0.58
Advanced features (SR+HCR+VCR) 0.65 0.25 0.72
Velocity-based features + advanced features 0.728 0.27 0.78
EF + normal post-processing 0.741 0.31 0.77
EF + graph-based post-processing 0.762 0.34 0.77
![Page 30: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/30.jpg)
Mining User Similarity Based on Location History - 1
Friend recommendation personalized location recommendation
Motivation First law of the geography
Significance of user similarity in communities
Increasing availability of user-generated trajectories
Difficulties How to uniformly model users’ location histories
How to measure user similarity
![Page 31: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/31.jpg)
Mining User Similarity Based on Location History - 2
Location history representation
Stay point detection
Hierarchical clustering
Personal graph building P6
P2
P5
P7
P8
P9
P1Stay Point 2
Stay Point 1
P3
P4
Latitude, Longitude, Time
P1: Lat1, Lngt1, T1
P2: Lat2, Lngt2, T2
………...
Pn: Latn, Lngtn, Tn
a b c d e
A B
Layer 1
Layer 2
Layer 3
G3
G1
G2
a
e
c
A
B
![Page 32: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/32.jpg)
Mining User Similarity Based on Location History - 3
Similar sequences Same visiting order: ai == bi
Similar transition time:
Similarity estimation The length of the matched similar sequence
The layer of the matched similar sequence
Layer 1
Layer 2
Layer 3
G3
G1
G2High
Low
a b
e
c
A
B
Layer 1
Layer 2
Layer 3
G3
G1
G2High
Low
a b
d
e
c
A
B
User 2: bd
User 1: A B
User 1: a c e
User 1: A B User 3: A B
A B
c e
A B
User 1: a c e
User 2: A B
User 3: bc e
User 1: User3> User 2
![Page 33: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/33.jpg)
Mining User Similarity Based on Location History - 4
0.72
0.76
0.8
0.84
0.88
0.92
0.96
MA
P
Methods
0.78
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
Methods
nDCG@ 5
nDCG@10
![Page 34: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/34.jpg)
Mining Interesting Locations and Travel Sequences - 1
When people come to an unfamiliar city
What’s the top interesting locations in this city
How should I travel among these places (travel sequences)
Difficulties
The interest level of a location not only depend on the number of users visiting this location
but also lie in these users’ travel experiences
How to determine a user’s travel experience?
The location interest and user travel experience are region-related
are relative value
![Page 35: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/35.jpg)
Mining Interesting Locations and Travel Sequences - 2
Mutual reinforcement relationship
A user with rich travel knowledge is more likely to visit more interesting locations
An interesting location would be accessed by many users with rich travel knowledge
A HITS-based inference model
Users are hub nodes
Locations are authority nodes
Topic is the geo-region
![Page 36: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/36.jpg)
Users: Hub nodes
Locations: Authority nodes
The HITS-based inference model
![Page 37: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/37.jpg)
Mining Interesting Locations and Travel Sequences - 3
Three factors determining the classical score of a sequence: Travel experiences (hub scores) of the users taking the sequence
The location interests (authority scores) weighted by
The probability that people would take a specific sequence
𝑆𝐴𝐶 = (𝑎𝐴 ∙ 𝑂𝑢𝑡𝐴𝐶 + 𝑎𝐶 ∙ 𝐼𝑛𝐴𝐶 + ℎ𝑘
𝑢𝑘∈𝑈𝐴𝐶 )
A
BC D
E
2 3
4
456
3
2 1
: Authority score of location A
: Authority score of location C
: User k’s hub score
The classical score of sequence AC:
![Page 38: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/38.jpg)
Mining Interesting Locations and Travel Sequences - 4
29 subjects 14 females and 15 males
have been in Beijing for more than 6 years
The test region:
specified by the fourth ring road of Beijing
Evaluated objects top 10 interesting locations
top 5 classical travel sequences
![Page 39: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/39.jpg)
Mining Interesting Locations and Travel Sequences - 5
Top 10 interesting
locations
(C1, C2,…,C10)
A geospatial
region
User Desirability
Rating on each
location
(-1, 0, 1, 2)
Representative Rating (0~10)
Comprehensive Rating (1~5)
nDCG
&
MAP
Top 5 classical
travel sequences
(Sq1, Sq2,…,Sq5)
Novelty Rating (0~10)
Presentation
User Desirability Rating
On each sequence
(-1,0,1,2)
Rank
Ratings Explanations
2 I’d like to plan a trip to that location.
1 I’d like to visit that location if passing by.
0 I have no feeling about this location, but
don’t oppose others to visit it.
-1 This location does not deserve to visit.
Ratings Explanations
2 I’d like to plan a trip with this travel sequence.
1 I’d like to take that sequence if visiting the region.
0 I have no feeling about this sequence, but don’t
oppose others to choose it.
-1 It is not a good choice to select this sequence.
![Page 40: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/40.jpg)
Mining Interesting Locations and Travel Sequences - 6
Ours Rank-by-count Rank-by-frequency
Representative 5.4 4.5 3.1
Comprehensive 4 3.4 2.3
Novelty 3.4 2.4 2.2
Ours Rank-by-count Rank-by-frequency
nDCG@5 0.823 0.714 0.598
nDCG@10 0.943 0.848 0.859
MAP 0.759 0.532 0.365
Ranking ability of different methods for locations
Comparison on the presentation ability of different methods
Ranking ability of different methods for travel sequences
Ours
(Interest + Experience)
Rank-by-
counts
Rank-by-
interest
Rank-by-
experience
Mean score 1.6 1.2 1.4 1.5
Classical Rate 0.6 0.3 0.4 0.4
![Page 41: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/41.jpg)
Collaborative Activity and Location Recommendation
Location Recommendation
Question: I want to find nice food, where should I go?
Activity Recommendation
Question: I will visit the downtown, what can I do there?
Nice food!
Big sale!
![Page 42: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/42.jpg)
Data Modeling
User <-> Location <-> Activity
Activity: tourism
“User Vincent: We took a tour bus to see around along the forbidden city moat …”
GPS: “39.903, 116.391, 14/9/2009 15:25”
Stay Region: “39.910, 116.400 (Forbidden City)”
+1
Vincent
Tourism
Alex …
![Page 43: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/43.jpg)
How to Do Recommendation?
If the tensor is full, then for each user:
Vincent
Tourism
Alex
…
2 1 6
4 3 2
5 4 1
Location recommendation for Vincent Tourism: Forbidden City > Bird’s Nest > Zhongguancun
Tourism
Exhibition
Shopping
Activity recommendation for Vincent Forbidden City: Tourism > Exhibition > Shopping
Tourism
Vincent
Unfortunately, in practice, the tensor is usually sparse!
![Page 44: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/44.jpg)
Our First Solution (WWW 2010)
Features
Lo
cati
on
s
Activities
Lo
cati
on
s
Activities
Act
ivit
ies
5 ? ?
? 1 ?
1 ? 6
Forbidden City
Tourism Exhibition Shopping
Bird’s Nest
Zhongguancun
?
User not explicitly modeled!
1. Not modeling each single user’s Loc-Act history
2. = a sum compression of our tensor
![Page 45: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/45.jpg)
Our Second Solution
Regularized Tensor and Matrix Decomposition
Locations
Use
rs
Lo
cati
on
s
Features
Use
rs
Locations
Use
rs
Users
Act
ivit
ies
Activities
?
![Page 46: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/46.jpg)
Our Model
X X, Y
Y Z
![Page 47: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/47.jpg)
Location Feature Extraction
Location features: Points of Interests (POIs) restaurant
bank
shopping mall
restaurant
Stay Region: “39.980, 116.306 (Zhongguancun)”
[restaurant, bank, shop] = [3, 1, 1]
TF-IDF style normalization*: feature = [0.13, 0.32, 0.18] restaurant
TF-IDF (Term-Frequency Inverse Document Frequency): Example: Assume in 10 locations, 8 have restaurants (less distinguishing), while 2 have banks and 4 have shops:
tf-idf(restaurant) = (3/5)*log(10/8) = 0.13 tf-idf(bank) = (1/5)*log(10/2) = 0.32 tf-idf(shop) = (1/5)*log(10/4) = 0.18
0.13 0.32
Forbidden City
restaurant bank
Location-Feature Matrix
…
…
Zhongguancun
![Page 48: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/48.jpg)
Activity Correlation Extraction
How possible for one activity to happen, if another activity happens?
Automatically mined from the Web, potentially useful when #(act) is large
Most mined correlations are reasonable. Example: “Tourism” with other activities.
Web search (from Bing) Human design (average on 8 subjects)
Food Sports
Movie
Shopping
-0.1
6E-16
0.1
0.2
0.3
0.4
0.5
0.6
Food
Sports Movie
Shopping
-0.1
6E-16
0.1
0.2
0.3
0.4
0.5
0.6
“Tourism and Amusement” and
“Food and Drink”
Correlation = h(1.16M), where h is a normalization func.
Tourism-Shopping more likely to happen
together than Tourism-Sports
![Page 49: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/49.jpg)
Optimization
Minimize the object function L(X, Y, Z, U) Gradient descent
Complexity: O (T × (mnr + m2 + r2)) T is #(iteration), m is #(user), n is #(location), r is #(activity)
where
![Page 50: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/50.jpg)
Experiments
Data GeoLife data set
13K GPS trajectories, 140K km long
530 comments
After clustering, #(loc) = 168; #(user) = 164, #(act) = 5, #(loc_fea) = 14
The user-loc-act tensor has 1.04% of the entries with values
Evaluation Ranking over the hold-out test dataset
Metrics: Root Mean Square Error (RMSE)
Normalized discounted cumulative gain (nDCG)
![Page 51: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/51.jpg)
Baselines – Category I
Tensor -> Independent matrices [Herlocker et al. 1999] Baseline 1: UCF (user-based CF)
CF on each user-loc matrix + Top N similar users for weighted average
Baseline 2: LCF (location-based CF) CF on each loc-act matrix + Top N similar locations for weighted average
Baseline 3: ACF (activity-based CF) CF on each loc-act matrix + Top N similar activities for weighted average
Loc
Use
r
Loc
…
Use
r
Loc UCF LCF
ACF
![Page 52: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/52.jpg)
Baselines – Category II
Tensor-based CF
Baseline 4: ULA (unifying user-loc-act CF) [Wang et al. 2006]
Top Nu similar users, top Nl similar loc’s, top Na similar act’s
Similarities from additional matrices + Small cube for weight average
Baseline 5: HOSVD (high order SVD) [Symeonidis et al. 2008]
Singular value decomposition with matrix unfolding
Loc
Use
r
loc-fea
user-user
act-act
Nu
Nl
Na
ULA HOSVD
![Page 53: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/53.jpg)
Comparison with Baselines
Reported in “mean ± std”
[Herlocker et al. 1999]
[Wang et al. 2006] [Symeonidis et al. 2008]
![Page 54: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/54.jpg)
Comparison with Our First Solution
Current user-centric solution
Previous generic solution
Current Solution
Previous Solution
RMSE 0.006 ±0.001
0.041 ±0.006
nDCGloc
0.576 ±0.043
0.552 ±0.027
nDCGact
0.931 ±0.009
0.885 ±0.019
Performance
![Page 55: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/55.jpg)
Impacts of the Model Parameters
Some observations Using additional info (i.e. λi > 0) is better than not (i.e. λi = 0)
Not very sensitive to most parameters Model is robust + Contribution from additional info is limited
As λ2 increases, nDCG for loc recommendation greatly decreases Maybe because the loc-feature matrix is noisy in extracting the POIs
Not directly related to act, so no similar observation for act recommendation
![Page 56: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/56.jpg)
Collaborative Activity and Location Recommendation
We showed how to mine knowledge from GPS data to answer
If I want to do something, where should I go?
If I will visit some place, what can I do there?
We evaluated our system on a large GPS dataset
19% improvement on location recommendation
22% improvement on activity recommendation
over the simple memory-based CF baseline (i.e. UCF, LCF, ACF)
![Page 57: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/57.jpg)
Homework (1)
Write your own application which uses GeoLife dataset
http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/
![Page 58: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/58.jpg)
TreasureMap: A Game Based Approach to Assign Geographical Relevance to Web Images
Highly relevant Lowly relevant
Captain Sailor Captain
Birds’ Net
![Page 59: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/59.jpg)
Game Logs
Name Description Example
Selected city Name of selected city Beijing
Selected location Name of selected location Summer Palace
Selected image Selected image ID 5514
Guessed location Lat. And Lon. of Sailor’s guessed point (40.022, 116.200)
Start time The date and time the countdown started.
2008/09/08 22:02:33
End time The date and time Sailor clicked a location on the map.
2008/09/08 22:02:41
Actions Information of Captain’s clicks .
Name Description Example
Clicked point X, Y-coordinate of Captain’s click on the image (138, 87)
Clicked time The date and time of Captain’s click 2008/09/08 22:02:37
![Page 60: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/60.jpg)
Experimental Results
Released our game prototype for internal study in Microsoft Research Asia.
Test duration: three weeks in the summer of 2008
Participants: 147 interns who are undergraduates or graduate students with computer science background.
Data set
50 locations in Beijing and 30 locations in Shanghai
1641 images were collected by using Live Image Search
2761 game logs were obtained
![Page 61: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/61.jpg)
Similar Landmark Detection
Co-location pattern mining based on players’ guesses for each location
By enlarging the game to world-based, we can derive similar locations in the world
Help travel recommendation services
Zhengyang-gate
Desheng-gate Archery Tower
![Page 62: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/62.jpg)
Image Relevance
Data to assess geographical relevance Number of times of image selection
Average number of clicks
Average discrepancy distance between players’ guesses and the correct location
No correlation between number of times of image selection and average discrepancy distance
Average discrepancy distance is affected by players’ knowledge of locations.
![Page 63: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/63.jpg)
Image Relevance
0 1 2 3 4 5 6 7 8 9
14 11 12 13
10
15 16 17 18 19 20 21 22
![Page 64: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/64.jpg)
Image Relevance
0 1 2 3 4 5 6 7 8 9
14 11 12 13
10
15 16 17 18 19 20 21 22
![Page 65: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/65.jpg)
Image Region Relevance
(a) Original image (b) Saliency map (c) Neutral heat map (d) Weighted heat map
![Page 66: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/66.jpg)
Re-ranking Image Search Results
Propose methods based on game logs to determine geographical relevance to images
Normalized number of times of image selection (Freq) Normalized average number of clicks (Click) Normalized average discrepancy distance (Dist)
The ranges of value are from 0 to 1, and larger the value is, the higher the geographical relevance is
location on theselection image of timesofnumber Max.
selection image of timesofNumber Freq
imagean on click ofnumber Average
location on theclick ofnumber average Min.Click
imagean of distancey discrepanc Average
location on the distancey discrepanc average Min.Dist
![Page 67: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/67.jpg)
Evaluation
Selected 15 frequently selected locations and assigned ground-truth geographical relevance to images
4 (relevant) to 0 (irrelevant)
Calculate and compare the normalized discounted cumulative gain (NDCG) with Live Image Search.
n
i
i
i
relrel
nfNDCG
2 2
1log)(
1
reli: relevance value of ith image n: number of images f(n): normalization factor
![Page 68: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/68.jpg)
Result
Freq performs the best. Players’ interaction is simpler than other two methods. Sometimes shows drop of NDCG.
Click achieves more stable performance. Combine Freq and Click to improve the performance
Method Average NDCG
Improvement
Freq 0.9244 0.1445
Click 0.8547 0.0748
Dist 0.8344 0.0545
Live Image Search
0.7799
![Page 69: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/69.jpg)
Mining City Landmarks from Photos
![Page 70: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/70.jpg)
System Framework
![Page 71: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/71.jpg)
View Generation by Visual Clustering
Lama
Temple
Beijing
Summer Palace
Temple of
Heaven
Tiananmen
Forbidden
City
View 2
Tiananmen Square
Mary
JamesJerry
Tsinghua University
View 1
Tiananmen Gate
Scene L
ayer
Vie
w L
ayer
View 1
View 2
View 3
View 4
Discard
Discard
Merge
Near-Duplicated Visual
Clustering in View Generation
Merge Threshold t
Discard Threshold M
![Page 72: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/72.jpg)
Mining Landmarks by Graph Modeling
A1 A2 A3
P1 P2 P3 P4 P5 P6 P7
Authority link
Author Node
Photo Link (Content & context associations)
Photo Node
PhotoRank is Conducted In Photo Node layer
Author Node layer owns Hub weight as in HITS
S1 S2
Scene Node
Scene Node layer owns Authority weight as in HITS
HITS-like process is conducted in Author and Scene layers, and affects the PhotoRank iteratively
![Page 73: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/73.jpg)
Experimental Results
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Content Clustering Content & Context Clustering Content PhotoRank Block-based Content PhotoRank Content & Context PhotoRank
Blog Users Top Ranked Landmarks at Worldwide Scale by Landmark-HITS
Asian1. Summer Palace (Beijing), 2. Sydney Opera House (Sydney), 3. Louvre Museum (Paris), 4. Tiananmen (Beijing), 5. Tokyo Tower (Tokyo), 6. Universal studios (L.A.), 7. Oriental Pearl (Shanghai), 8. Tower of London (London), 9. Empire State Building (New York), 10. Statue of Liberty (New York)
European1. Sydney Opera House (Sydney), 2. Louvre Museum (Paris), 3. London Museum (London), 4. Summer Palace (Beijing), 5. Tower of London (London), 6. Empire State Building (New York), 7. Statue of Liberty (New York), 8.Oriental Pearl (Shanghai), 9. Tokyo Tower (Tokyo), 10. Universal studios (L.A.)
American1. Statue of Liberty (New York), 2. Universal studios (L.A.), 3. Sydney Opera House (Sydney), 4. Empire State Building (New York), 5. Louvre Museum (Paris), 6. Space Needle (Seattle), 7. Summer Palace (Beijing), 8. CnTower (Toronto), 9. Tokyo Tower (Tokyo), 10. Oriental Pearl (Shanghai)
![Page 74: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/74.jpg)
Mining Trip Knowledge from Geo-tagged Photos
Trace people’s trips from geo-tagged photo collections
Photo trip patterns:
Sequence of visited cities and durations of stay
Typical description of trips represented by tags
Classify photo trip patterns based on their trip themes
Milano Venice Pisa 1 day 2 days
•Duomo •La Scala
•Gondola •San Marco
•Leaning Tower
![Page 75: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/75.jpg)
Photo Trip Pattern Mining: Segmentation
Detect changes of trips based on captured time gaps, distance between photos, and tags
Time Paris Barcelona London Honolulu
club, beer, friends,
chips
Eiffel Tower, Notre-Dame, Louvre,
Sagrada Familia, Picasso
honeymoon, wedding,
Honolulu, bay
![Page 76: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/76.jpg)
Photo Trip Pattern Mining: Classification
Classify photo trips into categories by SVMs
Landmark/Nature/Gourmet/Event/Business/Local
Features: tags and locations
Time Paris Barcelona London Honolulu
Landmark Local Event
![Page 77: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/77.jpg)
Trip Pattern Mining for Trip Classes
Apply TAS (Temporary Annotated Sequence) mining algorithm
Input: Set of trips extracted from all users
Output: Frequent trip patterns, e.g., a set of visited cities and typical transition times.
Paris Barcelona 7hrs
![Page 78: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/78.jpg)
Trip Semantic Identification
Paris Barcelona
Sky Paris London- Eye Summer Duomo 2007 Concert
Sforza Castle Wedding
Notre-Dame Louvre Picasso
Trip semantics
7hrs
![Page 79: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/79.jpg)
Trip Semantic Identification
Detect descriptive tags for each trip pattern
TF/IDF based method
Tag frequency, inverse tag frequency
User frequency
Consider geographical scale of tags to exclude locally/globally common tags
“shop”: globally common tags
“Beijing,” “BJ”: locally common tags
![Page 80: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/80.jpg)
Evaluation
Collected 5.7 million geo-tagged photos and conducted evaluation
72% precision and 85% recall for segmentation detection
79% accuracy for trip classification
Tags are most dominant feature
Combination of tags and locations performed best
Locations can compensate photos without tags
![Page 81: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/81.jpg)
Examples
Trip class Trip pattern Trip semantics
Landmark {Paradise, Las Vegas} casinos, VMA, Bellagio, The Strip, WYNN
Nature {Sydney, Randwick} blue sky, barbed wire, inner, bay, Manly
Gourmet {Camberwell, Melbourne} cookie, spoon, rice, Colonial hotel, DJ
Event {Washington D.C, Arlington} mountain biking, WW, Wednesdays at Wakefield, mountain bike race, racing
Business {Jersey City, New York, Jersey
City} comedians, MSN, live.com, Steve Kelley,
Yahoo
Local {Boston, Cambridge} ants, mall, hospital, highway, living room
![Page 82: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/82.jpg)
Homework (2)
Write an algorithm to decide the location of a photo on Weibo
Example input (URL to a weibo with a photo):
http://weibo.com/1649152460/eAfCrKUfDuX
Example output:
贵州省纳雍县羊场乡
![Page 83: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/83.jpg)
T-Drive: Driving Directions Based on Taxi Traces
![Page 84: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/84.jpg)
t =7:00am
t = 8:30am
Q=(𝑞𝑠, 𝑞𝑑 and t)
![Page 85: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/85.jpg)
Background
Shortest path and Fastest path (speed constraints)
Real-time traffic analysis Methods
Road sensors
Visual-based (camera)
Floating car data
Open challenges: coverage, accuracy,…
Have not been integrated into routing
Traffic light
parking
Human factor
![Page 86: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/86.jpg)
Background
What a drive really needs?
Finding driving direction > > Traffic analysis
Sensor Data
Traffic Estimation (Speed)
Driving Directions
Physical Routes Traffic flows Drivers
![Page 87: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/87.jpg)
Observations
A big city with traffic problem usually has many taxis
Beijing has 70,000+ taxis with a GPS sensor
Send (geo-position, time) to a management center
![Page 88: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/88.jpg)
Motivation
Human Intelligence Traffic patterns
![Page 89: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/89.jpg)
Challenges we are faced
Intelligence modeling Data sparseness
Low-sampling-rate
![Page 90: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/90.jpg)
Methodology
Pre-processing
Building landmark graph
Estimate travel time
Time-dependent two-stage routing
A Time-dependent
Landmark Graph
Taxi Trajectories
A Road Network
Rough
Routing
Refined
Routing
![Page 91: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/91.jpg)
Step 1: Pre-processing
Trajectory segmentation Find out effective trips with passengers inside a taxi
A tag generated by a taxi meter
Map-matching map a GPS point to a road segment
IVMM method (accuracy 0.8, <3min)
e1 e2
e3
e3.start
e3.end
e4
Vi
Vj
R1 R2
R3
a
b
R4
![Page 92: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/92.jpg)
Step 2: Building landmark graphs
Detecting landmarks A landmark is a frequently-traversed road segment
Top k road segments, e.g. k=4
Establishing landmark edges Number of transitions between two landmark edges > 𝛿
E.g., 𝛿 = 1
r2
Tr1 r3
r9
r8
r6
r1
Tr2
Tr5
Tr3
Tr4
A) Matched taxi trajectories B) Detected landmarks C) A landmark graph
r9
r3r1
r6
r9
r3r1
r6
p1 p2
p3 p4
r4
r5r7
r10
e16
e96
e93
e13
e63
![Page 93: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/93.jpg)
Step 3: Travel time estimation
The travel time of an landmark edge Varies in time of day
is not a Gaussian distribution
Looks like a set of clusters
A time-based single valued function is not a good choice
Data sparseness
Loss information related to drivers
Different landmark edges have different time-variant patterns
Cannot use a predefined time splits
VE-Clustering Clustering samples according to variance
Split the time line in terms of entropy
![Page 94: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/94.jpg)
Step 3: Travel time estimation
V-Clustering Sort the transitions by their travel times
Find the best split points on Y axis in a binary-recursive way
E-clustering Represent a transition with a cluster ID
Find the best split points on X axis iteratively
![Page 95: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/95.jpg)
Step 4: Two-stage routing
Rough routing Search a landmark graph for
A rough route: a sequence of landmarks
Based on a user query (𝑞𝑠, 𝑞𝑑, t, 𝛼)
Using a time-dependent routing algorithm
r4
r1
qd
0.1 r3
r2
0.1
0.1
qs
C12(0.1)=2 C34(0.1)=1
0.1
C12(1.1)=1 C34(1.1)=2
e12 e34
![Page 96: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/96.jpg)
Step 4: Two-stage routing
Refined routing Find out the fastest path connecting the consecutive landmarks
Can use speed constraints
Dynamic programming
Very efficient Smaller search spaces
Computed in parallel
r4 r5r2qs qe
2 2 10.3 0.2
r4.end
r6
qe
r4.start r5.start
r5.endr2.end
r2.start r6.start
r6.end1.4
4.51.7
2.5
2.8
2.4
3.2
0.9
qe1.4
2.5
0.9
r2.start
A) A rough route
B) The refined routing
C) A fastest path
r2.end r4.end
r4.start r5.start
r5.end
r6.start
r6.end
0.3
0.2
0.3
0.2
1 1 1 1
1 11 1
qs
qs
![Page 97: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/97.jpg)
Implementation & Evaluation
6-month real dataset of 30,000 taxis in Beijing Total distance: almost 0.5 billion (446 million) KM
Number of GPS points: almost 1 billion (855 million)
Average time interval between two points is 2 minutes
Average distance between two GPS points is 600 meters
Evaluating landmark graphs
Evaluating the suggested routes by Using synthetic queries
In the field studies
![Page 98: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/98.jpg)
Evaluating landmark graphs
Estimate travel time with a landmark graph
Using real-user trajectories 30 users’ driving paths in 2 months
GeoLife GPS trajectories (released)
K=2000 K=4000
K=500
![Page 99: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/99.jpg)
Evaluating landmark graphs
Accurately estimate the travel time of a route
10 taxis/ 𝑘𝑚2 is enough
![Page 100: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/100.jpg)
Synthetic queries
Baselines Speed-constraints-based method (SC)
Real-time traffic-based method (RT)
Measurements FR1, FR2 and SR
Using SC method as a basis
![Page 101: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/101.jpg)
In the field study
Evaluation 1 Same drivers traverse
different routes at different times
Evaluation 2 Different two users with similar driving skills
Travers two routes simultaneously
![Page 102: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/102.jpg)
Results
• More effective • 60-70% of the routes suggested by our method are faster than Bing and Google Maps.
• Over 50% of the routes are 20+% faster than Bing and Google.
• On average, we save 5 minutes per 30 minutes driving trip.
• More efficient
• More functional
![Page 103: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/103.jpg)
Conclusions
Build intelligence from the physical world
Activity/location recommendation based on GPS trajectories
Mining geo-tagged photos for travel recommendation
Driving directions based on taxi traces
Challenges and future directions
How to protect privacy?
How to support real-time information sharing and search?
How to reduce energy consumption?
![Page 104: 移动应用 普适计算 中的数据挖掘59.108.48.12/lcwm/course/WebDataMining/slides2011/第六... · 2011-10-18 · Ubiquitous Computing (1988) Mark Weiser (director, computer](https://reader033.fdocuments.net/reader033/viewer/2022041719/5e4d148981013e09b5127748/html5/thumbnails/104.jpg)
谢谢!
谢 幸
微软亚洲研究院 2011年10月18日