Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

42
A O S S C N S E D U S S P L 2015 Krzysztof Janowicz STKO Lab, University of California, Santa Barbara, USA E D U S S K. J

Transcript of Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Page 1: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Exploring the Data Universe with

Semantic Signatures

Plous Lecture 2015

Krzysztof JanowiczSTKO Lab, University of California, Santa Barbara, USA

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 2: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Puddingand planets

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 3: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Analogies & Atoms

Plum Pudding

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 4: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Analogies & Atoms

Thomson’s Plum Pudding Model (1904)

Positive charge distributed equally in the atom, electrons embedded as raisins

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 5: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Analogies & Atoms

Rutherford(-Bohr) Solar System Model (1911/13)

Small nucleus with a high mass and electrons that revolve around it

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 6: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Analogies & Atoms

Analogies

‘And I cherish more than anything the Analogies,my most trustworthy masters. They know all thesecrets of Nature, and they ought to be leastneglected in Geometry.’ (Johannes Kepler)

Analogies enable us to explore a newdomain (target) by mapping its structureto another, more familiar domain (source).They allow us to ask new questions whichonly become meaningful in the new domain.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 7: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Observatoriesand sensors

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 8: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Astronomical Observatories And Their Sensors

The Griffith Observatory

Griffith donated funds and land to build the observatory to make astronomy accessible tothe public. This was in clear contrast to the prevailing idea of locating observatories onremote mountaintops and restrict them to scientists. Today, our society is willing to investbillions to study phenomena that may not even exist anymore (e.g., the Pillars of Creation).

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 9: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Astronomical Observatories And Their Sensors

Observatories and Their Sensors

Whether on land or in space, observatories and their sensors servedifferent purposes and are most useful when they work together.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 10: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Astronomical Observatories And Their Sensors

Spectral Signatures, Bands, and Remote Sensing

Spectral signatures are the combination of emitted, reflected, or absorbedelectromagnetic radiation at varying wavelengths (bands) that uniquelyidentify a feature type.Spectral libraries, the idea of sharing spectral signatures, hasrevolutionized remote sensing.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 11: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

A Universe of Data?

The Data Universe: Synthesis Is The New Analysis

What is the common core of the digital universe, physical-cyber-socialsystems, digital earth, 4th paradigm, big data, social machines, and so forth?Synthesis is the new analysisObservational science versus experimental science(Unintended)reuse of existing data, semantic interoperabilityHeterogeneity: multi-thematic, multi-perspective, multi-resolution

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 12: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

A Universe of Data?

Towards Data Observatories

Web Science Trust: ‘A web observatory is a system that gives public accessto some specific aspects of the WWW and provides the infrastructure andvisualization techniques to support monitoring, analysis, and experiments.’Web Science Trust wants to establish a network of observatories.New questions: are there laws of the data universe?

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 13: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Constructing the

Analogy

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 14: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Semantic Signatures and Bands

Semantic Signatures As Analogy To Spectral Signatures

Geospatial bandsbased on geographic location

ANNDRipley’s K BinsJ MeasureDzero

Temporal bandsbased on geo-social check-ins

24 Hours7 DaysSeasons

Thematic bandsbased on venue tips and reviews

LDA topicsTF-IDF

Makes use of dataheterogeneity

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 15: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Thematic Bands

Thematic Bands & Geo-Indicativeness

Places at geographic location 34.43, -119.71 are:of types city, county seat,...at the coastline, near the mountains, have Mediterranean climate,...described in terms of urban area, economy, tourism, government, employment,...

Interesting observation: some of these terms will co-occur by type, others per region.Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 16: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Thematic Bands

Thematic Bands & Geo-Indicativeness

A thematic band can becomputed out of unstructuredtext from sources such asWikipedia, travel blogs, newsarticles, and so forth.Non-georeferenced plain textis often still geo-indicativeDifferent types of geographicfeatures have different,diagnostic topics associated tothem (out of 500 topics)Indicative topics and be lifted tothe type-level.Here, we modeled topics usinglatent Dirichlet allocation (LDA)

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 17: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Thematic Bands

Thematic Bands & Geographic Feature Types

City topics: 204>450>104>282>267>497>443>484>277>97>...Town topics: 425>450>419>367>104>429>266>69>204>308>...Mountain topics: 27>110>5>172>208>459>232>398>453>183>...

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 18: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Temporal Bands

Temporal Bands

Study geo-socialcheck-in data tolocation-based socialnetworks.Aggregate them to thefeature type level andclean them.Intuitively, people visitwineries in theafter-noon and eveningand bakeries in themornings.Combining weekly andhourly bands to createplace type signatures.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 19: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Spatial Bands

Spatial Bands

POI plotted by similarity to bar and post office in OpenStreetMap data (London)Similarity measured as association strength in OSM change historyBars (and similar features) tend to clump togetherPost Offices (and similar features) are rather uniformly distributed

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 20: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Spatial Bands

Spatial Bands

Dzero measures the likelihood of features of a certain type to co-occurwithin a specific semantic and spatial range.General idea: generate recommendations and clean up data based ontype likelihood. ’How likely is a post office directly next to an existing one?’

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 21: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Sensor Resolution & Social Sensing

Sensor Resolution & Social Sensing

(Remote sensing) sensors can be characterized by their resolution

Spatial resolution: smallest feature that can be detected, i.e., the pixel size.Temporal resolution: smallest time interval between a repeated observation.Spectral resolution: number, position, and width of spectral bands.Radiometric resolution: small distinguishable differences in radiation magnitude.

Analogous social sensor resolutions, e.g., types of bands, number of topics.Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 22: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Sensor Resolution & Social Sensing

Platial Resolution of Termporal Signatures

Circular temporal signatures histograms for Theme Park (a,b,c) andDrugstore (d,e,f).About 50% of ≈ 400 Point Of Interest (POI) types are regionally invariant in the USA.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 23: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Sensor Resolution & Social Sensing

Temporal Resolution of Termporal Signatures

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

The ’Foursquare-day’How and when do people check-in at places, manually, automatically?Do they check-out? If not, after what time are they checked-out automatically?

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 24: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Sensor Resolution & Social Sensing

Distinguishable Feature Types For Thematic Signatures From 500-Topics

Which classes in a feature type schema can be meaningfully distinguished?

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 25: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

SpatialSearch Challenges

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 26: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

From Space to Place Through Time

1. Challenge: Mapping User Locations from Spaces to Places

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 27: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

From Space to Place Through Time

1. Challenge: Mapping User Locations from Spaces to Places

Estimate the place visited by a user from the user’s spatial location(e.g., as measured by their smartphone).

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 28: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

From Space to Place Through Time

Baseline: Google Place API

Marker Category Distance (m)A Bakery 39.2B Nightclub 41.4C Nightclub 69.9D American Restaurant 62.7E Bakery 73.7F Fast Food 65.0G Apparel Store 85.8H Ice Cream Shop 82.6I Movie Theater 94.2J Pub 88.9K Cosmetics Shop 60.9L Diner 70.0M Italian Restaurant 45.7N Furniture / Home Store 114.9O Grocery Store 147.8P BBQ Joint 82.3Q Burrito Place 88.1R Italian Restaurant 93.6

Geolocation APIs map geographic coordinates, e.g., from a user’ssmartphone, to an ordered sets of nearby candidate POI.These services typically return the n nearest POI within a certain radius anduse spatial distance to the provided coordinates to determine their order.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 29: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

From Space to Place Through Time

Our Approach: Distort POI Locations Using Temporal Signatures

Marker Category Distance (m) Monday 10AM (10−3 ) Saturday 11PM (10−3 )A Bakery 39.2 6.28 4.08B Nightclub 41.4 0.26 44.16C Nightclub 69.9 0.26 44.16D American Restaurant 62.7 1.61 9.50E Bakery 73.7 6.28 4.08F Fast Food 65.0 4.80 5.78G Apparel Store 85.8 2.51 1.09H Ice Cream Shop 82.6 0.84 15.88I Movie Theater 94.2 1.44 11.00J Pub 88.9 0.53 22.66K Cosmetics Shop 60.9 3.87 1.57L Diner 70.0 5.49 7.56M Italian Restaurant 45.7 1.42 7.96N Furniture / Home Store 114.9 4.79 5.01O Grocery Store 147.8 4.53 1.38P BBQ Joint 82.3 0.43 9.35Q Burrito Place 88.1 0.54 3.16R Italian Restaurant 93.6 1.42 7.96

The likelihood of visiting a coffee shop, university, bakery, etc at 7pm israther low, while it is a peak hour for restaurants.In analogy to scale distortion in cartography, we can modify the purely spatialranking by pulling and pushing places based on the check-in probabilityof their temporal type signatures.Different distortion models: linear, non-linear, symmetrical, non-symmetric

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 30: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

From Space to Place Through Time

Our Approach: Distort POI Locations Using Temporal Signatures

Marker ActualDist.(m)

DistortedDist.(m)

A 39.2 25.8B 41.4 71.4C 69.9 99.9D 62.7 79.8E 73.7 60.3F 65.0 59.5G 85.8 95.6H 82.6 106.7I 94.2 112.8J 88.9 116.1K 60.9 61.1L 70.0 60.6M 45.7 64.5N 114.9 109.5O 147.8 143.9P 82.3 110.5Q 88.1 115.2R 93.6 112.4

Method MRR SRR nDCG 1st Pos.Distance-Only 0.359 443.8 0.583 211Temporally Adjusted 0.453 793.5 0.711 423

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 31: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Vague Cognitive Regions: Where is SoCal?

2. Challenge: Vague Cognitive Regions

Where is SoCal and NorCal?

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 32: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Vague Cognitive Regions: Where is SoCal?

Baseline: Tests With Human Participants

44 participants, 90 hexagon tessellation (≈ 4920km2 each)

Google Maps search for SoCal

[More on the extraction of polygons at [email protected]]

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 33: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Vague Cognitive Regions: Where is SoCal?

Data and Correlations

Source SoCal NorCal TotalFlickr 22132 19706 41838Instagram 169648 116984 286632Twitter 10376 3294 13670Travel Blogs 107 78 185Wikipedia 1450 700 2150

0 1000 2000 3000 4000

0.0

0.2

0.4

0.6

0.8

1.0

Empirical Cumluative Distribution

Flickr photo counts per userC

DF

Source ρ (M1) τ (M1)Flickr 0.881 0.721Instagram 0.867 0.711Twitter 0.874 0.714TravelBlogs & Wikipedia 0.897 0.74Means 0.870 0.712

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 34: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Vague Cognitive Regions: Where is SoCal?

Vague Cognitive Regions and Inter-rater Agreement

5

5

5 6.25

5.75

6.75

5.25

5.75

5.25

6.75

4.25

6.75

6.75

5.75

5.25

6.75

5.75

5.25

5.75

4.75

6.75

6.75

4.25 4.75

5.25

4

4

5

4

4

4

5

5.5

5.5

3.5

6.5

1.5

6.5

6.5 5.5

5.5

3.5

4.5

4.253

3

33

3.5

2.5

3.5

2.5

2.5

2.5

3.5

2.5

3.25

3.25

3.75

1.75

2.75

3

2

2

22.25 2.25

2.252.25

0 80 160 240 32040Miles

®

LegendInsufficient Data

Very Northern Californian

Moderately Northern Californian

Slightly Northern CalifornianEqually Northern and Southern CalifornianSlightly Southern Californian

Moderately Southern Californian

Very Southern Californian

Standard Deviations< 0.01

0.01 - 0.500.51 - 1.001.01- 1.73> 1.73

Source Four Raters Five RatersKendall’s W 0.953 0.929p-value < 0.001 < 0.001

Key idea: Data sources becomeraters/ participants.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 35: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Vague Cognitive Regions: Where is SoCal?

Vague Cognitive Regions and Thematic Signatures

Do you even have to mine for the Socal and Norcal term directly?Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 36: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Vague Cognitive Regions: Where is SoCal?

Vague Cognitive Regions and Self-Similarity

0 5 10 15

0.00

0.05

0.10

0.15

0.20

KLD Divergence

Northern CaliforniaSouthern CaliforniaBoth Northern & Southern California

Based on 60 topics, the similarity between SoCal (and NorCal) cells ishigher than between SoCal and NorCal cells.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 37: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Is the Data Universe Homogenous amd Isotropic?

Limits Of The Data Universe Analogy

At large scale, the physical universe is homogenous and isotropic

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 38: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

Is the Data Universe Homogenous amd Isotropic?

Limits Of The Data Universe Analogy

In terms of geospatial distribution the Social Media Web is neither homogenousnor isotropic. If you direct your social sensing instrument to a certain region,there will be no signal.

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 39: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

NextSteps

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 40: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

POI Pulse Observatory

POI Pulse Observatory: Explore the Pulse of Los Angeles Using Signatures

http://poipulse.com/

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 41: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

POI Pulse Observatory

A Public Data Observatory at UCSB?

A tangible & public observatory atUCSB; remember Griffith’s will.Show & stream data from differentsources and show analysis resultsVisualize privacy implications of dataShow citizens how their everyday datais used for scientific discoveries

Exploring the Data Universe with Semantic Signatures K. Janowicz

Page 42: Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

Analogies Observatories Semantic Signatures Challenges Next Steps

The Right Place

Exploring the Data Universe with Semantic Signatures K. Janowicz