Nielsen x DataScience SG Meetup (Apr 2015)
-
Upload
eugene-yan -
Category
Data & Analytics
-
view
464 -
download
0
Transcript of Nielsen x DataScience SG Meetup (Apr 2015)
SMU -‐ SCHOOL OF BUSINESS (SR 2.2)
20 APRIL 2015
Singapore Data Science InnovaEon Lab/InsEtute The Nielsen Company (Singapore) 47 ScoQs Road #13-‐00 Goldbell Towers Singapore 228233
DATASCIENCE.SG MEETUP LOCATION-‐BASED ANALYTICS FOR MARKETING RESEARCH
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
2
OUTLINE
• Brief overview of Nielsen • Selected case studies:
• Eye in the sky
• Large-‐scale survey fieldwork design & management
• Store-‐matching using locaEon informaEon
• Measuring exposure to outdoor adverEsing
• Q & A
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
3
Help our clients have the most complete understanding of consumers worldwide
Our Mission
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
4
Nielsen – A Truly Global Company • Founded in 1923 • Global footprint in >100 countries around the world • Employs >34,000 employees globally
Our 2012 revenue was USD$5.4 B
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
5
THE LATEST INDUSTRY BENCHMARK...
Source: Global Market Research 2014 Report by ESOMAR (European Society for Opinion & MarkeEng Research)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
6
Our clients… Buy Watch
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
7
foresight on the Asian consumer.
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
8
Eye in the Sky Rural-‐Urban ClassificaHon Using Satellite Images
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
9
RES – RETAIL ESTABLISHMENT SURVEY Number of sales outlets, types of outlets (market size & composiEon)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
10
SAMPLING CENSUS IN LARGE COUNTRIES
E.g. Indonesia (1.9 million square kilometers)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
11
• In RES, a target country is ‘carved’ up into small manageable survey areas
• StraEfied sampling used to ensure representaEveness of data collected
• E.g.: Indonesia
Rural-‐urban status is an important factor in the straEficaEon process
Problem Official info from Indonesian govt is not current, and important info may be missing/unavailable
STRATIFIED RANDOM SAMPLING
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
12
Step 1
Turning to remote sensing (satellite imagery – DigitalGlobe/RapidEye) to provide: scien-fic, objec-ve and con-nuous monitoring of survey regions
Pilot area: Bali
Land use report
Step 2
Step 3
Computa-onal Intelligence
Machine Learning
Step 4 Rural-‐Urban classifier
PROPOSED METHODOLOGY: SCIENTIFIC, OBJECTIVE, TRACTABLE
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
13
PILOT REGION: BALI, INDONESIA
• Bali (smallest of 34 provinces)
• Organized into:
Ø Regencies (Kapubaten) Ø Districts (Kecamatan)
Ø Towns/Villages (645 DESAs = Nielsen survey areas)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
14
RAPIDEYE IMAGES OF BALI, INDONESIA
Bali land use paQern dataset: 383 DESAs used in this study
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
15
GETTING THE GROUND TRUTH (RURAL OR URBAN)
Crowd sourcing approach
• Group of human volunteers used
• Image order randomized
• Majority voEng strategy adopted to derive final class label
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
16
RESULTS FROM TWO-‐CLASS APPROACH
• Results from 1000 groupings of training and hold-‐out subsets at 90%:10% parEEon raEo
Results are saHsfactory but error rates sHll too high to meet Nielsen’s standard for data quality
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
17
APPLYING K-‐MEANS CLUSTERING TO BALI DATASET
Bali land use paQern dataset: 383 DESAs used in this study
K-‐Means result concurs with visual observaUons!
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
18
Sub sub urban/Sub-‐rural areas: -‐ Region with large open areas
-‐ Undeveloped land/ farmlands
-‐ Low building density
Core urban areas: -‐ High building density -‐ LiQle/no vegetaEon cover -‐ LiQle/no farmlands
Core rural areas: -‐ Dense vegetaEon -‐ Natural lands
Sub urban areas: -‐ Mix of buildings & farmlands
-‐ LiQle/no dense vegetaEon
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
19
RESULTS FROM FOUR-‐CLASS APPROACH
• Results from 1000 groupings of training and hold-‐out subsets at 90%:10% parEEon raEo
We need to ascertain that the new set of results is significantly beNer than the one from the two-‐class approach
At 1% test level, the results from four-‐class approach are beQer!
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
20
CONFIRMATION OF RESULTS USING NIGHT IMAGERY
Earth-‐at-‐Night imagery from NASA-‐Earth observatory & NOAA satellites
Good fit between our classificaEon results and the EaN images
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
21
TO RECAP
What is urban?
AnalyEcs: rigour and sustainable
SoluEon must be pracEcal (cost)
?
!
! Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
22
Large-‐scale Survey Fieldwork Design & Management Nielsen Singapore Data Science InnovaUon Lab/InsUtute
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
23
WHAT IS REQUIRED...
Survey: LisEng of 32k respondents over a period of 12 months (Jul14 – May15)
LisEng released by Client:
Phase 1 Phase 2 Phase 3 Phase 4
~32K (lisEng) 10,500 10,500
7,000 3,500
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
24
CHALLENGES (1) Client’s requirement: Similar distribuEon of lisEngs across phases Nielsen: Task allocaEon (Field work efficiencies + ProducEvity) => reduced cost (2) Client provided address & postal code for lisEngs (with name, age, race, gender) Nielsen: Manual sorEng and grouping of addresses (32k respondents) require weeks
• Time consuming to check addresses manually
• Even more Eme to group addresses to ensure even distribuEon
• No classificaEon of dwelling type (public vs. private)
• Private housing has restricted access (condo names not provided by client)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
25
OBTAINING THE LOCATION INFORMATION
TranslaUng postal codes to geocodes (geo-‐coordinates)
Changi Airport
Paya Lebar Airbase, Industrial land
Nature reserve, Central Catchment Area
Jurong Industrial Estate
Tengah Airbase, Agricultural land
Seletar Airport
Black: Public housing Blue: Private housing
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
26
EVEN DISTRIBUTION (GROUPING BY POSTAL REGIONS)
Singapore is organized into postal regions • SG postal code has 6 digits • First 2 digits denote postal region • Each building in postal region is assigned a number (last 4 digits)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
27
LOCAL CLUSTERING WITHIN EACH POSTAL REGION
Clustering is applied to group locaUons by proximity • Same methodology applied for both public and private dwellings
Yishun (76)
Woodlands/Sembawang (73) Marsiling/Admiralty (75)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
28
FINAL SAMPLE DISTRIBUTION BY PHASES
Local grouping strategy ensures: • LocaEons closed to one another are visited in the same phase • Methodology is fast (clustering < 5 mins) • Manual adjustment can be used to fine-‐tune results
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
29
MAP OF INTERVIEWER AND RESPONDENT LOCATIONS FOR SELECTED SUBGROUP – PSO RESULT ILLUSTRATION
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
30
MAP OF INTERVIEWER AND RESPONDENT LOCATIONS FOR SELECTED SUBGROUP – PSO RESULT ILLUSTRATION
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
USING GEOCODING TO MATCH TWO LISTS OF STORES
31
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
32
Name of Store Store Address
Name of Store Store Address
Key observaEons: • May have similar names
among store list • Address formats are
non-‐standardised Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
33
• Geocode List A and B addresses using Google API
• Plot standardized Geo-‐coordinates for visual view of overlaps
• Perform matching based on pairwise distance and store name
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
34
USEFUL PYTHON PACKAGES
• geopy: easy to geocode/reverse geocode through various geocoder APIs, and to compute geographical distances
• python-‐levenshtein: Levenshtein funcEon produces a metric for fuzzy string matching
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
35
Scale: 1 : 10e6
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
36
Scale: 1 : 50’000
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
37
QUESTION: HOW CAN WE OBJECTIVELY MEASURE EXPOSURE TO OUTDOOR ADVERTISING?
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
38
IN TODAY’S MEDIA ENVIRONMENT, THE EXPOSURES TO A MESSAGE PROVIDED BY OUTDOOR ADVERTISING ARE MORE VALUABLE THAN EVER. BECAUSE IT IS INCREASINGLY DIFFICULT TO GET MESSAGES NOTICED AND/OR REMEMBERED, THE UNCLUTTERED ENVIRONMENT IN WHICH OUTDOOR ADS ARE SEEN (OFTEN WITH HIGH FREQUENCY) HELPS TO OVERCOME PROBLEMS OF MEDIA FRAGMENTATION AND SELECTIVE PERCEPTION.
-‐-‐-‐ C.R. Taylor (2006)
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
39
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
40
MEASURING EXPOSURE TO OUTDOOR ADVERTISING
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
41
WE’RE USED TO THE IDEA OF ROUTE PLANNING…
…paths possible if enough digital breadcrumbs
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
42
HOW TO FIND WHEN A PERSON GOES PAST A BILLBOARD? • Looking into Google Maps API for Work and Google DirecEons (23 waypoints allowed)
• Inside a Python program pass a request like:hQps://maps.googleapis.com/maps/api/direcEons/json?origin=%221%20marnham%20street,%20brisbane,%20australia%22&desEnaEon=%22116%20daw%20street,%20brisbane,%20australia%22
• Returns a JSON object, with the (approximate) paths as polylines:
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
43
HOW TO ENCODE/DECODE THE POLYLINE?
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
44
USE A GIS: SEE WHERE TRAVEL LINES INTERCEPT BUFFERS…
…automate using Python
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
45
WHAT OTHER POSSIBLE DATA SOURCES COULD THERE BE?
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b
Copyrig
ht ©
2013
The
Nielse
n Co
mpany. Con
fiden
Eal and
proprietary.
46
QUESTIONS?
THANK YOU!
[email protected] [email protected] [email protected]
Nielse
n Sing
apore
Data S
cienc
e Inn
ovati
on La
b