WebVisions – ViziCities: Bringing Cities to Life Using Big Data
Bringing big data to life
Transcript of Bringing big data to life
Jeroen Hardon
Bringing big data to life
1.3Exabytes
2.9 Million
Per second
375 Megabytes
Per day
24Petabytes
Per day
50Million
Per day
700Billion
Minutes per month
73Items
Per second
Big data is everywhere
20Hours
Per minute
A journey in segmentation with data scientists and big data.
What was the problem?
What was the solution?
How well did it work?
Needs-basedsegmentation
7 segments created
Classifier tool build, using 10
questions
Original segmentation study
This resulted in a happy client.
“Let’s tag a segment to each person in our database of 40 million“
12.000 people from the database
answered the classifier questions
Those 12.000 were classified
in 1 of the 7 segments
Attitudinal segments not explained by
demographics
Attitudes ≠ Demographics
Revised segments should align better with
big data
Must predict original
segments in segmentation
study
Merging the 2 types of
data
New classification tool
The database and survey
demographics did not match
We build classifiersby matching survey
data to resemble the database
We generated many samples of our survey data and
built an ensemble of classifiers
Ensembles
While building ensembles of classifiers helped, it was still inadequate.
We needed to strengthen the demographic / behavioral signal
Expectation Maximization
?
Expectation Maximization
5
Expectation Maximization
How do I "assign" each of
the individual fruits to a tree
type?
What are the characteristics of the fruit of each
tree type?
Expectation Maximization
Expectation Maximization
Expectation Maximization
Observed DataInitial segmentation data
6500 respondents
Augment of 12000from Big Data
Knownfixed
segment
Unknown segment
+ Model 1
Expectation Maximization
Observed DataInitial segmentation data
6500 respondents
Augment of 12000from Big Data
Knownfixed
segment
Unknown segment
+Big data variables
Model 2
We got classifiers that were slightly less optimal in predicting survey data, but much more aligned with the big data.We made sure to not let the predictive accuracy drop below 70% (originally 80%)
How well did it work?
Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7Seg 1 564 84 15 56 36 14 18Seg 2 68 844 84 13 7 13 10Seg 3 33 72 561 2 3 1 5Seg 4 34 8 0 567 5 81 29Seg 5 27 12 1 6 635 50 57Seg 6 21 27 6 76 43 873 30Seg 7 18 28 9 50 59 52 1193
Initi
al c
lass
ifier
se
gmen
t
Revised classifier segment
Only 19% changed
Data Source: Survey Data of 6500
How well did it work?
Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7Seg 1 135 102 18 66 207 157 45Seg 2 119 545 171 58 174 203 101Seg 3 55 113 316 44 240 219 72Seg 4 90 67 4 283 233 287 69Seg 5 303 169 41 216 1994 925 205Seg 6 325 259 36 261 646 1591 127Seg 7 52 26 3 90 193 191 156
Initi
al c
lass
ifier
se
gmen
t
Revised classifier segment
Over 58% changed
Data Source: Augment of 12000
Conclusions
Big data cannot predict
everything
No need to be scared of big data.
Surveys and big data can coexist
Expectation maximization
provides a framework for joint modeling
So what?
So what?
So what?
So what?
Jeroen HardonDirector Methodology and Innovation EUBased in [email protected]
Contact us
skimgroup.com
@SKIMgroup
SKIMgroup
SKIMgroup
27
Kevin LatteryVP Methodology & InnovationBased in New [email protected]