Post on 12-Feb-2017
LARGE SCALE DECISION
SUPPORT SYSTEMS
ABOUT ME
DECISION
SUPPORT
SYSTEMS
ANALYTICAL
SYSTEMS DATABASE
SYSTEMS
VP Engineering, MarketShare DecisionCloud At Neustar
MODEL BASED DECISION SUPPORT SYSTEMS
New Client
Onboarding
Data
Management
Data Adapters
Modeling
Post
Processing
ETL
Discovery
Scenario
Analysis
I have my planned spend for next year.
What will my planned spend yield in
terms of my sales/revenue and how
does that compare with my
sales/revenue forecasts?
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
BI IS WELL UNDERSTOOD
FACT Geo DIM
Product DIM
Time DIM
Data : Spend in $, Number of impressions bought and the corresponding Google Query Volume for 1st Quarter 2015
Reporting What happened?
Analysis Why did it
happen?
Monitoring What’s
happening now?
Com
ple
xity
Business Value
Query, reporting & search tools
Dashboards, scorecards, listening, real time reporting
OLAP and visualization tools
Business Intelligence
Complex event processing; NLP; Text mining
Time series analysis, data mining, clustering
BI ANSWERS IMPORTANT QUESTIONS ABOUT THE PAST
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
IMPACT OF INCREASING SPEND BY 10% IN 3RD QUARTER?
FACT Geo DIM
Product DIM
Time DIM
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
TID_3 PID_1 GID_1 0 4679.25
TID_3 PID_2 GID_1 0 40392
TID_3 PID_3 GID_1 0 37986.5
TID_3 PID_1 GID_2 0 15206.5
TID_3 PID_2 GID_2 0 260338
TID_3 PID_3 GID_2 0 249003.25
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
3RD QUARTER SPEND COULD BE REPLICATED FROM 1ST
FACT Geo DIM
Product DIM
Time DIM
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_3 ALL ALL 419147300 607605.5
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
TID_3 PID_1 GID_1 0 4679.25
TID_3 PID_2 GID_1 0 40392
TID_3 PID_3 GID_1 0 37986.5
TID_3 PID_1 GID_2 0 15206.5
TID_3 PID_2 GID_2 0 260338
TID_3 PID_3 GID_2 0 249003.25
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
TOTAL SPEND COULD BE CALCULATED
FACT Geo DIM
Product DIM
Time DIM
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_3 ALL ALL 419147300 607605.5
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
TID_3 PID_1 GID_1 0 4679.25
TID_3 PID_2 GID_1 0 40392
TID_3 PID_3 GID_1 0 37986.5
TID_3 PID_1 GID_2 0 15206.5
TID_3 PID_2 GID_2 0 260338
TID_3 PID_3 GID_2 0 249003.25
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
TO GET THE GQV VALUE ONE HAS TO BUILD PREDICTIVE
MODELS FACT
Geo DIM Product DIM
Time DIM
Reporting What happened?
Analysis Why did it
happen?
Monitoring What’s
happening now?
Prediction What might
happen
Decision What Should
I do now?
Com
ple
xity
Business Value
Query, reporting & search tools
Dashboards, scorecards, listening, real time reporting
OLAP and visualization tools
Predictive analytics
Decision support and management
Business Intelligence
Complex event processing; NLP; Text mining
Time series analysis, predictive modeling, ensemble modeling, machine learning
Constraint based optimization; choice modeling; decision trees
Time series analysis, data mining, clustering
NEXT-GEN ANALYTICS DRIVES DECISION MAKING
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
TO BUILD MODELS FROM DATA
FACT Geo DIM
Product DIM
Time DIM
THE DATA IS FLATTENED OUT
DENORMALIZED FLATTENED DATA
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
1/1/2015 COMP ALL 176843600 19885.75 8092237
1/1/2015 DIGI ALL 185465400 300730 3691062
1/1/2015 GAME ALL 56838300 286989.75 421021
1/1/2015 COMP GL 0 4679.25 8091825
1/1/2015 DIGI GL 0 40392 3684004
1/1/2015 GAME GL 0 37986.5 414270
1/1/2015 COMP MW 0 15206.5 412
1/1/2015 DIGI MW 0 260338 7058
1/1/2015 GAME MW 0 249003.25 6751
DATA IS GROUPED INTO SETS CALLED FEATURES
TIME PRODUCT REGION VARIABLE # IMPRESSIONS SPEND ($$) GQV
1/1/2015 COMP ALL TV_PRD_IM 176843600 19885.75 8092237
1/1/2015 DIGI ALL TV_PRD_IM 185465400 300730 3691062
1/1/2015 GAME ALL TV_PRD_IM 56838300 286989.75 421021
1/1/2015 COMP GL TV_LOCAL_PRD_SP 0 4679.25 8091825
1/1/2015 DIGI GL TV_LOCAL_PRD_SP 0 40392 3684004
1/1/2015 GAME GL TV_LOCAL_PRD_SP 0 37986.5 414270
1/1/2015 COMP MW TV_LOCAL_PRD_SP 0 15206.5 412
1/1/2015 DIGI MW TV_LOCAL_PRD_SP 0 260338 7058
1/1/2015 GAME MW TV_LOCAL_PRD_SP 0 249003.25 6751
DENORMALIZED FLATENNED DATA
Feature selection, also known as variable selection, attribute selection or variable
subset selection, is the process of selecting a subset of relevant features for use in
model construction.
FEATURES ARE ASSEMBLED INTO AN EQUATION
TIME PRODUCT REGION VARIABLE # IMPRESSIONS SPEND ($$) GQV
1/1/2015 COMP ALL TV_PRD_IM 176843600 19885.75 8092237
1/1/2015 DIGI ALL TV_PRD_IM 185465400 300730 3691062
1/1/2015 GAME ALL TV_PRD_IM 56838300 286989.75 421021
1/1/2015 COMP GL TV_LOCAL_PRD_SP 0 4679.25 8091825
1/1/2015 DIGI GL TV_LOCAL_PRD_SP 0 40392 3684004
1/1/2015 GAME GL TV_LOCAL_PRD_SP 0 37986.5 414270
1/1/2015 COMP MW TV_LOCAL_PRD_SP 0 15206.5 412
1/1/2015 DIGI MW TV_LOCAL_PRD_SP 0 260338 7058
1/1/2015 GAME MW TV_LOCAL_PRD_SP 0 249003.25 6751
FLATTENED DENORMALIZED DATA
Model Equation LOG(GQV_PD + 1) := TV_PRD_IM_LOGC*C(1) + TV_LOCAL_PRD_SP_LOGC*C(2)
CORRESPONDING COEFFICIENTS ARE ESTIMATED
TIME PRODUCT REGION VARIABLE # IMPRESSIONS SPEND ($$) GQV COEFF_VALUE
1/1/2015 COMP ALL TV_PRD_IM 176843600 19885.75 8092237 0.045756241
1/1/2015 DIGI ALL TV_PRD_IM 185465400 300730 3691062 0.01985766
1/1/2015 GAME ALL TV_PRD_IM 56838300 286989.75 421021 0.007270448
1/1/2015 COMP GL TV_LOCAL_PRD_SP 0 4679.25 8091825 0.027113343
1/1/2015 DIGI GL TV_LOCAL_PRD_SP 0 40392 3684004 0.027113343
1/1/2015 GAME GL TV_LOCAL_PRD_SP 0 37986.5 414270 0.027113343
1/1/2015 COMP MW TV_LOCAL_PRD_SP 0 15206.5 412 0.027113343
1/1/2015 DIGI MW TV_LOCAL_PRD_SP 0 260338 7058 0.027113343
1/1/2015 GAME MW TV_LOCAL_PRD_SP 0 249003.25 6751 0.027113343
Model Equation
FLATTENED DENORMALIZED DATA
LOG(GQV_PD + 1) := TV_PRD_IM_LOGC*C(1) + TV_LOCAL_PRD_SP_LOGC*C(2)
CALCULATIONS ARE MATRIX OPERATIONS ON THE MODEL
AND THE COEFFICIENTS
0.045756241
0.01985766
0.007270448
0.027113343
PRD=COMP & GEO = GL
PRD=DIGI & GEO = GL
PRD=GAME & GEO = GL
PRD=COMP & GEO = MW
PRD=DIGI & GEO = MW
PRD=GAME & GEO = MW
C1, PRD=COMP
C1, PRD=DIGI
C1, PRD=GAME
C2, PRD=ALL
GQV_PD
8091825
3684004
414270
412
7058
6751
TV_PRD_IM TV_PRD_IM TV_PRD_IM TV_LOCAL_PRD_SP
176843600 0 0 4679.25
0 185465400 0 40392
0 0 56838300 37986.5
176843600 0 0 15206.5
0 185465400 0 260338
0 0 56838300 249003.3
Model Input Coeff. Stack Outcome
NUMBER OF COLUMNS IS NUMBER OF COEFFICIENTS NUMBER OF ROWS IS NUMBER OF DISTINCT COMBINATIONS OF DIMENSIONS
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
FACT Geo DIM
Product DIM
Time DIM
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_3 ALL ALL 419147300 607605.5
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
TID_3 PID_1 GID_1 0 4679.25
TID_3 PID_2 GID_1 0 40392
TID_3 PID_3 GID_1 0 37986.5
TID_3 PID_1 GID_2 0 15206.5
TID_3 PID_2 GID_2 0 260338
TID_3 PID_3 GID_2 0 249003.25
IMPACT OF INCREASING SPEND BY 10% IN 3RD QUARTER?
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_3 ALL ALL 419147300 668366.05
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
TID_3 PID_1 GID_1 0 5147.18
TID_3 PID_2 GID_1 0 44431.2
TID_3 PID_3 GID_1 0 41785.15
TID_3 PID_1 GID_2 0 16727.15
TID_3 PID_2 GID_2 0 286371.8
TID_3 PID_3 GID_2 0 273903.6
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
FACT Geo DIM
Product DIM
Time DIM
1ST STEP : DISTRIBUTE
DISTRIBUTE THE SPEND AT THE LEVEL WHERE THE MODEL IS DEFINED
TIME PRODUCT REGION VARIABLE # IMPRESSIONS SPEND ($$)
7/1/2015 ALL ALL M_TV_N_BRD_SP 419147300 668366.05
7/1/2015 COMP ALL M_TV_N_PD_IM 176843600
7/1/2015 DIGI ALL M_TV_N_PD_IM 185465400
7/1/2015 GAME ALL M_TV_N_PD_IM 56838300
7/1/2015 COMP GL M_TV_L_P_SP 0 5147.18
7/1/2015 DIGI GL M_TV_L_P_SP 0 44431.2
7/1/2015 GAME GL M_TV_L_P_SP 0 41785.15
7/1/2015 COMP MW M_TV_L_P_SP 0 16727.15
7/1/2015 DIGI MW M_TV_L_P_SP 0 286371.8
7/1/2015 GAME MW M_TV_L_P_SP 0 273903.6
2ND STEP : CALCULATE
TV_PRD_IM TV_PRD_IM TV_PRD_IM TV_LOCAL_PRD_SP
176843600 0 0 5147.18
0 185465400 0 44431.2
0 0 56838300 41785.15
176843600 0 0 16727.15
0 185465400 0 286371.8
0 0 56838300 273903.6
GQV_PD
88091838
3684114
414372.8
453.23
7763.86
7426.13
TIME PRODUCT REGION # IMPRESSIONS SPEND ($$) GQV
TID_3 ALL ALL 419147300 668366.05 92205968
TID_1 PID_1 ALL 176843600 19885.75 8092237
TID_1 PID_2 ALL 185465400 300730 3691062
TID_1 PID_3 ALL 56838300 286989.8 421021
TID_1 PID_1 GID_1 0 4679.25 8091825
TID_1 PID_2 GID_1 0 40392 3684004
TID_1 PID_3 GID_1 0 37986.5 414270
TID_1 PID_1 GID_2 0 15206.5 412
TID_1 PID_2 GID_2 0 260338 7058
TID_1 PID_3 GID_2 0 249003.25 6751
TID_3 PID_1 GID_1 0 5147.18 88091838
TID_3 PID_2 GID_1 0 44431.2 3684114
TID_3 PID_3 GID_1 0 41785.15 414372.8
TID_3 PID_1 GID_2 0 16727.15 453.23
TID_3 PID_2 GID_2 0 286371.8 7763.86
TID_3 PID_3 GID_2 0 273903.6 7426.13
ID CODE
PID_1 COMP
PID_2 DIGI
PID_3 GAME
ID CODE
GID_1 GL
GID_2 MW
ID CODE
TID_1 1/1/2015
TID_2 4/1/2015
TID_3 7/1/2015
FACT Geo DIM
Product DIM
Time DIM
3RD STEP : AGGREGATE
AGGREGATE THE OUTPUT AT THE DESIRED LEVEL
Publisher
Marketing Driver
Tactic
Creative Concept
Geo
Time
Campaign
Year Quarter Month Week
Cube
Product
Non-Marketing Driver
Coupon redemption
Discounts
Macro economics
Pricing
Tabs
Weather
eCircular
Measures
National
Central
Dallas
Houston
Great Lakes
Chicago
Cincinnati
Northeast
Boston
New York
Southeast
Atlanta
Charlotte
West
Denver
Los Angeles
Seattle
Placement
Online Media Channel
Offline Media Channel
Social
Paid Social
Other
Affiliate
Display
Mobile
Video
Desktop
Other
Paid Search
Branded
Non Branded
Audio
Magazine
Radio
TV
Leads
Tab
Product Listing
Services Directory
All Products
Dept 21
D21 Core
D21 Fencing
Dept 22
D22 Concrete
Dept 23F
D23F Area Rugs
D23F Carpeting
Dept 24
D24 Applicators
D24 Caulks/Tape/Oth
Dept 59
D59 Decor/Furniture
D59 Organization
D59 Window Coverings
866
1590948
23910
336
15790
28
82
156
33
23
Assists Orders Revenue Clicks Events Impressions Spend
Last touches Click converting rate Converting click
131M granular rows
Dimension
Dimension Publisher
Marketing Driver
Tactic
Creative Concept
Geo
Time
Campaign
Year Quarter Month Week
Cube
Product
Non-Marketing Driver
Coupon redemption
Discounts
Macro economics
Pricing
Tabs
Weather
eCircular
Measures
National
Central
Dallas
Houston
Great Lakes
Chicago
Cincinnati
Northeast
Boston
New York
Southeast
Atlanta
Charlotte
West
Denver
Los Angeles
Seattle
Placement
Online Media Channel
Offline Media Channel
Social
Paid Social
Other
Affiliate
Display
Mobile
Video
Desktop
Other
Paid Search
Branded
Non Branded
Audio
Magazine
Radio
TV
Leads
Tab
Product Listing
Services Directory
All Products
Dept 21
D21 Core
D21 Fencing
Dept 22
D22 Concrete
Dept 23F
D23F Area Rugs
D23F Carpeting
Dept 24
D24 Applicators
D24 Caulks/Tape/Oth
Dept 59
D59 Decor/Furniture
D59 Organization
D59 Window Coverings
866
1590948
23910
336
15790
28
82
156
33
23
Assists Orders Revenue Clicks Events Impressions Spend
Last touches Click converting rate Converting click
131M granular rows
2
Digital manager gets an additional 10% marketing
budget to spend in Q4 2015 on online media and
wants to understand its’ effect.
Online ($)
Q4 2015 40,500,000 Spend view seen by
the user in application
After increasing the spend
by the user
Scoring and generate KPI
view
Revenue ($) Profit($)
3% 8%
Online ($)
Q4 2015 44,550,000
2
SCENARIO ANALYSIS (AS SEEN BY THE USER)
Scen
ari
o C
alc
ula
tio
n T
ime i
n S
eco
nd
s
4.5 B Data points
600MM Variables
170 MM Data points
90,000 Variables
20 MM Data points
7,000 Variables
2MM Data points
1,200 Variables
Representative
deployment
Modeling scale
EXPONENTIAL IMPROVEMENT IN SCENARIO CALCULATION
BANK CO. FINANCE CO. TELECOM CO. TRAVEL CO.
2010 2012 2014 2015
2010 2012 2014 2015Strategist Data Mgmt. Modeler Ops Support
Software has driven a decline in
service requirements automation
Hours
4.5 B Data points
600MM Variables
170 MM Data points
90,000 Variables
20 MM Data points
7,000 Variables
2MM Data points
1,200 Variables
Total
hours
per
variable
Representative
deployment
Modeling scale
BANK CO. FINANCE CO. TELECOM CO. TRAVEL CO.
The size of models has grown significantly
Yet the deployment effort required
has decreased significantly
LOT LESS EFFORT FOR MUCH LARGER DEPLOYMENTS
Distributed
Cache
Tool
Application
Engines
Calculation
Engines
Execution
Systems
Elastic Load
Balancer
Client Onboarding
Model
Store Config
Store
Attribution Funnel creation Post Processing
Orchestrator
Metadata Store
Modeling Stack Transformation
Stack ETL Configurations ETL
Model UAT
Attribution Models
Evaluate
Automated Model Generation
MSDECISION CLOUD ANALYTICS WORKFLOW
STAND ON THE SHOULDER OF GIANTS
Distributed
Cache
Tool
Application
Engines
Calculation
Engines
Execution
Systems
Elastic
Load
Balancer
Client Onboarding
Model
Store Config
Store
Attribution Funnel creation Post Processing
Orchestrator
Metadata Store
Modeling
Stack
Transformation
Stack ETL Configurations ETL
Model UAT
Attribution Models
Evaluate
Automated Model Generation
KEY FOCUS AREAS
o Configuration Driven Platform
o All Modules run via Configurations
oWill use Metadata to automatically fill in Configurations
o Real-time Simulation Engine
o Real-time change propagation of changes to modeling stack-frames
o Supporting infrastructure
oConstraint Engine
oCollaboration
Questions &
Discussion