MODELS & DATA
-
Upload
mckayla-halligan -
Category
Documents
-
view
56 -
download
0
description
Transcript of MODELS & DATA
1
MODELS & DATA
1. A Four-Box Model of a DSS / BI System
2. Implicit vs Explicit Models
3. Typologies of Models
4. Types of Data
5. The Model-Data Interdependency
6. Is Quality Data Worth It?
7. A Predictive Model for Evaluating Pricing Policies
2
A FOUR-BOX MODEL OFA DSS/BI SYSTEM
MANAGER
DECISIONMODELS
USERINTERFACE
STATISTICALANALYSIS
DATABASE
ENVIRONMENT
3
ANALYSIS OF DATA
Most frequently used operations are simple :Segregating data into groups
Aggregating dataMaking comparisonsTaking ratiosPicking out exceptionsRanking, Plotting, Making tables, etc.Standard statistical packages for
:Time series analysis
Moving averagesExponential smoothing Seasonal adjustmentsTrend curvesRegression analysis, etc.
4
MODELS AND DATA
Good data are vital ... but data fordata's sake is a worthless luxury
John D.C. Little
Models provide a framework foridentifying what data should becollected and how it should beprocessed once obtained
David Montgomery & Glen Urban
5
What is a Model?
Whenever a manager (or anybody else) looks at data, he or she has a preconceived idea of how the world works and therefore of what is interesting or worthwhile in the data. We shall call such ideas models.
John D. C. Little
Models provide the means for converting data into actionable information...
6
WHAT IS A MODEL ?
A model is the decision-maker's
perception of how something works
All decisions are based on some kind of model
PROBLEM
DATA ANALYSIS
ACTIONABLE INFORMATION
MODEL
7
IMPLICIT vs EXPLICIT MODELS
Implicit Models (or Mental Models)- Models carried in people's heads
Explicit Models- Prose Models- Flow Models- Mathematical Models
Key IssuesWhy do managers use implicit models ?What are the benefits of explicating an implicit model ?What problems are encountered when explicating an implicit model ?
8
A Typology of Models- What is the Purpose?
• Descriptive Models– Describes how something works
• Predictive Models– Provides “what if” information
• Normative Models– Prescribes the “best” solution to the problem
9
A Typology of Models- How is the Real World Represented?
1. How the model is formulated?Linear vs. Non-linear Models
How time is handled?Static vs. Dynamic Models
How risk is handled?Deterministic vs. Stochastic Models
At what level of detail?Micro vs. Macro Models
10
A Typology of Models- How is the Model Analyzed?
• Optimization Models–Determines the “best” values for the
decision variables in the models
• Simulation Models–Evaluates consequences of
alternative decisions
11
"SATISFICING" vs. "OPTIMIZING"IN DECISION-MAKINGChoose a solution that is good enough using manager's rules of thumb or
heuristics.
Benefits: Saves time and cost Easy to implement
versusSearch for the best solution using an
optimizing model
Problems: Model may not fit the problem More data needed
More time and cost Higher intellectual cost
12
TYPES OF DATA
1. SECONDARY DATA- Readily available data
2. PRIMARY DATA- Data generated for the problem at hand3. JUDGEMENTAL DATA - Data based on experience, knowledge and judgement
13
ITERATIVE PROCESS OF BUILDING MODELS1. Define the Problem to be Addressed by the
Model2. List Relevant Factors - Do not worry about
Data3. Select the Most Critical
Factors
4. Link the Selected Factors
5. Obtain the Required Data
6. Develop the System
7. Validate the Output from the System
8. Sensitivity Analysis of the Output from the System
14
RIMMS: A Model-Based System For Efficient Routing & Scheduling
Whirlpool- Schedules service calls of all technicians from a single site in Knoxville, Tennessee
Oakwood Medical Labs, Detroit- Arranges the 800 stops of 26 drivers each day to pick up blood samples from, and drop-off time-sensitive results to, 1000 clinics and hospitals
Sleepy’s - A Mattress Chain in Bethage, N.Y.- Promises quicker home delivery than its competition
Homemakers, a Furniture Superstore in Des Moines, Iowa- Offers a two-hour window on next-day home delivery- Previously, “it would take two days to prepare the schedules and, even though we used to give a 4-hour delivery window, maybe we made it on time or maybe not.
Source: Wall Street Journal, Apr 2, 1998
15
Biggest Strength: Good Data
Uses detailed street maps and other data affecting schedules, e.g.: Toll gates and posted speed limits
Users add data on scheduled stops, pickups and individual customer time-demands
Model calculates the best way to manage a day’s deliveries and pick-ups
Users can incorporate soft-data on other relevant factors, for example:- courier pick-ups take several minutes longer than drop-offs, a devilish problem that can throw off schedules- how a storm the previous night can slow driving speeds
16
"BAD" vs "GOOD" MODELS
Models that are simply wrong. - e.g. linear model of sales to advertising
Models that are too big.- require too much data- "larger" is not always "better"
What is a "good" model ?easy to understandcomplete on important issuesjust enough detail for operational accuracyjudicious use of all types of data
17
EVALUATING MODELS
What are the objectives of the model ?
What is the scope of the model ?
What data will be used ?
How was the model validated ?
How sensitive is the output to:
- data inputs
- model structure
- analysis techniques
What significant factors have been excluded ?
18
The Model-Data Interdependency
The “Chicken or Egg” Question -- An Approach
• Build the simplest model
• Use judgmental data if necessary
• Test sensitivity of the information
• Get better data
• Or, improve the model
DATA MODEL
Constrained by Available Data
Specifies Data Requirements
19
An Example: Forecasting Sales
• Time Series Models (e.g., Moving Averages, Exponential Smoothing)
– Data readily available– Straightforward models– BUT ...
Ignore what causes sales
• Regression Models– Better because they link sales to “explanatory” variables– However ...
... Which variables? Cost of Data?
... What type of relationship?
... Accuracy of projections of the explanatory
variables?
23
A Data Warehouse is Not Enough Because ......Managers Ask for Analysis, Not Retrieval
Sometimes retrieval questions come up of course, but most often the answers to important questions require non-trivial manipulation of stored data. Knowing this tells us much about the kind of software required. For example, a database management system is not enough.
- John Little (1979)
“Data” has to be converted into “Information” that triggers managerial action.
The conversion process is critical to get value from the data warehouse.
24
Models Help in Data Conversion
• A framework for identifying what data should be collected and how it should be processed–Avoids the “completeness” trap in building a
data warehouse
• A “good” model...…simple…complete on important issues…just enough detail for operational accuracy…judicious use of hard and soft data
25
Better Models Require . . .
. . . More Data
. . . More Time to Develop
. . . And, Cost More Not just $ but the Intellectual Cost
People tend to reject what they do not understand. The manager carries responsibility for outcomes. We should not be surprised if he prefers a simple analysis that he can grasp, even through it may have qualitative structure, broad assumptions, and only a little relevant data, to a complex model whose assumptions may be partially hidden or couched in jargon and whose parameters may be the result of obscure statistical manipulations. - John Little (1970)
26
How to Assess Cost-Effectiveness of Data- A Pragmatic Approach
Design a Prototype scaled to the barest minimum
Collect data for the Prototype
- Lowest data cost
Develop Prototype using real data
Users evaluate benefits of system
StopFull-blown
SystemValue vs
Cost?“No Go” “Go”
27
Case Example:A Consumer Packaged Goods Company
System Objective: To evaluate sales impact of trade promotions
Data Problem: Serious gaps in operational data
Available data on promotions: How much was spent When the bills were paid
Missing key data: When were the promotions run
...to correlate with sales data
Issue: Data problem is solvable in principle
But... Is it worth the effort and cost?
28
The Low-Cost Prototype- To Assess Value of Data
• Model limited to the core variables– sales, promotion expenditures and dates, margins
• Detailed data needed for useful information– by packs for each brand and by markets
– weekly data for capturing sales fluctuations
– two years of data to compare pre- with post-deal sales levels
• Cost of data– Manual effort to extract dates of promotions from logbooks
• Barest-minimum Prototype– 2 brands, a major brand and a new brand
– 8 markets (out of 50), 3 large, 3 medium and 2 small
• Results– Demonstrated the value of collecting the missing data and building
an integrated database
– Led to the development of a promotion-event calendar system
29
Gaps in Operational Data:A Perennial Problem -- Why?
Because of the narrow focus of operational systems
Operational systems are an important source of data for decision support
Design of operational systems must incorporate data requirements of management support systems
An Example:
When implementing new Human Resource Information Systems (e.g., PeopleSoft), are the data requirements of human resource management considered? For evaluating hiring sources? Career development? Etc.
30
The Product Pricing Problem
• Critical Problem for ALL Enterprises– Private Sector and Public Sector
• Predicting Customer Response is Difficult– Past behavior is of limited value
– Competitor’s reactions to “our” price is unpredictable
• Even More Difficult in the Public Sector–Bottom-line impact is not enough
–Must consider: Who is affected? How?
31
Price and Demand Relationships Are Complex
• Highly non-linear
• Exhibit “threshold effects”
• Delayed response
• Price is only one factor -- other decision variables (e.g., distribution, promotion) interact with price to affect demand
• External factors, about which we have imperfect information, impact pricing decisions
32
The Transit Pricing Problem
• Current Fare Structure – Essentially a “flat” fare
– Insensitive to distance traveled
• Inequities of Present Fare Structure– Favors long trips at the expense of short ones
– Long-distance riders -- mostly suburban commuters with relatively high incomes.
– Short-distance riders -- mostly urban residents traveling off-peak for discretionary purposes
– Thus, distance inequities often imply social inequities
33
Why Consider Distance-Based Fares?
• Evens out the fare per mile paid by all riders– e.g., with a 25 cent Flat Fare:
» Rider #1 travels 1 mile and pays 25 cents per mile» Rider #2 travels 5 miles and pays 5 cents per mile
• Drawback of Flat Fares: Long-distance riders being subsidized by short-distance riders
• Potential of Distance-Based Fares to:– Reduce inequities in fare per mile– Increase revenue
34
Macro Models for Demand Forecasting - The Conventional Tool
• Operate on aggregate data
• Relate a measure of travel demand to a set of explanatory (“independent”) variables
– Measures of travel demand:
» # of passengers or # of trips
– Explanatory variables:
» Demographic variables (e.g., median income), trip characteristics (e.g., peak/off-peak), and decision variables (e.g., fares)
35
Macro Models versus Micro Models
• Macro Models are useless for evaluating who is affected by a change in transit fares
– For example:
» Would a price increase hurt inner city residents more or less than suburban commuters?
» Would loss in patronage be greater off-peak than peak?
» Would a lower fare benefit work trips? Shopping trips?
• A Micro Model at the level of the individual rider is needed to handle the variety of ridership characteristics such as age, income, place of residence, time and purpose of travel, etc.
36
Micro - Simulation Model
• The Micro Model focuses on the behavior of the individual rider: how is his/her transit usage affected by a fare change?
• The “what if” forecasts for the individual riders are then aggregated by age, income, purpose of trip, etc. to show what groups of riders would be affected by the fare change.
37
Gist of the Micro Model for Transit Pricing
1. Travel demand of a rider would change in a manner governed by the fare elasticity appropriate to that rider.
2. Forecast transit usage and revenue for individual riders in the sample survey.
3. Weight the individual rider’s figures by an expansion factor to project the results to the population.
4. Aggregate the weighted figures by the desired ridership categories to assess the revenue and equity effects.
38
1. Model is complete on important factors that affect demand -- income, age, purpose of trip, time of travel, etc. are all represented in the individual riders in the sample -- the “Micro” approach
2. The “what if” demand for a new fare policy is determined through the fare elasticity appropriate for that rider -- the “Simulation” approach
Merits of the Micro-Simulation Approach
39
Merits of the Micro-Simulation Approach
3. The micro-simulation results can be subsequently aggregated by any desired rider characteristic for the equity analysis
4. Model is easy to understand -- critical since user will not risk using it for pricing decisions; even more so when a multiplicity of parties are involved as in transit pricing
40
Design of the Transit Pricing Model
• Conventional wisdom: “the bigger the better”
• Problem: The more elaborate the model, the more data needed to set up the model
• For the model to be useful, it should be:– Simple enough for transit managers to readily understand
but not simplistic
– Complete on important issues for a valid assessment of the impact of new fare policies
– A model that does not rely on historical data for calibration
– Generating outputs that the user finds easy to interpret
41
What is the Model?
• Forecast Usage for Rider # 1 = Present Usage of Rider #1 + (Elasticity of Rider # 1 * Fare Change Ratio)
• Above equation adjusts the current demand through a ratio based on the fare elasticity that is appropriate for that rider
• Micro-simulation is better than a macro regression model in an important way -- the model is robust because reasonable values for the elasticity will not yield unreasonable values for forecast demand
42
An Example
• Individual X uses the travel system 5 times per week paying a flat fare of 25 ¢ and traveling a distance of 5 miles per trip
• Proposed distance-based fare policy: a base fare of 10¢ and a 5¢ increment per mile
• New fare for this rider is 35 ¢ per trip
• % change in fare paid by this rider = (10 ¢/25 ¢) x 100 = 40%
• % change in frequency of ridership = (% change in fare paid) x EE = “fare elasticity of demand” = % change in demand for a 1% change in fare
• e.g., an E value of -.25 implies that a 1% increase in fare will reduce demand by .25%
• Hence, for the 40% increase in fare paid by this rider under the new policy, the percent reduction in demand is predicted to be 10%
43
Key Features of the Model
1. Different fare elasticities can be applied to individual riders, thus making the model complete on important factors that affect travel demand
2. Calibration of the model involves the estimation of only one parameter - fare elasticity
3. To simplify the calibration, segment the sample of riders into groups that are expected to have the same elasticity
4. Since fare elasticity has a clear operational meaning, it is feasible for the transit managers to judgmentally segment the market and estimate fare elasticities for each segment
44
Decision Calculus Concept
• A model-based set of procedures for processing data and judgments to assist a manager in decision making
• Enables more policy alternatives to be examined than if the manager relied on judgment alone
• Uses sensitivity analysis to test the robustness of the conclusions with regard to the soft data inputs used in the analysis
• Key element of this concept is its approach to calibration: Use the manager’s judgment, especially when available data are either inadequate or dirty
45
How the Model Works
1. For an individual rider in the sample survey:– The model calculates the % change in frequency of
ridership for the proposed fare change based on the elasticity appropriate for that rider
– The model applies this % change to current weekly frequency of ridership to obtain predicted new frequency with the proposed policy
– The model calculates the fare paid per trip under the new policy and the predicted weekly revenue for the individual rider
46
How the Model Works
2. Predicted ridership and revenue figures for each rider are expanded by suitable factors to project the sample to the ridership population
3. The expanded ridership and revenue figures are then aggregated according to income, age, etc.
4. Computer output includes % changes in ridership and revenue to facilitate “before” and “after” comparisons
47
Why the Model Works
• Crux of the model: fare elasticity which can be judgmentally estimated by managers using historical estimates, if available, as a first cut.
• Since all riders in the population do not react in the same way to fare changes, the population should be first subdivided into segments whose members are expected to be fairly similar in terms of their responses to fare changes
• Since elasticity estimates are soft, sensitivity analysis has to be done using multiple elasticity values to select a fare policy that performs in a satisficing manner with the range of estimates used