MODELS & DATA

1

MODELS & DATA

1. A Four-Box Model of a DSS / BI System

2. Implicit vs Explicit Models

3. Typologies of Models

4. Types of Data

5. The Model-Data Interdependency

6. Is Quality Data Worth It?

7. A Predictive Model for Evaluating Pricing Policies

2

A FOUR-BOX MODEL OFA DSS/BI SYSTEM

MANAGER

DECISIONMODELS

USERINTERFACE

STATISTICALANALYSIS

DATABASE

ENVIRONMENT

3

ANALYSIS OF DATA

Most frequently used operations are simple :Segregating data into groups

Aggregating dataMaking comparisonsTaking ratiosPicking out exceptionsRanking, Plotting, Making tables, etc.Standard statistical packages for

:Time series analysis

Moving averagesExponential smoothing Seasonal adjustmentsTrend curvesRegression analysis, etc.

4

MODELS AND DATA

Good data are vital ... but data fordata's sake is a worthless luxury

John D.C. Little

Models provide a framework foridentifying what data should becollected and how it should beprocessed once obtained

David Montgomery & Glen Urban

5

What is a Model?

Whenever a manager (or anybody else) looks at data, he or she has a preconceived idea of how the world works and therefore of what is interesting or worthwhile in the data. We shall call such ideas models.

John D. C. Little

Models provide the means for converting data into actionable information...

6

WHAT IS A MODEL ?

A model is the decision-maker's

perception of how something works

All decisions are based on some kind of model

PROBLEM

DATA ANALYSIS

ACTIONABLE INFORMATION

MODEL

7

IMPLICIT vs EXPLICIT MODELS

Implicit Models (or Mental Models)- Models carried in people's heads

Explicit Models- Prose Models- Flow Models- Mathematical Models

Key IssuesWhy do managers use implicit models ?What are the benefits of explicating an implicit model ?What problems are encountered when explicating an implicit model ?

8

A Typology of Models- What is the Purpose?

• Descriptive Models– Describes how something works

• Predictive Models– Provides “what if” information

• Normative Models– Prescribes the “best” solution to the problem

9

A Typology of Models- How is the Real World Represented?

1. How the model is formulated?Linear vs. Non-linear Models

How time is handled?Static vs. Dynamic Models

How risk is handled?Deterministic vs. Stochastic Models

At what level of detail?Micro vs. Macro Models

10

A Typology of Models- How is the Model Analyzed?

• Optimization Models–Determines the “best” values for the

decision variables in the models

• Simulation Models–Evaluates consequences of

alternative decisions

11

"SATISFICING" vs. "OPTIMIZING"IN DECISION-MAKINGChoose a solution that is good enough using manager's rules of thumb or

heuristics.

Benefits: Saves time and cost Easy to implement

versusSearch for the best solution using an

optimizing model

Problems: Model may not fit the problem More data needed

More time and cost Higher intellectual cost

12

TYPES OF DATA

1. SECONDARY DATA- Readily available data

2. PRIMARY DATA- Data generated for the problem at hand3. JUDGEMENTAL DATA - Data based on experience, knowledge and judgement

13

ITERATIVE PROCESS OF BUILDING MODELS1. Define the Problem to be Addressed by the

Model2. List Relevant Factors - Do not worry about

Data3. Select the Most Critical

Factors

4. Link the Selected Factors

5. Obtain the Required Data

6. Develop the System

7. Validate the Output from the System

8. Sensitivity Analysis of the Output from the System

14

RIMMS: A Model-Based System For Efficient Routing & Scheduling

Whirlpool- Schedules service calls of all technicians from a single site in Knoxville, Tennessee

Oakwood Medical Labs, Detroit- Arranges the 800 stops of 26 drivers each day to pick up blood samples from, and drop-off time-sensitive results to, 1000 clinics and hospitals

Sleepy’s - A Mattress Chain in Bethage, N.Y.- Promises quicker home delivery than its competition

Homemakers, a Furniture Superstore in Des Moines, Iowa- Offers a two-hour window on next-day home delivery- Previously, “it would take two days to prepare the schedules and, even though we used to give a 4-hour delivery window, maybe we made it on time or maybe not.

Source: Wall Street Journal, Apr 2, 1998

15

Biggest Strength: Good Data

Uses detailed street maps and other data affecting schedules, e.g.: Toll gates and posted speed limits

Users add data on scheduled stops, pickups and individual customer time-demands

Model calculates the best way to manage a day’s deliveries and pick-ups

Users can incorporate soft-data on other relevant factors, for example:- courier pick-ups take several minutes longer than drop-offs, a devilish problem that can throw off schedules- how a storm the previous night can slow driving speeds

16

"BAD" vs "GOOD" MODELS

Models that are simply wrong. - e.g. linear model of sales to advertising

Models that are too big.- require too much data- "larger" is not always "better"

What is a "good" model ?easy to understandcomplete on important issuesjust enough detail for operational accuracyjudicious use of all types of data

17

EVALUATING MODELS

What are the objectives of the model ?

What is the scope of the model ?

What data will be used ?

How was the model validated ?

How sensitive is the output to:

- data inputs

- model structure

- analysis techniques

What significant factors have been excluded ?

18

The Model-Data Interdependency

The “Chicken or Egg” Question -- An Approach

• Build the simplest model

• Use judgmental data if necessary

• Test sensitivity of the information

• Get better data

• Or, improve the model

DATA MODEL

Constrained by Available Data

Specifies Data Requirements

19

An Example: Forecasting Sales

• Time Series Models (e.g., Moving Averages, Exponential Smoothing)

– Data readily available– Straightforward models– BUT ...

Ignore what causes sales

• Regression Models– Better because they link sales to “explanatory” variables– However ...

... Which variables? Cost of Data?

... What type of relationship?

... Accuracy of projections of the explanatory

variables?

Trends in Rx Sales vs Symptoms

R

Our Promotions vs Comp. Promotions

Actual vs Predicted Rx Sales

Rx Sales = 527 + 0.13*Symptoms + 74*(Our Prom / Comp Prom)

23

A Data Warehouse is Not Enough Because ......Managers Ask for Analysis, Not Retrieval

Sometimes retrieval questions come up of course, but most often the answers to important questions require non-trivial manipulation of stored data. Knowing this tells us much about the kind of software required. For example, a database management system is not enough.

- John Little (1979)

“Data” has to be converted into “Information” that triggers managerial action.

The conversion process is critical to get value from the data warehouse.

24

Models Help in Data Conversion

• A framework for identifying what data should be collected and how it should be processed–Avoids the “completeness” trap in building a

data warehouse

• A “good” model...…simple…complete on important issues…just enough detail for operational accuracy…judicious use of hard and soft data

25

Better Models Require . . .

. . . More Data

. . . More Time to Develop

. . . And, Cost More Not just $ but the Intellectual Cost

People tend to reject what they do not understand. The manager carries responsibility for outcomes. We should not be surprised if he prefers a simple analysis that he can grasp, even through it may have qualitative structure, broad assumptions, and only a little relevant data, to a complex model whose assumptions may be partially hidden or couched in jargon and whose parameters may be the result of obscure statistical manipulations. - John Little (1970)

26

How to Assess Cost-Effectiveness of Data- A Pragmatic Approach

Design a Prototype scaled to the barest minimum

Collect data for the Prototype

- Lowest data cost

Develop Prototype using real data

Users evaluate benefits of system

StopFull-blown

SystemValue vs

Cost?“No Go” “Go”

27

Case Example:A Consumer Packaged Goods Company

System Objective: To evaluate sales impact of trade promotions

Data Problem: Serious gaps in operational data

Available data on promotions: How much was spent When the bills were paid

Missing key data: When were the promotions run

...to correlate with sales data

Issue: Data problem is solvable in principle

But... Is it worth the effort and cost?

28

The Low-Cost Prototype- To Assess Value of Data

• Model limited to the core variables– sales, promotion expenditures and dates, margins

• Detailed data needed for useful information– by packs for each brand and by markets

– weekly data for capturing sales fluctuations

– two years of data to compare pre- with post-deal sales levels

• Cost of data– Manual effort to extract dates of promotions from logbooks

• Barest-minimum Prototype– 2 brands, a major brand and a new brand

– 8 markets (out of 50), 3 large, 3 medium and 2 small

• Results– Demonstrated the value of collecting the missing data and building

an integrated database

– Led to the development of a promotion-event calendar system

29

Gaps in Operational Data:A Perennial Problem -- Why?

Because of the narrow focus of operational systems

Operational systems are an important source of data for decision support

Design of operational systems must incorporate data requirements of management support systems

An Example:

When implementing new Human Resource Information Systems (e.g., PeopleSoft), are the data requirements of human resource management considered? For evaluating hiring sources? Career development? Etc.

30

The Product Pricing Problem

• Critical Problem for ALL Enterprises– Private Sector and Public Sector

• Predicting Customer Response is Difficult– Past behavior is of limited value

– Competitor’s reactions to “our” price is unpredictable

• Even More Difficult in the Public Sector–Bottom-line impact is not enough

–Must consider: Who is affected? How?

31

Price and Demand Relationships Are Complex

• Highly non-linear

• Exhibit “threshold effects”

• Delayed response

• Price is only one factor -- other decision variables (e.g., distribution, promotion) interact with price to affect demand

• External factors, about which we have imperfect information, impact pricing decisions

32

The Transit Pricing Problem

• Current Fare Structure – Essentially a “flat” fare

– Insensitive to distance traveled

• Inequities of Present Fare Structure– Favors long trips at the expense of short ones

– Long-distance riders -- mostly suburban commuters with relatively high incomes.

– Short-distance riders -- mostly urban residents traveling off-peak for discretionary purposes

– Thus, distance inequities often imply social inequities

33

Why Consider Distance-Based Fares?

• Evens out the fare per mile paid by all riders– e.g., with a 25 cent Flat Fare:

» Rider #1 travels 1 mile and pays 25 cents per mile» Rider #2 travels 5 miles and pays 5 cents per mile

• Drawback of Flat Fares: Long-distance riders being subsidized by short-distance riders

• Potential of Distance-Based Fares to:– Reduce inequities in fare per mile– Increase revenue

34

Macro Models for Demand Forecasting - The Conventional Tool

• Operate on aggregate data

• Relate a measure of travel demand to a set of explanatory (“independent”) variables

– Measures of travel demand:

» # of passengers or # of trips

– Explanatory variables:

» Demographic variables (e.g., median income), trip characteristics (e.g., peak/off-peak), and decision variables (e.g., fares)

35

Macro Models versus Micro Models

• Macro Models are useless for evaluating who is affected by a change in transit fares

– For example:

» Would a price increase hurt inner city residents more or less than suburban commuters?

» Would loss in patronage be greater off-peak than peak?

» Would a lower fare benefit work trips? Shopping trips?

• A Micro Model at the level of the individual rider is needed to handle the variety of ridership characteristics such as age, income, place of residence, time and purpose of travel, etc.

36

Micro - Simulation Model

• The Micro Model focuses on the behavior of the individual rider: how is his/her transit usage affected by a fare change?

• The “what if” forecasts for the individual riders are then aggregated by age, income, purpose of trip, etc. to show what groups of riders would be affected by the fare change.

37

Gist of the Micro Model for Transit Pricing

1. Travel demand of a rider would change in a manner governed by the fare elasticity appropriate to that rider.

2. Forecast transit usage and revenue for individual riders in the sample survey.

3. Weight the individual rider’s figures by an expansion factor to project the results to the population.

4. Aggregate the weighted figures by the desired ridership categories to assess the revenue and equity effects.

38

1. Model is complete on important factors that affect demand -- income, age, purpose of trip, time of travel, etc. are all represented in the individual riders in the sample -- the “Micro” approach

2. The “what if” demand for a new fare policy is determined through the fare elasticity appropriate for that rider -- the “Simulation” approach

Merits of the Micro-Simulation Approach

39

Merits of the Micro-Simulation Approach

3. The micro-simulation results can be subsequently aggregated by any desired rider characteristic for the equity analysis

4. Model is easy to understand -- critical since user will not risk using it for pricing decisions; even more so when a multiplicity of parties are involved as in transit pricing

40

Design of the Transit Pricing Model

• Conventional wisdom: “the bigger the better”

• Problem: The more elaborate the model, the more data needed to set up the model

• For the model to be useful, it should be:– Simple enough for transit managers to readily understand

but not simplistic

– Complete on important issues for a valid assessment of the impact of new fare policies

– A model that does not rely on historical data for calibration

– Generating outputs that the user finds easy to interpret

41

What is the Model?

• Forecast Usage for Rider # 1 = Present Usage of Rider #1 + (Elasticity of Rider # 1 * Fare Change Ratio)

• Above equation adjusts the current demand through a ratio based on the fare elasticity that is appropriate for that rider

• Micro-simulation is better than a macro regression model in an important way -- the model is robust because reasonable values for the elasticity will not yield unreasonable values for forecast demand

42

An Example

• Individual X uses the travel system 5 times per week paying a flat fare of 25 ¢ and traveling a distance of 5 miles per trip

• Proposed distance-based fare policy: a base fare of 10¢ and a 5¢ increment per mile

• New fare for this rider is 35 ¢ per trip

• % change in fare paid by this rider = (10 ¢/25 ¢) x 100 = 40%

• % change in frequency of ridership = (% change in fare paid) x EE = “fare elasticity of demand” = % change in demand for a 1% change in fare

• e.g., an E value of -.25 implies that a 1% increase in fare will reduce demand by .25%

• Hence, for the 40% increase in fare paid by this rider under the new policy, the percent reduction in demand is predicted to be 10%

43

Key Features of the Model

1. Different fare elasticities can be applied to individual riders, thus making the model complete on important factors that affect travel demand

2. Calibration of the model involves the estimation of only one parameter - fare elasticity

3. To simplify the calibration, segment the sample of riders into groups that are expected to have the same elasticity

4. Since fare elasticity has a clear operational meaning, it is feasible for the transit managers to judgmentally segment the market and estimate fare elasticities for each segment

44

Decision Calculus Concept

• A model-based set of procedures for processing data and judgments to assist a manager in decision making

• Enables more policy alternatives to be examined than if the manager relied on judgment alone

• Uses sensitivity analysis to test the robustness of the conclusions with regard to the soft data inputs used in the analysis

• Key element of this concept is its approach to calibration: Use the manager’s judgment, especially when available data are either inadequate or dirty

45

How the Model Works

1. For an individual rider in the sample survey:– The model calculates the % change in frequency of

ridership for the proposed fare change based on the elasticity appropriate for that rider

– The model applies this % change to current weekly frequency of ridership to obtain predicted new frequency with the proposed policy

– The model calculates the fare paid per trip under the new policy and the predicted weekly revenue for the individual rider

46

How the Model Works

2. Predicted ridership and revenue figures for each rider are expanded by suitable factors to project the sample to the ridership population

3. The expanded ridership and revenue figures are then aggregated according to income, age, etc.

4. Computer output includes % changes in ridership and revenue to facilitate “before” and “after” comparisons

47

Why the Model Works

• Crux of the model: fare elasticity which can be judgmentally estimated by managers using historical estimates, if available, as a first cut.

• Since all riders in the population do not react in the same way to fare changes, the population should be first subdivided into segments whose members are expected to be fairly similar in terms of their responses to fare changes

• Since elasticity estimates are soft, sensitivity analysis has to be done using multiple elasticity values to select a fare policy that performs in a satisficing manner with the range of estimates used

MODELS & DATA

Documents

Transcript of MODELS & DATA