Data Clustering for Forecasting - MIT...

71
1 Data Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center Mahesh Kumar MIT OR Center Nitin Patel Visiting Professor Jonathan Woo ProfitLogic Inc.

Transcript of Data Clustering for Forecasting - MIT...

Page 1: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

1

Data Clustering for ForecastingJames B. Orlin

MIT Sloan School and OR Center

Mahesh KumarMIT OR Center

Nitin PatelVisiting Professor

Jonathan WooProfitLogic Inc.

Page 2: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

2

Overview of Talk

• Overview of Clustering

• Error-based clustering

• Use of clustering in forecasting

• But first, a few words from Scott Adams

Page 3: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

3

Page 4: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

4

Page 5: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

5

Page 6: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

6

What is clustering

Clustering is the process of partitioning a set of data or objects into clusters with the following properties:

• Homogeneity within clusters: data that belong to the same cluster should be as similar as possible

• Heterogeneity between clusters: data that belong to different clusters should be as different as possible.

Page 7: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

7

Overview of this talk

• Provide a somewhat personal view of the significance of clustering in life, and why it has not met its promise

• Provide our technique for how to incorporate uncertainty about data into clustering, so as to reduce uncertainty in forecasting.

Page 8: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

8

Iris Data (Fisher, 1936)

Species Setosa Versicolor Virginica

can2

can1

Page 9: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

9

Cluster the iris data

• This is a 2-dimensional projection of 4-dimensional data. (sepal length and width, petal length and width)

• It is not clear if there are 2, 3 or 4 clusters

• There are 3 clusters

• Clusters are usually chosen to minimize some metric (e.g., sum of squared distances from center of the cluster)

Page 10: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

10

Iris Data

Species Setosa Versicolor Virginica

can2

can1

Page 11: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

11

Iris Data, using ellipses

Species Setosa Versicolor Virginica

can2

can1

Page 12: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

12

Why is clustering important: a personal perspective

• Two very natural aspects of intelligence:– grouping (clustering) and categorizing– It’s an organizing principle of our minds and of our life

• Just a few examples– We cluster life into “work life” and “family life”– We cluster our life by our “roles” father, mother, sister,

brother, teacher, manager, researcher, analyst, etc– We cluster our work life into various ways, perhaps organized

by projects, or who we report to, or by who reports to us, etc.– We even cluster what talks we attend, perhaps organized by

quality, or what we learned, or where it was.

Page 13: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

13

More on Clustering in Life• More clustering Examples:

– Go shopping: products are clustered in the store (useful for locating things)

– As a professor: I need to cluster students into letter grades: “what really is the difference between a B+

and an A- ? (useful in evaluations)– When we figure out what to do, we often prioritize by

clustering things (important vs. non-important)– We cluster people into multiple dimensions based on

appearance, intelligence, character, religion, sexual orientation, place of origin, etc

• Conclusion: Humans are clustering and categorizing by nature. It is part of our nature. It is part of our intelligence

Page 14: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

14

Fields that have used clustering

• Marketing (market segmentation, catalogues)• Chemistry (the periodic table is a great example)• Finance (making sense of stock transactions)• Medicine (clustering patients) • Data mining (what can we do with transactional data,

such as click stream data?)• Bioinformatics (how can we make sense of proteins?)• Data compression and aggregation (can we cluster

massive data sets into smaller data sets for subsequent analysis?

• plus much more

Page 15: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

15

Has clustering been successful in data mining?

• Initial hope: clustering would find many interesting patterns and surprising relationships– arguably not met, at least not nearly enough– perhaps it requires too much intelligence– perhaps we can do better in the future

• Nevertheless: clustering has been successful in use computers for things that humans are quite bad at– dealing with massive amounts of data– effectively using knowledge of “uncertainty”

Page 16: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

16

An issue in clustering: the effect of scale

• Background: an initial motivation for our work in clustering (as sponsored by the e-business Center) is to eliminate the effect of scale in clustering

Page 17: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

17

A Chart of 6 Points

Clustering 6 points

0123456

1.5 2 2.5 3 3.5 4 4.5

Page 18: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

18

Two Clusters of the 6 Points

Clustering 6 points

0123456

1.5 2 2.5 3 3.5 4 4.5

Page 19: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

19

We added two points and adjusted the scale

Clustering 8 points

3.5

4

4.5

5

5.5

0 10 20 30 40 50 60

Page 20: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

20

3 clusters of the 8 pointsClustering 8 points

3.5

4

4.5

5

5.5

0 10 20 30 40 50 60

The 6 points on the left are clustered differently

Page 21: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

21

Scale Invariance

• A clustering approach is called “scale invariant” if it develops the same solution, independent of the scales used

• The approach developed next is scale invariant

Page 22: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

22

Using clustering to reduce uncertainty. Try to find the average of the 3 populations

Species Setosa Versicolor Virginica

can2

can1

Page 23: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

23

0.48 0.5 0.52 0.54 0.56 0.58 0.6

Using uncertainty to improve clustering: an example with 4 points in 1 dimension

The four points were obtained as sample means for four samples, two from one distribution, and two from another.

Objective: cluster into two groups of two each so as to maximize the probability that each cluster represents two samples from the same distribution.

Page 24: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

24

Standard Approach

Consider the four data points, and cluster based on these values.

Resulting cluster

0.48 0.5 0.52 0.54 0.56 0.58 0.6

Page 25: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

25

Incorporating Uncertainty• a common assumption in statistics

– data comes from “populations” or distributions– from data, we can estimate the mean of the population and the

standard deviation of the original • Usual approach to clustering

– keep track of the estimated mean– ignore the standard deviation (estimate of the error)

• Our approach: use both the estimated mean and the estimate of the error.

0.48 0.5 0.52 0.54 0.56 0.58 0.6

Page 26: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

26

0.48 0.5 0.52 0.54 0.56 0.58 0.6

The two samples on the left were samples with 10,000 points each. The samples on the right were two samples with 100 points each.

The radius corresponds to standard deviation.

Smaller circles !!!! larger data sets !!!! more certainty.

Page 27: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

27

0.48 0.5 0.52 0.54 0.56 0.58 0.6

probability = 4/19

probability = 8/19

probability = 7/19

Page 28: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

28

0.48 0.5 0.52 0.54 0.56 0.58 0.6

0.48 0.5 0.52 0.54 0.56 0.58 0.6

10,000 points with mean .501

100 points with mean .562

10,100 points with mean .501

10,000 points with mean .536

100 points with mean .592

10,100 points with mean .537

True mean: .5

True mean: .53

Page 29: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

29

More on using uncertainty

• We will use clustering to reduce uncertainty

• We will use our knowledge of the uncertainty to improve the clustering

• In the previous example, the correct cluster was

• We had generated 20 sets of four points at random. The data was from the second set of four points.

probability = 8/19

Page 30: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

30

Error based clustering

1. Start with n points in k-dimensional space – next example has 15 points, 2 dimensions– Each point has an estimated mean as well as a

standard deviation of the estimate2. Determine the likelihood for each pair of

points coming from the same distribution3. Merge the two points with the greatest

likelihood4. Return to Step 2.

Page 31: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

31

Using Maximum Likelihood

• Maximum Likelihood Method– Suppose we have G clusters, C1, C2, …, CG. Out of

exponentially many clusterings possible, which clustering is most likely w.r.t. to the observed data.

Objective:

Computationally difficult!

12 2 2

1

1max ( ) ( ) ( )k k k

Gti i

k i C i C i Ci i i

x xσ σ σ

= ∈ ∈ ∈∑ ∑ ∑ ∑

Page 32: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

32

Heuristic solution based on maximum likelihood

• Greedy heuristic– Start with n single point clusters– Combine pair of clusters that lead to maximum increase in the

objective value (based on maximum likelihood)– Stop when we have G clusters.

Similar to hierarchical Clustering

Page 33: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

33

Error-based clustering

• At each step combine pair of clusters Ci, Cj with smallest – xi , xi : maximum likelihood of means of clusters– """"i , """"j : standard errors in x’s.

• We define the distance between two clusters as

Computationally much easier!!

2 2 1( ) ( ) ( )ti j i j i jx x x xσ σ −− + −

2 2 1( ) ( ) ( )ti j i j i jx x x xσ σ −− + −

Page 34: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

34

Error-based Clustering Algorithm

• distance(Ci, Cj) =

• Start with n singleton clusters• At each step combine pair of clusters Ci, Cj with

smallest distance.• Stop when we have desired number of clusters

It is a generalization of Ward’s method.

2 2 1( ) ( ) ( )ti j i j i jx x x xσ σ −− + −

Page 35: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

35

The mean is the dot. The error is given by the ellipse.

A small ellipse means that the data is quite accurate.

Page 36: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

36

Determine the two elements most likely to come from the same distribution.

Merge them into a single element.

Page 37: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

37

Determine the two elements most likely to come from the same distribution.

Merge them into a single element.

Page 38: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

38

Continue this process, reducing the number of clusters one at a time.

Page 39: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

39

Page 40: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

40

Page 41: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

41

Page 42: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

42

Page 43: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

43

Page 44: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

44

Page 45: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

45

Page 46: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

46

Page 47: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

47

Page 48: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

48

Page 49: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

49

Page 50: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

50

Here we went all the way to a single cluster.

We could stop with 2 or 3 or more clusters. We can also evaluate different numbers of clusters at the end.

Page 51: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

51

Rest of the Lecture

• The use of clustering in forecasting developed while Mahesh Kumar worked at ProfitLogic.

• Joint work: Mahesh Kumar, Nitin Patel, Jonathon Woo.

Page 52: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

52

Motivation

• Accurate sales forecasting is very important in retail industry in order to make good decisions.

Manufacturer Wholesaler Retailer Customer

Shipping Allocation Pricing

Kumar et al. used clustering to help in accurate sales forecasting.

Page 53: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

53

Forecasting Problem

• Goal: Forecast Sales• Parameters that affect sales

– Price– When a product is introduced– Promotions– Inventory– Base demand as a function of time of the year.– Random effects.

Page 54: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

54

Seasonality Definition

• Seasonality is the hypothesized underlying base demand of a group of similar merchandize as a function of time of the year.

• It is a vector of size 52, describing variations over the year.

• It is independent of external factors like changes in price, promotions, inventory, etc. and is modeled as a multiplicative factor.

• e.g., two portable CD players have essentially the same seasonality, but they may differ in price, promotions, inventory, etc.

Page 55: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

55

Seasonality Examples (made up data)

weekly sales for summer shoes

weekly sales for winter boots

Page 56: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

56

Objective: determine seasonality of products

• Difficulty: observations of a product’s seasonality is complicated by so other factors– when the product is introduced– sales and promotions– inventory

• Solution methods– preprocess data to compensate for sales and promotions and

inventory effects– average over lots of very similar products to eliminate some of

the uncertainty– Further clustering of products can eliminate more uncertainty

Page 57: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

57

Retail Merchandize Hierarchy

Shoes

Men’ssummerShoes

Sales data available for items

Chain

Department

Class

Item

J-Mart

Debok walkers

Page 58: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

58

Modeling Seasonality

• Seasonality is modeled as a vector with 52 components

• Assumptions:– We assume errors are Guassian– We treat the estimate of the σσσσ’s as if they are the correct values

2 2 2 21 1 2 2 52 52{( , ), ( , ),..., ( , )} ( , )i i i i i i i i iSeas x x x xσ σ σ σ= =

Page 59: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

59

Illustration on simulated data

• Kumar et al generated data with 3 different seasonalities.

• They then combined similar products and produced estimates of seasonalities.

• Clustering produced much better final estimates.

Page 60: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

60

Simulation Study

• 3 different seasonalities were used to generate sales data for 300 items.

• All 300 items divided into 12 classes.

• 12 estimates of seasonality coefficients along with associated errors.

• Used clustering into three clusters to forecast correct seasonalities.

Page 61: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

61

Seasonalities

Page 62: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

62

Initial seasonality estimates

Page 63: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

63

Clustering

• Cluster classes with similar seasonality to reduce errors. – Example: Men’s winter shoes, men’s winter coats.

• Standard Clustering methods do not incorporate information contained in the errors.– Hierarchical clustering– K-means clustering– Ward’s method

Page 64: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

64

Further Clustering

• They used K-means, hierarchical, and Ward’s technique

• They also used error based clustering

Page 65: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

65

Kmeans, hierarchical (avg), Ward’s Result

Page 66: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

66

Error-based Clustering Result

Page 67: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

67

Real Data Study

• Data from retail industry.• 6 department: books, sporting goods, greeting cards,

videos, etc.• 45 classes.• Sales forecast

– Without clustering– Standard clustering– Error-based clustering

Page 68: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

68

Forecast Result (An example)

Weeks

Sale

s

No Clustering

Standard Clustering

Error-based Clustering

Page 69: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

69

Result Statistics

• Average Forecast Error

ForecastSale ActualSaleActualSale

−=∑

Page 70: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

70

Summary and Conclusion

• A new clustering method that incorporates information contained in errors

• It has strong theoretical justification under appropriate assumptions

• Computationally easy

• Works well in practice

Page 71: Data Clustering for Forecasting - MIT IDEebusiness.mit.edu/sponsors/common/2002-June-Wksp-DataM/orlin.pdfData Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center

71

‘Summary and Conclusion

• Major point: if one is using clustering to reduce uncertainty, then it makes sense to use error-based clustering.

• Scale invariance.

• Error-based clustering has strong theoretical justification and works well in practice.

• The concept of using errors can be applied to many other applications where one has reasonable estimate of errors.