Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

49
Lecture 3 Lecture 3 MARK2039 MARK2039 Winter 2006 Winter 2006 George Brown College George Brown College Wednesday 9-12 Wednesday 9-12

Transcript of Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Page 1: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Lecture 3Lecture 3

MARK2039MARK2039

Winter 2006Winter 2006

George Brown CollegeGeorge Brown College

Wednesday 9-12Wednesday 9-12

Page 2: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

RecapRecap• What are the four stages of data mining What are the four stages of data mining

and who are the stakeholdersand who are the stakeholders

• Data mining measures and metricsData mining measures and metrics– MeanMean– MedianMedian– ModeMode– Standard DeviationStandard Deviation

• Why are these above Statistics important Why are these above Statistics important in evaluating numbersin evaluating numbers

Page 3: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

RecapRecap• Is the Average or Mean Appropriate Is the Average or Mean Appropriate

in deriving Insight about a in deriving Insight about a group,segment or sample behaviour.group,segment or sample behaviour.

• Why do we need to look at how Why do we need to look at how numbers vary?numbers vary?

• What are some of the measures used What are some of the measures used to assess variation?to assess variation?

Page 4: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

RecapRecap

2 distributions above. What do they mean and you would interpret the results. Both distributions have the same median and mean

Distribution A Distribution B350 700500 725750 7501000 7751150 800

Page 5: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

RecapRecapDistribution A Distribution B

3 34 45 56 67 78 1000

What is the problem here?

Page 6: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

RecapRecap• Consider the following two Consider the following two

distributions ...distributions ...Distribution Distribution

A: B:0 4,500

1,000 4,6002,000 4,7003,000 4,8004,000 4,9005,000 5,0006,000 5,1007,000 5,2008,000 5,3009,000 5,40010,000 5,500

Mean 5,000 5,000

Stdev 3,316.62 331.66

Page 7: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

RecapRecap• For a binomial distribution, such as For a binomial distribution, such as

response, we must use a different response, we must use a different formula.formula.

)()*( Nqp

1 Responder 0 Non - responder0 Non - responder1 Responder 0 Non - responder1 Responder 0 Non - responder0 Non - responder0 Non - responder0 Non - responder

0.300 Mean0.145 Stdev

Page 8: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

RecapRecap• What are Indexes.What are Indexes.

• Give me some examples.Give me some examples.

• Why are they important in the Why are they important in the marketingmarketingworld?world?

• What is the most common one used What is the most common one used in the marketing world? in the marketing world?

Page 9: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

LiftLift• Lift represents a relative comparison Lift represents a relative comparison

between two numbers. It is a type of between two numbers. It is a type of index. How is normally used?index. How is normally used?

• Typically, it represents the number of Typically, it represents the number of a particular of a particular group a particular of a particular group divided by the average.( divided by the average.( X1/average).X1/average).

• Example: Example:

Response Rate Target Group 2%

Average 1.50%

Page 10: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Recap-LiftRecap-Lift• Use relative measures and not absolutesUse relative measures and not absolutes• The notion of “lift” should be the marketer’s key The notion of “lift” should be the marketer’s key

determinant of successdeterminant of success

Example

Campaign 1Campaign 1 Campaign 2Campaign 2

Strategy 1Strategy 1 3% Resp. Rate3% Resp. Rate 23% Resp. Rate23% Resp. Rate

Strategy 2Strategy 2 1.5% Resp. 1.5% Resp. RateRate

21.5% Resp. 21.5% Resp. RateRate

DifferencDifferencee

1.5% Resp. 1.5% Resp. RateRate

1.5% Resp. 1.5% Resp. RateRate

What is the key learning here?

Page 11: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Assignment 2Assignment 21.Answer the following questions on the table listed below:

Calculate the following averages and medians for each column of numbers

mean 222.5 3000 460

median 240 3000 300

Col. A Col. B Col.C240 4000 300250 4000 300220 3000 300250 2000 400240 1000 100240 5000 150240 3000 150260 3000 40050 2000 500235 3000 2000

Page 12: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Assignment 2 Assignment 2 What kind of distributions are col. A and col. C and what metric would be best

used to communicate to business users.

Skewed,or asymmetric,or nonnormal. Median is the key measure What column would be most reliable in estimating results to a larger population

and why?

Col. A as std.deviation is smallest which allows our range around the mean to be much tighter.

Page 13: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Assignment 2Assignment 22 marks 2. The median height of 65 inches is the same for two classes. Yet, the average in one class is 65 inches vs. 70 inches in another class. What is causing this difference? An outlier value containing a very tall person is causing the mean of one class to be 70 inches

Page 14: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Assignment 2Assignment 2

Calculate the index values for each variable for Customer A. Why are indexes useful in database marketing?

Spending: .5 Tenure: .5 Income: 1.2

Indexes are useful as relative measures in terms of comparing a value relative to the average and being able to rank order or prioritize records

Page 15: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating test resultsEvaluating test results• In database marketing, marketers In database marketing, marketers

are constantly asked what to are constantly asked what to conclude from their testing results.conclude from their testing results.

• For instance, are the results of one For instance, are the results of one strategy significantly different than strategy significantly different than another strategy.another strategy.

• Let’s take a look at some examples. Let’s take a look at some examples.

Page 16: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing TestEvaluating Marketing Test• Two groups of cells have been tested Two groups of cells have been tested

for different communication for different communication strategies. Results are as follows. strategies. Results are as follows. What would youWhat would youconclude? conclude?

Strategy Sample SizeResponse

RateA 10000 2.30%B 5000 2%

Page 17: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing TestEvaluating Marketing Test• To determine this, you need to do To determine this, you need to do

statistical testing which essentially statistical testing which essentially comprises three factors:comprises three factors:– Confidence level that you wantConfidence level that you want– Actual standard deviation based on the lower Actual standard deviation based on the lower

sample sizesample size– Response Rate Or performance RateResponse Rate Or performance Rate

– For our purposes, we will use a 95% confidence For our purposes, we will use a 95% confidence interval which essentially translates into 2 interval which essentially translates into 2 standard deviations around the mean standard deviations around the mean

Page 18: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing TestEvaluating Marketing Test• Calculate the following confidence Calculate the following confidence

intervals at 95%intervals at 95%– 1% with a std. deviation of .1%1% with a std. deviation of .1%– 2% with a std. Deviation of .05%2% with a std. Deviation of .05%– 5% with a std. Deviation of .5%5% with a std. Deviation of .5%– 5% with a std. Deviation of .3%5% with a std. Deviation of .3%

• Let’s get back to the problemLet’s get back to the problem

Page 19: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing TestEvaluating Marketing Test• Two groups of cells have been tested Two groups of cells have been tested

for different communication for different communication strategies. Results are as follows. strategies. Results are as follows. What would youWhat would youconclude? conclude?

Strategy Sample SizeResponse

RateA 10000 2.30%B 5000 2%

Page 20: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing TestEvaluating Marketing Test• Calculate the standard deviation first using Calculate the standard deviation first using

the sample with the lower qty-Strategy B.the sample with the lower qty-Strategy B.–

– Sq. root of (.02X.98)/5000=.00198Sq. root of (.02X.98)/5000=.00198

– 95% confidence interval=95% confidence interval=• .02+2*.00198 and .02-2*.00198=.02+2*.00198 and .02-2*.00198=

• .01604<=.02<=.02396..01604<=.02<=.02396.

– Based on this result, what can you concludeBased on this result, what can you concludebetween Strategy A and Strategy B between Strategy A and Strategy B

)()*( Nqp

Page 21: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing Test Evaluating Marketing Test ResultsResults

• Two other groups of cells have been Two other groups of cells have been tested for different communication tested for different communication strategies. Results are as follows. strategies. Results are as follows. What would you conclude?What would you conclude?

•Strategy •Sample Size•Response

•Rate•A •1000 •5.00%•B •2000 •3%

Suppose the A becomes 3.3%. What would you conclude?

Page 22: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing TestEvaluating Marketing Test• Calculate the standard deviation first using Calculate the standard deviation first using

the sample with the lower qty-Strategy A.the sample with the lower qty-Strategy A.–

– Sq. root of (.05X.95)/1000=.00689Sq. root of (.05X.95)/1000=.00689

– 95% confidence interval=95% confidence interval=• .05+2*.00689 and .05-2*.00689=.05+2*.00689 and .05-2*.00689=

• .03622<=.05<=.06378..03622<=.05<=.06378.

– Based on this result, what can you concludeBased on this result, what can you concludebetween Strategy A and Strategy B between Strategy A and Strategy B

)()*( Nqp

Page 23: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing Test Evaluating Marketing Test ResultsResults

• Two other groups of cells have been Two other groups of cells have been tested for different communication tested for different communication strategies. Results are as follows. strategies. Results are as follows. What would you conclude?What would you conclude?

•Strategy •Sample Size•Response

•Rate•A •1000 •5.00%•B •2000 •4.0%

Suppose B becomes 4.0%. What would you conclude?

Page 24: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing Test Evaluating Marketing Test ResultsResults• Calculate the standard deviation first using Calculate the standard deviation first using

the sample with the lower qty-Strategy A.the sample with the lower qty-Strategy A.–

– Sq. root of (.05X.95)/1000=.00689Sq. root of (.05X.95)/1000=.00689

– 95% confidence interval=95% confidence interval=• .05+2*.00689 and .05-2*.00689=.05+2*.00689 and .05-2*.00689=

• .03622<=.05<=.06378..03622<=.05<=.06378.

– Based on this result, what can you concludeBased on this result, what can you concludebetween Strategy A and Strategy B between Strategy A and Strategy B

)()*( Nqp

Page 25: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Evaluating Marketing Test Evaluating Marketing Test ResultsResults• Having done several of these tests, Having done several of these tests,

what will cause your confidence what will cause your confidence range to narrowrange to narrow– Large sample sizeLarge sample size– Smaller response ratesSmaller response rates

Page 26: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

DataData

Review of DataReview of Data

Page 27: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Types Of Data/FormatTypes Of Data/Format

• Character-Level DataCharacter-Level Data

• Numeric DataNumeric Data

• DateDate

• Give me some examplesGive me some examples

• In Data Mining, what do we have to do In Data Mining, what do we have to do with all data before building a solutionwith all data before building a solution

Page 28: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data Format ExamplesData Format Examples• GenderGender

• IncomeIncome

• SpendingSpending

• BirthdateBirthdate

• Customer typeCustomer type

• How would you use gender,customer How would you use gender,customer type, and birthdate in a data mining type, and birthdate in a data mining exercise exercise

Page 29: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data TransformationData Transformation• Gender VariableGender Variable

– Male=1, non male=0Male=1, non male=0– Female=1,non female=0Female=1,non female=0– What happens to missing values here?What happens to missing values here?

• Customer Type VariableCustomer Type Variable– Gold member=1,non gold member=0Gold member=1,non gold member=0– Platinum member=1,non platinum Platinum member=1,non platinum

member=0member=0– Etc. Etc.

Page 30: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data TransformationData Transformation• BirthdateBirthdate

– Convert birthdate to ageConvert birthdate to age– Extract birthyear from birthdate field and Extract birthyear from birthdate field and

substract from current year(i.e.2005-substract from current year(i.e.2005-1954)1954)

• Date of last Spending ActivityDate of last Spending Activity– Create recency of last spendCreate recency of last spend– Create tenure variableCreate tenure variable– How would this be done.How would this be done.

Page 31: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

DataData• Discrete vs. index vs. continuousDiscrete vs. index vs. continuous

• DiscreteDiscrete– Yes/NoYes/No– On/OffOn/Off

• Convert above type data to 1,0 type Convert above type data to 1,0 type scenarioscenario

Page 32: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

DataData• Index Type DataIndex Type Data

Customer Type Average SpendRegular 100Gold 200

Platinum 300Average 125

Could convert each customer type to binary value.

But what would be more valuable way to convert ortransform this variable?

List Source Average spendA 200B 400C 600

Average 400

Page 33: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data Data • Continuous dataContinuous data

– What are some examplesWhat are some examples

• What does it mean when we say that What does it mean when we say that data is continuous? data is continuous?

Page 34: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data TypeData Type• Looking at data as we have in the last Looking at data as we have in the last

number of slides, we can create what number of slides, we can create what we call data categories:we call data categories:

– NominalNominal– OrdinalOrdinal– IntervalInterval

Page 35: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data CategoriesData Categories

• Nominal variables are variables Nominal variables are variables where the values do not represent where the values do not represent any real order or magnitude of value.any real order or magnitude of value.

• Examples:Examples:– GenderGender– Product CategoryProduct Category– Promotion Category Promotion Category

Page 36: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data CategoriesData Categories• Ordinal Variables represent fields Ordinal Variables represent fields

where where the values have some orderthe values have some order

• Good examples are: Good examples are: – index-type variables index-type variables – Model rankModel rank– Etc. Etc.

Page 37: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data CategoriesData Categories• Interval Variables represent fields Interval Variables represent fields

where the actual values indicate order where the actual values indicate order but also magnitude.but also magnitude.– IncomeIncome– SpendSpend– Model ScoreModel Score

• What data category is the most What data category is the most granular?granular?

• Which category might you typically Which category might you typically expect to be more powerful in a data expect to be more powerful in a data mining exercise? mining exercise?

Page 38: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Data UsefulnessData Usefulness• When is Data Useful?When is Data Useful?

– Few Missing valuesFew Missing values– Variable does not consist primarily of one Variable does not consist primarily of one

valuevalue– Non-Numeric Data consists of too many Non-Numeric Data consists of too many

values which cannot be properly grouped values which cannot be properly grouped into more meaningful categories into more meaningful categories

Page 39: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Examples-Analytical Examples-Analytical PerspectivePerspective

Variable # of recordsData Field

format# of Unique

values

# of missing values

Income 100000 numeric 50000 2000Customer Type 100000 character 4 10000

Gender 100000 character 2 50000Household Size 100000 numeric 7 90000Product Type 100000 character 3000 5000

Customer Name 100000 character 100000 0Postal Code 100000 character 50000 0

What fields are useful and why?

Page 40: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

ExamplesExamples

Income % of Records

<25000 25%25000-50000 25%50000-75000 25%

75000+ 23%Missing 2%

Closer look at income

Gender % of recordsMale 23%

Female 27%Missing 50%

Closer look at gender

Page 41: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

ExamplesExamples• Closer Look at Customer TypeCloser Look at Customer Type

Customer Type % of recordsGold 5%

Bronze 40%Silver 30%

Platinum 15%Missing 10%

Closer look at Product Type

Product Type % of records Cum. % of recordsA001 0.07% 0.07%B001 0.08% 0.15%C003 0.06% 0.21%A010 0.06% 0.27%….

missing 5% 99.92%Z004 0.08% 100%

Page 42: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

ExamplesExamples

Variables # of recordsData Field

Format# of unique

values# of missing

values1st 3 digits of postal

code 100000 character ? 100000household size 100000 numeric ? 100000

Credit score 100000 numeric ? 100000 mortgage account 100000 character ? 100000

Product code 100000 character ? 100000

Median Income of Postal Code of record 100000 numeric ? 100000

•What variables would be useful hereWhat variables would be useful here

•What would be the number of unique variablesWhat would be the number of unique variables

Page 43: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

ExamplesExamples

Variables # of recordsData Field

Format# of unique

values# of missing

values1st 3 digits of postal

code 100000 character 100000 0household size 100000 numeric 100000 0

Credit score 100000 numeric 100000 0 mortgage account 100000 character 100000 0

Product code 100000 character 100000 0

Median Income of Postal Code of record 100000 numeric 100000 0

•What variables would be useful hereWhat variables would be useful here

Page 44: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Examples-Marketing Examples-Marketing PerspectivePerspective

• A mortgage company is conducting a A mortgage company is conducting a campaign to its high value customers. campaign to its high value customers. One of the key characteristics of value is One of the key characteristics of value is high income which is self-reported at high income which is self-reported at time of application. time of application. Income % of records

< 30000 5%30000-60000 5%60000-80000 20%

80000-100000 10%100000+ 10%missing 50%

As a marketer, how will you use this information and what do you need to consider?

Page 45: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Examples-Marketing Examples-Marketing PerspectivePerspective

• An insurance company is marketing an An insurance company is marketing an insurance product to people over the insurance product to people over the age of 60. Listed below is a report age of 60. Listed below is a report indicating the distribution of age.indicating the distribution of age.

As a marketer, how will you use this information?

Age % of records<30 5%

30-40 10%40-55 15%55-65 10%65+ 10%

missing 50%

Page 46: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Examples-Marketing Examples-Marketing PerspectivePerspective

• An retail company has over 1000 product An retail company has over 1000 product SKU’s. After investigation, it has been SKU’s. After investigation, it has been determined that the 1determined that the 1stst digit represents a digit represents a broader product category. You have been broader product category. You have been asked to design the product layout for all asked to design the product layout for all stores. stores.

As a marketer, how will you use this information?

Product SKU % of records Cum. % of recordsA000003 0.03% 0.03%A000004 0.02% 0.05%B000005 0.03% 0.08%B000006 0.04% 0.12%

….Z999999 0.02% 100%

Page 47: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Examples-Marketing Examples-Marketing PerspectivePerspective•

Gender % of recordsMale 10%

Female 12%Missing 88%

Income % of records0-20K 5%

20K-40K 4%40K-60K 7%60K-80K 6%

80K+ 5%missing 73%

What can be done here, if anything and what else can we consider in terms of using gender and income information ?

Page 48: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Examples-Marketing Examples-Marketing PerspectivePerspective• You have postal code information for You have postal code information for

each customer. You are asked to each customer. You are asked to design customer reports by design customer reports by province.How would you do this? province.How would you do this?

Page 49: Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.

Boire Filler GroupBoire Filler Group

Examples-Data Mining PerspectiveExamples-Data Mining Perspective• You have the following variables and valuesYou have the following variables and values

– Gender: ’M’:MaleGender: ’M’:Male ‘F’:Female ‘F’:Female

– Age:Age: ‘B’: <20M ‘B’: <20M ‘F’: 20M-40M ‘F’: 20M-40M ‘R’:40M-60M ‘R’:40M-60M ‘S’:60M-80M ‘S’:60M-80M ‘T’:80M-100M ‘T’:80M-100M ‘Z’: 100M+ ‘Z’: 100M+

• What must be done here? What must be done here?