Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
-
Upload
reynold-barrett -
Category
Documents
-
view
216 -
download
2
Transcript of Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Lecture 3Lecture 3
MARK2039MARK2039
Winter 2006Winter 2006
George Brown CollegeGeorge Brown College
Wednesday 9-12Wednesday 9-12
Boire Filler GroupBoire Filler Group
RecapRecap• What are the four stages of data mining What are the four stages of data mining
and who are the stakeholdersand who are the stakeholders
• Data mining measures and metricsData mining measures and metrics– MeanMean– MedianMedian– ModeMode– Standard DeviationStandard Deviation
• Why are these above Statistics important Why are these above Statistics important in evaluating numbersin evaluating numbers
Boire Filler GroupBoire Filler Group
RecapRecap• Is the Average or Mean Appropriate Is the Average or Mean Appropriate
in deriving Insight about a in deriving Insight about a group,segment or sample behaviour.group,segment or sample behaviour.
• Why do we need to look at how Why do we need to look at how numbers vary?numbers vary?
• What are some of the measures used What are some of the measures used to assess variation?to assess variation?
Boire Filler GroupBoire Filler Group
RecapRecap
2 distributions above. What do they mean and you would interpret the results. Both distributions have the same median and mean
Distribution A Distribution B350 700500 725750 7501000 7751150 800
Boire Filler GroupBoire Filler Group
RecapRecapDistribution A Distribution B
3 34 45 56 67 78 1000
What is the problem here?
Boire Filler GroupBoire Filler Group
RecapRecap• Consider the following two Consider the following two
distributions ...distributions ...Distribution Distribution
A: B:0 4,500
1,000 4,6002,000 4,7003,000 4,8004,000 4,9005,000 5,0006,000 5,1007,000 5,2008,000 5,3009,000 5,40010,000 5,500
Mean 5,000 5,000
Stdev 3,316.62 331.66
Boire Filler GroupBoire Filler Group
RecapRecap• For a binomial distribution, such as For a binomial distribution, such as
response, we must use a different response, we must use a different formula.formula.
)()*( Nqp
1 Responder 0 Non - responder0 Non - responder1 Responder 0 Non - responder1 Responder 0 Non - responder0 Non - responder0 Non - responder0 Non - responder
0.300 Mean0.145 Stdev
Boire Filler GroupBoire Filler Group
RecapRecap• What are Indexes.What are Indexes.
• Give me some examples.Give me some examples.
• Why are they important in the Why are they important in the marketingmarketingworld?world?
• What is the most common one used What is the most common one used in the marketing world? in the marketing world?
Boire Filler GroupBoire Filler Group
LiftLift• Lift represents a relative comparison Lift represents a relative comparison
between two numbers. It is a type of between two numbers. It is a type of index. How is normally used?index. How is normally used?
• Typically, it represents the number of Typically, it represents the number of a particular of a particular group a particular of a particular group divided by the average.( divided by the average.( X1/average).X1/average).
• Example: Example:
Response Rate Target Group 2%
Average 1.50%
Boire Filler GroupBoire Filler Group
Recap-LiftRecap-Lift• Use relative measures and not absolutesUse relative measures and not absolutes• The notion of “lift” should be the marketer’s key The notion of “lift” should be the marketer’s key
determinant of successdeterminant of success
Example
Campaign 1Campaign 1 Campaign 2Campaign 2
Strategy 1Strategy 1 3% Resp. Rate3% Resp. Rate 23% Resp. Rate23% Resp. Rate
Strategy 2Strategy 2 1.5% Resp. 1.5% Resp. RateRate
21.5% Resp. 21.5% Resp. RateRate
DifferencDifferencee
1.5% Resp. 1.5% Resp. RateRate
1.5% Resp. 1.5% Resp. RateRate
What is the key learning here?
Boire Filler GroupBoire Filler Group
Assignment 2Assignment 21.Answer the following questions on the table listed below:
Calculate the following averages and medians for each column of numbers
mean 222.5 3000 460
median 240 3000 300
Col. A Col. B Col.C240 4000 300250 4000 300220 3000 300250 2000 400240 1000 100240 5000 150240 3000 150260 3000 40050 2000 500235 3000 2000
Boire Filler GroupBoire Filler Group
Assignment 2 Assignment 2 What kind of distributions are col. A and col. C and what metric would be best
used to communicate to business users.
Skewed,or asymmetric,or nonnormal. Median is the key measure What column would be most reliable in estimating results to a larger population
and why?
Col. A as std.deviation is smallest which allows our range around the mean to be much tighter.
Boire Filler GroupBoire Filler Group
Assignment 2Assignment 22 marks 2. The median height of 65 inches is the same for two classes. Yet, the average in one class is 65 inches vs. 70 inches in another class. What is causing this difference? An outlier value containing a very tall person is causing the mean of one class to be 70 inches
Boire Filler GroupBoire Filler Group
Assignment 2Assignment 2
Calculate the index values for each variable for Customer A. Why are indexes useful in database marketing?
Spending: .5 Tenure: .5 Income: 1.2
Indexes are useful as relative measures in terms of comparing a value relative to the average and being able to rank order or prioritize records
Boire Filler GroupBoire Filler Group
Evaluating test resultsEvaluating test results• In database marketing, marketers In database marketing, marketers
are constantly asked what to are constantly asked what to conclude from their testing results.conclude from their testing results.
• For instance, are the results of one For instance, are the results of one strategy significantly different than strategy significantly different than another strategy.another strategy.
• Let’s take a look at some examples. Let’s take a look at some examples.
Boire Filler GroupBoire Filler Group
Evaluating Marketing TestEvaluating Marketing Test• Two groups of cells have been tested Two groups of cells have been tested
for different communication for different communication strategies. Results are as follows. strategies. Results are as follows. What would youWhat would youconclude? conclude?
Strategy Sample SizeResponse
RateA 10000 2.30%B 5000 2%
Boire Filler GroupBoire Filler Group
Evaluating Marketing TestEvaluating Marketing Test• To determine this, you need to do To determine this, you need to do
statistical testing which essentially statistical testing which essentially comprises three factors:comprises three factors:– Confidence level that you wantConfidence level that you want– Actual standard deviation based on the lower Actual standard deviation based on the lower
sample sizesample size– Response Rate Or performance RateResponse Rate Or performance Rate
– For our purposes, we will use a 95% confidence For our purposes, we will use a 95% confidence interval which essentially translates into 2 interval which essentially translates into 2 standard deviations around the mean standard deviations around the mean
Boire Filler GroupBoire Filler Group
Evaluating Marketing TestEvaluating Marketing Test• Calculate the following confidence Calculate the following confidence
intervals at 95%intervals at 95%– 1% with a std. deviation of .1%1% with a std. deviation of .1%– 2% with a std. Deviation of .05%2% with a std. Deviation of .05%– 5% with a std. Deviation of .5%5% with a std. Deviation of .5%– 5% with a std. Deviation of .3%5% with a std. Deviation of .3%
• Let’s get back to the problemLet’s get back to the problem
Boire Filler GroupBoire Filler Group
Evaluating Marketing TestEvaluating Marketing Test• Two groups of cells have been tested Two groups of cells have been tested
for different communication for different communication strategies. Results are as follows. strategies. Results are as follows. What would youWhat would youconclude? conclude?
Strategy Sample SizeResponse
RateA 10000 2.30%B 5000 2%
Boire Filler GroupBoire Filler Group
Evaluating Marketing TestEvaluating Marketing Test• Calculate the standard deviation first using Calculate the standard deviation first using
the sample with the lower qty-Strategy B.the sample with the lower qty-Strategy B.–
– Sq. root of (.02X.98)/5000=.00198Sq. root of (.02X.98)/5000=.00198
– 95% confidence interval=95% confidence interval=• .02+2*.00198 and .02-2*.00198=.02+2*.00198 and .02-2*.00198=
• .01604<=.02<=.02396..01604<=.02<=.02396.
– Based on this result, what can you concludeBased on this result, what can you concludebetween Strategy A and Strategy B between Strategy A and Strategy B
)()*( Nqp
Boire Filler GroupBoire Filler Group
Evaluating Marketing Test Evaluating Marketing Test ResultsResults
• Two other groups of cells have been Two other groups of cells have been tested for different communication tested for different communication strategies. Results are as follows. strategies. Results are as follows. What would you conclude?What would you conclude?
•Strategy •Sample Size•Response
•Rate•A •1000 •5.00%•B •2000 •3%
Suppose the A becomes 3.3%. What would you conclude?
Boire Filler GroupBoire Filler Group
Evaluating Marketing TestEvaluating Marketing Test• Calculate the standard deviation first using Calculate the standard deviation first using
the sample with the lower qty-Strategy A.the sample with the lower qty-Strategy A.–
– Sq. root of (.05X.95)/1000=.00689Sq. root of (.05X.95)/1000=.00689
– 95% confidence interval=95% confidence interval=• .05+2*.00689 and .05-2*.00689=.05+2*.00689 and .05-2*.00689=
• .03622<=.05<=.06378..03622<=.05<=.06378.
– Based on this result, what can you concludeBased on this result, what can you concludebetween Strategy A and Strategy B between Strategy A and Strategy B
)()*( Nqp
Boire Filler GroupBoire Filler Group
Evaluating Marketing Test Evaluating Marketing Test ResultsResults
• Two other groups of cells have been Two other groups of cells have been tested for different communication tested for different communication strategies. Results are as follows. strategies. Results are as follows. What would you conclude?What would you conclude?
•Strategy •Sample Size•Response
•Rate•A •1000 •5.00%•B •2000 •4.0%
Suppose B becomes 4.0%. What would you conclude?
Boire Filler GroupBoire Filler Group
Evaluating Marketing Test Evaluating Marketing Test ResultsResults• Calculate the standard deviation first using Calculate the standard deviation first using
the sample with the lower qty-Strategy A.the sample with the lower qty-Strategy A.–
– Sq. root of (.05X.95)/1000=.00689Sq. root of (.05X.95)/1000=.00689
– 95% confidence interval=95% confidence interval=• .05+2*.00689 and .05-2*.00689=.05+2*.00689 and .05-2*.00689=
• .03622<=.05<=.06378..03622<=.05<=.06378.
– Based on this result, what can you concludeBased on this result, what can you concludebetween Strategy A and Strategy B between Strategy A and Strategy B
)()*( Nqp
Boire Filler GroupBoire Filler Group
Evaluating Marketing Test Evaluating Marketing Test ResultsResults• Having done several of these tests, Having done several of these tests,
what will cause your confidence what will cause your confidence range to narrowrange to narrow– Large sample sizeLarge sample size– Smaller response ratesSmaller response rates
DataData
Review of DataReview of Data
Boire Filler GroupBoire Filler Group
Types Of Data/FormatTypes Of Data/Format
• Character-Level DataCharacter-Level Data
• Numeric DataNumeric Data
• DateDate
• Give me some examplesGive me some examples
• In Data Mining, what do we have to do In Data Mining, what do we have to do with all data before building a solutionwith all data before building a solution
Boire Filler GroupBoire Filler Group
Data Format ExamplesData Format Examples• GenderGender
• IncomeIncome
• SpendingSpending
• BirthdateBirthdate
• Customer typeCustomer type
• How would you use gender,customer How would you use gender,customer type, and birthdate in a data mining type, and birthdate in a data mining exercise exercise
Boire Filler GroupBoire Filler Group
Data TransformationData Transformation• Gender VariableGender Variable
– Male=1, non male=0Male=1, non male=0– Female=1,non female=0Female=1,non female=0– What happens to missing values here?What happens to missing values here?
• Customer Type VariableCustomer Type Variable– Gold member=1,non gold member=0Gold member=1,non gold member=0– Platinum member=1,non platinum Platinum member=1,non platinum
member=0member=0– Etc. Etc.
Boire Filler GroupBoire Filler Group
Data TransformationData Transformation• BirthdateBirthdate
– Convert birthdate to ageConvert birthdate to age– Extract birthyear from birthdate field and Extract birthyear from birthdate field and
substract from current year(i.e.2005-substract from current year(i.e.2005-1954)1954)
• Date of last Spending ActivityDate of last Spending Activity– Create recency of last spendCreate recency of last spend– Create tenure variableCreate tenure variable– How would this be done.How would this be done.
Boire Filler GroupBoire Filler Group
DataData• Discrete vs. index vs. continuousDiscrete vs. index vs. continuous
• DiscreteDiscrete– Yes/NoYes/No– On/OffOn/Off
• Convert above type data to 1,0 type Convert above type data to 1,0 type scenarioscenario
Boire Filler GroupBoire Filler Group
DataData• Index Type DataIndex Type Data
Customer Type Average SpendRegular 100Gold 200
Platinum 300Average 125
Could convert each customer type to binary value.
But what would be more valuable way to convert ortransform this variable?
List Source Average spendA 200B 400C 600
Average 400
Boire Filler GroupBoire Filler Group
Data Data • Continuous dataContinuous data
– What are some examplesWhat are some examples
• What does it mean when we say that What does it mean when we say that data is continuous? data is continuous?
Boire Filler GroupBoire Filler Group
Data TypeData Type• Looking at data as we have in the last Looking at data as we have in the last
number of slides, we can create what number of slides, we can create what we call data categories:we call data categories:
– NominalNominal– OrdinalOrdinal– IntervalInterval
Boire Filler GroupBoire Filler Group
Data CategoriesData Categories
• Nominal variables are variables Nominal variables are variables where the values do not represent where the values do not represent any real order or magnitude of value.any real order or magnitude of value.
• Examples:Examples:– GenderGender– Product CategoryProduct Category– Promotion Category Promotion Category
Boire Filler GroupBoire Filler Group
Data CategoriesData Categories• Ordinal Variables represent fields Ordinal Variables represent fields
where where the values have some orderthe values have some order
• Good examples are: Good examples are: – index-type variables index-type variables – Model rankModel rank– Etc. Etc.
Boire Filler GroupBoire Filler Group
Data CategoriesData Categories• Interval Variables represent fields Interval Variables represent fields
where the actual values indicate order where the actual values indicate order but also magnitude.but also magnitude.– IncomeIncome– SpendSpend– Model ScoreModel Score
• What data category is the most What data category is the most granular?granular?
• Which category might you typically Which category might you typically expect to be more powerful in a data expect to be more powerful in a data mining exercise? mining exercise?
Boire Filler GroupBoire Filler Group
Data UsefulnessData Usefulness• When is Data Useful?When is Data Useful?
– Few Missing valuesFew Missing values– Variable does not consist primarily of one Variable does not consist primarily of one
valuevalue– Non-Numeric Data consists of too many Non-Numeric Data consists of too many
values which cannot be properly grouped values which cannot be properly grouped into more meaningful categories into more meaningful categories
Boire Filler GroupBoire Filler Group
Examples-Analytical Examples-Analytical PerspectivePerspective
Variable # of recordsData Field
format# of Unique
values
# of missing values
Income 100000 numeric 50000 2000Customer Type 100000 character 4 10000
Gender 100000 character 2 50000Household Size 100000 numeric 7 90000Product Type 100000 character 3000 5000
Customer Name 100000 character 100000 0Postal Code 100000 character 50000 0
What fields are useful and why?
Boire Filler GroupBoire Filler Group
ExamplesExamples
Income % of Records
<25000 25%25000-50000 25%50000-75000 25%
75000+ 23%Missing 2%
Closer look at income
Gender % of recordsMale 23%
Female 27%Missing 50%
Closer look at gender
Boire Filler GroupBoire Filler Group
ExamplesExamples• Closer Look at Customer TypeCloser Look at Customer Type
Customer Type % of recordsGold 5%
Bronze 40%Silver 30%
Platinum 15%Missing 10%
Closer look at Product Type
Product Type % of records Cum. % of recordsA001 0.07% 0.07%B001 0.08% 0.15%C003 0.06% 0.21%A010 0.06% 0.27%….
missing 5% 99.92%Z004 0.08% 100%
Boire Filler GroupBoire Filler Group
ExamplesExamples
Variables # of recordsData Field
Format# of unique
values# of missing
values1st 3 digits of postal
code 100000 character ? 100000household size 100000 numeric ? 100000
Credit score 100000 numeric ? 100000 mortgage account 100000 character ? 100000
Product code 100000 character ? 100000
Median Income of Postal Code of record 100000 numeric ? 100000
•What variables would be useful hereWhat variables would be useful here
•What would be the number of unique variablesWhat would be the number of unique variables
Boire Filler GroupBoire Filler Group
ExamplesExamples
Variables # of recordsData Field
Format# of unique
values# of missing
values1st 3 digits of postal
code 100000 character 100000 0household size 100000 numeric 100000 0
Credit score 100000 numeric 100000 0 mortgage account 100000 character 100000 0
Product code 100000 character 100000 0
Median Income of Postal Code of record 100000 numeric 100000 0
•What variables would be useful hereWhat variables would be useful here
Boire Filler GroupBoire Filler Group
Examples-Marketing Examples-Marketing PerspectivePerspective
• A mortgage company is conducting a A mortgage company is conducting a campaign to its high value customers. campaign to its high value customers. One of the key characteristics of value is One of the key characteristics of value is high income which is self-reported at high income which is self-reported at time of application. time of application. Income % of records
< 30000 5%30000-60000 5%60000-80000 20%
80000-100000 10%100000+ 10%missing 50%
As a marketer, how will you use this information and what do you need to consider?
Boire Filler GroupBoire Filler Group
Examples-Marketing Examples-Marketing PerspectivePerspective
• An insurance company is marketing an An insurance company is marketing an insurance product to people over the insurance product to people over the age of 60. Listed below is a report age of 60. Listed below is a report indicating the distribution of age.indicating the distribution of age.
•
As a marketer, how will you use this information?
Age % of records<30 5%
30-40 10%40-55 15%55-65 10%65+ 10%
missing 50%
Boire Filler GroupBoire Filler Group
Examples-Marketing Examples-Marketing PerspectivePerspective
• An retail company has over 1000 product An retail company has over 1000 product SKU’s. After investigation, it has been SKU’s. After investigation, it has been determined that the 1determined that the 1stst digit represents a digit represents a broader product category. You have been broader product category. You have been asked to design the product layout for all asked to design the product layout for all stores. stores.
As a marketer, how will you use this information?
Product SKU % of records Cum. % of recordsA000003 0.03% 0.03%A000004 0.02% 0.05%B000005 0.03% 0.08%B000006 0.04% 0.12%
….Z999999 0.02% 100%
Boire Filler GroupBoire Filler Group
Examples-Marketing Examples-Marketing PerspectivePerspective•
Gender % of recordsMale 10%
Female 12%Missing 88%
Income % of records0-20K 5%
20K-40K 4%40K-60K 7%60K-80K 6%
80K+ 5%missing 73%
What can be done here, if anything and what else can we consider in terms of using gender and income information ?
Boire Filler GroupBoire Filler Group
Examples-Marketing Examples-Marketing PerspectivePerspective• You have postal code information for You have postal code information for
each customer. You are asked to each customer. You are asked to design customer reports by design customer reports by province.How would you do this? province.How would you do this?
Boire Filler GroupBoire Filler Group
Examples-Data Mining PerspectiveExamples-Data Mining Perspective• You have the following variables and valuesYou have the following variables and values
– Gender: ’M’:MaleGender: ’M’:Male ‘F’:Female ‘F’:Female
– Age:Age: ‘B’: <20M ‘B’: <20M ‘F’: 20M-40M ‘F’: 20M-40M ‘R’:40M-60M ‘R’:40M-60M ‘S’:60M-80M ‘S’:60M-80M ‘T’:80M-100M ‘T’:80M-100M ‘Z’: 100M+ ‘Z’: 100M+
• What must be done here? What must be done here?