M28- Categorical Analysis 1 Department of ISM, University of Alabama, 1992-2003 Categorical Data.
-
Upload
tracy-lamb -
Category
Documents
-
view
217 -
download
1
Transcript of M28- Categorical Analysis 1 Department of ISM, University of Alabama, 1992-2003 Categorical Data.
M28- Categorical Analysis 1 Department of ISM, University of Alabama, 1992-2003
Categorical Data Categorical Data
M28- Categorical Analysis 2 Department of ISM, University of Alabama, 1992-2003
Lesson Objective
Understand basic rules of probability.
Calculate marginal and conditional probabilities.
Determine if two categorical variables
are independent.
M28- Categorical Analysis 3 Department of ISM, University of Alabama, 1992-2003
Recall Rule of Thumb:
Quantitative variables: averages or differences have meaning.
Ex: weight, height, income, age
M28- Categorical Analysis 4 Department of ISM, University of Alabama, 1992-2003
Recall Rule of Thumb:
Categorical variables: classify people or things.
Ex: gender, race, occupation, political affiliation, country of origin
M28- Categorical Analysis 5 Department of ISM, University of Alabama, 1992-2003
Note: Sometimes quantitative variables are expressed as categorical.
Income (Family Economic Income):
Class Definition 1. Less than $30,000 2. $30,000 but less than $100,000 3. $100,000 or more.
M28- Categorical Analysis 6 Department of ISM, University of Alabama, 1992-2003
Relationships
Relationships
between between
variablesvariables
Relationships
Relationships
between between
variablesvariables
M28- Categorical Analysis 7 Department of ISM, University of Alabama, 1992-2003
Relationship between two quantitative variables?
Is relationship linear (scatterplot)?
Use Correlation &
Least Squares Regression.
Data transformations.
M28- Categorical Analysis 8 Department of ISM, University of Alabama, 1992-2003
Best graphical tool for examining the relationship between a quantitative variable and a categorical variable,(i.e., comparing distributions).
Recall: Boxplots
321
4000
3000
2000
originweight
US Far East EuropeW
eigh
t
“Do the distributions of weights vary for different countries of origin?”
Example: Weight vs. Country of Origin
Boxplot can be used to answer:
M28- Categorical Analysis 9 Department of ISM, University of Alabama, 1992-2003
Relationship between two categorical variables?
Use two-way frequency tables:
Look at marginal probabilities and conditional probabilities.
10M28- Categorical Data Data Department of ISM, University of Alabama, 1995-2003
STATISTICSSTATISTICSSTATISTICSSTATISTICS
is the science oftransforming datainto information
to make decisionsin the face of uncertainty.
M28- Categorical Analysis 11 Department of ISM, University of Alabama, 1992-2003
A numerical measure of the likelihood that an outcome or
an event occurs.
P(A) = probability of event A
Probability
How do we measure "uncertainty"?
M28- Categorical Analysis 12 Department of ISM, University of Alabama, 1992-2003
Three Methods for Assessing Probability
Classical
Relative Frequency
Subjective
M28- Categorical Analysis 13 Department of ISM, University of Alabama, 1992-2003
P(A) = 0 impossible event
P(A) = 1 certain event
2. Sum of the probabilities of all possible outcomes must equal 1. (Binomial, Poisson)
1. 0 < P(A) < 1_ _
Probability requirements fordiscrete variables:
M28- Categorical Analysis 14 Department of ISM, University of Alabama, 1992-2003
Conditional probability:The chance one event happens,given that another event willoccur.
P(A | B) =P(A and B)
P(B)
All outcomes belonging to BOTH A AND B
Those outcomes in the restricted group, B =
M28- Categorical Analysis 15 Department of ISM, University of Alabama, 1992-2003
Problem: Credit Card Manager
New credit test to determine credit worthiness.
Credit test checked against500 previous customers.
M28- Categorical Analysis 16 Department of ISM, University of Alabama, 1992-2003
350 50
20 80
Passed (P)
Failed (F)
Good (G)
Default (D)
400
100
370 130 500
Credit Test ACredit History
M28- Categorical Analysis 17 Department of ISM, University of Alabama, 1992-2003
P ( D ) What is the probability of a customer defaulting given that he fails test A?
What is the probability of a customer defaulting?
P ( D | F ) P(Defaults given failed test A) =
P(Defaults) =
350 50
20 80
P F
G
D
400
100
370 130 500
M28- Categorical Analysis 18 Department of ISM, University of Alabama, 1992-2003
General Rules:
P(A and B) = P(A) P(B|A)
= P(B) P(A|B)
P(A or B) = P(A) + P(B) - P(A and B)
M28- Categorical Analysis 19 Department of ISM, University of Alabama, 1992-2003
P(Fails AND Defaults)
= P(F) P(D|F)
350 50
20 80
P F
G
D
400
100
370 130 500
M28- Categorical Analysis 20 Department of ISM, University of Alabama, 1992-2003
P(Fails OR Defaults)
= P(F) + P(D) - P(D AND F)
Note: The “overlap” group Note: The “overlap” group would be counted twice if would be counted twice if no subtraction.no subtraction.
Note: The “overlap” group Note: The “overlap” group would be counted twice if would be counted twice if no subtraction.no subtraction.
350 50
20 80
P F
G
D
400
100
370 130 500
M28- Categorical Analysis 21 Department of ISM, University of Alabama, 1992-2003
Does knowledge of “test A result”help you make a better decision?
P ( D ) P ( D | F )
Do you want to know the test A results before you give the loan?
“Credit test A results” and “defaulting”
are ____________ on each other.
M28- Categorical Analysis 22 Department of ISM, University of Alabama, 1992-2003
A “Newer” Credit Test.
Is it even better? A “Newer” Credit Test.
Is it even better?
A different sample of 500 credit records
M28- Categorical Analysis 23 Department of ISM, University of Alabama, 1992-2003
340 60
85 15
Passed (P)
Failed (F)
Good (G)
Default (D)
400
100
425 75 500
Credit Test BCredit History
M28- Categorical Analysis 24 Department of ISM, University of Alabama, 1992-2003
P ( D ) What is the probability of a customer defaulting given that he fails test B?
What is the probability of a customer defaulting?
P ( D | F ) P(Defaults given failed test B) =
P(Defaults) =
340 60
85 15
P F
G
D
400
100
425 75 500
M28- Categorical Analysis 25 Department of ISM, University of Alabama, 1992-2003
Does knowledge of “test B result”help you make a better decision?
P ( D ) P ( D | F )
Test B tells me .“Credit test B results” and “defaulting” are
of each other.
M28- Categorical Analysis 26 Department of ISM, University of Alabama, 1992-2003
Independence
Independence
M28- Categorical Analysis 27 Department of ISM, University of Alabama, 1992-2003
Two events are independent if the occurrence, or non-occurrence, of one does not affect the chances of the other occurring, or not occurring.
Otherwise, we say the
events are dependent.
M28- Categorical Analysis 28 Department of ISM, University of Alabama, 1992-2003
If A and B independentindependent, then
P(A and B) = P(A) P(B)
P(A or B) = P(A) + P(B) - P(A) P(B)
P(A|B) = P(A)
P(B|A) = P(B)
Note: The condition Note: The condition does NOT changedoes NOT changethe probability.the probability.
M28- Categorical Analysis 29 Department of ISM, University of Alabama, 1992-2003
Survey of randomly selectedpeople voters in Jan. 2001:
Q1: Did you vote in the 2000 election?
Q2: Do you favor an amendment to require a balanced budget?
Q3: To which political party do you belong ?
M28- Categorical Analysis 30 Department of ISM, University of Alabama, 1992-2003
Political Party:
Republican
Democrat
Other
Total
Do you favor amendmentfor a balancedbudget?
Yes No Total
90
44
48
182 218 400
172
148
80
82
104
32
Sample size
Republican
Democrat
Other
Total
Party:
Favor amendment
Yes No Total
90 82 172
44 104 148
48 32 80
182 218 400
Marginal totalsfor opinion.
Marginal totals for Party.
What proportionfavor the amend.?
What proportionclaim to be Rep?What proportion
favor the amend.andand are Other?
Yes No Total Party
Favor amend.
90 82 172
44 104 148
48 32 80
182 218 400
Repub
Demo
Other
Total
What proportionfavor the amend,given those that claim to be Rep?
Of those that claim to be Democrat,what proportionfavor the amend.
Considering onlythose opposed, what proportionare not Republican?
Yes No Total Party
Favor amend.
90 82 172
44 104 148
48 32 80
182 218 400
Repub
Demo
Other
Total
M28- Categorical Analysis 34 Department of ISM, University of Alabama, 1992-2003
Restrict subjects to only those that meet a condition. Within this restricted group, what is the distribution of some other var.?
Distribution of “opinion” given those that claim to be Republican:
P( Yes | Rep. ) = .523
P( No | Rep. ) = .477
90172
82172
“given that”
Conditional Distribution:
M28- Categorical Analysis 35 Department of ISM, University of Alabama, 1992-2003
Is there a relationship betweenthe party and the opinion on the amendment?
What would you expect to happen if
no relationship existed?
M28- Categorical Analysis 36 Department of ISM, University of Alabama, 1992-2003
Three Conditional Distributions:
P( Yes | Rep.) = .523, P( No | Rep.) =P( Yes | Demo) = .297, P( No | Demo) =
P( Yes | Other) = .600, P( No | Other) =
Marginal Distribution: P( Yes ) = .455, P( No ) = .545
Is there a relationship?Is there a relationship?Why? or Why not?Why? or Why not?
M28- Categorical Analysis 37 Department of ISM, University of Alabama, 1992-2003
If there is NO relationship(i.e., independence)between the party andthe opinion, then
“the three conditional probabilities
should be the close to each
other and close to the marginal probability.”
M28- Categorical Analysis 38 Department of ISM, University of Alabama, 1992-2003
Three Conditional Probabilities:
P( Yes | Rep.) = .523
P( Yes | Demo) = .297
P( Yes | Other) = .600
Marginal Probability: P( Yes ) = .455
Not close; therefore, Not close; therefore, “party” and party” and the “opinion” are the “opinion” are ____________.
Are these close to
each other?
AND close to the “marginal”?
M28- Categorical Analysis 39 Department of ISM, University of Alabama, 1992-2003
Visual Displays
Create with “Pivot Tables”
in Excel.
M28- Categorical Analysis 40 Department of ISM, University of Alabama, 1992-2003
Rep.
Demo.
Other
Barchart- Clustered
Frequency
Yes
M28- Categorical Analysis 41 Department of ISM, University of Alabama, 1992-2003
Rep.
Demo.
Other
Barchart- Stacked
Frequency
Yes
M28- Categorical Analysis 42 Department of ISM, University of Alabama, 1992-2003
Rep.
Demo.
Other
Barchart- Percents
Percent
Yes
M28- Categorical Analysis 43 Department of ISM, University of Alabama, 1992-2003
SummaryFor two categorical variables: Must use conditional probabilities to determine if a relationship exists.
Cannot use correlation.
Visual display: Stacked percentage bar charts
M28- Categorical Analysis 44 Department of ISM, University of Alabama, 1992-2003
Quant. vs. Quant
numerical graphical
LS regression line, r, r-sq, std error
Scatterplot,residual plots
X-bar and sfor each category
Side-by-side box plots
Two-way table, conditional & marginal distributions
Bar chart : stacked, percent.
Cat. vs. Cat.
Quant. vs. Cat.
Variables
Associations between TWO Variables