© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The...
Transcript of © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The...
© 2008 McGraw-Hill Higher Education
The Statistical Imagination
• Chapter 13:
Nominal Variables: The Chi-Square and Binomial Distributions
© 2008 McGraw-Hill Higher Education
The Chi-Square Test
• Chi-Square is a test for a relationship between two nominal variables
• Calculations are made using a cross-tabulation (or “crosstab”) table, which reports frequencies of joint occurrences of attributes
© 2008 McGraw-Hill Higher Education
Crosstab Tables
• Cross-tabulation or “crosstab” tables are designed to compare the frequencies of two nominal/ordinal variables at once
© 2008 McGraw-Hill Higher Education
Sample Crosstab Table
• Spent night on streets in last 2 weeks by gender among homeless persons
On streets Male Female Total
Yes 28 10 38
No 79 44 123
Total 107 54 161
© 2008 McGraw-Hill Higher Education
Reading a Crosstab Table
• The number in a cell is the frequency of joint occurrences, where a joint occurrence is the combination of categories of the two variables for a single individual
• From the cell, look up then look to the left• E.g., in the table above, the joint
occurrence of “male and on-street” is 28, the number in the sample who are both male and spent a night on the streets
© 2008 McGraw-Hill Higher Education
Reading a Crosstab Table (cont.)
• The numbers in the margins on the right side and the bottom present marginal totals, the total number of subjects in a category
• The grand total (n, the sample size) is presented in the bottom right-hand corner
© 2008 McGraw-Hill Higher Education
Crosstab Tables and the Chi-Square Test
• For the chi-square test, the categories of the independent variable (X) go in the columns of the table, and those of the dependent variable (Y), in the rows
• E.g.: Is gender a good predictor of who among homeless persons is likely to spend a night on the streets?
© 2008 McGraw-Hill Higher Education
Calculating Expected Frequencies
• In addition to the observed joint frequencies, the chi-square test involves calculating the expected frequency of each table cell
• The expected frequency of a cell is equal to the column marginal total for the cell (look down) times the row marginal total for cell (look to the right) divided by the grand total
© 2008 McGraw-Hill Higher Education
Using Expected Frequencies to Test the Hypothesis
• The expected frequencies are those that would occur if there is no relationship between the two nominal/ordinal variables
• The chi-square statistic measures the gap between expected and observed frequencies
• If there is no relationship, then the expected and observed frequencies are the same and chi-square computes to zero
© 2008 McGraw-Hill Higher Education
The Chi-Square Statistic
• The sampling distribution is generated using the chi-square equation:
χ2 = Σ[(O-E)2/ E]
where O is the observed frequency of a cell,
and E is the expected frequency• Chi-square tells us whether the summed squared
differences between the observed and expected cell frequencies are so great that they are not simply the result of sampling error
© 2008 McGraw-Hill Higher Education
When to Use the Chi-Square Statistic
1) There is one population with a representative sample from it
2) There are two variables, both of a nominal/ordinal level of measurement
3) The expected frequency of each cell in the crosstab table is at least five
© 2008 McGraw-Hill Higher Education
Features of the Chi-Square Hypothesis Test
• Step 1. The H0 states that there is no relationship between the two variables. When this is the case, chi-square calculates to a value of zero, give or take some sampling error
• This null hypothesis asserts no difference in observed and expected frequencies
© 2008 McGraw-Hill Higher Education
Features of the Chi-Square Hypothesis Test (cont.)
• Step 2. The sampling distribution is the chi-square distribution. It describes all possible outcomes of the chi-square statistic with repeated sampling when there is no relationship between X and Y
• Degrees of freedom are determined by the number of columns and rows in the crosstab table: df = (r -1) (c -1)
© 2008 McGraw-Hill Higher Education
Features of the Chi-Square Hypothesis Test (cont.)
• Step 4. The test effects are the differences between expected and observed frequencies
• The test statistic is the chi-square statistic• The p-value is obtained by comparing the
calculated chi-square value to the critical values of the chi-square distribution in Statistical Table G of Appendix B
© 2008 McGraw-Hill Higher Education
The Existence of a Relationship for the Chi-Square Test
• Existence: Test the H0 that χ2 = 0;
that is, there is no relationship between X and Y
• If the H0 is rejected, a relationship exists
© 2008 McGraw-Hill Higher Education
Direction and Strength of a Relationship for Chi-Square
• Direction: Not applicable (because the variables are nominal level)
• Strength: These measures exist but are seldom reported because they are prone to misinterpretation
© 2008 McGraw-Hill Higher Education
Nature of a Relationship for the Chi-Square Test
• Nature: Report the differences between the observed and expected cell frequencies for a couple of outstanding cells
• Calculate column percentages for selected cells
© 2008 McGraw-Hill Higher Education
Column and Row Percentages
• A column percentage is a cell’s frequency as a percentage of the column marginal total
• A row percentage is a cell’s frequency as a percentage of the row marginal total
© 2008 McGraw-Hill Higher Education
Chi-Square as a Difference of Proportions Test
• The chi-square test is frequently used to compare proportions of categories of a nominal/ordinal variable for two or more groups of a second nominal/ordinal variable
• Thus, it may be viewed as a difference of proportions test as illustrated in Figure 13-2 in the text
© 2008 McGraw-Hill Higher Education
The Binomial Distribution
• The binomial distribution test is a small single-sample proportions test. Contrast it to the large single-sample proportions test of Chapter 10
• The test hinges on mathematically expanding the binomial distribution equation, (P + Q)n
© 2008 McGraw-Hill Higher Education
When to Use the Binomial Distribution
1) There is only one nominal variable and it is dichotomous, with P = p [of success] and Q = p [of failure]
2) There is a single, representative sample from one population
3) Sample size is such that [(psmaller)(n)] < 5, where psmaller = the smaller of Pu and Qu
4) There is a target value of the variable to which we may compare the sample proportion
© 2008 McGraw-Hill Higher Education
Expansion of the Binomial Distribution Equation
• Expansion of the binomial distribution equation, (P + Q)n, provides the sampling distribution for dichotomous events. That is, the equation describes all possible sampling outcomes and the probability of each, where there are only two possible categories of a nominal variable
© 2008 McGraw-Hill Higher Education
An Example of an Expanded Binomial Equation
• The equation reveals, for example, the possible outcomes of the tossing of 4 coins
• P = p [heads] = .5; Q = p [tails] = .5; n = 4 coins
• (P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4 • Add the coefficients to get the total number
of possible outcomes = 16• The probability of 3 heads and 1 tails, is the
coefficient of P3Q1 over the sum of coefficients = 4 over 16 = .25
© 2008 McGraw-Hill Higher Education
Pascal’s Triangle
• Pascal’s Triangle provides a shortcut method for expanding the binomial equation
• It provides the coefficients for small samples and allows a quick computation of the probabilities of all possible outcomes when P and Q are equal to .5
• See Table 13-7 in the text
© 2008 McGraw-Hill Higher Education
Features of the Binomial Distribution Test
• Step 1. H0: Pu = a target value
• Step 2. The sampling distribution is an expanded binomial equation for the given sample size
© 2008 McGraw-Hill Higher Education
Features of the Binomial Distribution Test (cont.)
• Step 4. The effect is the observed combination of successes and failures, which corresponds to a term in the equation (e.g., 3 heads and 1 tails, is represented by the term 4P3Q1)
• The test statistic is the expanded binomial equation
• The p-value is taken directly from the equation (not from a statistical table)
© 2008 McGraw-Hill Higher Education
Statistical Follies: Statistical Power and Sample Size
• For a given level of significance, statistical power is a test statistic’s probability of not incurring a Type II error (i.e., unknowingly making the incorrect decision of failing to reject a false null hypothesis)
• Low statistical power can result from having too small a sample size