Sociology 601: Midterm review, October 15, 2009
• Basic information for the midterm– Date: Tuesday October 20, 2009– Start time: 2 pm.– Place: usual classroom, Art/Sociology 3221– Bring a sheet of notes, a calculator, two pens or pencils– Notify me if you anticipate any timing problems
• Review for midterm– terms– symbols– steps in a significance test– testing differences in groups– contingency tables and measures of association– equations
1
Important terms from chapter 1
Terms for statistical inference:• population• sample• parameter• statistic
Key idea: You use a sample to make inferences about a population
2
Important terms from chapter 2
2.1) Measurement:• variable• interval scale• ordinal scale• nominal scale• discrete variable• continuous variable
2.2-2.4) Sampling:• simple random sample• probability sampling• stratified sampling• cluster sampling• multistage sampling• sampling error
Key idea: Statistical inferences depend on measurement and sampling.3
Important terms from chapter 3
3.1) Tabular and graphic description• frequency distribution• relative frequency distribution• histogram• bar graph
3.2-3.4) Measures of central tendency and variation• mean• median• mode• proportion• standard deviation• variance• interquartile range• quartile, quintile, percentile
4
Important terms from chapter 3
Key ideas:
1.) Statistical inferences are often made about a measure of central tendency.
2.) Measures of variation help us estimate certainty about an inference.
5
Important terms from Chapter 4
• probability distribution• sampling distribution • sample distribution• normal distribution• standard error• central limit theorem• z-score
Key ideas:
1.) If we know what the population is like, we can predict what a sample might be like.
2.) A sample statistic gives us a best guess of the population parameter.
2.) If we work carefully, a sample can tell us how confident to be about our sample statistic.
6
Important terms from chapter 5
• point estimator
• estimate
• unbiased
• efficient
• confidence interval
Key ideas:
1.) We have a standard set of equations we use to make estimates.
2.) These equations are used because they have specific desirable properties.
3.) A confidence interval provides your best guess of a parameter.
4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter.
7
Important terms from chapter 6
6.1 – 6.3) Statistical inference: Significance tests
• assumptions• hypothesis• test statistic• p-value• conclusion• null hypothesis• one-sided test• two-sided test• z-statistic
8
Key Idea from chapter 6
A significance test is a ritualized way to ask about a population parameter.
1.) Clearly state assumptions
2.) Hypothesize a value for a population parameter
3.) Calculate a sample statistic.
4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic.
5.) Decide whether the hypothesis can be thrown out.
9
More important terms from chapter 6
6.4, 6.7) Decisions and types of errors in hypothesis tests
• type I error
• type II error
• power
6.5-6.6) Small sample tests
• t-statistic
• binomial distribution
• binomial test
Key ideas:
1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference.
2.) Small sample tests have the same principles as large sample tests, but require different assumptions and techniques.
10
Significance tests, Step 1: assumptions
• An assumption that the sample was drawn at random.– this is pretty much a universal assumption for all significance
tests.
• An assumption whether the variable has two outcome categories (proportion) or many intervals (mean).
• An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. – Some tests assume a normal population distribution.– Other tests assume different minimum sample sizes.– Some tests do not make this assumption.
• Declare α level at the start, if you use one. 12
Significance Tests, Step 2: Hypothesis
• State the hypothesis as a null hypothesis.– Remember that the null hypothesis is about the
population from which you draw your sample.
• Write the equation for the null hypothesis.
• The null hypothesis can imply a one- or two-sided test.– Be sure the statement and equation are consistent.
13
Significance Tests, Step 3: Test statistic
For the test statistic, write:• the equation, • your work, and • the answer.
– Full disclosure maximizes partial credit.
– I recommend four significant digits at each computational step, but present three as the answer.
14
Significance tests, Step 4: p-value
Calculate an appropriate p-value for the test-statistic.
– Use the correct table for the type of test;
– Use the correct degrees of freedom if applicable;
– Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.
15
Significance Tests, Step 5: Conclusion
Write a conclusion
– write the p-value, your decision to reject H0 or not;
– a statement of what your decision means;
– discuss the substantive importance of your sample statistic.
16
Useful STATA outputs
• immediate test for sample mean using TTESTI:
. * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500
. ttesti 100 508 100 500, level(95)
One-sample t test
------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 100 508 10 100 488.1578 527.8422
------------------------------------------------------------------------------
Degrees of freedom: 99
Ho: mean(x) = 500
Ha: mean < 500 Ha: mean != 500 Ha: mean > 500 t = 0.8000 t = 0.8000 t = 0.8000 P < t = 0.7872 P > |t| = 0.4256 P > t = 0.212820
Useful STATA outputs
• immediate test for sample proportion using PRTESTI:
• . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5• . prtesti 832 .53 .50, level(95)
• One-sample test of proportion x: Number of obs = 832
• ------------------------------------------------------------------------------• Variable | Mean Std. Err. [95% Conf. Interval]• -------------+----------------------------------------------------------------• x | .53 .0173032 .4960864 .5639136• ------------------------------------------------------------------------------
• Ho: proportion(x) = .5
• Ha: x < .5 Ha: x != .5 Ha: x > .5• z = 1.731 z = 1.731 z = 1.731• P < z = 0.9582 P > |z| = 0.0835 P > z = 0.0418
21
Useful STATA outputs
• Comparison of two means using ttesti•
• ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal
• Two-sample t test with unequal variances
• ------------------------------------------------------------------------------• | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]• ---------+--------------------------------------------------------------------• x | 4252 18.1 .1978304 12.9 17.71215 18.48785• y | 6764 32.6 .221294 18.2 32.16619 33.03381• ---------+--------------------------------------------------------------------• combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597• ---------+--------------------------------------------------------------------• diff | -14.5 .2968297 -15.08184 -13.91816• ------------------------------------------------------------------------------• Satterthwaite's degrees of freedom: 10858.6
• Ho: mean(x) - mean(y) = diff = 0
• Ha: diff < 0 Ha: diff != 0 Ha: diff > 0• t = -48.8496 t = -48.8496 t = -48.8496• P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000
24
Chapter 6: Significance Tests for Single Sample
or sample size best test
mean large z-test for Ybar - 0
proportion large z-test for hat - 1
mean small t-test for Ybar - 0
proportion small Fisher’s exact test
32
Equations for tests of statistical significance
€
z =Y − μ0
ˆ σ Y
33
€
z =ˆ π − π 0
σ ˆ π
€
t =Y − μ0
ˆ σ Y
Chapter 7: Comparing scores for two groups
or sample size sample scheme best test
mean large independent z-test for 2 - 1
proportion large independent z-test for 2 - 1
mean small independent t-test for 2 - 1
proportion small independent Fisher’s exact test
mean large dependent z-test for D
proportion large dependent McNemar test
mean small dependent t-test for D
proportion small dependent Binomial test34
Two Independent Groups: Large Samples, Means
€
7.1. difference of two large sample means : z =Y 2 −Y 1( ) − 0
s12
n1
+s2
2
n2
• It is important to be able to recognize the parts of the equation, what they mean, and why they are used.
• Equal variance assumption? NO
35
Two Independent Groups: Large Samples, Proportions
€
7.2 difference of 2 large sample proportions : z =ˆ π 2 − ˆ π 1( ) − 0
ˆ π (1− ˆ π )
n1
+ˆ π (1− ˆ π )
n2
• Equal variance assumption? YES (if proportions are equal then so are variances).
• df = N1 + N2 - 2
36
Two Independent Groups: Small Samples, Means
€
t(or z) =(Y 2 −Y 1) − 0
ˆ σ Y 2 −Y 1
=(Y 2 −Y 1)
(n1 −1)s12 + (n2 −1)s2
2
n1 + n2 − 2*
1
n1
+1
n2
7.3 Difference of two small sample means:
Equal variance assumption: SOMETIMES (for ease)
NO (in computer programs)37
Two Independent Groups: Small Samples, Proportions
Fisher’s exact test • via stata, SAS, or SPSS• calculates exact probability of all possible
occurences
38
Chapter 8: Analyzing associations
• Contingency tables and their terminologies:– marginal distributions and joint distributions– conditional distribution of R, given a value of E.
(as counts or percentages in A & F)– marginal, joint, and conditional probabilities.
(as proportions in A & F)
• “Are two variables statistically independent?”
40
Descriptive statistics you need to know
• How to draw and interpret contingency tables (crosstabs)
• Frequency and probability/ percentage terms– marginal – conditional– joint
• Measures of relationships: – odds, odds ratios– gamma and tau-b
41
Observed and expected cell counts
• fo, the observed cell count, is the number of cases in a given cell.
• fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other.
• fe = row total * column total / N
– the equation for fe is a correction for rows or columns with small totals.
42
Chi-squared test of independence
• Assumptions: 2 categorical variables, random sampling, fe >= 5
• Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.)
• Test statistic: 2 = ((fo-fe)2/fe)
• p-value from 2 table, df = (r-1)(c-1)
• Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.
43
Probabilities, odds, and odds ratios.
• Given a probability, you can calculate an odds and a log odds.– odds = p / (1-p)
• 50/50 = 1.0
• 0 ∞– log odds = log (p / (1-p) ) = log (p) – log(1-p)
• 50/50 = 0.0• -∞ +∞
– odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ]
• Given an odds, you can calculate a probability.p = odds / ( 1 + odds)
44
Measures of association with ordinal data
• concordant observations C: – in a pair, one is higher on both x and y
• discordant observations D:– in a pair, one is higher on x and lower on y
• ties– in a pair, same on x or same on y
• gamma (ignores ties)
• tau-b is a gamma that adjusts for “ties”– gamma often increases with more collapsed tables b and both have standard errors in computer output b can be interpreted as a correlation coefficient
€
=C − D
C + D
45
Top Related