Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals...
-
Upload
francis-matthews -
Category
Documents
-
view
215 -
download
0
Transcript of Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals...
Introduction To Data Analysis Lab 10Section 9-1: Inference for Slope and CorrelationSection 9-3: Confidence and Prediction Intervals
[email protected] the Maths Study Centre 11am-5pm CB04.03.331This presentation is viewable on http://mahritaharahap.wordpress.com/teaching-areas/Check out the www.khanacademy.org website
Statistical inference is the process of drawing conclusions about the entire population based on information in a sample by: • constructing confidence
intervals on population parameters
• or by setting up a hypothesis test on a population parameter
RegressionThe linear regression line characterises the relationship between two quantitative variables. Using regression analysis on data can help us draw insights about that data. It helps us understand the impact of one of the variables on the other.It examines the relationship between one independent variable (predictor/explanatory) and one dependent variable (response/outcome) . The linear regression line equation is based on the equation of a line in mathematics.
β0+β1
X
X:Predictor VariableExplanatory VariableIndependent VariableVariable one can control.
Y:Outcome variableResponse VariableDependent VariableThe outcome to be measured/predicted.
),0(~10 NX
iii yye ˆ
General: Hypothesis Testing We use hypothesis testing to infer conclusions about the population
parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter.1. The null hypothesis, denoted H0 is a statement or claim about a population
parameter that is initially assumed to be true. No “effect” or no “difference”. Is always an equality. (Eg. H0: population parameter=hypothesised null parameter)
2. The alternative hypothesis, denoted by H1 is the competing claim. What we are trying to prove. Claim we seek evidence for. (Eg. H1: population parameter ≠ or < or > hypothesised null parameter)
3. Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained.
4. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H0 is true.If p-value≤0.05 reject Ho
If p-value>0.05 do not reject Ho
5. Conclusion: Make your conclusion in context of the problem.
Hypothesis Test for Correlation Coefficient
H0: ρ=0
H1: ρ≠0 (or ρ≠ρ*, ρ<ρ* or ρ>ρ*)
Test Statistic:~tn-2
If p-value≤α, reject H0. We conclude that the correlation is significantly different from zero.
If p-value>α, do not reject H0. We conclude that the correlation is not significantly different from zero.
Correlation is not significant.Correlation is significant.
Correlation measures the strength of the linear association between two variables.
Sample Correlation: r=
Interpretations of slopeINTERPRETATION OF THE SLOPE
The slope ‘β’ represents the predicted change in the response variable y given a one unit increase in the explanatory variable x.As the “independent variable” increases by 1 unit, the predicted “dependent variable” increases/decreases by β units on average.
Y=a+β*
X
H0: β=0. There is no association between the response variable and the
independent variable. (Regression is insignificant) y=α + 0*X
H1: β≠0. The independent variables will affect the response variable.
(Regression is significant) y=α + βX If p-value≤0.05. We reject H0. There is evidence that β≠0,
which means that the independent variable is an effective predictor of the dependent variable, at the 5% level of significance.
If p-value>0.05. We do not reject Ho. There is no evidence that β≠0, which means that the independent variable is NOT an effective predictor of the dependent variable, at the 5% level of significance.
Hypothesis Test for Slope
Confidence Interval for Slope
ESt ndf .*2/,2
11
Coefficient for determination R2
R-squared gives us the proportion of the total variability in the response variable (Y) that is “explained” by the least squares regression line based on the predictor variable (X). It is usually stated as a percentage. �
�Interpretation: On average, R2% of the variation in the dependent variable can be explained by the independent variable through the regression model.
Confidence Intervals and Prediction Intervals
The key point is that the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean. Prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval.
REVISION
Statistical inference is the process of drawing conclusions about the entire population based on information in a sample by: • constructing confidence
intervals on population parameters
• or by setting up a hypothesis test on a population parameter
General: Hypothesis Testing We use hypothesis testing to infer conclusions about the population
parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter.1. The null hypothesis, denoted H0 is a statement or claim about a population
parameter that is initially assumed to be true. No “effect” or no “difference”. Is always an equality. (Eg. H0: population parameter=hypothesised null parameter)
2. The alternative hypothesis, denoted by H1 is the competing claim. What we are trying to prove. Claim we seek evidence for. (Eg. H1: population parameter ≠ or < or > hypothesised null parameter)
3. Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained.
4. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H0 is true.If p-value≤0.05 reject Ho
If p-value>0.05 do not reject Ho
5. Conclusion: Make your conclusion in context of the problem.
Hypothesis Testing for a single meanHo: μ=null parameter
Ha: μ≠null parameter or μ<null parameter or μ>null parameter
Test Statistic:
If p-vale<0.05. We reject the H0. Conclude that we have enough evidence to prove the alternative hypothesis is true at the 5% level of significance.
If p-vale≥0.05. We reject the H0. Conclude that we have enough evidence to prove the alternative hypothesis is true at the 5% level of significance.
ns
nullx
ES
nullisticsamplestat
/.
Hypothesis Testing for Difference in Means (2 independent samples)
Ho: μ1=μ2 (μ1-μ2=0)Ha: μ1≠μ2 (μ1-μ2≠0) or μ1-μ2<0 or μ1-μ2>0
Test Statistic:
Find p-value using the t-distribution and compare to significance level α=0.05.
If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
α is 5% by default unless stated otherwise
Hypothesis Testing for Paired Difference in Means
Data is paired if the data being compared consists of paired data values (two measurements on each case).
Ho: μD=0Ha: μD≠0 or μD<0 or μD>0
Test Statistic: =
Find p-value using the t-distribution and compare to significance level α=0.05.
If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
α is 5% by default unless stated otherwise
Hypothesis Testing for 1 ProportionHo: p=null parameter
Ha: p≠null parameter or p<null parameter or p>null parameter
Test Statistic:
Find p-value using the z distribution and compare to significance level α=0.05.
If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
α is 5% by default unless stated otherwise
Hypothesis Testing for Difference in 2 Proportions
Ho: p1=p2 (p1-p2=0)Ha: p1≠p2 (p1-p2≠0) or p1-p2<0 or p1-p2>0
Test Statistic:
Find p-value using the z distribution and compare to significance level α=0.05.
If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
α is 5% by default unless stated otherwise
Hypothesis Testing for a Single Categorical Variable (Goodness of Fit test)
Ho: Hypothesised proportions for each category pi=….Ha: At least one pi is different.
Test Statistic: Calculate the expected counts for each cell as npi. Make sure they are all greater than 5 to proceed. Calculate the chi-squared statistic:
Find p-value as the area in the tail to the right of the chi-squared statistic (always select right tail) for a chi-squared distribution with df=(# of categories-1) and compare to significance level α=0.05.
If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative is true at the α % level of significance.
If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative is true at the α % level of significance.
α is 5% by default unless stated otherwise
Hypothesis Testing for an Association between two categorical variables
Ho: The two variables are not associatedHa: The two variables are associated
Test Statistic: Calculate the expected counts for each cell as (row total*column total)/n. Make sure they are all greater than 5 to proceed. Calculate the chi-squared statistic:
Find p-value as the area in the tail to the right of the chi-squared statistic (always select right tail) for a chi-squared distribution with df=(r-1)*(c-1) and compare to significance level α=0.05.
If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
α is 5% by default unless stated otherwise
=
Hypothesis Testing for the difference in means of more than two samples (ANOVA – Analysis of Variance test) Condition met when the ratio of the largest standard deviation to the smallest is less than
2. The assumption of equal variances hold and therefore it is appropriate to use the ANOVA table when testing the difference in the means.
Ho: μ1=μ2=μ3 or Ho: The means are equal to each otherHa: μ1≠μ2≠μ3 or Ho: At least one mean is different
Construct an ANOVA table to calculate the F-Test Statistic based on your sample data:
F-statistic=MSG/MSE Find p-value as the area in the tail to the right of the F-statistic (always select right tail)
for a F distribution with df=(k-1)/(n-k) where k=no of categories and n=no of samples and compare to significance level α=0.05.
If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.
α is 5% by default unless stated otherwise