Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals...

Introduction To Data Analysis Lab 10Section 9-1: Inference for Slope and CorrelationSection 9-3: Confidence and Prediction Intervals

[email protected] the Maths Study Centre 11am-5pm CB04.03.331This presentation is viewable on http://mahritaharahap.wordpress.com/teaching-areas/Check out the www.khanacademy.org website

mailto:[email protected]

Statistical inference is the process of drawing conclusions about the entire population based on information in a sample by: • constructing confidence

intervals on population parameters

• or by setting up a hypothesis test on a population parameter

RegressionThe linear regression line characterises the relationship between two quantitative variables. Using regression analysis on data can help us draw insights about that data. It helps us understand the impact of one of the variables on the other.It examines the relationship between one independent variable (predictor/explanatory) and one dependent variable (response/outcome) . The linear regression line equation is based on the equation of a line in mathematics.

β0+β1

X

X:Predictor VariableExplanatory VariableIndependent VariableVariable one can control.

Y:Outcome variableResponse VariableDependent VariableThe outcome to be measured/predicted.

),0(~10 NX

iii yye ˆ

General: Hypothesis Testing We use hypothesis testing to infer conclusions about the population

parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter.1. The null hypothesis, denoted H0 is a statement or claim about a population

parameter that is initially assumed to be true. No “effect” or no “difference”. Is always an equality. (Eg. H0: population parameter=hypothesised null parameter)

2. The alternative hypothesis, denoted by H1 is the competing claim. What we are trying to prove. Claim we seek evidence for. (Eg. H1: population parameter ≠ or < or > hypothesised null parameter)

3. Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained.

4. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H0 is true.If p-value≤0.05 reject Ho

If p-value>0.05 do not reject Ho

5. Conclusion: Make your conclusion in context of the problem.

Hypothesis Test for Correlation Coefficient

H0: ρ=0

H1: ρ≠0 (or ρ≠ρ*, ρ<ρ* or ρ>ρ*)

Test Statistic:~tn-2

If p-value≤α, reject H0. We conclude that the correlation is significantly different from zero.

If p-value>α, do not reject H0. We conclude that the correlation is not significantly different from zero.

Correlation is not significant.Correlation is significant.

Correlation measures the strength of the linear association between two variables.

Sample Correlation: r=

Interpretations of slopeINTERPRETATION OF THE SLOPE

The slope ‘β’ represents the predicted change in the response variable y given a one unit increase in the explanatory variable x.As the “independent variable” increases by 1 unit, the predicted “dependent variable” increases/decreases by β units on average.

Y=a+β*

X

H0: β=0. There is no association between the response variable and the

independent variable. (Regression is insignificant) y=α + 0*X

H1: β≠0. The independent variables will affect the response variable.

(Regression is significant) y=α + βX If p-value≤0.05. We reject H0. There is evidence that β≠0,

which means that the independent variable is an effective predictor of the dependent variable, at the 5% level of significance.

If p-value>0.05. We do not reject Ho. There is no evidence that β≠0, which means that the independent variable is NOT an effective predictor of the dependent variable, at the 5% level of significance.

Hypothesis Test for Slope

Confidence Interval for Slope

ESt ndf .*2/,2

Coefficient for determination R2

R-squared gives us the proportion of the total variability in the response variable (Y) that is “explained” by the least squares regression line based on the predictor variable (X). It is usually stated as a percentage. �

�Interpretation: On average, R2% of the variation in the dependent variable can be explained by the independent variable through the regression model.

Confidence Intervals and Prediction Intervals

The key point is that the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean. Prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval.

REVISION

Statistical inference is the process of drawing conclusions about the entire population based on information in a sample by: • constructing confidence

intervals on population parameters

• or by setting up a hypothesis test on a population parameter

General: Hypothesis Testing We use hypothesis testing to infer conclusions about the population

parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter.1. The null hypothesis, denoted H0 is a statement or claim about a population

parameter that is initially assumed to be true. No “effect” or no “difference”. Is always an equality. (Eg. H0: population parameter=hypothesised null parameter)

2. The alternative hypothesis, denoted by H1 is the competing claim. What we are trying to prove. Claim we seek evidence for. (Eg. H1: population parameter ≠ or < or > hypothesised null parameter)

3. Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained.

4. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H0 is true.If p-value≤0.05 reject Ho

If p-value>0.05 do not reject Ho

5. Conclusion: Make your conclusion in context of the problem.

Hypothesis Testing for a single meanHo: μ=null parameter

Ha: μ≠null parameter or μ<null parameter or μ>null parameter

Test Statistic:

If p-vale<0.05. We reject the H0. Conclude that we have enough evidence to prove the alternative hypothesis is true at the 5% level of significance.

If p-vale≥0.05. We reject the H0. Conclude that we have enough evidence to prove the alternative hypothesis is true at the 5% level of significance.

ns

nullx

ES

nullisticsamplestat

/.

Hypothesis Testing for Difference in Means (2 independent samples)

Ho: μ1=μ2 (μ1-μ2=0)Ha: μ1≠μ2 (μ1-μ2≠0) or μ1-μ2<0 or μ1-μ2>0

Test Statistic:

Find p-value using the t-distribution and compare to significance level α=0.05.

If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.

If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance.

α is 5% by default unless stated otherwise

Hypothesis Testing for Paired Difference in Means

Data is paired if the data being compared consists of paired data values (two measurements on each case).

Ho: μD=0Ha: μD≠0 or μD<0 or μD>0

Test Statistic: =

Find p-value using the t-distribution and compare to significance level α=0.05.




Hypothesis Testing for 1 ProportionHo: p=null parameter

Ha: p≠null parameter or p<null parameter or p>null parameter

Test Statistic:

Find p-value using the z distribution and compare to significance level α=0.05.




Hypothesis Testing for Difference in 2 Proportions

Ho: p1=p2 (p1-p2=0)Ha: p1≠p2 (p1-p2≠0) or p1-p2<0 or p1-p2>0

Test Statistic:

Find p-value using the z distribution and compare to significance level α=0.05.




Hypothesis Testing for a Single Categorical Variable (Goodness of Fit test)

Ho: Hypothesised proportions for each category pi=….Ha: At least one pi is different.

Test Statistic: Calculate the expected counts for each cell as npi. Make sure they are all greater than 5 to proceed. Calculate the chi-squared statistic:

Find p-value as the area in the tail to the right of the chi-squared statistic (always select right tail) for a chi-squared distribution with df=(# of categories-1) and compare to significance level α=0.05.

If p-value< α, reject Ho. Conclude that we have evidence to prove the alternative is true at the α % level of significance.

If p-value≥ α, do not reject Ho. Conclude that we do not have enough evidence to prove the alternative is true at the α % level of significance.


Hypothesis Testing for an Association between two categorical variables

Ho: The two variables are not associatedHa: The two variables are associated

Test Statistic: Calculate the expected counts for each cell as (row total*column total)/n. Make sure they are all greater than 5 to proceed. Calculate the chi-squared statistic:

Find p-value as the area in the tail to the right of the chi-squared statistic (always select right tail) for a chi-squared distribution with df=(r-1)*(c-1) and compare to significance level α=0.05.




=

Hypothesis Testing for the difference in means of more than two samples (ANOVA – Analysis of Variance test) Condition met when the ratio of the largest standard deviation to the smallest is less than

2. The assumption of equal variances hold and therefore it is appropriate to use the ANOVA table when testing the difference in the means.

Ho: μ1=μ2=μ3 or Ho: The means are equal to each otherHa: μ1≠μ2≠μ3 or Ho: At least one mean is different

Construct an ANOVA table to calculate the F-Test Statistic based on your sample data:

F-statistic=MSG/MSE Find p-value as the area in the tail to the right of the F-statistic (always select right tail)

for a F distribution with df=(k-1)/(n-k) where k=no of categories and n=no of samples and compare to significance level α=0.05.




Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals...

Documents

Transcript of Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals...