Day 3 SPSS

18
1 Statistical Inference: (a) Estimation and (b) Test of Hypothesis M. Amir Hossain, Ph.D. January 19, 2016 Point and Interval Estimates Point Estimation: A Point estimate is one value ( a point) that is used to estimate a population parameter. Examples of point estimates are the sample mean, the sample standard deviation, the sample variance, the sample proportion etc... EXAMPLE: The number of defective items produced by a machine was recorded for five randomly selected hours during a 40-hour work week. The observed number of defectives were 12, 4, 7, 14, and 10. So the sample mean is 9.4. Thus a point estimate for the hourly mean number of defectives is 9.4.

Transcript of Day 3 SPSS

1

Statistical Inference: (a) Estimation and (b) Test of Hypothesis

M. Amir Hossain, Ph.D.

January 19, 2016

Point and Interval Estimates

Point Estimation: A Point estimate is one value ( a point) that is used to estimate a population parameter.

Examples of point estimates are the sample mean, the sample standard deviation, the sample variance, the sample proportion etc...

EXAMPLE: The number of defective items produced by a machine was recorded for five randomly selected hours during a 40-hour work week. The observed number of defectives were 12, 4, 7, 14, and 10. So the sample mean is 9.4. Thus a point estimate for the hourly mean number of defectives is 9.4.

2

Point and Interval Estimates

Interval Estimation: An Interval Estimate states the range

within which a population parameter lies with certain

probability.

The interval within which a population parameter is

expected to occur is called a confidence interval.

The two confidence intervals that are used extensively are

the 95% and the 99%.

Interval Estimates

A 95% confidence interval means that about 95% of the

similarly constructed intervals will contain the parameter

being estimated, or 95% of the sample means for a

specified sample size will lie within 1.96 standard

deviations of the hypothesized population mean.

For the 99% confidence interval, 99% of the sample means

for a specified sample size will lie within 2.58 standard

deviations of the hypothesized population mean.

3

Standard Error of the Sample Means

The standard error of the sample means is the standard deviation

of the sampling distribution of the sample means.

It is computed by

is the symbol for the standard error of the sample means.

is the standard deviation of the population.

n is the size of the sample.

xn

x

Standard Error of the Sample Means

If is not known and , the standard deviation of the

sample, designated s, is used to approximate the population

standard deviation. The formula for the standard error then

becomes:

n30

ss

nx

4

95% and 99% Confidence Intervals for µ

The 95% and 99% confidence intervals for are constructed as

follows when

95% CI for the population mean is given by

99% CI for the population mean is given by

n 30

Xs

n 1 9 6.

Xs

n 2 58.

Constructing General Confidence Intervals for µ

In general, a confidence interval for the mean is

computed by:

X Zs

n

5

EXAMPLE: The Dean of the Business School wants to estimate the mean number of hours worked per week by students. A sample of 49 students showed a mean of 24 hours with a standard deviation of 4 hours.

The point estimate is 24 hours (sample mean).

The 95% confidence interval for the average number of hours worked per week by the students is:

The endpoints of the confidence interval are the confidence limits. The lower confidence limit is 22.88 and the upper confidence limit is 25.12

12.25 88.22)7/4(96.124 to

What is a Hypothesis?

Hypothesis: A statement about the value of a population parameter

developed for the purpose of testing.

Examples :

The mean monthly income for system analysts is $3, 625.

Twenty percent of all juvenile offenders are caught and sentenced to prison.

Hypothesis testing: A procedure, based on sample evidence and

probability theory, used to determine whether the hypothesis is

a reasonable statement and should not be rejected, or is

unreasonable and should be rejected.

6

Terminologies

Null Hypothesis H0: A statement about the value of a population

parameter which we want to test based on sample

Alternative Hypothesis H1: A statement about the value of a

population parameter other than null hypothesis.

Level of Significance: The probability of rejecting the null

hypothesis when it is actually true.

Terminologies

Type I Error: Rejecting the null hypothesis when it is actually true.

Type II Error: Accepting the null hypothesis when it is actually

false.

Test statistic: A value, determined from sample information, used

to determine whether or not to reject the null hypothesis.

Critical value: The dividing point between the region where the

null hypothesis is rejected and the region where it is not rejected.

7

Court

Minimize

O.K

Correct Not correct

Type I error

Correct Not correct

O.K Type II error

Assassination

Judge

Basis

Punished

more info.

Guilty

Witness Evidence Sample info.

Not guilty

Law

Judge's decision

Accused

Not punished

Fix at min.

Test of hypothesis

Steps of Hypothesis Testing

Do not reject null Reject null and accept alternate

Step 5: Take a sample, arrive at a decision

Step 4: Formulate a decision rule

Step 3: Identify the test statistic

Step 2: Select a level of significance

Step 1: State null and alternate hypotheses

8

One-Tail and Two-Tail Tests of Significance

A test is one-tailed when the critical region is in one side of the probability curve of the test statistics, it depends on H1 (if a direction is specified by H1)

H0 : Average income of females and males is equal.

H1 : Average income of females is greater than males.

A test is two-tailed when the critical region is in both side of the probability curve of the test statistics (no direction is specified by H1)

H0 : Average income of females and males is equal.

H1 : Average income of females is not equal to males income.

Testing for the Population Mean

When testing for the population mean from a large sample and the

population standard deviation is known, the test statistic is given

by:

zX

/ n

Assumption: Large Sample, Population Standard Deviation Known

9

EXAMPLE: The processors of Fries’ Catsup indicate on the label

that the bottle contains 16 ounces of catsup. A sample of 36

bottles is selected hourly and the contents weighed. Last hour

a sample of 36 bottles had a mean weight of 16.12 ounces

with a standard deviation of .5 ounces. At the .05 significance

level is the process out of control?

Step 1: State the null and the alternative hypotheses:

Step 2: Set up the level of significance is α = 0.05

Testing for the Population Mean

16: 16: 10 HH

Step 3: Decide on the test statistic:

Step 4: State the decision rule:

Step 5: Compute the value of the test statistic:

Step 6: Decide on H0 : H0 is not rejected because 1.44 is less than

the critical value of 1.96. i.e. The process is not out of control

OtherwiseReject ; 96.1 1.96 - accepted 0 zifisH

44.1]36/5.0/[]1612.16[ z

n/

Xz

Testing for the Population Mean

10

p-Value in Hypothesis Testing

p-Value: the probability, assuming that the null hypothesis is true,

of getting a value of the test statistic at least as extreme as the

computed value for the test.

If the p-value is smaller than the significance level, H0 is rejected.

If the p-value is larger than the significance level, H0 is not

rejected.

Some frequently used test statistics

If σ is unknown, and

sample size n ≥ 30, then

For equality of two

population means

z

X X

s

n

s

n

1 2

1

2

1

2

2

2

ns

Xz

/

n

pz

)1(

zp p

p p

n

p p

n

c c c c

1 2

1 2

1 1( ) ( )

Test concerning single

proportion

Test concerning two

proportion

11

Exercise: Test for equality of two proportions

Chittagong city Total Sample HH = 1000 # eat paijam rice = 600 Dhaka city Total Sample HH = 1000 # eat paijam rice = 550

Chittagong city Total Sample HH = 500 # eat paijam rice = 300 Dhaka city Total Sample HH = 1000 # eat paijam rice = 500

Do they differ significantly ?

Topic:

Bi-variate data- teat of association

Dr. M. Amir Hossain

Professor

ISRT, University of Dhaka

12

Cross Tables

Cross Tables: list the number of observations for every combination of values for two variables concerned is cross tables or bi-variate tables

If both the variables are categorical or ordinal variables (Qualitative ) then it will be a contingency table.

If both the variables are interval or ratio variables (Quantitative) then it will be a bi-variate table.

If there are r categories for the first variable (rows) and c categories for the second variable (columns), the table is called an r x c cross table

Cross Tables: Example

4 x 3 Cross Table for Investment Choices by Investor (values in $1000’s)

Investment Investor A Investor B Investor C Total

Category

Knitting 46 55 27 129 Spinning 32 44 19 95 Dying 16 20 14 49 Finishing 16 28 7 51 Total 110 147 67 324

13

Cross Tables: Example

r x c Contingency Table Attribute B

Attribute A 1 2 . . . C Totals

1

2

.

.

.

r

Totals

O11

O21

.

.

.

Or1

C1

O12

O22

.

.

.

Or2

C2

O1c

O2c

.

.

.

Orc

Cc

R1

R2

.

.

.

Rr

n

Cross Tables (Test of Association)

Consider n observations tabulated in an r x c contingency

table

Denote by Oij the number of observations in the cell that is

in the ith row and the jth column

The null hypothesis is

The appropriate test is a chi-square test with (r-1)(c-1) d.f. population the in attributes two the between

exists nassociatio No :H0

14

Test for Association

Let Ri and Cj be the row and column totals

The expected number of observations in cell row i and column j, given that H0 is true, is

A test of association at a significance level is based on the chi-square distribution and the following decision rule

2

1),1)c(r

r

1i

c

1j ij

2

ijij2

0E

)E(O if H Reject αχχ

n

CRE

ji

ij

Contingency Table Example

Dominant Hand: Left vs. Right

Gender: Male vs. Female

H0: There is no association between hand preference and gender

H1: Hand preference is not independent of gender

Sample results organized in a

contingency table: sample size n

= 300. 120 Females, 12 were left

handed and 80 Males, 24 were

left handed

Gender

Hand Preference Total

Left Right

Female 12 108 120

Male 24 156 180

Total 36 264 300

15

Logic of the Test

If H0 is true, then the proportion of left-handed females should be

the same as the proportion of left-handed males

The two proportions above should be the same as the proportion

of left-handed people overall

H0: There is no association between hand preference and gender

H1: Hand preference is not independent of gender

Finding Expected Frequencies

Overall: P(Left Handed)

= 36/300 = .12

120 Females, 12 were left handed

180 Males, 24 were left handed

If no association, then

P(Left Handed | Female) = P(Left Handed | Male) = .12

So we would expect 12% of the 120 females and 12% of the 180

males to be left handed…

So, we would expect (120)(.12) = 14.4 females to be left handed

and (180)(.12) = 21.6 males to be left handed

16

Expected Cell Frequencies

Expected cell frequencies:

size sample Total

total) Column total)(jRow (i

n

CRE

ththji

ij

14.4300

(120)(36)E11

Example:

Observed vs. Expected Frequencies

Observed frequencies vs. expected frequencies:

Gender

Hand Preference Total

Left Right

Female Observed = 12

Expected = 14.4

Observed = 108

Expected = 105.6 120

Male Observed = 24

Expected = 21.6

Observed = 156

Expected = 158.4 180

Total 36 264 300

17

The Chi-Square Test Statistic

• where:

Oij = observed frequency in cell (i, j)

Eij = expected frequency in cell (i, j)

r = number of rows

c = number of columns

r

1i

c

1j ij

2

ijij2

E

)E(O

The Chi-square test statistic is:

)1c)(1r(.f.d with

Observed vs. Expected Frequencies

Gender

Hand Preference Total

Left Right

Female Observed = 12

Expected = 14.4

Observed = 108

Expected = 105.6 120

Male Observed = 24

Expected = 21.6

Observed = 156

Expected = 158.4 180

Total 36 264 300

6848.04.158

)4.158156(

6.21

)6.2124(

6.105

)6.105108(

4.14

)4.1412( 22222

18

Contingency Analysis

2 2.05 = 3.841

Reject H0

= 0.05

Decision Rule:

If 2 > 3.841, reject H0, otherwise,

do not reject H0

1(1)(1)1)-1)(c-(r d.f. with6848.02

Do not reject H0

Here, 2 = 0.6848 < 3.841, so

we

do not reject H0 and conclude

that gender and hand

preference are not

associated

2 2.05 = 3.841

= 0.05