Statistics Lecture

33
1 St. Paul University Philippines Graduate School A Course Presentation in Statistics Course Content Basic Concepts in Statistics Measures of Central Tendency Measures of Variability Correlation and Regression Analysis Test of Hypothesis Z Test T Test Chi Square Test Analysis of Variance (ANOVA) EXPLORING THE SPSS Course Requirements Reaction Paper/ Film Clip Analysis Problem Set Final Examination Reaction Paper (Film Clip Analysis) Lies, Damned Lies and Statistics: The Misapplication of Statistics in Everyday Life

description

stat

Transcript of Statistics Lecture

Page 1: Statistics Lecture

1

St. Paul University Philippines

Graduate School

A Course Presentation in Statistics

Course Content

• Basic Concepts in Statistics

• Measures of Central Tendency

• Measures of Variability

• Correlation and Regression Analysis

• Test of Hypothesis

– Z – Test

– T – Test

– Chi – Square Test

– Analysis of Variance (ANOVA)

• EXPLORING THE SPSS

Course Requirements

Reaction Paper/ Film Clip Analysis

Problem Set

Final Examination

Reaction Paper (Film Clip Analysis)

Lies, Damned Lies and Statistics: The

Misapplication of Statistics in

Everyday Life

Page 2: Statistics Lecture

2

Statistics defined . . .

• STATISTICS is a collection of methods for

planning experiment, obtaining data, and

then organizing, summarizing, presenting,

analyzing, interpreting and drawing

conclusions based on the data.

Main Divisions

Descriptive Statistics

- summarize or describe the important

characteristics of a known set of

population data

Inferential Statistics

-use sample data to make inferences (or

generalizations) about a population

Population vs. Sample

• A POPULATION is the complete collection of

elements (scores, people, measurements, and so

on)

• A SAMPLE is a portion / subset of elements

drawn from a population

Parameter vs. Statistic

• A PARAMETER is a numerical measurement

describing some characteristics of a population

• A STATISTIC is a numerical measurement

describing some characteristic of a sample

Page 3: Statistics Lecture

3

Qualitative vs. Quantitative Data

• Qualitative (categorical or attribute) data

can be separated into different categories

that are distinguished by some non –

numerical characteristics

• Quantitative data consists of numbers

representing counts or measurements

Discrete vs Continuous Data

• Discrete data result from either a finite number of

possible values or a countable number of possible

values (that is, the number of possible values are

0, 1, 2, or more)

• Continuous data result from infinitely many

possible values that can be associated with points

on a continuous scale in such a way that there are

no gaps or interruptions

Dependent vs Independent Variable

• Dependent variable – the variable that is being

affected

- the variable that is being

explained

• Independent variable – the variable that affects

- the variable that explains

Nominal Level of Measurement

• The nominal level of measurement is

characterized by data that consists of names,

labels or categories only. The data cannot be

arranged in an ordering scheme

• Examples:

gender of employees, civil status,

nationality, religion, etc

Page 4: Statistics Lecture

4

Ordinal Level of Measurement

• The ordinal level of measurement involves

data that may be arranged in some order, but

differences between data values are either

meaningless or cannot be determined.

• Examples:

good, better or best speakers; 1 star, 2 star

or 3 star movie; rank of an employee

Interval Level of Measurement

• The interval level of measurement is like the

ordinal level, with the additional property that

meaningful amounts of differences between data

can be determined. However, there are no inherent

(natural) zero starting point

• Examples:

body temperature, year (2007, 2008, 2013, etc)

Ratio Level of Measurement

• The ratio level of measurement is the

interval modified to include the inherent

zero starting point. For values at this level,

differences and ratios are meaningful.

• Examples:

weights, lengths, distance traveled

Visual Summary of the Scales of Measurement

Are there named categories?

YES NO

Nominal scale of measurement `

Are the scores ranked?

YES NO

Ordinal scale of measurement Are there equal intervals with a

meaningful zero point?

YES NO

Ratio scale of measurement Interval scale of measurement

Page 5: Statistics Lecture

5

Measures of

Central

Tendency

(UNGROUPED

DATA)

Mean Median Mode

The Mean

• Two Forms

– Simple mean

– Weighted mean

The mean takes the symbol X.

Arithmetic Mean (Mean)

“balancing point” of a set of scores

the “average score”

The Mean

If you have a

Population Sample

Total number of cases is N

Sum of the scores is ΣX

Compute the mean of the

population

∑X

N

Total number of cases is n

Sum of the scores is ΣX

Compute the mean of the

sample

X = ΣX

n µ =

Page 6: Statistics Lecture

6

Simple Arithmetic Mean

XX

n

Where:

x = an individual

score

n = the number of

scores/cases

Sigma or x= sum of

the individual score

values

Example:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Consider the following data set:

XX

n

Solution:

= 1 + 2 + 3 + 4 + 5 + 6 + 7+ 8 + 9 + 10

10

Mean = 5.5

Example:

• The following data represents the ages of the mothers

of Paulinian Graders randomly selected from four

different grade levels who attended a session on

Counseling. What is the mean age of the mothers per

grade level?

• Grade 1: 35, 37, 45, 54, 39, 48

• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63

• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56

• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47

Solution:

• To obtain the mean age of the mothers of the Grade 1,

we have

X = 35 + 37 + 45 + 54 + 39 + 48

6

= 258

6

X = 43

**This means that the mothers of the Grade 1 pupils are relatively young.

• To obtain the mean age of the mothers of the Grade 1,

we have

X = 35 + 37 + 45 + 54 + 39 + 48

6

= 258

6

X = 43

Page 7: Statistics Lecture

7

Example:

• Find the mean of the other grade levels. Round off

your answers to the nearest hundredths.

• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63

• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56

• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47

Answers:

• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63

ANSWER: 53.73

• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56

ANSWER: 50

• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47

ANSWER: 52.44

Weighted Mean

Where:

w = weight per item value

x = individual score values

Xw = w1X1 + w2X2 + w3X3 + . . . + wnXn

Total number of weights

Example: • The following are the responses of 30 randomly chosen

respondents in one item of a research questionnaire.

** Find the weighted response of the respondents and

interpret the result.

Verbal Description Weight No. of Responses

Very strongly agree 5 7

Strongly agree 4 11

Agree 3 9

Disagree 2 2

Strongly disagree 1 1

Page 8: Statistics Lecture

8

Solution:

• To obtain for the weighted response, we have

X = 5(7) +4(11) + 3(9) +2(2) + 1(1)

30

= 111

30

X = 3.70 strongly agree

Interpretation of Values

Range Verbal Description

4.20 – 5.00 Very strongly agree

3.40 – 4.19 Strongly agree

2.60 – 3.39 Agree

1.80 – 2.59 Disagree

1.00 – 1.79 Strongly disagree

Exercise:

• Construct a likert scale to interpret items of a

questionnaire with weights 1 – 4.

• Assume the following descriptions were used:

4 – always

3 – sometimes

2 – seldom

1 – never

Example: • The following are the grades of one student one

summer term.

** Find the weighted average of the student.

** What could have been the student’s average if all his

subjects are of equal weights?

Subject No. of Units Grade

Statistics 3 98

PE 2 90

Chemistry 5 93

Page 9: Statistics Lecture

9

Characteristics of the Mean

an interval statistic

calculated average

affected by extreme values

most widely used

most sensitive measure

value is determined by every

case in the distribution

A

B

C D E

(– 1) + (– 2) + (– 2) + 1 + 4 = 0

3 4 5 6 7 8 9

(+4) (-2)

(-2)

(-1) (+1)

sum of the deviations about the mean is zero

Median

the value that lies in the middle after ranking all

the scores

positional measure

the midpoint or the

50th percentile of a

distribution

Median

the value at which 1/2 of the ordered scores fall above

and 1/2 of the scores fall below

1 2 3 4 5 1 2 3 4

Median = 3 Median = 2.5

n = odd n = even

Page 10: Statistics Lecture

10

Example I am the 4th

observation. I

am the median.

Example:

5.40 1.10 0.42 0.73 0.48 1.10

0.42 0.48 0.73 1.10 1.10 5.40

0.73 + 1.10

2

(even number of values – no exact middle shared by two numbers)

MEDIAN is 0.915

Example Example

5.40 1.10 0.42 0.73 0.48 1.10 0.66

0.42 0.48 0.66 0.73 1.10 1.10 5.40

(in order - odd number of values)

exact middle MEDIAN is 0.73

an ordinal statistic

rank or position average

not affected by extreme values

can be subjected to a few

mathematical computations

less widely used than the mean

represents a typical score

Characteristics

of the Median

Page 11: Statistics Lecture

11

• The following data represents the ages of the mothers

of Paulinian Graders randomly selected from four

different grade levels who attended a session on

Counseling. What is the median of the ages of the

mothers per grade level?

• Grade 1: 35, 37, 45, 54, 39, 48

• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63

• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56

• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47

Exercise Mode

the value which occurs most frequently in a given data

set

does not involve any calculation or ordering of data

Example

Observation Value/

Score

1 5

2 7

3 3

4 8

5 7

Consider the following data set:

Examples

a. 5.40 1.10 0.42 0.73 0.48 1.10

b. 27 27 27 55 55 55 88 88 99

c. 1 2 3 6 7 8 9 10

Mode is 1.10

Bimodal - 27 & 55

No Mode

Page 12: Statistics Lecture

12

a nominal statistic

Characteristics of

the Mode

an inspection average

most frequently occurring value

cannot be manipulated mathematically

rarely used

most “popular score

Advantages Disadvantages

Mode Quick and easy to

calculate.

May not be representative

of the whole sample

Median Fairly easy to calculate.

Half of the scores lie

above the median.

Tedious to find for a large

set of numbers or for a set

that is not in order

Mean Takes all numbers into

account.

Can be affected by outliers

Which is best?

When to use . . .

Mean -an interval interpretation is needed

-the value of each score is desired

-further statistical computation is expected

Median -an ordinal interpretation is needed

-the middle score is desired

-avoidance of the influence of extreme values is

needed

Mode -a nominal interpretation needed

-a quick approximation of a central tendency

measure is desired

-most frequently occurring score is needed

Measures of

Central

Tendency

(GROUPED

DATA)

Mean Median Mode

Page 13: Statistics Lecture

13

The Mean

i.) Classmark method

X = n

fxm

Where:

Xm – class mark / class midpoint

f – frequency

n – number of cases / observations

The Mean

ii.) Coded – deviation method

X = AM + n

fdi

Where:

AM – assumed mean (Xm of where the zero deviation is set)

f – frequency

d – deviation

n – number of cases / observations

Example

**Find the mean, median and mode of the following

data set:

X F

24 – 26 3

21 – 23 12

18 – 20 10

15 – 17 6

12 – 14 6

9 – 11 5

6 – 8 5

3 – 5 3

The Median

Md = XLB + 2

n- cfp

f i

Where:

XLB – lower boundary of the median class

cfp – cumulative frequency preceding the median class

n – number of cases

f – frequency of the median class

i – class size/width

Md = XLB + f

i

Page 14: Statistics Lecture

14

The Mode

Mo = XLB + ∆1 i

∆1 + ∆2 Where:

XLB – lower boundary of the modal class

∆1 – difference between frequency of the modal class

and frequency below it

∆2 – difference between frequency of the modal class

and frequency above it

i – class size/width

Exercise

X F

56 – 62 4

49 – 55 9

42 – 48 12

35 – 41 12

28 – 34 10

21 – 27 8

14 – 20 6

7 – 13 4

**Find the mean, median and mode of the following

data set:

Other Measures of Position

(QUANTILES)

1. Quartile (Qk ) – divides the distribution into 4

equal parts

2. Decile (Dk )- divides the distribution into 10

equal parts

3. Percentile (Pk )- divides the distribution into

100 equal parts

Where:

XLB – lower boundary of the quartile class

cfp – cumulative frequency preceding the quartile class

n – number of cases

f – frequency of the quartile class

i – class size/width

- cfp

Qk = XLB + f

i

The Quartile

4

kn

Page 15: Statistics Lecture

15

The Decile

Where:

XLB – lower boundary of the decile class

cfp – cumulative frequency preceding the decile class

n – number of cases

f – frequency of the decile class

i – class size/width

- cfp

Dk = XLB + i 10

kn

f

Where:

XLB – lower boundary of the percenttile class

cfp – cumulative frequency preceding the percentile class

n – number of cases

f – frequency of the percentile class

i – class size/width

The Percentile

f

- cfp

Pk = XLB + 100

kn

i

Exercise

**Using the frequency distribution below, find:

1. Q1 3. D3 5. P3

2. D6 4. P78

X F

56 – 62 6

49 – 55 9

42 – 48 10

35 – 41 12

28 – 34 10

21 – 27 8

14 – 20 6

7 – 13 4

Measures of Variability

• The statistical tool used to describe the degree to

which scores/ observations are scattered.

•It is used to determine the degree of consistency /

homogeneity of scores.

1. range

2. mean absolute deviation

3. semi – interquartile range/ quartile deviation

4. variance

5. standard deviation

Page 16: Statistics Lecture

16

Formulas (Ungrouped Data)

1. Range

R =HOV – LOV

2. Mean absolute deviation

MAD =

3. Semi – interquartile range/ quartile deviation

QD = Q3 – Q1

2

n

XX //

4. Variance

s2 =

5. Standard deviation

s =

1

)( 2

n

XX

2s

Formulas (Ungrouped Data)

Exercise:

• Given the following data, find the range, MAD,

variance and the standard deviation.

20, 26, 40, 39, 35

Application:

• Two seemingly equally excellent students are

vying for an academic honor where only one must

have to be chosen to get the award. The following

are their grades which are used as a basis for giving

the award.

•Student A: 90, 92, 92, 94, 95

•Student B: 90, 91, 93, 94, 95

•Who do you think deserves the award? Why?

Page 17: Statistics Lecture

17

Guiding Principle

• The lesser the value of the measure, the

more consistent, the more homogenous and

the less scattered are the observations in the

set of data.

Formulas (Grouped Data)

1. Range

R = HOV – LOV

2. Mean absolute deviation

MAD =

3. Semi – interquartile range/ quartile deviation

QD = Q3 – Q1

2

n

XXf m //

Formulas (Grouped Data)

4. Variance

s2 =

5. Standard deviation

s =

1

)( 2

n

XXf m

2s

X F

56 – 62 6

49 – 55 9

42 – 48 10

35 – 41 12

28 – 34 10

21 – 27 8

14 – 20 6

7 – 13 4

**Using the frequency distribution below, find:

1. Range 3. QD 5. Standard Deviation

2. MAD 4. variance

Exercise:

Page 18: Statistics Lecture

18

Tests of Hypothesis

Hypothesis

•A statement or tentative theory which aims to

explain facts about the real world

•An educated guess

•It is subject for testing. If it is found to be

statistically true, it is accepted. Otherwise, it gets

rejected.

Kinds of Hypotheses

1. Null Hypothesis (Ho)

• It serves as the working hypothesis

• It is that which one hopes to accept or reject

• It must always express the idea of no

significant difference

2. Alternative Hypothesis (H1 or Ha)

• It generally represents the hypothetical

statement that the researcher wants to prove.

Types of Alternative Hypotheses (Ha)

1. Directional hypothesis

expresses direction

one – tailed

uses order relation of “greater than” or “less than”,

2. Non – directional hypothesis

does not express direction

two – tailed

uses the “not equal to”

Type I and Type II Errors

When making a decision about a proposed

hypothesis based on the sample data, one runs the

risk of making an error. The following table on the

next slide summarizes the possibilities:

Page 19: Statistics Lecture

19

Type I and Type II Errors

A Type I error is the mistake of rejecting the null

hypothesis when it is true.

The symbol ∞ (alpha) is used to represent the probability

of a type I error.

A Type II error is the mistake of failing to reject the null

hypothesis when it is false.

The symbol (beta) is used to represent the probability of

a type II error.

Level of Significance

The probability of making Type I error or alpha

error in a test is called the significance level of the

test. The significance level of a test is the maximum

value of the probability of rejecting the null

hypothesis (Ho) when in fact it is true.

Critical Region

The critical region (or rejection region) is the set of all values

of the test statistic that cause us to reject the null hypothesis.

P - value Critical - value

Region of

acceptance

Region of

rejection

Page 20: Statistics Lecture

20

Critical Value

A critical value is any value that separates the

critical region (where we reject the null

hypothesis) from the values of the test statistic

that do not lead to rejection of the null

hypothesis, the sampling distribution that

applies, and the significance level .

P - Value

The P-value (probability value) is the probability of

getting a value of the test statistic that is at least as

extreme as the one representing the sample data,

assuming that the null hypothesis is true. The null

hypothesis is rejected if the P-value is very small,

such as 0.05 or less.

Two-tailed, Right-tailed and

Left-tailed Tests

• The tails in a distribution are the extreme

regions bounded by critical values.

Two-tailed Tests Given:

H0: = ; H1: ≠

Page 21: Statistics Lecture

21

Right – tailed Tests

Given:

H0: = ; H1: >

Left – tailed Tests

Given:

H0: = ; H1: <

Steps in Hypothesis Testing

1. Formulate the null hypothesis (Ho) that there is no

significant difference between the items compared. State

the alternative hypothesis (Ha) which is used in case Ho

is rejected.

2. Set the level of significance of the test, .

3. Determine the test to be used.

Z – TEST – used if the population standard deviation

is given

T – TEST – used if the sample standard deviation is

given

Steps in Hypothesis Testing

4. Determine the tabular value of the test.

***For a Z – test, the table below summarizes the

critical values at varying significance levels

Type of

Test

Level of Significance

0.10 0.05 0.025 0.01

One –

Tailed

± 1. 28 ± 1. 645 ± 1.96 ± 2.33

Two –

Tailed

± 1.645 ± 1.96 ± 2.33 ± 2. 58

Page 22: Statistics Lecture

22

Steps in Hypothesis Testing

4. Determine the tabular value of the test.

***For a T – test, one must compute first the

degree/s of freedom (df) then look for the tabular

value from the table of Students’ T – Distribution.

i. For a single sample

df = n – 1

ii. For two samples

df = n1 + n2 – 2

Steps in Hypothesis Testing

5. Compute for z or t as needed. Vary your solutions using

the formulas:

For z – test

i. Sample mean compared with a population mean

ii. Comparing two sample means

iii. Comparing two sample proportions

For t – test

i. Sample mean compared with a population mean

ii. Comparing two sample means

Steps in Hypothesis Testing

6. Compare the computed value with its

corresponding tabular value, then state your

conclusions based on the following guidelines:

Reject Ho if the absolute computed value is

equal to or greater than the absolute tabular value

Accept Ho if the absolute computed value is less

than the absolute tabular value

Decision Criterion

Traditional Method:

***Reject H0 (Accept H1 ) if the test

statistic falls within the critical region.

***Fail to reject H0 (Accept Ho) if the

test statistic does not fall within the critical

region.

Page 23: Statistics Lecture

23

Decision Criterion

P - value method:

*** Reject Ho (Accept H1 ) if P-value

(where is the significance level, such as

0.05)

***Fail to reject H0 (Accept Ho) if

P-value >

Decision Criterion

Another option:

Instead of using a significance level

such as 0.05, simply identify the P-value and

leave the decision to the reader.

Z - TEST

1. Sample Mean (X) Compared with a Population Mean (μ)

Where:

X – sample mean

μ – population mean

n – number of items in the sample

δ – population standard deviation

( X – μ) n

Z =

δ

Z - TEST

2. Comparing Two Sample Means (X1 & X2)

Where:

X1 – mean of the first sample

X2 – mean of the second sample

n1 – number of items in the first sample

n2– number of items in the second sample

δ – population standard deviation

X1 - X2

Z =

δ (1/n1) + (1/n2)

Page 24: Statistics Lecture

24

Z- TEST

Where:

p1 – proportion of the first sample

p2 – proportion of the second sample

n1 – number of items in the first sample

n2– number of items in the second sample

q1 = 1 – p1

q2 = 1 – p2

3. Comparing Two Sample Proportions (P1 & P2)

P1 - P2

Z =

(p1q1/n1) + (p2q2/n2)

T- TEST

4. Sample Mean (X) Compared with a Population Mean (μ)

Where:

X – sample mean

μ – population mean

n – number of items in the sample

s – sample standard deviation

( X – μ) n – 1

t =

s

T- TEST

5. Comparing Two Sample Means (X1 & X2)

Where:

X1 – mean of the first sample

X2 – mean of the second sample

n1 – number of items in the first sample

n2– number of items in the second sample

s1 – standard deviation of the first sample

s2 – standard deviation of the second sample

X1 – X2

t =

(n1 – 1)(s1)2 + (n2 – 1)(s2)

2 1 + 1

n1 + n2 – 2 n1 n2

Example 1

Data from a school census show that the

mean weight of college students is 45 kilos with a

standard deviation of 3 kilos. A sample of 100

college students were found to have a mean of 47

kilos. Are the college students really heavier than

the rest using the 0.05 level of significance?

Page 25: Statistics Lecture

25

Example 2

A researcher wishes to find out whether or not there

is significant difference in the monthly allowance of

morning and afternoon students in his school. By random

sampling, he took a sample of 239 students in the morning

session. The students were found to have a mean monthly

allowance of P142.00. The researcher also took a sample of

209 students in the afternoon session . They were found to

have a mean monthly allowance of P148.00. The population

of students in that school have a standard deviation of

P40.00. Is there a significant difference between the two

samples at 0.01 level?

Example 3

A sample survey of television programs in

Metro Manila shows that 80 out of 200 men and 75

out of 250 women dislike “May Bukas Pa”

program. One likes to know whether the difference

between the two sample proportions, 80/200 = 0.40

and 75/250 = 0.30, is significant or not at 0.05

level.

Example 4

A researcher knows that the average height of

Filipino women is 1.525 meters. A random sample

of 26 women was taken and was found to have a

mean height of 1.56 meters, with a standard

deviation of 0.10 meters. Is there reason to believe

that the 26 women are significantly taller than the

rest using the 0.05 level of significance?

Example 5

Beta company is manufacturing steel wire

with an average tensile strength of 50 kilos. The

laboratory tests 16 pieces and finds that the mean is

47 kilos with a standard deviation of 15 kilos. Are

the results in accordance with the hypothesis that

the population mean is 50 kilos?

Page 26: Statistics Lecture

26

Example 6

It is known from the records of the city

schools that the standard deviation of math test

scores on ABC test is 5. A sample of 200 students

from the system was taken and it was found out that

the sample mean is 75. Previous tests showed the

population mean to be 70. Is it safe to conclude that

the sample is significantly different from the

population at 0.01 level?

Example 7

Two types of rice varieties are being considered for

yield and a comparison is needed. Thirty hectares were

planted with the rice varieties exposed to fairly uniform

conditions. The results are tabulated below:

Variety A Variety B

Average yield 80 sack/hec 85 sack/hec

Sample Variance 5.90 12.10

Is there significant difference in the yield of the two

varieties at 0.05 level of significance?

Example 8

A manufacturer of flashlight batteries claims

that the average life of his product will exceed 40

hours. A company is willing to buy a very large

shipment of batteries provided the claim is true. A

random sample of 36 batteries is tested, and it was

found out that the sample mean is 45 hours. If the

population of batteries has a standard deviation of 5

hours, is it likely that the batteries will be bought?

Example 9

A company is trying to decide which brand of two

types to buy for their trucks. They would like to adopt Brand

c unless there is some evidence that Brand D is better. An

experiment was conducted where 16 from each brand were

used. The tires were run under uniform conditions until they

wore out. The results are:

Brand C: X1 = 40,000 km s1 = 5,400 km

Brand D: X2 = 38,000 km s2 = 3,200 km

What conclusion can be drawn?

Page 27: Statistics Lecture

27

Example 10

All freshmen in a particular school were

found to have a variability in grades expressed as a

standard deviation of 3. two samples among these

freshmen, made up of 20 and 50 students each,

were found to have means of 88 and 85respectively.

Based on their grades, is the first group really

brighter than the second group using 0.01 level of

significance?

Analysis of Variance (F - Test)

-A test that was developed by Ronald A. Fisher

-A technique in inferential statistics designed to test

whether or not more than two samples (or groups)

are significantly different from each other

Analysis of Variance

Steps:

1. Compute for the sum of squares

TSS =

N

xx

2

2)(

SSB = N

xx

rij

2

2)(

)(1

SSW = TSS – SSB

Analysis of Variance

2. Compute degrees of freedom

dft = rk – 1 = N – 1

dfb = k – 1

dfw = dft – dfb

Page 28: Statistics Lecture

28

Analysis of Variance

3. Compute for the mean sum of squares

MSSB =

MSSW =

dfb

SSB

dfw

SSW

4. Compute for the F – Ratio

F = MSSW

MSSB

Contingency Table for ANOVA

Sources of

Variation

Sum of

Squares

Degree of

Freedom

(df)

Mean Sum

of Squares

F – Ratio

Between

Column

SSB dfb MSSB

Within

Column

SSW dfw MSSW

Total TSS dft

Exercise

1. The weights in kilograms of three groups of 5 members

each are shown in the table below. Is there unusual

variation among the groups? ( use ∞ = 0.05)

Members

Group

A B C

1 50 60 53

2 48 40 55

3 55 50 40

4 50 60 40

5 46 52 47

Exercise

2. The following are the mileage obtained after several road tests were

run using 5 different kinds of gasoline on a Toyota Car.

Is there significant difference among the mileage yields, at 1% level?

Road

Test

Type of Gasoline

A B C D E

1ST 35 61 38 65 56

2ND 31 63 54 60 69

3RD 42 50 47 57 70

4TH 48 42 60 55 50

5TH 40 49 55 60 48

Page 29: Statistics Lecture

29

Exercise

3. Below are the bowling scores of four groups og four

members each. At 5% significance level, find out if there

is unusual variation among the groups.

Members Group

A B C D

1 98 100 87 90

2 78 95 92 93

3 95 90 105 95

4 110 85 88 97

Chi – Square Test (X2)

- Used to test significant difference or relationship

- Used if data are in frequencies (enumeration data)

USES:

1. to test the goodness of fit of a normal curve; that is to

find out whether or not a sample distribution conforms

with the hypothetical normal distribution

2. to find out whether or not an observed proportion is

equal to some given ideal or expected proportion

3. to test the independence of one variable from another

variable.

Formulas:

i. For a 2 x 2 table (with YATE’s correction for continuity)

ii. For a non 2 x 2 table

X2 =

X2 =

EF

EFOF 2)5.0(

EF

EFOF 2)(

Exercise

1. Test the hypothesis that educational attainment does not

depend on socio – economic status for the following 100

persons in a particular community.

Socio – economic

status

Educational Attainment

Finished College Did Not Finish

College

Poor 18 10

Middle Class 28 25

Rich 14 5

Page 30: Statistics Lecture

30

Exercise

2. At 1% significance level, does college academic grade

depend on the high school NSAT results for the following

200 students?

Academic

Grade

NSAT Rating

Low Average High

Above 85 13 25 21

75 – 85 18 31 38

Below 75 14 20 20

Exercise

3. At ABC Company, there are 28 males and 32

females. Out of the 28 males, 10 holds executive

posts and the others do clerical work. Of the 32

females, only 5 hold executive position and the

others do clerical work. Prepare a contingency

table, then test the hypothesis that position is

independent on sex.

Exercise

4. To determine whether type of personality is related to

academic performance, a random sample of 180 high

school students from a certain college were taken and the

data are as follows:

Is there a significant relationship between personality type

and academic performance?

Low Average Average High Average

Introvert 35 30 25

Extrovert 31 23 36

Correlation

and

Regression Analysis

Page 31: Statistics Lecture

31

Regression Analysis

- concerned with the problem of estimation and

forecasting

FORMULA:

y = a + bx

Where:

y predicted score

a y – intercept

b slope of the line

Regression Analysis

Where:

Y mean of the y values

X mean of the x values

b =

a = Y – bX

22 xxn

yxxyn

Correlation Analysis

- Concerned in the relationship of the changes of

the variables

Formula: Pearson Product Moment Correlation (r)

r = ))(()( yxxyn

2222 )()(][)()([ yynxxn

Range of Values: r = [-1, 1]

(+) r – shows a direct positive relationship

(- ) r – shows a negative or inverse relationship

r = 0 this indicates no relationship

r = 1 perfect positive relationship

r = -1 perfect negative relationship

Page 32: Statistics Lecture

32

Interpretation:

Pearson r Qualitative Description

± 1 Perfect Correlation

± 0.91 – ± 0.99 Very High

± 0.71 – ± 0.90 High

± 0.41 – ± 0.70 Marked

± 0.21 – ± 0.40 Slight/Low

0 – ± 0.20 Negligible

Testing the Significance of r

t = r 2

2

1

)2(

r

n

Exercise

1. It is generally known that the number of road accidents is inversely

proportional with road width. The following data shows the result of

a study indicating the number of accidents occurring per hundred

thousand vehicles.

a. draw a scatter diagram

b. find the equation of the LSRL

c. predict accident frequency for a road whose width is 55 feet;

48 feet

d. find the degree of relationship between road width and

accident frequency.

Road width (in feet) (x) 75 52 60 33 22

Number of accidents (y) 40 84 55 92 90

Exercise

2. The following table shows the final grades of ten students

in Algebra and Statistics.

a. draw a scatter diagram

b. find the equation of the LSRL

c. predict grade in Statistics if grade in

Algebra is 78; 82; 89; 95; 100

d. find the degree of relationship between grades in

Algebra and Statistics

Algebra (x) 75 80 93 65 87 71

Statistics (y) 82 78 86 72 91 80

Page 33: Statistics Lecture

33

Pilar B. Acorda

Email Address : [email protected]

Mobile Number: 09359547319