Statistics Lecture
description
Transcript of Statistics Lecture
1
St. Paul University Philippines
Graduate School
A Course Presentation in Statistics
Course Content
• Basic Concepts in Statistics
• Measures of Central Tendency
• Measures of Variability
• Correlation and Regression Analysis
• Test of Hypothesis
– Z – Test
– T – Test
– Chi – Square Test
– Analysis of Variance (ANOVA)
• EXPLORING THE SPSS
Course Requirements
Reaction Paper/ Film Clip Analysis
Problem Set
Final Examination
Reaction Paper (Film Clip Analysis)
Lies, Damned Lies and Statistics: The
Misapplication of Statistics in
Everyday Life
2
Statistics defined . . .
• STATISTICS is a collection of methods for
planning experiment, obtaining data, and
then organizing, summarizing, presenting,
analyzing, interpreting and drawing
conclusions based on the data.
Main Divisions
Descriptive Statistics
- summarize or describe the important
characteristics of a known set of
population data
Inferential Statistics
-use sample data to make inferences (or
generalizations) about a population
Population vs. Sample
• A POPULATION is the complete collection of
elements (scores, people, measurements, and so
on)
• A SAMPLE is a portion / subset of elements
drawn from a population
Parameter vs. Statistic
• A PARAMETER is a numerical measurement
describing some characteristics of a population
• A STATISTIC is a numerical measurement
describing some characteristic of a sample
3
Qualitative vs. Quantitative Data
• Qualitative (categorical or attribute) data
can be separated into different categories
that are distinguished by some non –
numerical characteristics
• Quantitative data consists of numbers
representing counts or measurements
Discrete vs Continuous Data
• Discrete data result from either a finite number of
possible values or a countable number of possible
values (that is, the number of possible values are
0, 1, 2, or more)
• Continuous data result from infinitely many
possible values that can be associated with points
on a continuous scale in such a way that there are
no gaps or interruptions
Dependent vs Independent Variable
• Dependent variable – the variable that is being
affected
- the variable that is being
explained
• Independent variable – the variable that affects
- the variable that explains
Nominal Level of Measurement
• The nominal level of measurement is
characterized by data that consists of names,
labels or categories only. The data cannot be
arranged in an ordering scheme
• Examples:
gender of employees, civil status,
nationality, religion, etc
4
Ordinal Level of Measurement
• The ordinal level of measurement involves
data that may be arranged in some order, but
differences between data values are either
meaningless or cannot be determined.
• Examples:
good, better or best speakers; 1 star, 2 star
or 3 star movie; rank of an employee
Interval Level of Measurement
• The interval level of measurement is like the
ordinal level, with the additional property that
meaningful amounts of differences between data
can be determined. However, there are no inherent
(natural) zero starting point
• Examples:
body temperature, year (2007, 2008, 2013, etc)
Ratio Level of Measurement
• The ratio level of measurement is the
interval modified to include the inherent
zero starting point. For values at this level,
differences and ratios are meaningful.
• Examples:
weights, lengths, distance traveled
Visual Summary of the Scales of Measurement
Are there named categories?
YES NO
Nominal scale of measurement `
Are the scores ranked?
YES NO
Ordinal scale of measurement Are there equal intervals with a
meaningful zero point?
YES NO
Ratio scale of measurement Interval scale of measurement
5
Measures of
Central
Tendency
(UNGROUPED
DATA)
Mean Median Mode
The Mean
• Two Forms
– Simple mean
– Weighted mean
The mean takes the symbol X.
Arithmetic Mean (Mean)
“balancing point” of a set of scores
the “average score”
The Mean
If you have a
Population Sample
Total number of cases is N
Sum of the scores is ΣX
Compute the mean of the
population
∑X
N
Total number of cases is n
Sum of the scores is ΣX
Compute the mean of the
sample
X = ΣX
n µ =
6
Simple Arithmetic Mean
XX
n
Where:
x = an individual
score
n = the number of
scores/cases
Sigma or x= sum of
the individual score
values
Example:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Consider the following data set:
XX
n
Solution:
= 1 + 2 + 3 + 4 + 5 + 6 + 7+ 8 + 9 + 10
10
Mean = 5.5
Example:
• The following data represents the ages of the mothers
of Paulinian Graders randomly selected from four
different grade levels who attended a session on
Counseling. What is the mean age of the mothers per
grade level?
• Grade 1: 35, 37, 45, 54, 39, 48
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
Solution:
• To obtain the mean age of the mothers of the Grade 1,
we have
X = 35 + 37 + 45 + 54 + 39 + 48
6
= 258
6
X = 43
**This means that the mothers of the Grade 1 pupils are relatively young.
• To obtain the mean age of the mothers of the Grade 1,
we have
X = 35 + 37 + 45 + 54 + 39 + 48
6
= 258
6
X = 43
7
Example:
• Find the mean of the other grade levels. Round off
your answers to the nearest hundredths.
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
Answers:
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
ANSWER: 53.73
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
ANSWER: 50
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
ANSWER: 52.44
Weighted Mean
Where:
w = weight per item value
x = individual score values
Xw = w1X1 + w2X2 + w3X3 + . . . + wnXn
Total number of weights
Example: • The following are the responses of 30 randomly chosen
respondents in one item of a research questionnaire.
** Find the weighted response of the respondents and
interpret the result.
Verbal Description Weight No. of Responses
Very strongly agree 5 7
Strongly agree 4 11
Agree 3 9
Disagree 2 2
Strongly disagree 1 1
8
Solution:
• To obtain for the weighted response, we have
X = 5(7) +4(11) + 3(9) +2(2) + 1(1)
30
= 111
30
X = 3.70 strongly agree
Interpretation of Values
Range Verbal Description
4.20 – 5.00 Very strongly agree
3.40 – 4.19 Strongly agree
2.60 – 3.39 Agree
1.80 – 2.59 Disagree
1.00 – 1.79 Strongly disagree
Exercise:
• Construct a likert scale to interpret items of a
questionnaire with weights 1 – 4.
• Assume the following descriptions were used:
4 – always
3 – sometimes
2 – seldom
1 – never
Example: • The following are the grades of one student one
summer term.
** Find the weighted average of the student.
** What could have been the student’s average if all his
subjects are of equal weights?
Subject No. of Units Grade
Statistics 3 98
PE 2 90
Chemistry 5 93
9
Characteristics of the Mean
an interval statistic
calculated average
affected by extreme values
most widely used
most sensitive measure
value is determined by every
case in the distribution
A
B
C D E
(– 1) + (– 2) + (– 2) + 1 + 4 = 0
3 4 5 6 7 8 9
(+4) (-2)
(-2)
(-1) (+1)
sum of the deviations about the mean is zero
Median
the value that lies in the middle after ranking all
the scores
positional measure
the midpoint or the
50th percentile of a
distribution
Median
the value at which 1/2 of the ordered scores fall above
and 1/2 of the scores fall below
1 2 3 4 5 1 2 3 4
Median = 3 Median = 2.5
n = odd n = even
10
Example I am the 4th
observation. I
am the median.
Example:
5.40 1.10 0.42 0.73 0.48 1.10
0.42 0.48 0.73 1.10 1.10 5.40
0.73 + 1.10
2
(even number of values – no exact middle shared by two numbers)
MEDIAN is 0.915
Example Example
5.40 1.10 0.42 0.73 0.48 1.10 0.66
0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values)
exact middle MEDIAN is 0.73
an ordinal statistic
rank or position average
not affected by extreme values
can be subjected to a few
mathematical computations
less widely used than the mean
represents a typical score
Characteristics
of the Median
11
• The following data represents the ages of the mothers
of Paulinian Graders randomly selected from four
different grade levels who attended a session on
Counseling. What is the median of the ages of the
mothers per grade level?
• Grade 1: 35, 37, 45, 54, 39, 48
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
Exercise Mode
the value which occurs most frequently in a given data
set
does not involve any calculation or ordering of data
Example
Observation Value/
Score
1 5
2 7
3 3
4 8
5 7
Consider the following data set:
Examples
a. 5.40 1.10 0.42 0.73 0.48 1.10
b. 27 27 27 55 55 55 88 88 99
c. 1 2 3 6 7 8 9 10
Mode is 1.10
Bimodal - 27 & 55
No Mode
12
a nominal statistic
Characteristics of
the Mode
an inspection average
most frequently occurring value
cannot be manipulated mathematically
rarely used
most “popular score
Advantages Disadvantages
Mode Quick and easy to
calculate.
May not be representative
of the whole sample
Median Fairly easy to calculate.
Half of the scores lie
above the median.
Tedious to find for a large
set of numbers or for a set
that is not in order
Mean Takes all numbers into
account.
Can be affected by outliers
Which is best?
When to use . . .
Mean -an interval interpretation is needed
-the value of each score is desired
-further statistical computation is expected
Median -an ordinal interpretation is needed
-the middle score is desired
-avoidance of the influence of extreme values is
needed
Mode -a nominal interpretation needed
-a quick approximation of a central tendency
measure is desired
-most frequently occurring score is needed
Measures of
Central
Tendency
(GROUPED
DATA)
Mean Median Mode
13
The Mean
i.) Classmark method
X = n
fxm
Where:
Xm – class mark / class midpoint
f – frequency
n – number of cases / observations
The Mean
ii.) Coded – deviation method
X = AM + n
fdi
Where:
AM – assumed mean (Xm of where the zero deviation is set)
f – frequency
d – deviation
n – number of cases / observations
Example
**Find the mean, median and mode of the following
data set:
X F
24 – 26 3
21 – 23 12
18 – 20 10
15 – 17 6
12 – 14 6
9 – 11 5
6 – 8 5
3 – 5 3
The Median
Md = XLB + 2
n- cfp
f i
Where:
XLB – lower boundary of the median class
cfp – cumulative frequency preceding the median class
n – number of cases
f – frequency of the median class
i – class size/width
Md = XLB + f
i
14
The Mode
Mo = XLB + ∆1 i
∆1 + ∆2 Where:
XLB – lower boundary of the modal class
∆1 – difference between frequency of the modal class
and frequency below it
∆2 – difference between frequency of the modal class
and frequency above it
i – class size/width
Exercise
X F
56 – 62 4
49 – 55 9
42 – 48 12
35 – 41 12
28 – 34 10
21 – 27 8
14 – 20 6
7 – 13 4
**Find the mean, median and mode of the following
data set:
Other Measures of Position
(QUANTILES)
1. Quartile (Qk ) – divides the distribution into 4
equal parts
2. Decile (Dk )- divides the distribution into 10
equal parts
3. Percentile (Pk )- divides the distribution into
100 equal parts
Where:
XLB – lower boundary of the quartile class
cfp – cumulative frequency preceding the quartile class
n – number of cases
f – frequency of the quartile class
i – class size/width
- cfp
Qk = XLB + f
i
The Quartile
4
kn
15
The Decile
Where:
XLB – lower boundary of the decile class
cfp – cumulative frequency preceding the decile class
n – number of cases
f – frequency of the decile class
i – class size/width
- cfp
Dk = XLB + i 10
kn
f
Where:
XLB – lower boundary of the percenttile class
cfp – cumulative frequency preceding the percentile class
n – number of cases
f – frequency of the percentile class
i – class size/width
The Percentile
f
- cfp
Pk = XLB + 100
kn
i
Exercise
**Using the frequency distribution below, find:
1. Q1 3. D3 5. P3
2. D6 4. P78
X F
56 – 62 6
49 – 55 9
42 – 48 10
35 – 41 12
28 – 34 10
21 – 27 8
14 – 20 6
7 – 13 4
Measures of Variability
• The statistical tool used to describe the degree to
which scores/ observations are scattered.
•It is used to determine the degree of consistency /
homogeneity of scores.
1. range
2. mean absolute deviation
3. semi – interquartile range/ quartile deviation
4. variance
5. standard deviation
16
Formulas (Ungrouped Data)
1. Range
R =HOV – LOV
2. Mean absolute deviation
MAD =
3. Semi – interquartile range/ quartile deviation
QD = Q3 – Q1
2
n
XX //
4. Variance
s2 =
5. Standard deviation
s =
1
)( 2
n
XX
2s
Formulas (Ungrouped Data)
Exercise:
• Given the following data, find the range, MAD,
variance and the standard deviation.
20, 26, 40, 39, 35
Application:
• Two seemingly equally excellent students are
vying for an academic honor where only one must
have to be chosen to get the award. The following
are their grades which are used as a basis for giving
the award.
•Student A: 90, 92, 92, 94, 95
•Student B: 90, 91, 93, 94, 95
•Who do you think deserves the award? Why?
17
Guiding Principle
• The lesser the value of the measure, the
more consistent, the more homogenous and
the less scattered are the observations in the
set of data.
Formulas (Grouped Data)
1. Range
R = HOV – LOV
2. Mean absolute deviation
MAD =
3. Semi – interquartile range/ quartile deviation
QD = Q3 – Q1
2
n
XXf m //
Formulas (Grouped Data)
4. Variance
s2 =
5. Standard deviation
s =
1
)( 2
n
XXf m
2s
X F
56 – 62 6
49 – 55 9
42 – 48 10
35 – 41 12
28 – 34 10
21 – 27 8
14 – 20 6
7 – 13 4
**Using the frequency distribution below, find:
1. Range 3. QD 5. Standard Deviation
2. MAD 4. variance
Exercise:
18
Tests of Hypothesis
Hypothesis
•A statement or tentative theory which aims to
explain facts about the real world
•An educated guess
•It is subject for testing. If it is found to be
statistically true, it is accepted. Otherwise, it gets
rejected.
Kinds of Hypotheses
1. Null Hypothesis (Ho)
• It serves as the working hypothesis
• It is that which one hopes to accept or reject
• It must always express the idea of no
significant difference
2. Alternative Hypothesis (H1 or Ha)
• It generally represents the hypothetical
statement that the researcher wants to prove.
Types of Alternative Hypotheses (Ha)
1. Directional hypothesis
expresses direction
one – tailed
uses order relation of “greater than” or “less than”,
2. Non – directional hypothesis
does not express direction
two – tailed
uses the “not equal to”
Type I and Type II Errors
When making a decision about a proposed
hypothesis based on the sample data, one runs the
risk of making an error. The following table on the
next slide summarizes the possibilities:
19
Type I and Type II Errors
A Type I error is the mistake of rejecting the null
hypothesis when it is true.
The symbol ∞ (alpha) is used to represent the probability
of a type I error.
A Type II error is the mistake of failing to reject the null
hypothesis when it is false.
The symbol (beta) is used to represent the probability of
a type II error.
Level of Significance
The probability of making Type I error or alpha
error in a test is called the significance level of the
test. The significance level of a test is the maximum
value of the probability of rejecting the null
hypothesis (Ho) when in fact it is true.
Critical Region
The critical region (or rejection region) is the set of all values
of the test statistic that cause us to reject the null hypothesis.
P - value Critical - value
Region of
acceptance
Region of
rejection
20
Critical Value
A critical value is any value that separates the
critical region (where we reject the null
hypothesis) from the values of the test statistic
that do not lead to rejection of the null
hypothesis, the sampling distribution that
applies, and the significance level .
P - Value
The P-value (probability value) is the probability of
getting a value of the test statistic that is at least as
extreme as the one representing the sample data,
assuming that the null hypothesis is true. The null
hypothesis is rejected if the P-value is very small,
such as 0.05 or less.
Two-tailed, Right-tailed and
Left-tailed Tests
• The tails in a distribution are the extreme
regions bounded by critical values.
Two-tailed Tests Given:
H0: = ; H1: ≠
21
Right – tailed Tests
Given:
H0: = ; H1: >
Left – tailed Tests
Given:
H0: = ; H1: <
Steps in Hypothesis Testing
1. Formulate the null hypothesis (Ho) that there is no
significant difference between the items compared. State
the alternative hypothesis (Ha) which is used in case Ho
is rejected.
2. Set the level of significance of the test, .
3. Determine the test to be used.
Z – TEST – used if the population standard deviation
is given
T – TEST – used if the sample standard deviation is
given
Steps in Hypothesis Testing
4. Determine the tabular value of the test.
***For a Z – test, the table below summarizes the
critical values at varying significance levels
Type of
Test
Level of Significance
0.10 0.05 0.025 0.01
One –
Tailed
± 1. 28 ± 1. 645 ± 1.96 ± 2.33
Two –
Tailed
± 1.645 ± 1.96 ± 2.33 ± 2. 58
22
Steps in Hypothesis Testing
4. Determine the tabular value of the test.
***For a T – test, one must compute first the
degree/s of freedom (df) then look for the tabular
value from the table of Students’ T – Distribution.
i. For a single sample
df = n – 1
ii. For two samples
df = n1 + n2 – 2
Steps in Hypothesis Testing
5. Compute for z or t as needed. Vary your solutions using
the formulas:
For z – test
i. Sample mean compared with a population mean
ii. Comparing two sample means
iii. Comparing two sample proportions
For t – test
i. Sample mean compared with a population mean
ii. Comparing two sample means
Steps in Hypothesis Testing
6. Compare the computed value with its
corresponding tabular value, then state your
conclusions based on the following guidelines:
Reject Ho if the absolute computed value is
equal to or greater than the absolute tabular value
Accept Ho if the absolute computed value is less
than the absolute tabular value
Decision Criterion
Traditional Method:
***Reject H0 (Accept H1 ) if the test
statistic falls within the critical region.
***Fail to reject H0 (Accept Ho) if the
test statistic does not fall within the critical
region.
23
Decision Criterion
P - value method:
*** Reject Ho (Accept H1 ) if P-value
(where is the significance level, such as
0.05)
***Fail to reject H0 (Accept Ho) if
P-value >
Decision Criterion
Another option:
Instead of using a significance level
such as 0.05, simply identify the P-value and
leave the decision to the reader.
Z - TEST
1. Sample Mean (X) Compared with a Population Mean (μ)
Where:
X – sample mean
μ – population mean
n – number of items in the sample
δ – population standard deviation
( X – μ) n
Z =
δ
Z - TEST
2. Comparing Two Sample Means (X1 & X2)
Where:
X1 – mean of the first sample
X2 – mean of the second sample
n1 – number of items in the first sample
n2– number of items in the second sample
δ – population standard deviation
X1 - X2
Z =
δ (1/n1) + (1/n2)
24
Z- TEST
Where:
p1 – proportion of the first sample
p2 – proportion of the second sample
n1 – number of items in the first sample
n2– number of items in the second sample
q1 = 1 – p1
q2 = 1 – p2
3. Comparing Two Sample Proportions (P1 & P2)
P1 - P2
Z =
(p1q1/n1) + (p2q2/n2)
T- TEST
4. Sample Mean (X) Compared with a Population Mean (μ)
Where:
X – sample mean
μ – population mean
n – number of items in the sample
s – sample standard deviation
( X – μ) n – 1
t =
s
T- TEST
5. Comparing Two Sample Means (X1 & X2)
Where:
X1 – mean of the first sample
X2 – mean of the second sample
n1 – number of items in the first sample
n2– number of items in the second sample
s1 – standard deviation of the first sample
s2 – standard deviation of the second sample
X1 – X2
t =
(n1 – 1)(s1)2 + (n2 – 1)(s2)
2 1 + 1
n1 + n2 – 2 n1 n2
Example 1
Data from a school census show that the
mean weight of college students is 45 kilos with a
standard deviation of 3 kilos. A sample of 100
college students were found to have a mean of 47
kilos. Are the college students really heavier than
the rest using the 0.05 level of significance?
25
Example 2
A researcher wishes to find out whether or not there
is significant difference in the monthly allowance of
morning and afternoon students in his school. By random
sampling, he took a sample of 239 students in the morning
session. The students were found to have a mean monthly
allowance of P142.00. The researcher also took a sample of
209 students in the afternoon session . They were found to
have a mean monthly allowance of P148.00. The population
of students in that school have a standard deviation of
P40.00. Is there a significant difference between the two
samples at 0.01 level?
Example 3
A sample survey of television programs in
Metro Manila shows that 80 out of 200 men and 75
out of 250 women dislike “May Bukas Pa”
program. One likes to know whether the difference
between the two sample proportions, 80/200 = 0.40
and 75/250 = 0.30, is significant or not at 0.05
level.
Example 4
A researcher knows that the average height of
Filipino women is 1.525 meters. A random sample
of 26 women was taken and was found to have a
mean height of 1.56 meters, with a standard
deviation of 0.10 meters. Is there reason to believe
that the 26 women are significantly taller than the
rest using the 0.05 level of significance?
Example 5
Beta company is manufacturing steel wire
with an average tensile strength of 50 kilos. The
laboratory tests 16 pieces and finds that the mean is
47 kilos with a standard deviation of 15 kilos. Are
the results in accordance with the hypothesis that
the population mean is 50 kilos?
26
Example 6
It is known from the records of the city
schools that the standard deviation of math test
scores on ABC test is 5. A sample of 200 students
from the system was taken and it was found out that
the sample mean is 75. Previous tests showed the
population mean to be 70. Is it safe to conclude that
the sample is significantly different from the
population at 0.01 level?
Example 7
Two types of rice varieties are being considered for
yield and a comparison is needed. Thirty hectares were
planted with the rice varieties exposed to fairly uniform
conditions. The results are tabulated below:
Variety A Variety B
Average yield 80 sack/hec 85 sack/hec
Sample Variance 5.90 12.10
Is there significant difference in the yield of the two
varieties at 0.05 level of significance?
Example 8
A manufacturer of flashlight batteries claims
that the average life of his product will exceed 40
hours. A company is willing to buy a very large
shipment of batteries provided the claim is true. A
random sample of 36 batteries is tested, and it was
found out that the sample mean is 45 hours. If the
population of batteries has a standard deviation of 5
hours, is it likely that the batteries will be bought?
Example 9
A company is trying to decide which brand of two
types to buy for their trucks. They would like to adopt Brand
c unless there is some evidence that Brand D is better. An
experiment was conducted where 16 from each brand were
used. The tires were run under uniform conditions until they
wore out. The results are:
Brand C: X1 = 40,000 km s1 = 5,400 km
Brand D: X2 = 38,000 km s2 = 3,200 km
What conclusion can be drawn?
27
Example 10
All freshmen in a particular school were
found to have a variability in grades expressed as a
standard deviation of 3. two samples among these
freshmen, made up of 20 and 50 students each,
were found to have means of 88 and 85respectively.
Based on their grades, is the first group really
brighter than the second group using 0.01 level of
significance?
Analysis of Variance (F - Test)
-A test that was developed by Ronald A. Fisher
-A technique in inferential statistics designed to test
whether or not more than two samples (or groups)
are significantly different from each other
Analysis of Variance
Steps:
1. Compute for the sum of squares
TSS =
N
xx
2
2)(
SSB = N
xx
rij
2
2)(
)(1
SSW = TSS – SSB
Analysis of Variance
2. Compute degrees of freedom
dft = rk – 1 = N – 1
dfb = k – 1
dfw = dft – dfb
28
Analysis of Variance
3. Compute for the mean sum of squares
MSSB =
MSSW =
dfb
SSB
dfw
SSW
4. Compute for the F – Ratio
F = MSSW
MSSB
Contingency Table for ANOVA
Sources of
Variation
Sum of
Squares
Degree of
Freedom
(df)
Mean Sum
of Squares
F – Ratio
Between
Column
SSB dfb MSSB
Within
Column
SSW dfw MSSW
Total TSS dft
Exercise
1. The weights in kilograms of three groups of 5 members
each are shown in the table below. Is there unusual
variation among the groups? ( use ∞ = 0.05)
Members
Group
A B C
1 50 60 53
2 48 40 55
3 55 50 40
4 50 60 40
5 46 52 47
Exercise
2. The following are the mileage obtained after several road tests were
run using 5 different kinds of gasoline on a Toyota Car.
Is there significant difference among the mileage yields, at 1% level?
Road
Test
Type of Gasoline
A B C D E
1ST 35 61 38 65 56
2ND 31 63 54 60 69
3RD 42 50 47 57 70
4TH 48 42 60 55 50
5TH 40 49 55 60 48
29
Exercise
3. Below are the bowling scores of four groups og four
members each. At 5% significance level, find out if there
is unusual variation among the groups.
Members Group
A B C D
1 98 100 87 90
2 78 95 92 93
3 95 90 105 95
4 110 85 88 97
Chi – Square Test (X2)
- Used to test significant difference or relationship
- Used if data are in frequencies (enumeration data)
USES:
1. to test the goodness of fit of a normal curve; that is to
find out whether or not a sample distribution conforms
with the hypothetical normal distribution
2. to find out whether or not an observed proportion is
equal to some given ideal or expected proportion
3. to test the independence of one variable from another
variable.
Formulas:
i. For a 2 x 2 table (with YATE’s correction for continuity)
ii. For a non 2 x 2 table
X2 =
X2 =
EF
EFOF 2)5.0(
EF
EFOF 2)(
Exercise
1. Test the hypothesis that educational attainment does not
depend on socio – economic status for the following 100
persons in a particular community.
Socio – economic
status
Educational Attainment
Finished College Did Not Finish
College
Poor 18 10
Middle Class 28 25
Rich 14 5
30
Exercise
2. At 1% significance level, does college academic grade
depend on the high school NSAT results for the following
200 students?
Academic
Grade
NSAT Rating
Low Average High
Above 85 13 25 21
75 – 85 18 31 38
Below 75 14 20 20
Exercise
3. At ABC Company, there are 28 males and 32
females. Out of the 28 males, 10 holds executive
posts and the others do clerical work. Of the 32
females, only 5 hold executive position and the
others do clerical work. Prepare a contingency
table, then test the hypothesis that position is
independent on sex.
Exercise
4. To determine whether type of personality is related to
academic performance, a random sample of 180 high
school students from a certain college were taken and the
data are as follows:
Is there a significant relationship between personality type
and academic performance?
Low Average Average High Average
Introvert 35 30 25
Extrovert 31 23 36
Correlation
and
Regression Analysis
31
Regression Analysis
- concerned with the problem of estimation and
forecasting
FORMULA:
y = a + bx
Where:
y predicted score
a y – intercept
b slope of the line
Regression Analysis
Where:
Y mean of the y values
X mean of the x values
b =
a = Y – bX
22 xxn
yxxyn
Correlation Analysis
- Concerned in the relationship of the changes of
the variables
Formula: Pearson Product Moment Correlation (r)
r = ))(()( yxxyn
2222 )()(][)()([ yynxxn
Range of Values: r = [-1, 1]
(+) r – shows a direct positive relationship
(- ) r – shows a negative or inverse relationship
r = 0 this indicates no relationship
r = 1 perfect positive relationship
r = -1 perfect negative relationship
32
Interpretation:
Pearson r Qualitative Description
± 1 Perfect Correlation
± 0.91 – ± 0.99 Very High
± 0.71 – ± 0.90 High
± 0.41 – ± 0.70 Marked
± 0.21 – ± 0.40 Slight/Low
0 – ± 0.20 Negligible
Testing the Significance of r
t = r 2
2
1
)2(
r
n
Exercise
1. It is generally known that the number of road accidents is inversely
proportional with road width. The following data shows the result of
a study indicating the number of accidents occurring per hundred
thousand vehicles.
a. draw a scatter diagram
b. find the equation of the LSRL
c. predict accident frequency for a road whose width is 55 feet;
48 feet
d. find the degree of relationship between road width and
accident frequency.
Road width (in feet) (x) 75 52 60 33 22
Number of accidents (y) 40 84 55 92 90
Exercise
2. The following table shows the final grades of ten students
in Algebra and Statistics.
a. draw a scatter diagram
b. find the equation of the LSRL
c. predict grade in Statistics if grade in
Algebra is 78; 82; 89; 95; 100
d. find the degree of relationship between grades in
Algebra and Statistics
Algebra (x) 75 80 93 65 87 71
Statistics (y) 82 78 86 72 91 80