Statistical Methods for Engineering.
-
Upload
norhasan-abdul-jamil -
Category
Career
-
view
456 -
download
8
Transcript of Statistical Methods for Engineering.
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P01 – SUMMARIZE AND PRESENT
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 3
Summarize and Present It is reported that male polytechnic students have a higher failure rate in the National Physical Fitness Award (Napfa) test compared with JC students. This is despite the fact that many polytechnic students play sports or exercise at least once a week. To help male students pass the Napfa test, your school has introduced an exercise program to help improve their fitness level. Attached are some data from the exercise program:
Data.xls
You are asked to summarize the data and present them in a form that is useful and easy to understand at a glance. How would you describe and measure the different sets of data? What display graphs or tools can you use?
SCHOOL OF
ENGINEERING
Page 3 of 3
1School of Engineering
E214 – Statistical Methods for
Engineering
P01 – Summarize and Present
2School of Engineering
What is Statistics?
• Statistics provides a basis for assessing and drawing a conclusion.
• Statistics plays a critical role in the improvement of the quality of any product or service. It enables engineers to understand phenomena subject to variation and to effectively predict or control them.
• Basic idea behind all statistical methods of data analysis is to make inferences about a population by studying a relatively small sample chosen from it.
• Everything dealing with the collection, processing, analysis, and interpretation of numerical data belongs to the domain of statistics.
3School of Engineering
Descriptive Vs Inferential Statistics
Descriptive
• Enable understanding of important features or provide insight of data through the use of values and graphical presentations
• Purpose is to organize and summarize the data collected in some meaningful forms or measures that are easily understood
• Examples:
Charts, graphs, plots, measures of mean, median, frequency, standard deviation.
Inferential
Consists of:
• Making claims about population from data collected in sample
• Performing estimations about population characteristics and making predictions
• Determining relationships among variables
• Examples:
Hypothesis Testing, ANOVA, correlation analysis
4School of Engineering
Stem-and-Leaf Plot• Simple way to summarize a data set
• Compact way to represent data, and provides some indication of its
shape
• Stem-and-leaf plot displays all the sample values but the order the items
were sampled cannot be determined
• Example of a Stem-and-Leaf Plot
24 24 26 26 26 27 27 27 27 28 29 30 30
31 33 35 36 36 37 37 43 45 45 46 48 49
50 50 51 53 53 55 56 57 58 59 59 60 60
Stem Leaf 2 44666777789 3 001356677 4 355689 5 00133567899 6 00
5School of Engineering
Box Plot
• Box Plot presents the median, first and third quartiles, and outliers. It is used to compare samples.
• The box plot has two whiskers and two parts of the box, each representing one quarter of the data.
• ‘Whiskers’ extend from the top and bottom of the box and end at the most extreme data point that is not an outlier
• Interquartile range (IQR) is the difference between the third quartile and first quartile
• Outliers lie more than 1.5 IQR below the first quartile or 1.5 IQR above the third quartile
6School of Engineering
Anatomy of a Box Plot
X
XX
Outliers
Third Quartile
First Quartile
Median
Largest data point
within 1.5 IQR of
the third quartile
Smallest data point
within 1.5 IQR of the
first quartile
(Taken from Navidi W., Statistics for Engineers and Scientists)
7School of Engineering
Distribution Shape and
Box Plot
Right-SkewedLeft-Skewed Symmetric
Q1 Q1 Q1Q2 Q2 Q2
Q2
Q3 Q3 Q3
Q1 Q3 Q1 Q2 Q3 Q1 Q2 Q3
8School of Engineering
Histogram• Most common form of graphical representation of frequency
distribution
• Useful in displaying shape, location and variability of the data
• Emphasizes irregularities and unusual features
• Sometimes it can be enough to draw a histogram in order to solve an
engineering problem
0
5
10
15
20
25
30
35
40
45
11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90
Marks
Fre
qu
ency
of
Pu
pil
s
9School of Engineering
Cumulative Frequency
A point on the horizontal axis of the cumulative frequency graph represents a
possible data value.
Its corresponding vertical plot gives the number of the data whose value are
less than or equal to it.
A cumulative frequency plot is called an Ogive.
0
20
40
60
80
100
120
140
160
180
10 20 30 40 50 60 70 80
Marks
Cu
mu
lati
ve F
req
uen
cy
10School of Engineering
Pareto Diagram
• Orders each type of failure or defect according to its frequency
• Very useful in the analysis of defect data in manufacturing systems. Helps engineers identify important defects and their causes
• When a process is identified as a candidate for improvement, the first step is to collect data on the frequency of each type of failure and then present the data on a Pareto Diagram
• Always arrange categories in descending order of frequency of occurrence, that is, the most frequently occurring is on the left, followed by the next most frequently occurring
• The horizontal scale of a Pareto Diagram is usually categorical classifications
11School of Engineering
Pareto Diagram
The Pareto Diagram highlights the relatively few types of defects that areresponsible for most of the observed defects.
Pareto diagram is an important part of a quality improvement program as it forces attention to the most critical defects.
Pareto diagram graphically depicts Pareto’s empirical law that any assortment of events consists of a few major and many minor elements. Typically, two or three elements will account for more than half the total frequency.
It is much easier to reduce or eliminate frequently occurring defects than rare ones.
12School of Engineering
Mean
• Defined as the sum of the observations divided by
sample size
• To emphasize that it is based on a set of observations, it
is often referred to as the sample mean
• It indicates the center of the data
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
13School of Engineering
Median• The median of a sample is the middlemost value after the data
is arranged from smallest to largest
• It is not affected by extreme values (outliers)
• Eliminates the effect of extreme (very large or very small) values
• If n numbers are ordered from smallest to largest:– If n is odd, the median is the number in position (n+1)/2
– If n is even, the median is the average of the numbers in positions (n/2) and (n/2 + 1)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
14School of Engineering
Mode
• Most frequently occurring value in a
sample
• There may be no mode, there may be
several modes
• It is not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
15School of Engineering
Time Series Plot• Also known as Line Graph or Run Chart
• Displays data in a time sequence for a given period of
time
• Used to monitor whether there is a systematic change of
the data over time (trend)
16School of Engineering
Scatter Diagram• Different sets of data are plotted on different axes
• Show whether a relationship exists between 2 sets of
data, i.e. how much one is affected by the other
17School of Engineering
Shape of a Distribution
• Describes how data is distributed
• Measures of shape
– Symmetric or Skewed
Mean = Median =ModeMean < Median < Mode Mode < Median < Mean
Right-SkewedLeft-Skewed Symmetric
18School of Engineering
Quartiles and Percentiles
• The quartiles are the 25th, 50th and 75th percentiles– First quartile Q1 = 25th percentile
– Second quartile Q2 = 50th percentile
– Third quartile Q3 = 75th percentile
• Second quartile is equal to the median
• Interquartile range = third quartile - first quartile
• Example:Percentiles are often used to interpret scores on standardized tests.
If a student is informed that her score is on the 70th percentile, this means that 70 percent of students who took the test received lower scores.
19School of Engineering
Determining Quartiles and Percentiles
To calculate the sample 100 p-th Percentile:
1. Order the n observations from smallest to largest
2. Determine the product np
If np is not an integer, round it up to the next integer and find the corresponding ordered value.
If np is an integer, say k, calculate the mean of the k-th and the (k+1)-st ordered observations.
Example:
If n is 80, in order to find Q1, first obtain np
np = (80)(0.25)=20
Since np is an integer, Q1 is obtained by taking the
average of the 20th and 21st ordered observations.
20School of Engineering
Position of Quartiles and Percentiles
75th Percentile
50th Percentile
25th Percentile
Q1 Q2 Q3
Cumulative Frequency Graph
21School of Engineering
Dispersion
• Measures the spread of the values around the central tendency
• 2 common measures: range and standard deviation
• Standard deviation is an important measure of the variation in the data. You will learn more about it!
22School of Engineering
Today’s Problem
23School of Engineering
Conclusion
• Different graphical representations have different advantages
• Stem & Leaf Plot is a compact way to represent data, and provides some indication of its shape
• Box Plot presents the median, first and third quartiles, and outliers. It is used to compare samples
• Histogram is a common form of graphical representation of frequency distribution used for displaying shape, location and variability of the data
• A suitable graphical representation should be chosen depending on what you are interested to display
24School of Engineering
Learning Outcomes
• Differentiate between Descriptive and Inferential Statistics
• Select the Appropriate Data Display Tools– Frequency of Occurrence (Pie Chart, Pareto Chart)
– Distribution of Data (Stem and Leaf Plot, Box Plot, Histogram Plot)
– Trends over Time (Time Series Plot)
– Association (Scatter Diagram)
• Summary Measurements– Distribution
– Measure of Central Tendency (Mean, Median, Mode)
– Dispersion (Range, Standard Deviation)
– Quartiles and Percentiles
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P02 – Describe it with Venn
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Describe it with Venn The Land Transport Authority of Singapore (LTA) is interested to find out about whether Republic Polytechnic students have been adequately served by the public transport system. They are planning to conduct a survey to find out the proportion of students who take public transport to school, as well as the number who get to school on time using public transport. The mode of transport, whether it is by means of bus, train or both, should be indicated in the survey. How would you advise LTA to conduct this survey? When the survey is completed, how can the response be analyzed using a Venn diagram to determine if the public transport system serving Republic Polytechnic is satisfactory?
1School of Engineering
E214 – Statistical Methods for
Engineering
P02 – Describe it with Venn
2School of Engineering
Sample Space and Events
• In statistics, a set of all possible outcomes of an
experiment is called a sample space.
• Sample spaces are usually denoted by the
letter S.
• In statistics, any subset of a sample space is
called an event.
• A subset is any part of a set, including the
whole set, and a set called the empty set (denoted by Ø).
• The empty set has no elements at all.
3School of Engineering
Sample Space and Events
An Example:
• Roll a die and observe the number obtained.
• In this example, rolling the die is the experiment.
• The only possible outcomes are 1, 2, 3, 4, 5 or 6.
• The event that a die comes up an even number
is:
– The sample space for the experiment is S =
{1,2,3,4,5,6}
– Coming up an even number corresponds to Even =
{2,4,6}
4School of Engineering
Mutually Exclusive Events
• Mutually Exclusive events have no elements in common.
• For example, it is impossible that a coin can come up both
heads and tails.
• Such an event is said to be mutually exclusive.
• The events A and B are said to be mutually exclusive if
they have no outcomes in common.
• Eg. Rain („A‟) or no rain („B‟) at 12pm are mutually
exclusive events
A B
S
5School of Engineering
Union
• If A and B are any two sets in a sample space S, their
union, denoted by A U B, is the subset of S that contains
all elements that are either in A, in B, or in both.
• In words, A U B, means “A and/or B”.
• Eg. Number of students clearing either PP („A‟) or CE („B‟)
or both.
S
A B
6School of Engineering
Intersection• If A and B are any two sets in a sample space S, their
intersection, denoted by A ∩ B, is the subset of S that
contains all elements that are in both A and B.
• In words, A ∩ B means “A and B”.
• In the previous example, A ∩ B indicates the number of
students clearing both CE and PP.
A B
S
7School of Engineering
Complement• The complement of an event A, denoted by Ac, is the
subset of S that contains all the elements of S that are not
in A.
• In words, Ac means “not A”.
• Eg. Ac for Event A which is taking bus means all
responses other than „Bus‟, i.e. „Train‟ or „Both‟ or „Others‟
Ā A
S
8School of Engineering
Probability TheoremsGiven a finite sample space S and an event A in S, we define P(A), the probability of A, to be a value of an additive set function that satisfies the following three conditions:
• Axiom 1 0 ≤ P(A) ≤ 1 for each event A in S.
• Axiom 2 P(S) = 1.
• Axiom 3: If A and B are mutually exclusive events in S, then
P (A U B) = P(A) + P(B).
(An Axiom is any starting assumption from which other statements are logically derived. It requires no proof.)
• Probability functions must be additive.
– If A1, A2, A3, ... are mutually exclusive events in a sample space S, then
P (A1 U A2 U A3 U … U An ) = P(A1) + P(A2) + P(A3) + …+ P(An)
• For any event A, P(Ac) = 1 – P(A)
9School of Engineering
A ∩
B
Addition Rule
When A and B are non-mutually exclusive events in S,
P(A U B) = P(A) + P(B) – P(A ∩ B)
When A and B are mutually exclusive events in S,
P(A U B) = P(A) + P(B), since P(A ∩ B) = 0
A B
S
A U B
10School of Engineering
Mutually Exclusive vs Independent
A B
If A and B are mutually exclusive events
Then, A ∩ B = ø so that probability of A occurring given that B has
occurred is P(AIB)=0 and also P(BIA)=0
Two events are considered be independent if the occurrence of one is not
affected by the occurrence or nonoccurrence of the other.
The below Multiplication Rule applies if and only if A and B are independent:
P(A ∩ B) = P(A).P(B)
Hence, P(AIB) = P(A) and P(BIA) = P(B)
11School of Engineering
Independent Events
• If A and B are independent, then the following pairs are also independent:
– A and Bc , Ac and B, and Ac and Bc
• An ExampleA die is thrown twice. Find the probability of obtaining a 4 on the first throw and an odd number on the second throw.
Let A be the event „a 4 is obtained on the 1st throw‟ => P(A) = 1/6
Let B be the event „an odd number is obtained on the 2nd throw‟ => P(B) = 3/6 = ½, since B = {1, 3, 5}
Since the result of the 2nd throw is clearly not affected by the result of the 1st throw, A and B are independent events.
Hence, P(A ∩ B) = P(A). P(B) = 1/6 . 1/2 = 1/12
12School of Engineering
Today‟s Problem
Survey Questions:
Question 1 What is your primary mode of transport to RP?Response Bus, Train, Both, Others
Question 2 Under normal circumstances, do you usually arrive in RP on time?
Response Yes, No
13School of Engineering
Proposed Solution
S
A
C
B
9
2
1
2
0
D
2
5
4
Events
A : By Bus
B : By Train
C : By Bus and Train
Ac ∩ Bc ∩ Cc : By Others
D : On Time
Dc : Late
A ∩ Dc : Late by Bus
B ∩ D : Punctual by Train
(A ∩ D) U (B ∩ D) U (C ∩ D):
Punctual by Public Transport
14School of Engineering
Analysis
• 16/25 or 64% of respondents take public transport to RP and
arrive on time.
• It may be more meaningful to estimate the probability of students
being on time given that they take public transport (16/23 = 70%).
This is known as conditional probability.
• A follow-up survey can be conducted to determine the reasons of
arriving late by bus
• A bigger sample size is needed to better represent the population
that takes bus to RP and target audience should be selected
randomly to avoid biasness e.g. respondents should have
addresses in many parts of the island
15School of Engineering
Learning Outcomes
• Concept of:– Sample Space
– Events
– Mutually Exclusive Events
– Independent Events
– Unions, Intersections, and Complements
• Venn diagrams
• Additive Rules
• Multiplicative Rules
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P03 – Dependent or Independent
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Dependent or Independent Having developed a microcontroller based actuator that automatically lowers window shades in strong sunlight, Nathan knows he has to put the system through the rigors of testing before he can unveil it. His ego was bruised when home tests conducted by his school mates revealed that the shades did not lower with every incidence of strong sunlight. Knowing that a system running with a reliability of 95 percent is the industry tolerance for microcontroller based actuators, Nathan is determined to ascertain the reliability of his invention. He commissioned a laboratory to test the reliability of the system over the course of 100 incidences of strong sunlight. Attached are the schematic of the system and the results of the tests conducted by the laboratory:
Schematic of the System and Test Data
Based on the data collected, help Nathan analyse the reliability of the system.
1School of Engineering
P03:
Dependent or Independent
E214 – STATISTICAL METHODS IN ENGINEERING
2School of Engineering
Dependent Events
• Two events are said to be dependent if the occurrence
or outcome of the first event affects the probability of
occurrence of the second event.
Probability of both events occurring, P(A ∩ B)
= P(A).P(BIA) = P(B).P(AIB)
• ExampleThere are 2 red balls and 3 blue balls in a bag. If two balls are drawn at random without replacing the balls, find the probability that both balls are red.
P(both red balls) = P(1st ball is red).P(2nd ball is red after first draw is red) = 2/5 x 1/4 = 1/10
The probability that the 2nd ball is red is clearly dependent on the result of the 1st draw.
3School of Engineering
Conditional Probability• Probability of event A occurring given that event B has already occurred
is written as P(A|B)
S
BA
P(A|B) = P(A∩B)/P(B) …….. (1)
P(BIA) = P(B∩A)/P(A) …….. (2)
Since P(A∩B) = P(B∩A), equating (1) and (2),
P(AIB).P(B) = P(BIA).P(A)
P(AIB) = [P(BIA).P(A)]/P(B) ------- BAYES RULE
P(AIB) is the probability of event A occurring, given that event B has already occurred.
Note that P(A|B) ≠ P(B|A).
4School of Engineering
Conditional Probability
Example
• Given that a heart is picked at random from a pack of 52 playing cards, find the probability that it is a picture heart card
• P (picture card | heart card)
= P (picture card ∩ heart card) / P (heart card)
=
= 3/13
52/13
52/3
5School of Engineering
Bayes’ Theorem: An ExampleAndy, Ben and Carrie pack biscuits in a factory. From the batch allotted to
them, Andy packs 55%, Ben 30% and Carrie 15%. The probability that Andy
breaks some biscuits in a packet is 0.7, for Ben it is 0.2, and, for Carrie the
probability is 0.1. What is the probability that a randomly selected packet
with broken biscuits is packed by Andy?
)(
)().|(
DP
APADP
Solution:
Let A be the event ‘the packet was packed by Andy’, B be the event ‘the packet
was packed by Ben’, C be the event ‘the packet was packed by Carrie’, and, D
be the event ‘the packet contains broken biscuits’.
Given P(A) = 0.55, P(B) = 0.3, P(C) = 0.15, P(D|A) = 0.7, P(D|B) = 0.2,
P(D|C) =0.1
We require P(A|D). Using Bayes’ Rule,
P(A|D) =
P(D) =P(D|A).P(A) + P(D|B).P(B) + P(D|C).P(C)
=(0.7)(0.55) + (0.2)(0.3) + (0.1)(0.15) = 0.46
=46.0
)55.0)(7.0(= 0.837
6School of Engineering
Probability Tree• The probability of the final outcome is given by the sum of the products of the probabilities
corresponding to each branch of the tree.
• Probability Tree can be used for both dependent and independent events
• Using the probability tree to solve the preceding example:
AB
C
0.55
0.3
0.15
0.7
0.3
0.2
0.8
0.1
0.9
Broken -
(0.55)(0.7)
Broken - (0.30)(0.2)
Broken - (0.15)(0.1)
P(AID) = )1.0)(15.0()2.0)(3.0()7.0)(55.0(
)7.0)(55.0(
= 0.837
Not Broken
Not Broken
Not Broken
7School of Engineering
Sequence of Dependent Events
Example
A bag contains eight green counters and three black counters. Two counters
are drawn, one after the other without replacement. Find the probability of
drawing one green and one black counter, in any order.
B
G
B
G
B
G
P(G1) = 8/11
P(G2|G1) = 7/10
P(B2|G1) = 3/10
P(B1) = 3/11
P(G2|B1) = 8/10
P(B2|B1) = 2/10
1st Draw
2nd Draw
P(Drawing 1G & 1B)
= (8/11)(3/10)
= 24/110 + 24/110
= 24/55
+ (3/11)(8/10)
8School of Engineering
Proposed Probability Tree Solution
Microcontroller
works
No: 0.06
Yes: 0.94
Yes: 0.9726
Actuator
works
Yes: 0.88
System
state
No: 0.12
Strong Sunlight,
and at least one
sensor works
Yes: 0.88
No: 0.12
No: 0.06
Yes: 0.94
Yes: 0.88
No: 0.12
Yes: 0.88
No: 0.12 0.000197
0.001447
0.003091
0.022665
0.007003
0.051353
0.804535
(Prob. of shade
working)
No: 0.30(0.24)(0.38)
= 0.0274
0.109709
9School of Engineering
Solution Analysis
Assumptions
1. Non-mutually exclusive events Failure of one
component does not preclude the failure of another.
Example: Any of the sensor and the actuator can fail
together.
2. Independent Failures The likelihood of a component
failing is not affected by the occurrence of other failures
Example: Probability of microcontroller working is the
same regardless of whether the sensor is functioning
10School of Engineering
Solution Analysis
From the Probability Tree:
• Probability of system working successfully on a sunny
day = 0.8045
• Probability of shade not lowering on a sunny day = 1 –
0.8045 = 0.1955
• P (system fails and sensor is faulty) = 0.0274
• P(system fails and only microcontroller is faulty) = 0.0513
• P(system fails and only actuator is faulty) = 0.1097
11School of Engineering
Solution Analysis
Conditional Probability:
• P(system fails | microcontroller is faulty) = 1
• P(system fails | sensor is working) = P(system fails and
sensor works)/ P(sensor works)
= 0.1681 / 0.9726 = 0.1728
12School of Engineering
Solution Analysis
Conditional Probability:
• P(only actuator is faulty | system fails) = P(only
actuator fails and system fails)/ P(system fails)
= 0.1097/(1-0.8045) = 0.5613
13School of Engineering
Learning Outcomes
• Conditional Probability
• Bayes’ Theorem
• Probability Tree
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P04 – MANY COMBINATIONS
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Many Combinations
You are a perfume connoisseur who is in charge of creating a new family of perfumes. The scent of a perfume is formed by different components known as notes. There are 3 types of notes, namely top note, middle note and base note. The different notes unfold over time when a perfume is applied and combine to describe the scent of the perfume. You have shortlisted a list of 16 aromatic compounds of which 3 will be used for top notes, 8 for middle notes and 5 for base notes. The new perfume is designed to have 6 different aromatic compounds and must contain at least one of each type of notes. If it takes the company 3 days to produce and test 100 different perfumes, how much time is required for testing all possible combinations of aromas? You may assume that each different perfume would contain a unique combination of compounds.
Illustrative Figures on Perfume Notes
School of Engineering
E214 Statistical Methods for Engineering
P04
Many Combinations
School of Engineering
E214 Statistical Methods for Engineering
(2)
Permutations
• A permutation is an ordered arrangement of distinct objects.
• One permutation differs from another if the order of arrangement differs or if the content differs.
• How many ways are there to arrange three boys –A, B, and C?
• The possible arrangements are ABC, ACB, BAC, BCA, CAB, CBA. There are six ways.
• Each arrangement is called a permutation.
School of Engineering
E214 Statistical Methods for Engineering
(3)
Permutation of n different objects
• For the first boy, we can choose from A, B or C (3 ways).
• Once the first boy is chosen, the second boy can be chosen from the 2 remaining boys (2 ways).
• The third boy has to be the remaining boy (1 way).
• Number of ways = 3 x 2 x 1 = 3! = 6
• Number of ways of arranging n different objects is n!
• n! = n(n-1)(n-2)….(3)(2)(1)
School of Engineering
E214 Statistical Methods for Engineering
(4)
Permutation of n distinct objects
Example:
• How many ways can the letters A, B, C and D be arranged?
Approach:
• The 1st letter can be chosen in 4 ways (either A or B or C or D)
• The 2nd letter can be chosen in 3 ways.
• The 3rd letter can be chosen in 2 ways.
• The 4th letter can be chosen in only 1 way.
• Therefore, number of ways of arranging 4 letters
= 4! = 24
School of Engineering
E214 Statistical Methods for Engineering
(5)
Permutation of n objects (not all distinct)
Example:
• If instead of the letters A, B, C, D, we have the letters A, A, A, D
Approach:• The 24 arrangements reduce to: AAAD, AADA, ADAA, DAAA
• The number of ways of arranging 4 objects, of which 3 are alike = 4! / 3! = 4
• The number of ways of arranging n objects of which p are of one type, q of another type, r of a third type and so on is
!...!!
!
rqp
n
School of Engineering
E214 Statistical Methods for Engineering
(6)
Permutation of r objects from n objects• Consider the number of ways of placing 3 of the letters A, B, C,
D, E, F, G in 3 empty spaces.
• The 1st space can be filled in 7 ways. The 2nd space can be filled in 6 ways. The 3rd space can be filled in 5 ways.
• Therefore, there are (7)(6)(5) = 210 ways of arranging 3 letters taken from 7 letters.
• Number of permutations of 3 objects taken from 7
= 7P3=(7)(6)(5) =
• Number of permutations of r objects taken from n different objects is nPr =
)!37(
!7
!4
!7
)1)(2)(3)(4(
)1)(2)(3)(4)(5)(6)(7(
)!(
!
rn
n
School of Engineering
E214 Statistical Methods for Engineering
(7)
Combinations• A combination is an arrangement of distinct objects where
one combination differs from another only if the content of the arrangement differs. Order does not matter.
• The number of combinations of n different objects taken r at a time, denoted by nCr is
Proof:
We are interested in determining the number of combinations when there are n distinct objects to be selected r at a time. Since the number of permutations was the number of ways to select r objects from the n and then permute the r objects, we note that nPr = r! nCr
Hence nCr = nPr / r! =
)!(!
!
rnr
n
)!(!
!
rnr
n
School of Engineering
E214 Statistical Methods for Engineering
(8)
Combinations: Example 1
• Four letters are chosen at random from the word RANDOMLY. Find the probability that all four letters chosen are consonants.
• Let S be the possibility space, then n(S) = 8C4 = 70
• Let E be the event ‘4 consonants are chosen’. As there are 6 consonants, n(E) = 6C4 = 15
• P(E) = n(E)/n(S) = 15/70 = 3/14
School of Engineering
E214 Statistical Methods for Engineering
(9)
Combinations: Example 2
• Suppose a box contains 8 chip processors, 3 of which are
defective. If 3 are sold at random, find the probabilities that:
– Exactly 2 are defective
– All 3 are defective
– At least 1 is defective
• Taking each chip as individual entities, we need to determine
all combinations when 3 chips are sold 8C3
Exactly 2 are defective
– Combination of 2 defective chips and 1 good chip 3C2 x 5C1
Hence, probability is 3C2 x 5C1/ 8C3 = (3x5) / 56 = 0.27
School of Engineering
E214 Statistical Methods for Engineering
(10)
Combinations: Example 2
All 3 are defective
– Combination of 3 defective chips 3C3
Hence, probability is 3C3/ 8C3= 1 / 56 = 0.018
At least 1 is defective
– This is the complement of no defective chips 1 – P(no defective chip)
– Combination of 3 good chips 5C3
Hence, probability is 1- 5C3/ 8C3 = 1- 10/56 = 0.82
School of Engineering
E214 Statistical Methods for Engineering
(11)
Proposed Solution
• The new perfume should have 6 different aromas
with at least 1 aroma in each note (top, middle and
base)
• Possible combinations are:
– 3 top notes + 2 middle + 1 base
– 3 top notes + 1 middle + 2 base
– 2 top notes + 3 middle + 1 base
– 2 top notes + 2 middle + 2 base
– 2 top notes + 1 middle + 3 base
…and so on. Total there are 9 different combinations of
top, middle and base notes.
School of Engineering
E214 Statistical Methods for Engineering
(12)
Proposed SolutionTotal number of ways to create the perfume:
• 3 top notes + 1 middle + 2 base = 3C3 x 8C1 x 5C2 = 140
• 3 top notes + 2 middle + 1 base = 3C3 x 8C2 x 5C1 = 80
• 2 top notes + 3 middle + 1 base = 3C2 x 8C3 x 5C1 = 840
• 2 top notes + 2 middle + 2 base = 3C2 x 8C2 x 5C2 = 840
• 2 top notes + 1 middle + 3 base = 3C2 x 8C1 x 5C3 = 240
• 1 top note + 4 middle + 1 base = 3C1 x 8C4 x 5C1 = 1050
• 1 top note + 3 middle + 2 base = 3C1 x 8C3 x 5C2 = 1680
• 1 top note + 2 middle + 3 base = 3C1 x 8C2 x 5C3 = 840
• 1 top note + 1 middle + 4 base = 3C1 x 8C1 x 5C4 = 120
• Total = 5830
• Time required = 5830/100*3 = 175 days
School of Engineering
E214 Statistical Methods for Engineering
(13)
Learning Outcomes
• Permutations
• Combinations
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P05 – Chance Winnings
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Chance Winnings
Entrusted with raising funds, and inspired by his recent holiday to Las Vegas, James determines that the fruit machine is a sure bet. It has after all earned the informal name of the one-armed bandit owing to its appearance and its ability to leave the gamer penniless. Wanting to raise as much funds as possible for the charity, James knows he must take into careful consideration the payout for each winning combination. If he pays out too much, he may end up making a loss over the three-day fund raiser. If the payout is too little, it might not generate any interest in playing the fruit machine. Having four windows, each showing at any one time either a lemon, an orange, an apple or cherries, the fruit machine has been configured to pay out when at least three windows show the same fruit. Knowing the probability of a window showing a particular fruit is 0.4 for lemons, 0.3 for oranges 0.2 for apples, 0.1 for cherries, how should James set the cost of each play and pay out to support the fund-raising?
School of Engineering
E214 Statistical Methods for Engineering
P05 – Chance Winnings
School of Engineering
E214 Statistical Methods for Engineering
(2)
Discrete Random Variable
• A random variable (r.v.) is a variable whose values are determined by chance.
• Random variables are denoted by capital letters (X, Y, etc.) to distinguish them from their possible values given in lower case x, y.
• Discrete random variables can take on only a finite number of values or an infinite number of values that can be counted.
• Example: A die is thrown 6 times. Let X = number of 5’s obtained.
– Then X is a discrete r.v. and x = 0, 1, 2, 3, 4, 5, 6
School of Engineering
E214 Statistical Methods for Engineering
(3)
Basic Properties of a pdf
• The probability distribution of a discrete r.v. X is a list of the possible values of X together with their probabilities
f(x) = P[X=x]
• The probability of each event in the sample space must be between or equal to 0 or 1:
0 ≤ P[X=x] ≤ 1 for all x
• The sum of the probabilities of all events in the sample space must equal 1:
xall
xXP_
1][
School of Engineering
E214 Statistical Methods for Engineering
(4)
Probability Density Function (pdf)
• A discrete PDF consists of all possible values that a discrete r.v. can take on, together with the associated probabilities.
• Example: Let X represent the outcomes when a fair die is tossed once. The pdf of X is:
x 1 2 3 4 5 6
P[X=x] 1/6 1/6 1/6 1/6 1/6 1/6
where P[X=x] means probability that the r.v. X takes
the value x.
Formula form: P[X=x] =1/6, for x=1,2,3,4,5,6
School of Engineering
E214 Statistical Methods for Engineering
(5)
Expected Value E(X)
Let X be a discrete random variable. Then the
expected value of X, also know as the mean of
the r.v., is denoted by E(X):
E(X) = xall
xXxP_
)(
School of Engineering
E214 Statistical Methods for Engineering
(6)
Example
• Find E(X) for the pdf of a single throw of a fair die.
Solution:
• E(X) =
= (1/6)(1) + (1/6)(2) + (1/6)(3) + (1/6)(4) + (1/6)(5)
+ (1/6)(6)
= 21/6 = 3.5
xall
xXxP
_
)(
x 1 2 3 4 5 6
P[X=x] 1/6 1/6 1/6 1/6 1/6 1/6
School of Engineering
E214 Statistical Methods for Engineering
(7)
Properties of E(X)
• E(a) = a
• E(aX) = aE(X)
• E(aX + b) = aE(X) + b
• E[f(X) ± g(X)] = E[f(X)] ± E[g(X)]
where a and b are constant values
School of Engineering
E214 Statistical Methods for Engineering
(8)
Example
The r.v. X has pdf P(X=x) for x = 1, 2, 3.
Calculate E(3), E(X), E(5X), E(5X + 3), E(X2), E(4X2 – 3)
E(3) = ∑all x 3P(X=x) = 3(0.1) + 3(0.6) + 3(0.3) = 3
E(X) = ∑all x xP(X=x) = 1(0.1) + 2(0.6) + 3(0.3) = 2.2
x 1 2 3
P(X = x) 0.1 0.6 0.3
School of Engineering
E214 Statistical Methods for Engineering
(9)
Example
E(5X) = ∑all x 5xP(X=x) = 5(0.1) + 10(0.6) + 15(0.3)
= 11 = 5E(X)
E(5X + 3) = ∑all x (5x + 3)P(X=x)
= 8(0.1) + 13(0.6) + 18(0.3) = 14 = 5E(X) + 3
E(X2) = ∑all x x2P(X=x) = 1(0.1) + 4(0.6) + 9(0.3) = 5.2
E(4X2-3) = ∑all x (4x2-3)P(X=x) = 1(0.1) + 13(0.6) + 33(0.3) = 17.8 = 4E(X2) - 3
School of Engineering
E214 Statistical Methods for Engineering
(10)
Variance, Var(X)
The variance of a discrete r.v. X measures the spread
or deviation of the r.v. about its mean value. It is
denoted by Var(X) or σ2:
Var(X) = E(X - µ)2
= E(X2 - 2µX + µ2) = E(X2)- 2µE(X) + E(µ2)
= E(X2)- 2µ2 + µ2
= E(X2) - µ2
= E(X2)- [E(X)]2
School of Engineering
E214 Statistical Methods for Engineering
(11)
Example
The r.v. X has pdf shown below:
Find Var(X).
E(X) = 1(0.1) + 2(0.3) + 3(0.2) + 4(0.3) + 5(0.1) = 3
E(X2) = 1(0.1) + 4(0.3) + 9(0.2) + 16(0.3) + 25(0.1)
= 10.4
Var(X) = E(X2) – [E(X)]2 = 10.4 – 32 = 1.4
x 1 2 3 4 5
P(X = x) 0.1 0.3 0.2 0.3 0.1
School of Engineering
E214 Statistical Methods for Engineering
(12)
Properties of Var(X)
Var(c) = 0, where c is any constant
Var(cX) = c2Var(X)
Var(cX + d) = c2Var(X), where d is a constant
Proof:
Var(cX) = E(c2X2) – [E(cX)]2 = c2E(X2) – [cE(X)]2
= c2E(X2) – c2[E(X)]2 = c2(E(X2) – [E(X)]2)
= c2Var(X)
School of Engineering
E214 Statistical Methods for Engineering
(13)
Discussion for Today’s Problem
Winning
Combination
3 Lemons 3 Oranges 3 Apples 3 Cherries
Payout $1 $2 $3 $5
Winning
Combination
4 Lemons 4 Oranges 4 Apples 4 Cherries
Payout $3 $4 $6 $12
School of Engineering
E214 Statistical Methods for Engineering
(14)
Discussion for Today’s Problem
P(lemons) = 0.4, P(oranges) = 0.3,
P(apples) = 0.2, P(cherries) = 0.1
Calculations for:
3 fruits of the same kind:
e.g. P(3 lemons) = 4C3 (0.4)3 (1-0.4)1 = 0.1536
4 fruits of the same kind
e.g. P(4 oranges) = (0.3)4 = 0.0081
P(James wins) = 1 – P(No winning combinations appear)
School of Engineering
E214 Statistical Methods for Engineering
(15)
Discussion for Today’s Problem
x $1(no win)
$0(3 lemons)
-$1(3 oranges)
-$2(3 apples)
-$4(3 cherries)
P(X=x) 0.7062 0.1536 0.0756 0.0256 0.0036
x -$2(4 lemons)
-$3(4 oranges)
-$5(4 apples)
-$11(4 cherries)
P(X=x) 0.0256 0.0081 0.0016 0.0001
Let the cost of one play be $1 and $x be James’ profit per
play.
Expected profit per play, E(X) = $0.480
Variance of profit, Var(X) = 0.938
School of Engineering
E214 Statistical Methods for Engineering
(16)
Discussion for Today’s Problem
By doubling the payout and doubling the price to
play, the expected value of James’ profit will double
and the variance will increase by four times.
• E(2X) = 2E(X) = $0.961
• Var(2X) = 4Var(X) = 3.754
• James will want to increase his expected profit and reduce
the variance so that his earnings will be more certain.
• This can be achieved by reducing the number of winning
combinations (e.g. no win for 3 lemons), increasing the cost
of play and/or reducing the payout for the 3 lemons winning
combination
School of Engineering
E214 Statistical Methods for Engineering
(17)
Learning Outcomes
• Discrete Random Variable
• Probability Density Function (pdf)
• Expectation
• Variance
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P06 – UNDERWEIGHT OR NOT
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Underweight or Not You are a purchaser in a food company. Recently, your supplier for frozen chicken fillet has been bought over by a competitor and the company initiated a major change in the packaging and production method. Even though the agreement for the supply of the fillet remains unchanged, you are concerned that the amount of fillet in the new packaging may be different. One day, you carried out a sampling check on 40 packets of chicken fillet and collected the following data:
P6 Data.xls
Past records show that on the average, out of 100 packets of chicken fillet, 16 packets were underweight. Your company wants to know if the claim by the supplier that the weight of the packet remains unchanged is substantiated by the data. How do you decide from the data collected? If similar checks were to be carried out in the future, what acceptance criteria should be used?
School of Engineering
E214 Statistical Methods for Engineering
P06 – Underweight or Not
School of Engineering
E214 Statistical Methods for Engineering
(2)
Probability Problems with 2 Outcomes
• Many types of probability problems have only two outcomes or can be reduced to two outcomes.
• For example:– When a coin is tossed, it can land heads or tails.
– When a baby is born, it will be male or female.
– In an examination, you either pass or fail.
• Situations that can be reduced to 2 outcomes:– A medical treatment can be classified as effective or ineffective.
– A person can be classified a having normal or abnormal blood pressure, depending on the measure of the blood pressure.
– A multiple-choice question response, although there are 4 or 5 answer choices, can be classified as correct or incorrect.
• Situations like these are called binomial experiments.
School of Engineering
E214 Statistical Methods for Engineering
(3)
Binomial Experiment
• A binomial experiment is an experiment that satisfies the following properties:
– Experiment consists of n repeated trials.
– Each trial has two possible outcomes: success or failure.
– Probability of success, denoted by p, is the same in each trial.
– Repeated trials are independent.
• Outcomes of a binomial experiment and the corresponding probabilities of these outcomes are called a binomial distribution.
School of Engineering
E214 Statistical Methods for Engineering
(4)
Binomial Distribution
• Let X be the number of successes in n trials of a binomial experiment
• X is called a binomial random variable with pdf given by:
P(X = r) = nCrpr(1-p)n-r, where r = 0,1,2,…,n
• p is the probability of success.
• It can also be expressed as X ~ B(n,p).
School of Engineering
E214 Statistical Methods for Engineering
Example: Positive and Negatively Skewed
Binomial Distribution
(5)
Right-skewed Left-skewed
School of Engineering
E214 Statistical Methods for Engineering
(6)
An Example
A coin is tossed three times. Find the probability of getting exactly two heads.
– This problem can be solved by looking at the sample space:
HHH, HHT, HTH, THH, TTH, THT, HTT, TTT
– There are 3 ways to get 2 heads, therefore,
• P(exactly 2 heads) = 3/8 = 0.375
School of Engineering
E214 Statistical Methods for Engineering
(7)
An Example – Coin Toss, P(2 heads)
• Consists of three trials (tosses)
• Each trial has only two possible outcomes: heads or tails
• Probability of success (head) = 0.5 for each trial
• Outcomes are independent of each other (the outcome of one toss does not affect the outcome of the other tosses)
Solution:
• Applying Binomial Distribution, let X be the random variable representing the number of heads
• X ~ B(3, 0.5)
• P(X = 2) = 3C2(0.5)2(0.5)1 = 0.375
School of Engineering
E214 Statistical Methods for Engineering
(8)
An Example
There are five multiple choice questions in a test. Each question has five possible choices. If a student randomly guesses the answers to all five questions, find the probability that he gets exactly three correct.
Solution:
Let X be the r.v. representing the number of correct answers.
X ~ B(5, 0.2)
P(X = 3) = 5C3(0.2)3(0.8)2 = 0.0512
School of Engineering
E214 Statistical Methods for Engineering
(9)
Mean and Variance for Binomial Distribution
Let X ~ B(n,p), then
E(X) = µ = np
Var(X) = σ2 = np(1-p)
School of Engineering
E214 Statistical Methods for Engineering
(10)
An Example
A die is rolled 480 times. Find the mean, variance and standard deviation of the number of 2’s obtained.
Solution:
Let X be the r.v. representing the number of 2’s obtained.
µ = np = 480 x 1/6 = 80
σ2 = np(1-p) = 480 x 1/6 x 5/6 = 66.67
σ = sqrt[np(1-p)] = sqrt(66.67) = 8.16
School of Engineering
E214 Statistical Methods for Engineering
(11)
Today’s Problem
Let X be the random variable representing the number of underweight
fillets. Assuming probability of the fillet being underweight is the same
for all packets and the result of each weighing is independent,
X ~ binomial (40, 16/100)
Average number of underweight fillets in a sample of 40 is
E(X) = 0.16*40 = 6.4
The variance is Var(X) = 5.38 and Standard Deviation = 2.32. This
means that most of the checks should yield between 4 and 9
underweight packets.
Using Excel, work out the probability of each x (x is from 0 to 40) using:
BINOMDIST(x, 40, 16/100, 0)
School of Engineering
E214 Statistical Methods for Engineering
(12)
Today’s Problem
From the graph, it is seen that the probability of getting exactly 8
packets of underweight fillets from the check is 0.125. However, we
cannot make our decision on this probability.
School of Engineering
E214 Statistical Methods for Engineering
(13)
Today’s Problem
It is more useful to determine the cumulative probability in setting the
acceptance criterion. For example, if the acceptance criterion is 9 or
less, then it means that the probability of getting more than 9 packets of
underweight fillets is 1-0.9= 0.1, which is unlikely.
x P(X<=x)
0 0.00
1 0.01
2 0.03
3 0.10
4 0.21
5 0.37
6 0.54
7 0.70
8 0.82
9 0.90
10 0.95
11 0.98
12 0.99
13 1.00
School of Engineering
E214 Statistical Methods for Engineering
(14)
Learning Outcomes
Binomial Distribution
– Properties
– Probabilities
– Mean
– Variance
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P07 – ENOUGH AMBULANCES?
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Enough Ambulances? A Straits Times article on Jan 28, 2009 reported that Singapore Civil Service Defence Force (SCDF) is planning to open up its emergency ambulance service to the private sector. This is in response to the growing number of emergency calls that is expected to further increase with a growing and aging population. SCDF wants to add ten more private ambulances to its current fleet of forty emergency ambulances. According to the report, a total of 111,127 emergency calls were made to SCDF last year, 9 per cent more than the calls received in 2007. The number of prank calls to the emergency lines also went up to at least 11 calls per day. Being a statistics student, you are naturally curious about the numbers. Doing a quick mental calculation, you worked out that the average number of calls made daily is 316 including prank calls. However, knowing that incidences of emergencies occur randomly, you wonder if adding ten more ambulances will be sufficient for SCDF’s needs. How do you think the statistical nature of the problem is considered in the planning? Assuming that your team is consultant to SCDF, conduct a study based on the numbers given, make reasonable assumptions and present your findings.
School of Engineering
E214 Statistical Methods for Engineering
P07 – Enough Ambulances?
School of Engineering
E214 Statistical Methods for Engineering
(2)
Recall: Probability Distribution
• A probability distribution lists all the outcomes of an experiment and the probabilities associated with each outcome.
• It describes the likelihood of some future event.
• Two important characteristics of a probability distribution are:– The probability of a particular outcome is between 0
and 1, inclusive.
– The sum of the probabilities of all mutually exclusive events is 1.0.
School of Engineering
E214 Statistical Methods for Engineering
(3)
Recall: Random Variable
• In any experiment of chance, the outcomes
occur randomly. These quantities are called
Random Variables.
• Random Variables can be Discrete or
Continuous.
– Discrete random variables can assume only certain
clearly separated values (countable).
– Continuous random variables can assume one of an
infinitely large number of values (measurable)
School of Engineering
E214 Statistical Methods for Engineering
(4)
Poisson Distribution
• Poisson probability distribution describes the number of times some event occurs during a specified interval.
• Interval may be time, distance, area, or volume.
• Poisson distribution is based on two assumptions:– Probability of a “success” is proportional to the length of the
interval
– Intervals are independent
• The longer the interval, the larger the probability and the number of occurrences in one interval does not affect the other intervals
• It is a discrete probability distribution because it is formed by counting.
School of Engineering
E214 Statistical Methods for Engineering
(5)
Poisson Random Variable
Examples of Poisson Random Variables:
• Number of people who arrived at a hospital emergency
room in 1-hour interval
• No of customers queuing up at a POSB bank counter
• Number of flaws (cracks and deep scratches) in an area
of ceramic flooring in a newly built HDB flat
Interval
In a Poisson process, events
occur at random in an interval
School of Engineering
E214 Statistical Methods for Engineering
(6)
Poisson Distribution
!)(
x
exP
x
The Poisson Distribution is given by the formula:
Where:
is the mean number of occurrences (successes) in a particular interval
x is the number of occurrences (successes)
e is the constant 2.71828 (base of the natural logarithm)
P(x) is the probability for a specified value of x
When X is a Poisson variable, we write
X~Poisson(), or,
X ~ Po()
for x = 0,1,2,3…
School of Engineering
E214 Statistical Methods for Engineering
(7)
Characteristics of Poisson Distribution
Expected Value:
E(X) =
Variance:Var(X) = np(1 – p)
= (1 – p)
As p tends to zero, then Var(X) =
Poisson Distribution has the same Expected Value and Variance.
School of Engineering
E214 Statistical Methods for Engineering
(8)
Approximation of Poisson Distribution
• Poisson Distribution can be used to approximate binomial distribution B(n,p) when n is large and p is small
• 2 general rules-of-thumb:
– n≥20 and p≤0.1 or
– n≥100 and np≤10
School of Engineering
E214 Statistical Methods for Engineering
(9)
Example
Given = 5 cars arriving in a 5-minute period,
• Probability of 8 cars arriving in 5 minute period,
P(X=8) = (58 x e-5)/8! = 0.065
• Probability of more than 6 cars arriving in 5 minute period,
P(X>6) = 1-P(X<=6) = 0.24
[Using Excel, 1- Poisson(6,5,1)]
• Mean number of cars arriving in 1 hour = 12 x 5 = 60
School of Engineering
E214 Statistical Methods for Engineering
(10)
Example
0
0.05
0.1
0.15
0.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Poisson probability distribution (= 5)
School of Engineering
E214 Statistical Methods for Engineering
(11)
Today’s Problem
• 111,127 calls are received in 2008. Projecting a 10% increase, the number of calls in 2009 would be 122,240.
• Average no. of calls in a day is 335+12(prank calls)= 347
• Average calls per hour is 14.5
• A few assumptions are required:– The average duration an ambulance is engaged during a call is 1
hour
– The distribution of calls throughout the day is not uniform. Assume that there is a peak hour each day and that the number of calls during the peak hour is 2 times the average no. of calls, i.e. 29
– All 40 ambulances are available at all times (together with the associated manpower and equipment)
School of Engineering
E214 Statistical Methods for Engineering
(12)
Today’s Problem
• Are the number of ambulances sufficient? To answer this question, we have to calculate the probability that there will be more than 40 calls made in 1 hour:
• From Excel, Poisson(40,29,1) = 0.979
• Hence there is approximately 2% chance of running out of ambulances
• This may seem low but if we assume that the peak hour occurs everyday, then in one year, there is more than 7 incidents where there is a shortage of ambulances. It could mean 7 lives lost!
)40(1)40( XPXP
School of Engineering
E214 Statistical Methods for Engineering
(13)
Today’s Problem
• If 10 ambulances are added, from Excel, Poisson(50,29,1) = 0.9999. There is almost zero chance of shortage.
• In this case, will there be too many ambulances? What percent risk is acceptable?
• Do bear in mind this is a statistical exercise (see footnote). Other factors that should be considered include the availability of manpower and equipment, the reliability of vehicle, the response time requirement, the likelihood of a disease outbreak, cost involved, etc. Can you think of any others?
School of Engineering
E214 Statistical Methods for Engineering
(14)
Learning Outcomes
• Poisson Probability Distribution
• Poisson Random Variables
• Characteristics of a Poisson Distribution
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P08 – OF PISTONS AND CYLINDERS
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Of Pistons and Cylinders
You are an engineer working for an engine manufacturer. Your company has received a few complaints from customers about a recently launched engine. The engine sometimes does not perform to specifications and fails to deliver the stated torque. An investigation reveals that the cause is sub-performance of the main piston. In order to perform optimally, strict specifications require that the gap between the piston and cylinder be between 0.12 and 0.40mm. After eliminating the likelihood of problems in the assembly process, you turn your attention to the dimensions of the piston and cylinder. A request is made to the supplier of the components for the exact diameter measurements of all the pistons and cylinders delivered so far.
Data.xls
What can you conclude from the data? Suppose the supplier offers a new type of piston that is touted to deliver better performance. Your colleague John asked you to conduct a sampling check on the diameter of the new pistons. You need to measure the diameters for 10 pistons and determine the mean diameter. John said that if the mean falls within one standard deviation (of the mean of the old pistons), the new pistons should be accepted. Is John’s approach correct?
School of Engineering
E214 Statistical Methods for Engineering
P08 – Of Pistons and Cylinders
School of Engineering
E214 Statistical Methods for Engineering
(2)
Why Study Normal Distribution?
• Certain probability densities have so much importance
in statistics that areas under the curve have been
tabulated for future reference.
• One such distribution is the Normal, or Bell-shaped,
Distribution.
• This distribution is useful for describing variability in
industrial measurements such as lengths or weights.
• Natural variation in living organisms and their
characteristics also tend to follow a Normal
Distribution.
School of Engineering
E214 Statistical Methods for Engineering
(3)
Properties of Normal Distribution Curve
• The normal distribution curve is bell-shaped.
• The mean, median and mode are equal and located at the centre of the
distribution.
• The curve is symmetrical about the mean.
• The standard deviation () specifies the amount of dispersion around the
mean.
• Two parameters and completely define a normal distribution curve.
• The further away from the mean the curve moves, the closer it gets to
the x-axis but it never touches.
• The curve is represented by the formula :
Where μ = mean, σ = std deviation, e = 2.718282, -∞<x< ∞
School of Engineering
E214 Statistical Methods for Engineering
(4)
Histogram for blood pressure measurements
for sample of 118 men
0.1
.2. 3
.4.5
Perc
enta
ge o
f M
en
80 100 120 140 160
Systolic BP (mmHg)
School of Engineering
E214 Statistical Methods for Engineering
(5)
Histogram for blood pressure measurements
for sample of 5000 men
0.1
.2.3
Perc
enta
ge o
f M
en
80 100 120 140
Blood Pressure (mmHg)
School of Engineering
E214 Statistical Methods for Engineering
(6)
Examples of Normal Distribution Curves
Typical normal distribution
with mean=5 and variance=1
Two normal distributions with different
mean values and same variance
Two normal distributions with different variances
and the same mean
School of Engineering
E214 Statistical Methods for Engineering
(7)
The Normal Distribution • Each normal density curve is completely defined by two parameters
– mean (average), represented by μ, and,
– standard deviation, represented by σ.
School of Engineering
E214 Statistical Methods for Engineering
(8)
Interpreting the Normal Curve
a b
• Probability = Area under the curve = shaded region
• P (a<X<b) = area under the curve between a and b
Area under the curve is obtained using NORMDIST in Excel
or
In case of a Standard Normal Distribution, NORMSDIST.
School of Engineering
E214 Statistical Methods for Engineering
(9)
Standard Normal Distribution
• Standard Normal Distribution is a distribution with mean 0 and variance 1
• It is represented by the standard normal variable Z where
Z = (X - µ) / σ
School of Engineering
E214 Statistical Methods for Engineering
Linear Combinations of Normal R.V.
• Linear combinations of normal random variables are also
normally distributed
Linear Functions
• If X ~ N(µ,σ2) and a and b are constants, then
Y = aX + b ~ N(aµ+b, a2σ2)
Sum of Two Independent Normal R.V.
• If X1 ~ N(µ1,σ12) and X2 ~ N(µ2,σ2
2) are independent
r.v., then
Y = X1+X2 ~ N(µ1+µ2, σ12+σ2
2)(10)
School of Engineering
E214 Statistical Methods for Engineering
Sampling
Sampling is a process of selecting a subset of data from the
population
Reasons for Sampling: Time Constraints
Cost Constraints
Impossibility of a Census
Population is infinite
Measuring process is destructive
School of Engineering
E214 Statistical Methods for Engineering
Take k samples each of size n and calculate the sample mean for each
sample. Using these sample means, a distribution known as sampling
distribution of the mean can be obtained.
Sample Observations Sample Mean
1 X1,1 X1,2 …… X1,n 1
2 X2,1 X2,2 …… X2,n
2
k Xk,1 Xk,2 …… Xk,n k
Essentially, sampling distribution is the distribution of values for a sample
statistic obtained from repeated samples, all of the same size and all
drawn from the same population.
Sampling Distribution of Sample Mean
x
x
x
School of Engineering
E214 Statistical Methods for Engineering
Let X1, X2,…, Xn denote a random sample selected from a population having mean µ and variance σ2.
Central Limit Theorem states that as sample size n increases (i.e. n ≥
30), the sampling distribution of the sample mean will:
1. Have a mean µ = µ
2. Have a standard deviation σ = σ / √n
3. Be approximately normally distributed
• The sampling distribution has a normal distribution if the population is
normally distributed. For other types of population, it will approximate
a normal distribution when n is large (rule of thumb, n ≥ 30)
• The standard deviation of the sample mean is known as the standard
error of the sample mean and is an indication of the accuracy of the
estimating the ‘true’ mean with sample mean
Central Limit Theorem
x
x
x
School of Engineering
E214 Statistical Methods for Engineering
Central Limit Theorem (CLT)Normal Uniform Exponential
Population Distribution
Sample of n=2
Sample of n=5
Sample of n=30
Significance of CLT
It permits us to use
sample statistics to
make inferences about
the population
parameters without
knowing anything about
the specific shape of the
population distribution.
School of Engineering
E214 Statistical Methods for Engineering
(15)
Today’s Problem
• Let the gap between piston and cylinder be
Y = X2 – X1
• It follows that Y would be normally distributed as it is a linear combination of normal random variables X1 and X2.
• Mean µY = µX2 – µX1 = 25.26 – 25.00 = 0.26
• Variance σY2 = σX2
2 + σX12 = 0.062 + 0.082 = 0.0094
• Hence Y ~ N(0.26, 0.0094)
School of Engineering
E214 Statistical Methods for Engineering
(16)
Today’s Problem
• Probability that a piston will not fit in a cylinder is when the gap is less than zero,
P(Y<0) = P(z<(0- µY)/√σY)
= P(z<(0-0.26)/√0.0094) = P(z<-2.67)
= 0.0037
• Probability that a piston will perform optimally is when gap is between 0.12mm and 0.40mm,
P(0.12<Y<0.40) = P [(0.12-µY)/√σY< z <(0.40-µY)/√σY)]
= P [(0.12-0.26)/√0.0094 < z < (0.40-0.26)/√0.0094]
= P(z<1.44) – P(z<-1.44) = 0.851
School of Engineering
E214 Statistical Methods for Engineering
(17)
Today’s Problem
• Assuming that diameters of the new pistons have the same distribution as the old ones, distribution of the sample mean can be written as:
~ N (µ, σ2/10)
• If underlying distribution of the new piston diameters is unknown or cannot be assumed as normal, then we can increase the sample size to 30 or more pistons so that by Central Limit Theorem, the sample mean is approximated to be normally distributed.
X
School of Engineering
E214 Statistical Methods for Engineering
(18)
Today’s Problem
• The diameters of the piston within the interval [24.94, 25.06] represents one standard deviation from the mean.
• Probability that the sample mean lies within one s.d. is
P(µ-σ< <µ+σ)
= P [(µ-σ-µ)/√σ2/10 < z < (µ+σ-µ)/√σ2/10]
= P(z<0.06/(√0.062/10)) – P(z<-0.06/(√0.062/10))
= P(z<3.16) – P(z<-3.16)
= 0.9984
x
1 S.D.
Population
Distribution
Distribution of
Sample Mean
School of Engineering
E214 Statistical Methods for Engineering
(19)
Today’s Problem
• Although there is a 68% chance that the diameters of the old pistons lie within one standard deviation of the mean, the average diameter of the ten new pistons has a 99.8% chance of falling within the interval if both have the same distribution.
• The acceptance criteria should be based on comparison with population mean and not the distribution of the population.
• Hypothesis testing of the mean should be conducted to check whether the diameter of the new pistons is the same.
School of Engineering
E214 Statistical Methods for Engineering
(20)
Learning Outcomes
• Properties of a Normal Distribution Curve
• Standard Normal Distribution
– Standard Normal Variable z
– Applications of the Standard Normal
Distribution Table
• Central Limit Theorem
• Sampling Distribution of Sample Mean
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P09 – CASE OR NO CASE
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Case or No Case
A consumer product company, A & B, has being producing its popular foam facial wash for the past 10 years. A & B has always been proud of its ability to provide consumers with exciting foam height per pump of its liquid – to – foam facial foam. In a recent advertising campaign, A & B’s endorsing artiste has mentioned on national TV that the average foam height is 75 millimeters. Foam height is approximately normally distributed and has a standard deviation of 5 millimeters. Vivian, an avid blogger, has been intrigued by A & B’s claim. She decided to conduct her own experiment to challenge that the average foam height is not 75 millimeters. Vivian obtained 50 foam height data as attached below:
P09_Foam Height Data_AllTeams.xlsx
Vivian looked at the data and analyzed that she can reject A & B’s claim. Thus she wrote about her findings in her blog. Vivian wondered if her analysis will be affected if she is unaware of the standard deviation and is there an estimated sample size for a predefined error. Is Vivian’s conclusion valid? How will you help to address Vivian’s doubts?
Illustrative Figure on foam height
School of Engineering
E214 Statistical Methods for Engineering
P09
Case or No Case
School of Engineering
E214 Statistical Methods for Engineering
(2)
Statistical Hypotheses
• Many problems in daily life require that we decide whether to accept or reject a statement about some parameter
• The statement is called a hypothesis, and the decision-making procedure about the hypothesis is called hypothesis testing.
• A hypothesis is thus a claim or statement about a property of a population.
School of Engineering
E214 Statistical Methods for Engineering
(3)
Terms used
• Significance Level is the probability of making a decision to reject the null hypothesis when the null hypothesis is actually true.
• Critical Region is the set of values for which we reject the null hypothesis.
• Critical Values determine the boundary between a decision whether or not to reject the null hypothesis.
School of Engineering
E214 Statistical Methods for Engineering
(4)
Forming the Hypothesis
• Manufacturer’s claim– The average foam height per pump is 75 mm.
– This claim is commonly referred to as the null hypothesis,H0.
– The null hypothesis is presumed true unless we have enough evidence to reject it.
• Blogger’s suspicion
– The average foam height per pump is NOT 75 mm.
– This is commonly referred to as the alternative hypothesis, H1.
School of Engineering
E214 Statistical Methods for Engineering
(5)
Null and Alternative Hypothesis
• Null Hypothesis– The null hypothesis is a statement of the value of a
population parameter.
– It tests whether the sample mean is the same as the population mean:
H0 : μ = 75
• Alternative Hypothesis– The alternative hypothesis (denoted by H1) is the
statement that must be true if the null hypothesis is false.
H1 : μ ≠ 75
– This is a two – tailed test.
School of Engineering
E214 Statistical Methods for Engineering
(6)
One – Tailed and Two – Tailed Test
• Example:
Hypothesis Testing Problem Null and Alternative Hypothesis
Mean burn rate is not 50 cm/s H0 : μ = 50
H1 : μ ≠ 50
Mean burn rate is less than 50 cm/s H0 : μ = 50
H1 : μ < 50
Mean burn rate is more than 50 cm/s H0 : μ = 50
H1 : μ > 50
School of Engineering
E214 Statistical Methods for Engineering
(7)
One – Tailed Test
• One – Tailed test– In such a test, the critical region is in the region of the
inequality of the alternative hypothesis
(i.e. < will be left, > will be right).
Null and Alternative
Hypothesis
Acceptance Region
H0 : μ = 50
H1 : μ < 50
H0 : μ = 175
H1 : μ > 175
School of Engineering
E214 Statistical Methods for Engineering
(8)
Two – Tailed Test
• Two – Tailed Test– In such a test, the critical region is split into two parts,
with (usually) equal probability placed in each tail of the distribution of the test statistic.
Null and Alternative
Hypothesis
Acceptance Region
H0 : μ = 50
H1 : μ ≠ 50
School of Engineering
E214 Statistical Methods for Engineering
(9)
Test Statistic
Variance
Known?
Sampling
Distribution
Test Test Statistic
Known
(Or Large
Sample
Size)
Normal
Distribution
Z – Test
Unknown 1 Sample
t – Distribution
(n-1 Degree of
freedom)
1 Sample t - Test
• Hypothesis Tests on Mean– When conducting hypothesis testing on mean of a normally
distributed population, variance can be either known or unknown, resulting in different estimated sample distribution.
ns
xt
n
xz
School of Engineering
E214 Statistical Methods for Engineering
(10)
Critical Region and Value
• The critical region is the set of all values of the test statistic that
would cause rejection of the null hypothesis.
• The critical value is the value separating the critical region from
the values of the test statistic that would not lead to rejection of
the null hypothesis.
Significance
LevelOne – Tailed
Test
Two – Tailed
Test
5%(95%
Confidence
Level)
-1.645 -1.96 1.96
School of Engineering
E214 Statistical Methods for Engineering
(11)
Critical Region and ValueSignificance
LevelOne – Tailed
Test
5%(95%
Confidence
Level)
-1.645
= NORMSINV(0.05) = -1.645
School of Engineering
E214 Statistical Methods for Engineering
(12)
Critical Region and ValueSignificance
LevelTwo – Tailed
Test
5%(95%
Confidence
Level)
2
-1.96 1.96
= NORMSINV(0.025) = -1.96
= NORMSINV(0.975) = 1.96
School of Engineering
E214 Statistical Methods for Engineering
(13)
Test Statistic: Example 1
• Suppose we are interested in the burn rate of a solid propellant
used to power aircrew escape systems, where it has been claimed
that the mean burn rate is 50 cm/s and hypothesis test problem is
mean burn rate not 50 cm/s.
– Given = 2.5 mm/sec; n = 50 (large sample size, thus Normal
sampling distribution); = 50.25 cm/s; 95% Confidence Level
Sol:
H0: µ = 50 cm/s
H1: µ ≠ 50 cm/s
x
x
-1.96 1.96
Normal Sampling Distribution:
Since z = 0.707 is in the acceptance
region, we will not reject H0.
n
xz
707.0505.2
5025.50
z
At 95% CI
School of Engineering
E214 Statistical Methods for Engineering
(14)
Test Statistic: Example 2
• Suppose we are interested in the burn rate of a solid propellant
used to power aircrew escape systems, where it has been claimed
that the mean burn rate is 50 cm/s and hypothesis test problem is
mean burn rate not 50 cm/s.
– Given = 2.5 mm/sec; n = 50 (large sample size, thus Normal
sampling distribution); = 46.55 cm/s; 95% Confidence Level
Sol:
H0: µ = 50 cm/s
H1: µ ≠ 50 cm/s
x
x
-1.96 1.96
At 95% CI
Normal Sampling Distribution:
Since z = -9.76 is NOT in the
acceptance region, we will reject H0.
n
xz
76.9505.2
5055.46
z
School of Engineering
E214 Statistical Methods for Engineering
(15)
Proposed Solution (Known Variance)
• Hypothesis:
– H0: µ = 75 mm
– H1: µ ≠ 75 mm
-1.96 1.96
At 95% CI
Normal Sampling Distribution:
Since z = 7.382 is NOT in the
acceptance region, we will reject H0.
n
xz
382.7505
7522.08
z
School of Engineering
E214 Statistical Methods for Engineering
(16)
Proposed Solution (Unknown Variance)
• Hypothesis:
– H0: µ = 75 mm
– H1: µ ≠ 75 mm
At 95% CI, n-1= 49
degree of freedom
1 Sample t-Test Sampling Distribution:
Since t = 6.28 is NOT in the acceptance
region, we will reject H0.
ns
xt
28.650877.5
7522.08
t
-2.01 2.01 Can also use Excel function TINV(0.1,49)
School of Engineering
E214 Statistical Methods for Engineering
(17)
Proposed Solution (Estimate Sample Size)
• Hypothesis:
– Given: = 0.1, E = 1.5 mm, = 5,
2
2/
E
zn
31
2
5.1
5)645.1(
School of Engineering
E214 Statistical Methods for Engineering
(18)
Proposed Solution (HT Methodology)
From Problem, identify parameter of interest
State Null Hypothesis, H0
Specify appropriate alternative hypothesis, H1
Choose a significance level,
Determine an appropriate test statistic
State the rejection region for the statistic
Decide whether or not H0 should be rejected and report in
problem contextss
Compute any necessary sample quantities, substitute
these into the equation for the test statistic, and compute
that value
School of Engineering
E214 Statistical Methods for Engineering
(19)
Learning Outcomes
• Hypothesis Testing– Null and Alternative Hypothesis (One-tailed and Two-
tailed)
– Significance Level
– Test Statistic
– Methodology
• Z – Test
• 1 Sample t – Test
• Estimate Sample Size
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P10 – CHARGED OR RECHARGE
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Charged or Recharge
The advertised claim for ABC batteries for mobile phones is set at 48 operating hours, with proper charging procedures. A study of 5000 batteries is carried out and 7 stop operating prior to 48 hours. Do these experimental results support the claim that less than 0.2 percent of the company’s batteries will fail during the advertised time period, with proper charging procedures?
Perform a hypothesis – testing procedure with = 0.05, discussing the errors that could arise when a wrong decision is made from the result. Can you also estimate the confidence interval for which the mean percent of batteries will fail, based on the experimental results? What is the relationship between confidence Interval and hypothesis testing outcome?
School of Engineering
E214 Statistical Methods for Engineering
P10
Charged or Recharge
School of Engineering
E214 Statistical Methods for Engineering
(2)
Proportions
• Proportions provide useful information in summary
• Hypothesis testing can be applied not only to absolute data (such as sample mean), but also for population proportions.
School of Engineering
E214 Statistical Methods for Engineering
(3)
Significance of Proportions
School of Engineering
E214 Statistical Methods for Engineering
(4)
Assumptions
Certain assumptions must be made when testing a claim about a population proportion, probability or percentage:
1. The conditions for a binomial experiment are satisfied.
That is, there are a fixed number of independent trials having constant probabilities, and each trial has only two possible outcomes.
2. The conditions np ≥ 5 and nq ≥ 5 are both satisfied, so that the binomial distribution of sample proportions can be approximated by a normal distribution with = npand npq
School of Engineering
E214 Statistical Methods for Engineering
n
pq
pp ˆ
p̂n
x
Terms used
Notations used for hypothesis testing of
one proportion: n = number of trials
= (sample proportion), where x is the number of
‘success’ considered
p = population proportion
q = 1 – p
Test Statistics
Z =
School of Engineering
E214 Statistical Methods for Engineering
(6)
Recall: Steps in Hypothesis TestingFrom Problem, identify parameter of interest
State Null Hypothesis, H0
Specify appropriate alternative hypothesis, H1
Choose a significance level,
Determine an appropriate test statistic
State the rejection region for the statistic
Decide whether or not H0 should be rejected and report in
problem context
Compute any necessary sample quantities, substitute
these into the equation for the test statistic, and compute
that value
School of Engineering
E214 Statistical Methods for Engineering
(7)
Proposed SolutionStep 1: The parameter of interest is the proportion of
batteries that fail during advertised period, p
Step 2: Null Hypothesis, H0: p = 0.002
Step 3: Alternative hypothesis, H1: p<0.002
(This formulation will allow manufacturers to make a strong claim about
the proportion of batteries that fail if the null hypothesis, H0: p=0.002 is
rejected)
Step 4: Chosen significance level is = 0.05
School of Engineering
E214 Statistical Methods for Engineering
(8)
Proposed SolutionStep 5: The test statistic to be used is:
Step 6: Reject H0: p = 0.002 if Z0<Z0.05 = -1.645
Step 7: Compute the test statistic:
n
pq
ppZ
0
n
pq
ppZ
0
5000
998.0*002.0
002.0)5000/7(
950.0
School of Engineering
E214 Statistical Methods for Engineering
(9)
Proposed SolutionStep 8: Conclusions:
Since Z0 = -0.95 is not < Z0.05 = -1.645, we cannot
reject H0 and conclude that the
manufacturers cannot claim that less than 0.2
percent of the company’s batteries will fail
during the advertised time period at = 0.05
School of Engineering
E214 Statistical Methods for Engineering
(10)
Type I () and Type II errors ()
Decision H0 is Actually True H0 is Actually False
Fail to reject H0 No error
(1-) Type II error, (failing to reject a false
null hypothesis)
Reject H0Type I error,
(rejecting a true null
hypothesis)
No error
(1-)
School of Engineering
E214 Statistical Methods for Engineering
Type I () and Type II errors ()
• β usually cannot be calculated as it depends on the actual
difference between the hypothesized value of the parameter
and the true value (we don’t know the true value!).
• 1- β is also known as Power of a Test. It measures the
sensitivity of the test to detect a real difference in
parameters if one actually exists.
• Larger results in a smaller β, and smaller results in a
larger β.
• To increase the Power, either increase the value of and/or
increase the sample size (which would reduce as well).
This would reduce the confidence interval of the sample
parameter and increase the ‘precision’ of the experiment.
(11)
School of Engineering
E214 Statistical Methods for Engineering
(12)
Type I and Type II errors: Example
Decision H0 is Actually True H0 is Actually False
Fail to reject H0
(Decide that company is
NOT unfair)
No error
(1-) Type II error, (Not sue company when they
are ACTUALLY unfair)
Reject H0
(Decide that company is
unfair)
Type I error, (Suing company when they are
NOT unfair)
No error
(1-)
Suppose that you are a lawyer that is trying to establish that a
company has been unfair to workers above 50 years old with
regard to salary increases. Suppose the mean salary increase
per year is 8%.
H0: µ= 0.08 ; H1: µ < 0.08
School of Engineering
E214 Statistical Methods for Engineering
(13)
Type I and Type II errors: Problem
Decision H0 is Actually True H0 is Actually False
Fail to reject H0
(Decide that NOT less than
0.2% batteries fail)
No error
(1-) Type II error, (Do not accept less than
0.2% batteries fail when it is true)
Reject H0
(Decide that less than
0.2% batteries fail)
Type I error, (Accepting less than
0.2% batteries fail when it is
NOT true)
No error
(1-)
The advertised claim for ABC batteries for mobile phones is set at
48 operating hours, manufacturer’s claim that less than 0.2
percent of the company’s batteries will fail
H0: p= 0.002 ; H1: p < 0.002
School of Engineering
E214 Statistical Methods for Engineering
(14)
Confidence Interval - Definition
• Confidence interval gives an estimated range of values
which is likely to include an unknown population parameter,
the estimated range being calculated from a given set of
sample data.
• The width of the confidence interval gives us some idea
about how uncertain we are about the unknown parameter. A
very wide interval may indicate that more data should be
collected before anything very definite can be said about the
parameter.
School of Engineering
E214 Statistical Methods for Engineering
(15)
Confidence Interval and Hypothesis Testing Outcome
• Close relationship between confidence intervals
and hypothesis testing
• Examples:
1. At 95% confidence interval, all values in the interval are
considered plausible values for the parameter being
estimated. If the value of the parameter specified by the
null hypothesis is contained in the 95% interval then the
null hypothesis cannot be rejected at the 0.05 level.
2. At 99% confidence interval, values outside the interval
are rejected at the 0.01 level.
School of Engineering
E214 Statistical Methods for Engineering
(16)
2-tailed Confidence Interval for Problem Statement
Based on the sampling of 5000 batteries, the 95% 2-
tailed confidence interval for mean proportion of
batteries that will fail the specs is:
||||
^^
025.0
^^^
025.0
^
n
qpZpp
n
qpZp
5000
)9986.0(0014.096.10014.0
5000
)9986.0(0014.096.10014.0 p
00244.000036.0 p
School of Engineering
E214 Statistical Methods for Engineering
(17)
1-tailed Confidence Interval for Problem Statement
The 95% 1-tailed confidence interval is:
Since the null hypothesis proportion value is 0.002
and lies within the interval, we cannot reject H0 at
0.05 level of significance.
|0
^^
05.0
^
n
qpZpp
5000
)9986.0(0014.065.10014.00 p
00227.00 p
School of Engineering
E214 Statistical Methods for Engineering
(18)
Learning Outcomes
• What are Proportions and its significance
• Hypothesis Testing a Proportion
• Assumptions when testing a claim about
Proportion
• Type I and Type II error
• Confidence Interval and Hypothesis Testing
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P11 – TRUE OR NOT TRUE
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
True or Not True
It has been reported that Singapore youths (aged 15 – 24) spend the longest hours daily in the region on instant messaging. Attached are the data collected from 10 youths (age 15 – 24) from South Korea, a country known for superb IT infrastructure and high internet usage rates
P11-Students response.xls
How can we prove whether the report is true or not true with reasonable statistical confidence, assuming the populations under comparison have equal variances?
School of Engineering
E214 Statistical Methods for Engineering
P11
True or Not True
School of Engineering
E214 Statistical Methods for Engineering
(2)
Testing Between 2 Samples
• There are many cases where researchers wish to compare 2 sample means. For example:– Is there a difference between the average lifetimes of 2 different brand
of tires?
– Did the students from college A score better in a common exams
compared with those from college B?
– How does the mean selling price of 4-room flats in one town compare
with another one?
– Have the soldiers’ fitness levels improved after training?
• To answer the above questions, we would collectdata for 2 samples and compare them by testing tosee if there is a statistically significant differencebetween the means
School of Engineering
E214 Statistical Methods for Engineering
(3)
Case 1: 2 Sample z-test
• If we sample from 2 normal populations that are independent of each other (meaning no relationship between the subjects in each sample), and the standard deviation of each variable is known, then we use z-test for comparing the 2 means:
• If population standard deviation is unknown, both sample sizes must be 30 or more and replace σ with sample standard deviation, s.
2
2
2
1
2
1
21
nn
XXz
School of Engineering
E214 Statistical Methods for Engineering
(4)
z-Test : Example 1
• The same physical fitness test was given to a group of 100 scouts
and 144 guides. The maximum score was 30. The guides obtained
a mean score of 26.81 and the scouts obtained a mean score of
27.53. Assuming that the fitness scores are normally distributed
with a common population standard deviation of 3.48, test at 95%
confidence interval whether the guides did not do as well as the
scouts in the fitness test.
Solution:
– Let X1 be the guide’s score and let the population mean be µ1.
X1 ~ N (µ1, 2)
– Let X2 be the guide’s score and let the population mean be µ2.
X2 ~ N (µ2, 2)
– Given = 3.48; n1 = 144, n2 = 100, X1 =26.81, X2 =27.53 ;
95% Confidence Level
x
School of Engineering
E214 Statistical Methods for Engineering
(5)
z-Test : Example 1Solution:
Hypothesis:
H0: µ1 - µ2 = 0
H1: µ1 - µ2 < 0 (1-tailed test since we are interested to find out if the
guides did not perform as well as the scouts)
At = 0.05, critical z-value is -1.645
Using:
Since z calculated is > -1.645, we do not reject H0. Thus there is no
evidence, at 5% level, that the guides did not perform as well as
the scouts in the fitness tests.
2
2
2
1
2
1
21
nn
XXz
589.1
100
48.3
144
48.3
53.2781.26
22
School of Engineering
E214 Statistical Methods for Engineering
(6)
Case 2: 2 Sample t-test
• If we sample from 2 independent normal populationswith unknown variances and the sample sizes are small, then we use 2-sample t-test for comparing the 2 means:
Where degrees of freedom are equal to n1 + n2 – 2
• The above t-test assumes that the variances of the populations are equal.
2121
2
22
2
11
21
11
2
)1()1(
nnnn
snsn
XXt
School of Engineering
E214 Statistical Methods for Engineering
(7)
Is it reasonable to assume same variances between
populations for t-Test?
• In the t-test, the population variances are unknown so often we do not know if the variances can be assumed to be equal.
• If the population variances are very different, the 2-sample t-test may not be accurate as the results may be influenced by the difference in the variances.
• However, the 2-sample t-test is not overly sensitive to small differences between population variances so most of the times this test can be used.
School of Engineering
E214 Statistical Methods for Engineering
(8)
2 Sample t-Test: One-sided Vs Two-sided
Left-tailed Test Right-Tailed
Test
Two-Tailed Test
H0: 1 - 2 ≥ 0
H1: 1 - 2 < 0
H0: 1 - 2 ≤ 0
H1: 1 - 2 > 0
H0: 1 - 2 = 0
H1: 1 - 2 ≠ 0
t0
.10
Reject
t0
.10
Reject
t0
.05
Reject
.05
Reject
School of Engineering
E214 Statistical Methods for Engineering
(9)
P-Value Method for Hypothesis Testing
• Hypothesis testing commonly uses level of significance α of 0.05 or 0.1 which is Type I error.
• P-value represents the calculated probability of getting the sample statistic. It is the actual area under the distribution curve.
1.0
P-value
= 0.159
Z
Example
Use NORMSDIST(-1) function in Excel
to find one-sided P-value of standard
normal distribution (z=1.0):
P(Z=1.0)
= 0.159
School of Engineering
E214 Statistical Methods for Engineering
(10)
Interpretation of P-Value
There are 2 ways to interpret the p-value. Assuming the
null hypothesis is true, p-value is the probability of:• Getting a test statistic like the one calculated or even
more extreme value
• Rejecting the null hypothesis when it is true
It answers this question – ‘To what extent does the data
support the null hypothesis?’
The smaller the p-value, the less the data supports the null hypothesis.
School of Engineering
E214 Statistical Methods for Engineering
(11)
Decision Making Based on P-Value
α Criteria
• If p-value is smaller than or equal to level of significance α, reject null hypothesis.
• If p-value is greater than level of significance α, do not reject null hypothesis
School of Engineering
E214 Statistical Methods for Engineering
(12)
Conventional Interpretation of P-values
•P > 0.10
Result is not significant
•0.05 < P < 0.10
Result is marginally
significant
•0.01 < P < 0.05
Result is significant
•P < 0.01
Result is highly significant
This is a rule-of-
thumb interpretation
without the need to
set α value.
School of Engineering
E214 Statistical Methods for Engineering
(13)
Proposed SolutionLet Sample 1 be Singapore youths’ instant messaging
time, and
Sample 2 be Korean youths’ instant messaging time
Hypothesis:
H0: µ1 - µ2 = 0
H1: µ1 - µ2 > 0 (1-tailed test since we are proving
whether the report is true or not true)
School of Engineering
E214 Statistical Methods for Engineering
(14)
Proposed SolutionGiven,
n1=20,n2=10,s1=100.85,s2=24.024,X1=268,X2=166.6
Using:
t-Statistic = 3.11 and p-value = 0.00214
[Excel function TDIST(3.11,28,1)]
Since p-value is <<0.05, reject null hypothesis
I.e. The report that Singaporean youths spending
most time on internet messaging is true
2121
2
22
2
11
21
11
2
)1()1(
nnnn
snsn
XXt
School of Engineering
E214 Statistical Methods for Engineering
(15)
Learning Outcomes
• To test the hypotheses about the difference
between two population means
• Test Statistic for the difference between two
means (independent normal populations)
– z-Test (known variances)
– 2 sample t-Test (unknown equal variances)
• p-Value calculation and significance
• 2 Sample t-Test
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P12 – WHO TYPES FASTER
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
SCHOOL OF
ENGINEERING
Page 2 of 2
Engineering, Republic Polytechnic, Singapore. Who Types Faster Some people believe that women in general can type faster than men since there are more female administrative staff compared to male ones. Others think that men have better hand-eye coordination and thus can type faster. Carry out a hypothesis test to determine if there is basis to further investigate those beliefs. You may carry out an experiment in your class and collect the relevant data by making use of typing tests provided in the following website: http://www.powertyping.com/typing_test/typing_test.shtml A second thought about typing speed is that typing an article containing non-English names will decrease typing speed. To find out whether this is true, conduct another study using hypothesis testing. Should you use the same test statistic for the two hypothesis tests? Meaning, is there any difference in the two studies in relation to assumptions about the population distribution, the relationship of the samples and the parameter under testing?
School of Engineering
E214 Statistical Methods for Engineering
P12 – Who Types Faster
School of Engineering
E214 Statistical Methods for Engineering
Recall: Testing of 2 Sample Means
• If we are comparing the means of 2 independent normal populations
with unknown variances and sampling sizes are small, 2-sample t-
test statistic can be applied:
Where degrees of freedom are equal to n1 + n2 – 2
• This t-test requires that the variances of the populations be equal.
2
2121
2
22
2
11
21
11
2
)1()1(
nnnn
snsn
XXt
School of Engineering
E214 Statistical Methods for Engineering
What if population variances are different?
• Often there is reason to suspect that variances between
2 populations may be very different. For example, output
of a newly set-up process Vs a long-run stable process.
• We may examine the sample variances. As a rule of
thumb, if the difference is 4 times or more, then we
cannot assume the population variances to be equal.
• In this case, a more appropriate test, called the Smith-
Satterthwaite Test, can be used. It is also known as the
2-sample t-test with unequal variances.
3
School of Engineering
E214 Statistical Methods for Engineering
Smith-Satterthwaite Test (2-Sample T-test with
Unequal Variances)
• When comparing the means of 2 independent samples from normal
populations whose variances are unknown and unequal, use the
following test statistic:
which is a random variable that approximates to t-distribution with
degrees of freedom equal to (round down to nearest integer):
4
2
2
2
1
2
1
21
n
s
n
s
XXt
School of Engineering
E214 Statistical Methods for Engineering5
T-Test with Unequal Variances: An Example
A researcher wants to determine whether the salaries of professional nurses employed in private hospitals are higher than those employed by government hospitals.
Data collected:
At 99% confidence level, can it be concluded that the private hospitals pay more than the government ones?
Private Government
10
800$
26800$
1
1
1
n
s
X
8
400$
25700$
1
1
2
n
s
X
School of Engineering
E214 Statistical Methods for Engineering6
T-Test with Unequal Variances: An Example
Let μ1 and μ2 be the average salaries of nurses in private and government hospitals respectively.
Ho: μ1 = μ2 and H1: μ1 > μ2 (Right-tailed T-test)
Assuming variances are not equal,
P-value is 0.0011 which is smaller than α = 0.01. Hence reject the null hypothesis. I.e. private hospitals pay nurses more than government ones.
13
80.3
8
400
10
800
2570026800
22
2
2
2
1
2
1
21
DOF
n
s
n
s
XXt
School of Engineering
E214 Statistical Methods for Engineering7
Dependent or Related Samples
• Sometimes, samples under study are related or they
contain the same subjects but under different conditions.
In this case, the samples are not independent of each
other and we cannot use 2-sample t-test.
• Examples of dependent samples:
– Performance of workers before and after a training program
– Effectiveness of a drug on patients
– Comparison of IQ scores of pairs of children matched with the
same age (to block out the differences in scores due to age)
School of Engineering
E214 Statistical Methods for Engineering8
Independent vs. Related Populations
• Independent Data
Sources
• Use Difference Between
the 2 Sample Means
• Same Data Source
– Paired/Matched
– Repeated Measures
(Before/After)
• Use Difference between
Each Pair of
Observations
Dn = X1n - X2n
Independent Related
21XX
School of Engineering
E214 Statistical Methods for Engineering9
Two Related Populations:
Paired Sample t-Test
• The paired sample t-test is used to test means of 2
related populations
– Paired or Matched samples
– Repeated Measures (Before/After)
• Eliminates variation among subjects in the same sample
• Assumptions
– If the sample is small, the distribution of difference scores
should be normally distributed
– Both Populations Are Normally Distributed
– If Not Normal, Can Be Approximated by Normal Distribution (n1
30 & n2 30 )
School of Engineering
E214 Statistical Methods for Engineering10
Paired Sample t-Test Statistic
Sample Mean
D
D
n
i
i
n
1
tDS
n
with df nD
1
Sample
Standard
Deviation
11
2)(
n
n
iD
iD
DS
In paired sample t-test, we test the mean of the differences
between each pair of subjects. The test statistic is:
School of Engineering
E214 Statistical Methods for Engineering11
Paired-Sample t-Test: An Example
To ascertain the effectiveness of a training program, the following test score data is collected:
Name Before Training After Training
Sam 85 94Tamika 94 87Brian 78 79Mike 87 88
At the 90% confidence level, determine the effectiveness of the training.
School of Engineering
E214 Statistical Methods for Engineering12
Paired-Sample t-Test:
Calculation of Test Statistic
Before After Difference
Sam 85 94 -9
Tamika 94 87 7
Brian 78 79 -1
Mike 87 88 -1
Total - 4
53.6
3
128
14
2))1(1((2))1(1((2))1(7((2))1(9((1
44
DSD
School of Engineering
E214 Statistical Methods for Engineering13
Paired Sample t-Test : Solution
H0: D 0 (D = B - A)
H1: D < 0
= 0.10
df = 4 - 1 = 3
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
Do not reject at = 0.10
There is no evidence the
training is effective.t0-1.6377
.10
Reject
tDS
nD
1
6 5
4
.0.306
School of Engineering
E214 Statistical Methods for Engineering14
Problem Statement
• “Some people believe that women in general can type faster
than men since there are more female administrative staff
compared to male ones. Others think that men have better
hand-eye coordination and thus type faster. Carry out a
hypothesis test to determine if there is basis to further
investigate those beliefs.”
• How should you formulate H0 and H1?
• Do you use z test or t-test?
• What assumptions do you make in this test?
School of Engineering
E214 Statistical Methods for Engineering15
Solution: 2 Sample t-Test with unequal variances
Let 1 be typing speed (words/min) for female
Let 2 be typing speed (words/min) for male
Test Hypotheses:
H0: 1 - 2 = 0
H1: 1 - 2 <> 0 (2-tailed test as the results may show men type faster!)
Are the population variances known? Are they are the same?
No, we do not know what the population variances are or
whether they are the same or not. Let us apply 2 sample t-test
with unequal variances.
School of Engineering
E214 Statistical Methods for Engineering16
Calculations: 2 Sample t-Test with unequal variances
Assume the following has been calculated from data collected:
x1 = 31.42
x2 = 27.64
s1 = 6.35
s2 = 4.54
n1 = n2 = 10
• From Excel function TDIST(1.37,16,2) P(T>t or T<-t)
= 0.190 (p-value)
• Ho cannot be rejected at level of significance =0.1
16
37.1
8
54.4
8
35.6
64.2742.31
22
2
2
2
1
2
1
21
DOF
n
s
n
s
XXt
School of Engineering
E214 Statistical Methods for Engineering17
Paired t-Test• How can we prove if typing an article containing non-English names
affects the typing speed?
• In this case, since different people have different typing speeds, we should compare the speed of the same person typing two articles, one with and the other without non-English names.
• Paired t-test gives a more accurate result here as it is able to detect the differences (Xi1 – Xi2) to a greater extent compared with 2 sample t-test.
Person Article 1 typing speed Article 2 typing speed
1 X11 X12
2 X21 X22
3 X31 X32
4 X41 X42
School of Engineering
E214 Statistical Methods for Engineering18
Summary of Learning Outcomes
• Perform t-test for 2 independent samples with unequal variances
• Understand the difference between independent and related (dependent) populations
• Perform paired t-test for 2 dependent samples
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P13 – TEST IT FOR FAIRNESS
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Test it for Fairness You have just developed a program that simulates a six-sided dice. You are thinking of selling this as a product commercially as an embedded software application or an online tool. You want to test whether the dice is really fair, before you go ahead to launch it. How would you go about conducting a statistical test to decide whether the dice program you have developed is fair?
School of Engineering
E214 Statistical Methods for Engineering
P13
Test it for Fairness
School of Engineering
E214 Statistical Methods for Engineering
(2)
Test a distribution for Goodness-of-Fit using
Chi-Square
• Previously, we used statistical hypothesis to test
single population parameters.
• For today’s problem, we use statistical hypothesis to
determine if a population has a specified theoretical
distribution.
• This test is based on how good a fit we have between:
– the frequency of occurrence of observations in an observed
sample
– the expected frequencies obtained from the hypothesized
distribution
School of Engineering
E214 Statistical Methods for Engineering
(3)
The Multinomial Experiment
• The experiment consists of n identical, independent trials.
• The outcome of each trial falls in one of k categories.
• The probabilities associated with the k outcomes, denoted by π1, π2, …, πk, remain the same from trial to trial. Since there are only k possible outcomes, we have π1 + π2 + … + πk = 1
• The experimenter records the values of o1, o2, …, ok, where oj (j = 1, 2, …, k) is equal to the observed number of trials in which the outcome is in category j.
Note that n = o1 + o2 + … + ok
School of Engineering
E214 Statistical Methods for Engineering
(4)
The 2 Test Statistic
The 2 Test Statistic measures the amount of
disagreement between the observed data and
the expected data.
2 = ∑ (oj – ej)2 / ej
where the sum is over all categories, with oj being
the observed frequency count and ej the
expected frequency count in category j.
School of Engineering
E214 Statistical Methods for Engineering
(5)
Test Statistic and its Applicability
Test Statistic: 2 = ∑ (oj – ej)2 / ej
with degrees of freedom equal to the number of categories minus 1 (right-tailed test), where
o = observed frequency
e = expected frequency
Assumptions for Chi-Square Goodness-of-Fit Test
1. The experiment satisfies the properties of a multinomial experiment.
2. No expected cell count, ej, is less than 5
School of Engineering
E214 Statistical Methods for Engineering
(6)
Typical 2 Density Curve
The curve begins at zero is and skewed right. As the degrees
of freedom increase, the distribution stretches out along the
horizontal axis.
School of Engineering
E214 Statistical Methods for Engineering
(7)
Step 1: State the Null and Alternative Hypotheses
H0: Newly developed die is fair
H1: Newly developed die is not fair
School of Engineering
E214 Statistical Methods for Engineering
(8)
Step 2: Compute Expected (Ei) and Observed (Oi) Frequencies
School of Engineering
E214 Statistical Methods for Engineering
(9)
Step 3: Decide on Rejection Criterion
Degrees of freedom
= Number of classes – number of restrictions
= 6 – 1 = 5
Test at 5% significance level,
Reject Ho if: 2(calc) > 2
5%, 5
i.e. if 2(calc) > 11.07
School of Engineering
E214 Statistical Methods for Engineering
(10)
2 Distribution with 5 Degrees of Freedom
=0.05
2.05,5 = 11.07
School of Engineering
E214 Statistical Methods for Engineering
(11)
Step 4: Conclusion
Since = 2(calc) = 2.12 < 11.07,
Ho is accepted.
Conclusion:
The newly developed die is fair and you can
confidently release it commercially.
School of Engineering
E214 Statistical Methods for Engineering
(12)
Learning Outcomes
The Chi-Square Goodness-of-Fit Test
• Understand the basic properties of the multinomial experiment
• Know how to calculate the expected number of outcomes to fall in categories of a multinomial experiment
• Know the assumptions required for a chi-square goodness-of-fit test
• Know how to conduct a chi-square goodness of fit test
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P14 – Testing with Signs
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 2
Testing with Signs Your statistics facilitator thinks that RP students have less sleep than a typical person in Singapore. Can you prove whether this is true with reasonable statistical confidence, if the median duration of sleep is known to be 7 hours in Singapore? Note that the distribution of sleep duration cannot be assumed normal. Consider a small sample size from your class and perform a sign test to test the claim.
School of Engineering
E214 Statistical Methods for Engineering
P14
Testing with Signs
School of Engineering
E214 Statistical Methods for Engineering
(2)
Nonparametric Tests
• Statistical tests such as z, t and F tests are called
parametric tests
• Parametric tests require the assumption that
sampling populations are normally distributed
• In situations where population distribution is not
normal, nonparametric (or distribution-free) tests
can be used
School of Engineering
E214 Statistical Methods for Engineering
(3)
Pros and Cons of Nonparametric Tests
• Variables under test need not be
normally distributed
• Can be used to test hypotheses
that do not involve population
parameters, such as randomness
of sample, relationship between 2
samples
• Computations are generally
easier compared with parametric
tests
• Less sensitive than parametric
tests when normality assumption
is met. Thus, larger differences
are needed before null hypothesis
can be rejected
• Use less information than
parametric tests
• Less efficient in the sense that
larger sample size is required to
overcome loss of information
Advantages Disadvantages
School of Engineering
E214 Statistical Methods for Engineering
(4)
Sign Test
• The sign test is used to test the value of a median
of a specific sample
• An alternative to 1-sample t-test or paired t-test
• Can be used for small sample size
• Assigns a ‘+’ to sample values above the
hypothesized median value and a ‘-’ to sample
values below the median
• Does not account for the difference between
values in the data and the median
School of Engineering
E214 Statistical Methods for Engineering
If the probability (p-value) is smaller than significance level
, reject the null hypothesis. Conclude appropriately.
Compute the p-value based on binomial distribution with n, r and
p=0.5:
P(X<=r) if H1 contains ‘<‘, P(X>=r) if H1 contains ‘>‘ while p-value
for 2-sided test is twice the smaller p-value
(5)
Procedure in Sign Test
From Problem, identify the claim
State null (H0), alternative (H1) hypotheses and
significance level
For single-sample test, compare each
value with the hypothesized median. If
value is larger, replace with a ‘+’ sign. If
it is smaller, replace with a ‘-’ sign. If
equal, discard the value.
Count the number of ‘+’ (r) and the total number of signs (n)
For paired-sample test, subtract
each after value from the before
value and indicate the difference
with a ‘+’ or ‘-’ sign or 0. Discard the
‘0’ value(s).
School of Engineering
E214 Statistical Methods for Engineering
(6)
Example 1: One-sample Sign Test
A researcher read that the median age for
viewers of the Singapore Idol show is 20 years.
To test the claim, 80 viewers were surveyed, and
30 were under the age of 20 years old while
exactly 4 were 20 years old.
At = 0.05, test the claim. Give one reason why
an advertiser might like to know the result of this
study.
School of Engineering
E214 Statistical Methods for Engineering
(7)
Example 1: SolutionThe claim under test is that median age of viewers (ū) is 20 years.
Null Hypothesis: H0: ū = 20
Alternative Hypothesis: H1: ū ≠ 20
Letting n=76 (4 values are discarded), x=30, p=0.5, the probability of
getting 30 or less ‘-’ is:
P(X<=30) = 76Cr(0.5)r(1-0.5)76-r = 0.0423
P-value is 0.0423x2 = 0.0846 as this is a 2-tailed test.
Since p-value is greater than = 0.05, there is not enough evidence to
reject null hypothesis and we accept the claim that median age of
viewers is 20 years.
x
r 0
School of Engineering
E214 Statistical Methods for Engineering
(8)
Example 2: Paired-sample Sign Test
The following are the average weekly losses of
worker-hours due to accidents at 13 industrial
sites before and after a certain safety program
was put into operation:
23 and 35, 41 and 30, 20 and 8, 28 and 35, 45 and 24, 83
and 77, 26 and 24, 17 and 11, 55 and 58, 29 and 25, 15
and 10, 28 and 22 and 37 and 35.
Use 0.05 level of significance to test whether the
safety program is effective.
School of Engineering
E214 Statistical Methods for Engineering
(9)
Example 2: Solution
Let ūd be the mean difference in loss hours before and after the program.
Null Hypothesis: H0: ūd = 0 (safety program is not effective)
Alternative Hypothesis: H1: ūd > 0 (safety program is effective)
The 13 sample pairs yield: - + + - + + + + - + + + +
Letting n=13, x=10, p=0.5, the probability of getting 10 or more ‘+’ is:
P(X>=10) = 1- P(X<=9) =1- 13Cr(0.5)r(1-0.5)13-r = 0.0461
P-value is 0.0461 (1-tailed test)
Since p-value is smaller than = 0.05, reject null hypothesis and
conclude that the safety program is effective.
x
r 0
School of Engineering
E214 Statistical Methods for Engineering
(10)
Problem Statement: Solution Example
Suppose data (average sleep duration in hours) from 16 students are collected as follows:
6.5 7 6 5
5.5 5 6 6
6.5 7 8 6.5
5.5 5 7.5 6
School of Engineering
E214 Statistical Methods for Engineering
(11)
Problem Statement: Solution Example
1. Claim is that RP students have on the average less than 7 hours of sleep every night.
2. Null Hypothesis:
H0: ū = 7
Alternative Hypothesis:
H1: ū < 7
3. Converting values into positive and negative signs, we have 2 ‘+’, 12 ‘-’ and 2 discarded values (tie)
School of Engineering
E214 Statistical Methods for Engineering
(12)
Problem Statement: Solution Example
4. Letting n=14, x=2, p=0.5, the probability of getting 2 or less ‘+’ is
P(X<=2) = 14Cr(0.5)r(1-0.5)14-r = 0.0065
Answer is calculated from Excel function BINOMDIST(2,14,0.5,1)
5. Since p-value is 0.0065 < =0.01, reject null hypothesis
6. We can confidently say that RP students has less than 7 hours of sleep per night.
x
r 0
School of Engineering
E214 Statistical Methods for Engineering
(13)
Learning Outcomes
The Sign Test (Nonparametric Test)
• Understand when to apply nonparametric tests
• Know how to apply the Sign Test for small sample size where the normality assumption is not valid
• Know that the sign test is used to test population median for both one-sample and paired-sample tests
SCHOOL OF ENGINEERING
DIPLOMA IN INDUSTRIAL & OPERATIONS MANAGEMENT
DIPLOMA IN SUPPLY CHAIN MANAGEMENT
DIPLOMA IN CIVIL AVIATION
P15 – CHOCOLATE ADVERTISEMENTS
E214 : STATISTICAL METHODS FOR ENGINEERING
Copyright © 2009 School of Engineering, Republic Polytechnic, Singapore
All rights reserved. No part of this document may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the School of
Engineering, Republic Polytechnic, Singapore.
SCHOOL OF
ENGINEERING
Page 2 of 3
Chocolate Advertisements You are working in an advertisement company. Recently, your company won a contract to design a poster advertisement to attract people to try out a new series of chocolate newly launched by your customer. To kick off the project, you have been tasked to look into the current advertisement posters by other companies and evaluate their attractiveness. You have identified 3 posters below. Conduct an investigation to evaluate if there is any significant difference in the attractiveness of the 3 advertisement posters. Poster 1
SCHOOL OF
ENGINEERING
Page 3 of 3
Poster 2
Poster 3
School of Engineering
E214 Statistical Methods for Engineering
P15 – Chocolate Advertisements
School of Engineering
E214 Statistical Methods for Engineering2
ANOVA
What is ANOVA?
Analysis of Variance (ANOVA) provides the tools to compare the means of several
populations with a single test.
The role of ANOVA is to perform a numerical test of significance that will test the equality
of all the means.
School of Engineering
E214 Statistical Methods for Engineering3
Underlying Assumptions for ANOVA
The F distribution is also used for testing whether two or more
sample means are from the same or equal populations.
This technique is called Analysis of Variance or ANOVA.
ANOVA requires the following conditions:
– The sampled populations follow the normal distribution.
– The populations have equal standard deviations.
– The samples are randomly selected and are
independent.
School of Engineering
E214 Statistical Methods for Engineering4
The F-Statistic
ANOVA is a procedure that compares the variability
between the samples to the variability within the
samples by computing the ratio
The F-statistic is a numerical measure of how much
the sample means differ.
samplesthewithiniance
samplesthebetweenianceF
___var
___var
School of Engineering
E214 Statistical Methods for Engineering5
Characteristics of the F Distribution
• Each member of the family is determined by two
parameters: the numerator degrees of freedom and
the denominator degrees of freedom.
• F cannot be negative, and is a continuous
distribution.
• The F distribution is positively skewed.
• Its value ranges from 0 to . As F , the
curve approaches the x-axis.
School of Engineering
E214 Statistical Methods for Engineering6
Procedure for the Analysis of Variance
• Null Hypothesis:
– Population means are the same.
• Alternative Hypothesis:
– At least one of the means is different.
• Test Statistic is the F Distribution.
• Decision rule is to reject the null hypothesis if
Fcalculated > Fcritical
School of Engineering
E214 Statistical Methods for Engineering7
Procedure for the Analysis of Variance
• For k populations sampled, the numerator degrees of
freedom is (k – 1).
• For a total of n observations, the denominator degrees
of freedom is (n – k).
• The test statistic is computed by:
Where MS(Tr) is Mean Square for Treatments and MSE
is Mean Square Error
knSSE
kTrSS
MSE
TrMSF
1)()(
School of Engineering
E214 Statistical Methods for Engineering8
Procedure for the Analysis of Variance
• SS(Tr) is the Treatment Sum of Squares.
where
TC is the column total
nc is the number of observations in each column
X the sum of all the observations
n the total number of observations
n
X
n
TTrSS
c
c
22
)(
School of Engineering
E214 Statistical Methods for Engineering9
Procedure for the Analysis of Variance
• SST is the Total Sum of Squares
•SSE is the Sum of Squares Error
SS(Tr)- SST SSE
n
XXSST
2
2 )(
School of Engineering
E214 Statistical Methods for Engineering10
Example 1Specializing in meals for the elderly, a restaurant recently introduced vegetarian
porridge at three of its branches.
Data on number of vegetarian porridge ordered were collected over a period of 5
days. Assuming a 5% level of significance, determine if there is a difference in the
mean number of bowls ordered per day at the three branches.
Branch 1 Branch 2 Branch 3
Day 1 13 10 18
Day 2 12 12 16
Day 3 14 13 17
Day 4 12 11 17
Day 5 14 13 17
School of Engineering
E214 Statistical Methods for Engineering11
Example 1: Proposed Solution• SST:
• SS(Tr):
•SSE: SSE = SST – SS(Tr)
= 87 – 74.2
= 12.8
87
15
2092999
)(22
2
n
XXSST
2.74
15
)209(
5
85
5
59
5
65
)(
2222
22
n
X
n
TTrSS
c
c
School of Engineering
E214 Statistical Methods for Engineering12
Example 1: Proposed Solution (continued)
• Step 1:
H0: Mean number of porridge sold at the 3 branches are the
same
H1: Mean number of porridge sold at the 3 branches are not
the same
• Step 2:
– H0 is rejected if F > Fcritical
– Fcritical = 3.89 as there are 2 df in the numerator and 12 df
in the denominator
School of Engineering
E214 Statistical Methods for Engineering13
Example 1: Proposed Solution (continued)
• Calculating the value of F:
• The decision is to reject the null hypothesis as the
treatment means are not the same.
• The mean number of bowls of vegetarian porridge sold
at the three locations is not the same.
77.343158.12
132.74
1)(
knSSE
kTrSSF
School of Engineering
E214 Statistical Methods for Engineering14
Inferences About Treatment Means
When the null hypothesis that the means are
equal is rejected, it may be necessary to know
which treatment means differ.
One of the simplest procedures to determine
this is through the use of confidence intervals.
School of Engineering
E214 Statistical Methods for Engineering15
Confidence Interval for the
Difference Between Two Means
where
• t is obtained from the t table with degrees of freedom
(n - k),
• MSE (Mean Square Error) = [SSE/(n - k)]
)11
(t)(21
21
nnMSEXX
School of Engineering
E214 Statistical Methods for Engineering16
Example 2
Continuing from Example 1, develop a 95% confidence
interval for the difference in the mean number of bowls of
vegetarian porridge sold in Branch 2 and Branch 3.
Can management conclude that there is a difference
between the two branches?
School of Engineering
E214 Statistical Methods for Engineering17
Example 2: Proposed Solution• Confidence Interval:
(17 – 11.8) ± 2.179 √(1.067(1/5 + 1/5))
= 5.2 ± 1.424
= (3.776, 6.624)
Since zero is not in the interval, conclude that this pair of means differ.
Hence, the mean number of bowls of vegetarian porridge sold in Branch 2 is different from in Branch 3.
School of Engineering
E214 Statistical Methods for Engineering18
Today’s Problem
• Ho: The mean scores for the 3 posters are the
same
• H1: The mean scores for the 3 posters are not
the same
• Critical value = 3.124
• p-value = 0.0000056
• Since F = 14.3 > 3.124 & p-value < 0.05, we
reject the null hypothesis and conclude that
there is a significant difference in the scores for
the 3 posters.
School of Engineering
E214 Statistical Methods for Engineering19
Today’s Problem
Using Excel, the ANOVA output as follows:
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Poster 1 25 117 4.68 0.476666667
Poster 2 25 145 5.8 0.583333333
Poster 3 25 137 5.48 0.676666667
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 16.64 2 8.32 14.37236084 5.59715E-06 3.123907449
Within Groups 41.68 72 0.578888889
Total 58.32 74
SS(Tr) SSE SST
k-1
n-k F statistic
MS(Tr)
MSE
School of Engineering
E214 Statistical Methods for Engineering20
Today’s Problem
• 95% confidence interval between scores for
poster 1 and 2:
= 0.69 to 1.55
• Since 0 is not in the interval, we conclude that
this pair of means differ. We further conclude
that mean score for poster 2 is significantly
higher than poster 1.
)11
(t)(12
12nn
MSEXX
School of Engineering
E214 Statistical Methods for Engineering21
Learning Outcomes
• What is Analysis of Variance (ANOVA)?
• Characteristics of the F Distribution
• Test for Equal Variance (single factor
ANOVA test)
• Underlying Assumptions for ANOVA
• Confidence Interval for the Difference
Between Two Means