MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of...

80
MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering & Management Institute) Flint, MI 48504-4898 Phone: 810.762.7906 Email: [email protected] Homepage: www.kettering.edu/~schakrav

Transcript of MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of...

Page 1: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

MATH408: Probability & StatisticsSummer 1999

WEEK 2

Dr. Srinivas R. ChakravarthyProfessor of Mathematics and Statistics

Kettering University(GMI Engineering & Management Institute)

Flint, MI 48504-4898Phone: 810.762.7906

Email: [email protected]: www.kettering.edu/~schakrav

Page 2: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SAMPLE

• Sample: is a subset (part) of the population.

• Since it is infeasible (and impossible in many cases) to study the entire population, one has to rely on samples to make the study.

• Samples have to be as representative as possible in order to make valid conclusions about the populations under study.

Page 3: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SAMPLE (cont'd)

• Contain more or less the same type of information that the population has.

• For example if workers from three shifts are involved in assembling cars of a particular model, then the sample should contain units from all three.

• Samples will be used to “estimate” the parameters.

Page 4: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SAMPLE (cont’d)

• Much care should be devoted to the sampling.• There is always going to be some error involved in

making inferences about the populations based on the samples.

• The goal is to minimize this error as much as possible.

• There are many ways of bringing in systematic bias (consistently misrepresent the population).

Page 5: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SAMPLE (cont’d)

• This can be avoided by taking random samples.• Simple random sample: all units are equally likely to

be selected.

• Multi-stage sample: units are selected in several stages.

• Cluster sample: is used when there is no list of all the elements in the population and the elements are clustered in larger units.

Page 6: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SAMPLE (cont’d)

• Stratified sample: In cases where population under study may be viewed as comprising different groups (stratas) and where elements in each group are more or less homogeneous, we randomly select elements from every one of the strata.

• Convenience sample: samples are taken based on convenience of the experimenter.

• Systematic sample: units are taken in a systematic way such as selecting every 10th item after selecting the first item at random.

Page 7: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

HOW TO USE SAMPLES?

• Samples should represent the population.

• Random sample obtained will not always be an exact copy of the population.

• Thus, there is bound to be some error:

Page 8: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SAMPLES (cont’d)

• Random or unbiased error: This is due to the random selection of the sample and the mean of such error will be 0 as positive deviation and negative deviation cancel out. This random error is also referred to as random deviation and is measured by the standard deviation of the estimator.

• Non-random or biased error: this occurs due to several sources such as human, machines, mistakes due to copying or punching, recording and so on. Through careful planning we should try to avoid or minimize this error.

Page 9: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

EXPERIMENTS USING MINITAB

• We will illustrate the concepts of sample, sampling error, etc with practical data using MINITAB when we go to the laboratory next time.

Page 10: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

NEXT?

• Once the data has been gathered, what do we do next?

• Before any formal statistical inference through estimation or test of hypotheses is conducted, EDA should be employed.

Page 11: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

EXPLORATORY DATA ANALYSIS (EDA)

• This is a procedure by which the data is carefully looked for patterns, if any, and to isolate them.

• First step in identifying appropriate model.

Page 12: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

EDA(cont'd)

• The main difference between EDAand conventional data analysis is:

– while the former, which is more flexible (in terms of any assumptions on the nature of the populations from which the data are gathered) emphasizes on searching for evidence and clues for the patterns, the latter concentrates on evaluating the evidence and the hypotheses on the nature of the parameters of the population(s) under study.

Page 13: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

CAPABILITY ANALYSIS

• Deals with the study of the ability of the process to manufacture products within specifications.

• In order to perform the capability analysis, the process must be stable (i.e., things such as warm up period needed on the process before manufacturing products and others should be taken care of).

Page 14: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

CAPABILITY ANALYSIS (cont'd)

• The process specifications are compared to the variance (or the spread) of the process.

• For a process to be more capable, more measurements would be expected to fall within the specifications.

Page 15: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

CAPABILITY ANALYSIS (cont'd)

s

LSLx

s

xUSLC pk 3

,3

min

A commonly used capability index is given by

Page 16: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

CAPABILITY ANALYSIS (cont'd)

• The larger the value of Cpk, the less evidence that the process is outside the specifications. A value of 1.5 or higher for Cpk is usually desired. More on this will be seen later.

Page 17: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

DESCRIPTIVE STATISTICS

• Deals with characterization and summary of key observations from the data.

• Quantitative measures: mean, median, mode, standard deviation, percentiles, etc.

• Graphs: histogram, Box plot, scatter plot, Pareto diagram, stem-and-leaf plot, etc.

• Here one has to be careful in interpreting the numbers. Usually more than one descriptive measure will be used to assess the problem on hand.

Page 18: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

DIFFERENT TYPES OF PLOTS

• Point plot: The horizontal axis (x-axis) covering the range of the data values and vertically plot the points, stacking any repeated values.

• Time series plot: x-axis corresponds to the number of the observation or the time of the observation or the day and so on and the y-axis will correspond to the value of the observation.

Page 19: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Time-series plot

Page 20: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PLOTS (cont'd)

• Scatter plot: Construct x-axis and y-axis that cover the ranges of two variables. Plot (x i, yi) points for each observation in the data set.

• Histogrom: This is a bar graph, where the data is grouped into many classes. The x-axis corresponds to the classes and the y-axis gives the frequency of the observations.

Page 21: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Histogram

Page 22: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PLOTS (cont'd)

• Stem-and-leaf plot: Data is plotted in such a way the output will look like histogram and also features a frequency distribution. The idea is to use the digits of the data to illustrate its range, shape and density. Each observation is split into leading digits and trailing digits. All the leading digits are sorted and listed to the left of a vertical line. The trailing digits are written to the right of the vertical line.

Page 23: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Stem-and-leaf

Page 24: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PLOTS (cont'd)

• Pareto Diagram: Named after the Italian economist. This is a bar diagram for qualitative factors. This is very useful to identify and separate the commonly occurring factors from the less important ones. Visually it conveys the information very easily.

Page 25: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Pareto Diagram

40 56 64 80120440 5.0 7.0 8.010.015.055.0

100.0 95.0 88.0 80.0 70.0 55.0

100

80

60

40

20

0

800

700

600

500

400

300

200

100

0

Defect

CountPercentCum %

Perc

ent

Num

ber

of O

ccur

renc

es

Pareto Chart for : FailuresSolder Defects (Messina:SQC for MM, 1987)

Page 26: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PLOTS (cont'd)

• Box plot: is due to J. Tukey and provides a great deal of information. A rectangle whose lower and upper limits are the first and third quartiles, respectively, is drawn. The median is given by a horizontal line segment inside the rectangle box. The average value is marked by a symbol such as “x” or “+”. All points that are more extreme are identified.

Page 27: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.
Page 28: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Boxplot for MPG example

VanSportsLargeCompactMidsizeSmall

45

35

25

15

size

mpg

_c

Box Plot of City MPG vs Size

Page 29: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PLOTS (cont'd)

• Quantile plot: This plot is very useful when we want to identify/ verify an hypothesized population distribution from which the data set could have been chosen. A quantile, Q(r), is a number that divides a sample (or population) into two groups so that the specified fraction r of the data values is less than or equal to the value of the quantile.

Page 30: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PLOTS (cont'd)

• Probability plot: This involves plotting the cumulative probability and the observed value of the variable against a suitable probability scale which will result in linearization of the data. The basic steps involved here are: (a) Sorting the data into ascending order; (b) Computing the plotting points; (c) Selecting appropriate probability paper; (d) Plot the points; (e) Fitting a “best” line to data.

Page 31: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

MEASURES OF LOCATION

• MEAN: Used very often in analyzing the data.

– Although this is a common measure, if the data vary greatly the average may take a non-typical value and could be misleading.

• Median: is the halfway point of the data and tells us something about the location of the distribution of the data.

• Mode: if exists, gives the data point that occur most frequently.

– It is possible for a set of data to have 0, 1 or more modes.

Page 32: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

LOCATION (cont’d)

• Mean and median always exist.• Mode need not exist.• Median and mode are less sensitive to extreme

observations.• Mean is most widely used.• There are some data set for which median or mode

may be more appropriate than mean

Page 33: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

LOCATION (cont’d)

• Percentiles: The 100pth percentile of a set of data is the value below which a proportion p of the data points will lie.

• Percentiles convey more information and are very useful in setting up warranty or guarantee periods for manufactured items.

• Also referred to as quantiles.• The shape of the frequency data can be classified

into several classes.

Page 34: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

LOCATION (cont’d)

• Symmetric: mean = median = mode• Positively skewed: tail to the right; mean > median• Negatively skewed:tail to the right; median >

mean• In problems, such as waiting time problems one is

interested in the tails of the distributions.• For skewed data median is preferred to the mean.

Page 35: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

MEASURES OF SPREAD

• One should not solely rely on mean or median or mode.

• Also two or more sets of data may have the same mean but they may be qualitatively different.

• In order to make a meaningful study, we need to rely on other measures.

Page 36: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

MEASURES OF SPREAD

• For example, we may be interested to see how the data is spread.

• Range: is the difference between the largest and the smallest observations.

• Quick estimate on the standard deviation.

• Plays an important role in SPC.

Page 37: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SPREAD (cont’d)

• Standard deviation: describes how the data is spread around its mean.

• Coefficient of variation: The measures we have seen so far depend on the unit of measurements. It is sometimes necessary and convenient to have a measure that is independent of the unit and such a useful and common measure is given by the ratio of the standard deviation to the mean called the coefficient of variation.

Page 38: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SPREAD (cont’d)

• Interquartile range: is the difference between the 75th and 25th percentiles.

• Gives the interval which contains the central 50 % of the observations.

• Avoids the total dependence on extreme data

Page 39: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

2

1

2

1

)(1

1s and

1XX

nX

nX

n

ii

n

ii

2

1

22

11

1s X

n

nX

n

n

ii

Page 40: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Stem-and-leaf of cycles N = 70Leaf Unit = 100 (Problem 2.2)

1 0 3 1 0 5 0 7777 10 0 88899 22 1 000000011111 33 1 22222223333 (15) 1 444445555555555 22 1 66667777777 11 1 888899 5 2 011 2 2 22

Page 41: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

400 800 1200 1600 2000

95% Confidence Interval for Mu

1250 1350 1450 1550

95% Confidence Interval for Median

Variable: cycles

A-Squared:P-Value:

MeanStDevVarianceSkewnessKurtosisN

Minimum1st QuartileMedian3rd QuartileMaximum

1307.71

345.01

1263.17

0.2500.735

1403.66 402.39161914

-2.5E-02-4.4E-01

70

375.001097.751436.501735.002265.00

1499.60

482.82

1538.24

Anderson-Darling Normality Test

95% Confidence Interval for Mu

95% Confidence Interval for Sigma

95% Confidence Interval for Median

Descriptive StatisticsProblem 2.2

Page 42: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

INFERENTIAL STATISTICS

• Recall that a parameter is a descriptive measure of some characteristic of the population.

• The standard ones are the mean, variance and proportion.

• We will simply denote by , the parameter of the population under study.

Page 43: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

INFERENTIAL STATISTICS

• Estimation Theory and Tests of Hypotheses are two pillars of statistical inference.

• While estimation theory is concerned about giving point and interval estimates for parameter(s) under study, test of hypotheses deals with testing claims on the parameter(s).

Page 44: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.
Page 45: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Illustrative Example 1

• The following data corresponds to an experiment in which the effect of engine RPM on the horsepower is under study.

TABLE 1: Data for HP Example

hp@4500 243 241 233 264 233 263 248 240hp@5500 211 234 218 240 218 237 216 203hp@4500 258 257 247 276 248 272 261 268hp@5500 252 264 245 275 244 270 257 273

Page 46: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Variable N Mean Median Tr Mean StDev SE Mean

hp@4500 16 253.25 252.50 253.07 13.51 3.38hp@5500 16 241.06 242.00 241.36 23.16 5.79

Variable Min Max Q1 Q3

hp@4500 233.00 276.00 241.50 263.75hp@5500 203.00 275.00 218.00 262.25

Page 47: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

302010

280

270

260

250

240

230

220

210

200

Time Order

Hor

se P

ower

Horse Power at 4500 RPM and 5500 RPM

Page 48: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

hp@5500hp@4500

280

270

260

250

240

230

220

210

200

Boxplots of hp@4500 and hp@5500(means are indicated by solid circles)

Page 49: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

GROUNDWORK FOR PROBABILITY

• Looking at the data in Table 1, why is that the hp values, say at 4500 RPM, are not exactly the same if the experiment is repeated under the “same conditions”?

• The fluctuation that occurs from one repetition to another is called experimental variation, which is usually referred to as “noise” or “statistical error” or simply “error” [Recall this term from earlier discussion on data collection].

Page 50: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• This represents the “variation” that is inherently present in any (practical) system.

• The noise is a “random variable” and is studied through probability.

Page 51: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

What is Probability?

• A manufacturer of blender motors wants to determine the warranty period for this product.

• If motor life were constant, (say 8 years) the manufacturer would have no problem. The motor could be warranted for 8 years.

• But, in reality, the motor life is not a constant.

Page 52: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• Some motors will fail quickly and others will last for several years.

• There is an element of randomness in the life of the motors.

• The manufacturer cannot precisely predict how long any motor will last.

• Probability theory gives the manufacturer the means to quantify what is known about motor lifetimes and helps to quantify the risks involved in setting a warranty period.

Page 53: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• Similar problems arise in the context of other products.

• FMS play an important role in modern manufacturing. Improved quality, lower inventory, shorter lead times, higher productivity and greater safety are some of the benefits derived from FMS.

• All of these have random elements.

Page 54: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• Probability theory deals with randomness, allowing the study of quantities whose behavior cannot be predicted completely in advance.

• The above examples deal with manufacturing.

Page 55: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• We could just as easily find examples in business, electrical and computer engineering, biomedical science and engineering, sociology, economics, marketing, civil engineering, the behavioral sciences and so on. The underlying problem, randomness, is the same.

Page 56: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• One should understand the ideas of probability and statistics from both theoretical and practical points of view.

• To properly apply probability and statistics in the real world, we must appreciate both sides of the picture.

• We cannot properly apply a procedure if we don't, at least in general terms, understand the reasoning (theory) behind it.

Page 57: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• On the other hand, trying to apply theory without knowledge of the area of application is foolish. We have to have a proper perspective on both before meaningful progress can be made.

• Probability theory develops mathematical models for random experiments.

• A random experiment is a sequence of actions whose outcome cannot be predicted with certainty.

Page 58: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• If you've used phrases like "one chance in a 1000", "50-50" or "3-to-2 odds" to describe something, you have most likely been using an informal probability model.

• If we throw two fair dice and our concern is about whether or not the dice eventually land and come to rest, then the throwing of the two dice is not a random experiment.

• Our knowledge of physical laws allows us to predict with

virtual certainty that this outcome will happen.

Page 59: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• If, however, we are concerned with how many dots show on the topmost faces when the dice come to rest, then we are performing a random experiment in tossing the dice, since we cannot predict with certainty which faces will show.

Page 60: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY (cont’d)

• Outcomes of random experiments: the length of a phone call, the gender mix of three people chosen from a group of 25 people, and the phenotype of the offspring of a cross breeding experiment, the number of defects on a painted panel.

Page 61: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

EXPERIMENT

• Calculation of MPG of a new model car.• Measurements of current in a thin copper wire.• Measurements of Film build thickness in a

painting process.• Duration of phone calls.• Time to assemble a job.• Tossing a coin.• Throwing a dice.

Page 62: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Sample space (S)

• Collection of all possible outcomes in an experiment.– The MPG’s of all cars from that particular

model car.

• Event (A)– A subset of a sample space

– The MPG of the new model car exceeds, say, 25 miles.

Page 63: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SET THEORY

AB AB

Page 64: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

SET THEORY (cont’d)

A'

Page 65: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROBABILITY

• is a function defined on the set of all possible events.

• is a number between 0 and 1.

• satisfies a set of axioms:– P(A) 0.– P(S) = 1.

.)(,11

i

ii

iji AAPthenAAIf

Page 66: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PROPERTIES

• 0 P(A) 1.

• P(A') = 1 - P(A)

• P(AB) = P(A) + P(B) - P(AB).

Page 67: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Classical definition

• While axiomatic definiton of probability is very useful in developing the theory of probability, it doesn’t tell us how to compute probabilities of events.

• Classical Definition: If S has a finite number of sample points and are equally likely to occur, then P(A) = number of points in A / number in S.

• If S doesn’t contain equally likely outcomes, then P(A) = sum of the weights associated with points in A.

Page 68: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Classical definition(cont’d)

• To use this definition, we need to calculate the number of points in S and in A.

• How do we do this without actually listing all possible outcomes?

• Using Counting Techniques.– Principle of addition and multiplication

– Permutations and combinations

Page 69: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

Principle of Addition and Multiplication

• If the task is done if any one of the subtasks is done, then the total number of ways of doing the main task is n1 + n2 + ... + nk .

• If the task is done if and only if all the subtasks are done, then the total number of ways of doing the main task is the product n1 n2 ... nk.

Page 70: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

PERMUTATION

• Suppose that r objects are to be drawn without replacement from n (r n).

• If the order of selection is important, then using the principle of multiplication we see that the total number of ways of doing this is n(n-1)...(n-r+1).

• This could be written in a compact form using the factorials as n!/(n-r)! or Pr

n.

Page 71: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

COMBINATION

• Suppose that r objects are to be drawn without replacement from n (r n).

• When the order of selection is not important, any particular set of r objects can be ordered in Pr

r = r! ways, the total number of ways of selecting r out of n in which order is immaterial is Pr

n /r!. It is convenient to denote this by Cr

n or by

r

n

Page 72: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

EXAMPLES

Page 73: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

CONDITIONAL PROBABILITY

• What we saw so far is referred to as unconditional probability. That is, the probabilities of events of interest were computed only based on the sample space and with no prior information.

• Sometimes it is convenient to compute certain unconditional probabilities by first conditioning on some event.

Page 74: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

CONDITIONAL PROBABILITY

• Also, this plays an important role in stochastic modeling.

• In a finite buffer queuing model, computation of waiting time of an admitted customer involves conditional probability.

• DEFINITION: P(B/A) = P(AB) / P(A)

• Events A and B are independent if and only if P(AB) = P(A)P(B).

Page 75: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

RANDOM VARIABLES

• Often in probability and statistics, the quantities that are of interest are not the outcomes but rather the values associated with the outcome of the experiment.

• If n items are selected from a production lot the quality inspector is interested in the number of defectives out of the n chosen and the corresponding probabilities.

Page 76: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

RANDOM VARIABLES (cont’d)

• A random variable, X, is a real-valued function defined on the sample space S into the set of real numbers.

• Random variables can be – Discrete – taking only discrete values

• Number of defective molds

– Continuous – taking continuous values• Time taken to assemble a product• Mixture of discrete and continuous• Waiting time of a customer

Page 77: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

INDEPENDENCE

Page 78: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

EXAMPLES

Page 79: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.

STUDY OF RANDOM VARIABLES

• Probability functions

• Probability mass function (discrete)

• Probability density function (continuous)

• Cumulative probability distribution function

Page 80: MATH408: Probability & Statistics Summer 1999 WEEK 2 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI Engineering.