Statistics 3220

download Statistics 3220

of 14

Transcript of Statistics 3220

  • 8/12/2019 Statistics 3220

    1/14

  • 8/12/2019 Statistics 3220

    2/14

    2

    where (average) represents the most probable estimate of based on the available data and

    the confidence interval or uncertainty in that estimate at some probability level, P%. The confi-

    dence interval (or uncertainty) is based both on estimates of the precision error and on the bias

    error in the measurement of x.

    Statistical Measurement Theory

    A sample (N)of data refers to a set of data points (sampling) obtained during repeatedmeasurements of a variable x under fixed operating conditions. The measured variable is also known asthe measurand. Fixed operating conditions imply that the external constraints that control the processfrom which the measured value is obtained, are held at fixed values while obtaining the sample. In actualengineering practice, the ability to control the constraints at truly fixed conditions may be impossible and

    the term fixed operating conditions should be considered in a nominal sense. That is, the processconditions are maintained as closely as possible.

    Definitions:

    Mode:

    The mode is the value which occurs with the highest frequency.

    Example: [1]

    The 20 readings of a variable were collected as 26, 25, 28, 23,25, 24, 24, 21, 23, 26, 28, 26, 24, 23,24, 32, 25, 27, 24, and 22. Find the mode.

    Solution:

    Among these numbers, 21, 22, 27, and 32 each occurs once. The number 28 occurs twice: 23, 25, and26 each occurs three times: and 24 occurs five times. Thus, 24 is the modal reading.

    Note: If there are no multiple data points, the mode does not exist.

    Median:

    The median of N values requires that we arrange the data according to size (in either ascending or

    descending order). Then, when N is odd, the median is the value of the item that is in the middle. When Nis even, the median is the mean of the two items that are nearest to the middle.

    Example: [2]

    In a recent month, a state Game and Fish Department reported 53, 31, 67, 53, and 36 hunting orfishing violations for five different regions. Find the median number of violations for these regions.

    Solution :

    The median is not 67, the third (or middle) item, because the figures must first be arranged accordingto size. Thus, we get 31 36 53 53 67 and it can be seen that the median is 53.

    Average:

    The average of N numbers is the sum of all their values divided by N.

    [2]

    x x' ux

    x1

    N---- xi

    i 1=

    N

    x1 x2 xN+ + +

    N------------------------------------------= =

  • 8/12/2019 Statistics 3220

    3/14

    3

    Example: [3]

    On a certain day, nine students received 1, 4, 2, 0, 1, 5, 2, 1, and 3 pieces of mail. Find the average.

    Solution :

    The total number of pieces of mail which the nine students received is 1 + 4 + 2 + 0 + 1 + 5 + 2 + 1 + 3

    = 19. Since = 2.25, the average number of pieces of mail per student is 2.25.

    Mean:

    The mean value, , is that which would be obtained if every xin the population could be averaged

    together. In other words, the average value could be used to predict the mean values.

    Note: In certain cases, median could be used to predict the mean value.

    Standard Deviation:

    Standard Deviation could be described as the dispersion of a data set. If the data set are closelybunched about their mean, the standard deviation obtained is small. If the data set are scattered widely

    about their mean, the standard deviation obtained is large.

    If a set of numbers , constituting a population, has the average (mean) , thedifferences

    [3]

    are called the deviations from the average (mean). The standard deviation for such a discrete data

    is given by as:

    [4]

    where N-1 term is called the degree of freedom or of that sample. is called the sample

    variance.

    Example: [4]

    On six consecutive Sundays, a tow-truck operator received 9, 7, 11, 10, 13, and 7 service calls.

    Calculate standard deviation, .

    Solution :

    First calculating the average, we get

    [5]

    and the work required to find may be arranged as in Table [1]. Dividing by 6-1 =5 and

    taking the square root, we get

    [6]

    Note in the table above that the total for the middle column is zero; since this must always be the case,it provides a check on the calculations.

    19

    9------

    x1 x2 x3 and xN, , , , x

    x1 x x2 x x3 x and xN x, , , ,

    Sx

    Sx x x( )N 1

    --------------------2

    =

    Sx2

    Sx

    x9 7 11 10 13 7+ + + + +

    6-----------------------------------------------------------

    57

    6------ 9.5= = =

    x x( )2

    Sx27.50

    5------------- 5.5 2.3= =

  • 8/12/2019 Statistics 3220

    4/14

  • 8/12/2019 Statistics 3220

    5/14

    5

    Here, is the estimation of the interval of value at P%., is sample mean of finite number of

    sampling and is obtained from a weighting function used for finite data sets. This value for

    the t estimator is a function of the probability, P, and the degrees of freedom, , in the standard

    deviation.

    Students t-Distribution:

    The definition of t estimator is beyond the objective of these discussions. In short, this distribution is

    used in predicting the mean value of a Gaussian (or normal probability distribution) population when only asmall sample of data is available. t values can be obtained from Table [2] below which is a tabulation ofthe Students t-distribution as developed by William S. Gosset.

    Standard Deviation of the Means

    We must now recognize that the sample mean value itself has some degree of inherent uncertainty.

    The amount of variation possible in the sample means would depend on two values: the sample variance,, and sample size, N. such that the discrepancy tends to increase with variance and decrease with

    . The variance of the distribution of mean values that could be expected can be estimatedfrom a single finite data set through thestandard deviation of the means, .

    [10]

    The standard deviation of the means represents a measure of the precision in a sample mean. Therange over which the possible values of the true mean value might lie at some probability level, P, based on

    the information from a sample data set is given as,

    (P%) [11]

    where represents a precision interval, at the assigned probability, P%, within which oneshould expect the true value of x to fall. As such, the precision interval is a quantified measure of

    the precision error in the estimate of the true value of variable x.This estimate of the true mean value based on a finite data set is now stated as

    [12]

    Example: [5]

    Statistics, Value Interval and True Mean Value. Consider the sample of variable x in Table [3]:

    a) Compute the sample statistics for this data set.

    b) Estimate the interval of value over which 95% of the measurements of the measurand should be

    expected to lie (or calculate the precision interval of each measurements).c) Estimate the true mean value of the measurand at 95% probability based on this finite data set (or

    calculate the precision interval of the mean of the measurements)

    Known: N=20,

    Find:

    xi x

    tv p,

    Sx2

    N1 2

    Sx

    SxSx

    N1 2

    ------------=

    x t P, Sx

    t P, Sx

    x x t P, Sx=

    xix x tSx and x t S x,

  • 8/12/2019 Statistics 3220

    6/14

    6

    Solution :

    a) The sample mean value is computed for the N=20 values by the relation

    [13]

    Table [2] Students t-Distribution

    1 1.000 6.314 12.706 63.657

    2 0.816 2.920 4.303 9.925

    3 0.765 2.353 3.182 5.481

    4 0.741 2.132 2.770 4.604

    5 0.727 2.015 2.571 4.032

    6 0.718 1.943 2.447 3.707

    7 0.711 1.895 2.365 3.449

    8 0.706 1.860 2.306 3.355

    9 0.703 1.833 2.262 3.250

    10 0.700 1.812 2.228 3.169

    11 0.697 1.796 2.201 3.106

    12 0.695 1.782 2.179 3.055

    13 0.694 1.771 2.160 3.012

    14 0.692 1.761 2.145 2.977

    15 0.691 1.753 2.131 2.947

    16 0.690 1.746 1.120 2.921

    17 0.689 1.740 2.110 2.898

    18 0.688 1.734 2.101 2.87819 0.688 1.729 2.093 2.861

    20 0.687 1.725 2.086 2.845

    21 0.686 1.721 2.080 2.831

    30 0.683 1.697 2.042 2.750

    40 0.681 1.684 2.021 2.704

    50 0.680 1.679 2.010 2.679

    60 0.679 1.671 2.000 2.660

    0.674 1.645 1.960 2.576

    t50 t90 t95 t99

    x1

    20------ xi

    i 1=

    20

    1.02= =

  • 8/12/2019 Statistics 3220

    7/14

    7

    This, in turn, is used to compute the sample standard deviation

    [14]

    The degrees of freedom in the standard deviation are .

    b) From Table [2] at 95% probability, is 2.093. Then, the interval of valuesin which 95% ofthe measurements of x should lie is given by equation (1.4):

    (95%) [15]

    Accordingly, if a 21st data point were to be taken, there is a 95% probability that its value would lie

    between 0.69 and 1.35.

    c) The true mean value is estimated by the sample mean value. However, the precision interval forthis estimate is , where

    [16]

    Then from Equation [12]

    [17]

    Accordingly true mean of the 20 data points should lie between 1.1 and 0.94 with 95% probability.

    Note: The difference between part (b) and part (c) is that, part (b) is the estimation of each sampleinterval meanwhile part (c) is the estimation of the mean of the measurements (or the true mean value

    estimations).Number of Measurements Required to Achieve a Given Precision.

    Table [3] Sample of Variable x

    i xi i xi i xi i xi

    1 0.98 6 0.68 11 1.02 16 1.112 1.07 7 1.34 12 1.26 17 0.99

    3 0.86 8 1.04 13 1.08 18 0.78

    4 1.16 9 1.21 14 1.02 19 1.06

    5 0.96 10 0.86 15 0.94 20 0.96

    Sx1

    19------ xi 1.02( )

    2

    i 1=

    20

    0.16= =

    N 1 19= =t19 95,

    xi x 2.093 0.16( ) 1.02 0.33= =

    t19 95, Sx

    SxSx

    N1 2------------ 0.16

    20( )1 2------------------ 0.04= = =

    x x t19 95, Sx 1.02 0.08= =

    http://-/?-http://-/?-
  • 8/12/2019 Statistics 3220

    8/14

    8

    Number of Measurements Required to Achieve a Given Precision

    Statistics can be used to assist in the design and planning of a test program. For example, how manymeasurements, N, are required to estimate the true mean value, , with acceptable precision? To answer

    this question, begin with Equation [12], which expresses the true value based on a sample mean and itsprecision interval:

    [18]

    where

    [19]

    Therefore, we could rearrange Equation [18] to read

    [20]

    We can express the precision interval in Equation [20]as Confidence Intervalor CI, that is,

    (P%) [21]

    To evaluate CI, we must assign a value to . should be a conservative estimate based onprevious test data, prior experience, or manufacturers information.

    The Precision interval is two sided about the mean, defining a range from to

    . We introduce the one-sided precision value d as

    [22]

    Then, it follows that the required number of measurements is estimated by

    (P%) [23]

    The use of the inequality serves as a remainder that this expression is based on an assumed value for. The accuracy of Equation [23]will depend on how well the assumed value for approximates the

    standard deviation.

    The obvious deficiency in the above method is that an estimate for the sample variance is needed.One way around this is to make a preliminary small number of measurements, , to obtain an estimate

    of the sample variance, , to be expected. Then is used to estimate the number of measurements

    required. The total number of measurements, , will be estimated by

    (P%) [24]

    This is an iterative process as will be demonstrated next. This establishes that additionalmeasurements will be required.

    x'

    x x t P, Sx=

    SxSx

    N1 2

    ------------=

    x x t P,

    Sx

    N1 2------------=

    CI t P,Sx

    N1 2

    ------------=

    Sx Sx

    t P, Sx N1 2

    t P, Sx+ N1 2

    d CI

    2------

    t P, Sx

    N1 2---------------= =

    Nt P, Sx

    d---------------

    2

    Sx Sx

    N1S1 S1

    NT

    NT

    tN1 1 P, S1

    d------------------------

    2

    NT N1

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/12/2019 Statistics 3220

    9/14

    9

    Example: [6]

    Consider Example [5]. Determine the number of measurements required to reduce the precision

    interval of the mean value of a variable to within 5%. Assume P=95%.

    Known: CI = 5% = 0.05, P = 95%

    d = [25]

    where from the example .

    Solution :

    Because Equation [23] has two unknowns, begin this problem by guessing at some value for N. Then,

    using this guess value, compute the t variable at the probability level desired. An updated value for N canthen be found from the formulation

    (95%) [26]

    Then use trial and error iteration to converge on a value for N. We begin with N=20. Then

    This yields samples.

    So, now guess N=180. Then

    This yields samples

    Then, guess again at N=158 where

    This yields samples.

    We have converged on N = 158. Thus, at least 158 measurements must be made (138 more than 20

    already available) to achieve the desired precision interval in the measured variable. An analysis of theresults after 158 measurements should also be made to ensure that the variance level used wasrepresentative of the actual data set.

    Error Sources

    As a guide to looking for measurement errors, it is possible to consider the measurement process as

    consisting of three distinct steps: calibration, data acquisition, and data reduction. Errors that enter duringeach of these steps will be grouped under their respective error source heading:

    Calibration errors Data acquisition errors Data reduction errors

    Within each of these three error source groups, an objective should be to list the types of errors

    encountered. Such errors are the elemental errors of the measurement.

    CI

    2------

    0.05

    2---------- 0.025= =

    Sx 0.16=

    Nt 95, Sx

    d----------------

    =

    20 1 19= = t19 95, 2.093=N

    2.093 0.160.025

    ------------------------------ 2 179.43 180= =

    180 1 179= = t179 95, 1.96=

    N1.96 0.16

    0.025---------------------------

    2 157.35 158= =

    158 1 157= = t157 95, 1.96=

    N 1.96 0.160.025

    --------------------------- 2

    157.35 158= =

    http://-/?-http://-/?-
  • 8/12/2019 Statistics 3220

    10/14

    10

    In each elemental error under above categories one can also have bias and/or precision error, which is

    a difference between the value indicated by a measurement system and the actual value measured.

    Bias Error:

    The constant offset between the average indicated value and the actual value measured.

    Precision Error:

    Statistical measure of the variation of the measured value during repeated measurements.

    Uncertainty Analysis:

    Uncertainty:

    An estimate of the range of a possible error or errors. An estimate of the probable error in a reportedvalue.

    Uncertainty Analysis:

    A process of identifying the errors in a measurement and quantifying their effects.

    During measurement, we really cannot know if the system indicates the true value. However, from the

    calibration we can estimate the probable error in any subsequent measurement. From that we canspeculate on how closely the measured value should agree with the true value.

    In the previous part, we stated that the best estimate of the true value sought in a measurement is

    provided by its sample mean value and the uncer tainty in that value,

    (P%) [27]

    where is called the uncertainty.

    Note: In comparing Equations [27]and [12], we observe that the terms and , are identical.But, we are going to modify the Precision error with the Bias error and call it . So that, we could performthe uncertainty analysis by taking into account different kind of errors that might cause us in performing anexperimental analysis.

    Uncertainty analysis is the method used to quantify the term, where in the case of a single error,

    [28]

    Note:

    B = Bias Error

    E = Precision Error (also from previous discussions)= Table [2]

    If multiple elemental errors exist as a source of Bias and Precision errors, then

    where n = 1, 2, 3... [29]

    and

    where n = 1, 2, 3,... [30]

    x x ux=

    ux

    t P, Sx uxux

    ux

    ux B( )2

    tv P, E( )2

    +=

    Sxtv p,

    B B12

    B22 Bn

    2+ + +=

    E E12

    E22 En

    2+ + +=

    http://-/?-http://-/?-
  • 8/12/2019 Statistics 3220

    11/14

    11

    Here, n is the number of elemental errors. For instance, in measuring density of a gas if only pressure

    and temperature data is utilized, then n=2.

    Estimation of the degrees of freedom in the precision index E requires some discussion since Es

    composed of elements that usually have different degrees of freedom. In this case, the degrees of freedomin the measurement precision index is estimated using the Welch-Satterhwaite formula:

    where =1, 2, 3,... [31]

    Certain assumptions are implicit in an uncertainty analysis.

    1. The test objectives are known.

    2. The measurement itself is a clearly defined process in which all known calibration corrections forbias error have already been applied.

    3. Data are obtained under fixed operating conditions.4. Some system component experience is available.

    Component experience is defined as an estimate of component bias and precision errors based onsome evidence, such as personal experience through previous or simulated tests and calibrations, orsomeone elses experience, such as the manufacturers performance literature, an NIST bulletin, a

    professional test code, or performance information discussed in the technical literature.

    Example: [7]

    Find the best estimate of the true value sought in a measurement which is provided by its sample

    mean value and the uncertainty in that value. Consider Example [5] again.

    This time the Bias error are known to be of a single source of a value from the manufacturer as

    recorded on the machine used to estimate those values. Determine the best estimate of the byperforming the uncertainty analysis.

    Solution :

    Bias Error, B = 0.05Precision Error, = 0.04 = E

    Sample Data, N = 20Degree of freedom, = N-1 = 19Mean Value, = 1.02

    Number of errors n = 1

    We seek for the statement, (95%) where all the information were stated above. Theuncertainty estimate in this measurement is obtained from the source error statements.

    B = 0.05

    E = 0.04v = 19

    Therefore the t-estimator, could be determined from Table [2] where . The uncertainty

    estimate is found using equation (1.22) as shown below:

    = [32]

    The best estimate is given in the form of Equation [27]as

    This measurement has an uncertainty of about %

    Ei2

    i 1=

    n

    2

    Ei4 i( )

    i 1=

    n

    -----------------------------= i

    0.05xi xi

    Sx

    x

    x x ux=

    t19 95, 2.093

    ux 0.05( )2

    2.093 0.04( )2+= 0.0975

    x' 1.02 0.0975=0.0975

    1.02---------------- 100 9.5=

  • 8/12/2019 Statistics 3220

    12/14

    12

    Regression Analysis

    Objective

    1. To show how to analyze a set of experimental data using a concept called regression analysis.2. Generate a curve (line) to represent all those points with minimum error, that is to say, the devia-

    tion of the experimental data from the polynomial curve is minimal.

    Introduction

    The regression analysis for a single variable of the form y = f(x) provides an mth-order polynomial fit ofthe data in the form

    [33]

    where refers to the value of the dependent variable obtained directly from the polynomialequation for a given value of x.

    For n different values of the independent variable included in the analysis, the highest order, m, of thepolynomial that can be determined is restricted to . The values of the m coefficient

    are determined analytically.

    The most common form for regression analysis for engineering applications is the method of least-squares. The least-squares technique attempts to minimize the sum of the squares of the deviations

    between the actual data and the polynomial fit of a stated order by adjusting the values of the coefficients,as necessary.

    An mth-order polynomial relationship is to be found for a set of N data points of the form (x,y) in which

    x and y are the independent and dependent variables, respectively. Consider the situation in which Nvalues of y exist, , where i=1, 2,..., N, over n values of x. The task is to find the m+1

    coefficients, , of the polynomial of Equation [33]. Define the deviation between anydependent variable and the polynomial as where is the value of the polynomial evaluatedat the data point . The sum of the squares of this deviation for all values of is

    [34]

    The goal is to reduce D to a minimum for a given order of polynomial. Combining Equations [33]and

    [34], one can write

    [35]

    The total differential of D is dependent on the m+1 coefficients through

    [36]

    To minimize the sum of squares of the deviations, one wants dD to be zero. This is accomplished bysetting each of the partial derivatives equal to zero:

    yc f x( ) C0 C1x C2x2 Cmx

    m+ + + += =

    yc

    m n 1C0 C1 Cm, , ,

    yi

    C0 C1 Cm, , ,yi yi yci ycixi yi,( ) yi

    D yi yci( )2

    i 1=

    N

    =

    D yi C0 C1x C2x2 Cmx

    m+ + + +( )[ ]

    2

    i 1=

    N

    =

    dD D

    C0---------dC0

    DC1---------dC1

    DCm----------dCm+ + +=

  • 8/12/2019 Statistics 3220

    13/14

    13

    [37]

    [38]

    [39]

    This yields m+1 equations which are solved simultaneously to yield the unknown regression

    coefficients, .

    Example: [8]

    Least-Square Regression Analysis. The following data in Table [4] is suspected to follow a linear

    relationship. Find an appropriate equation of the first-order form.

    Known

    Independent variable, x

    dependent measured variable, yN=5

    Assumptions

    Linear relations. Find

    Solution :

    We seek a polynomial of the form , which minimizes the term

    [40]

    [41]

    Table [4] x and y data

    x y

    1.0 1.2

    2.0 1.9

    3.0 3.2

    4.0 4.1

    5.0 5.3

    DC0--------- 0

    C0--------- yi C0 C1x C2x

    2 Cmxm

    + + + +( )[ ]2

    i 1=

    N

    = =

    DC1--------- 0

    C1--------- yi C0 C1x C2x

    2 Cmxm

    + + + +( )[ ]2

    i 1=

    N

    = =

    DCm---------- 0

    Cm---------- yi C0 C1x C2x

    2 Cmxm

    + + + +( )[ ]2

    i 1=

    N

    = =

    C0 C1 Cm, , ,

    yc C0 C1x+=

    yc C0 C1x+=

    D yi yci( )2

    i 1=

    N

    =

    DC0--------- 0 2 yi C0 C1xi+( )[ ]

    i 1=

    N

    = =

  • 8/12/2019 Statistics 3220

    14/14

    14

    [42]

    yielding

    [43]

    [44]

    Solving simultaneously for the coefficients and yields

    [45]

    [46]

    From the data set, one finds = 0.02 and = 1.04. Hence,

    DC1--------- 0 2 yi C0 C1xi+( )[ ]xi

    i 1=

    N

    = =

    yi C0 C1xi+( )[ ]i 1=

    N

    0=

    yi C0 C1xi+( )[ ]xii 1=

    N

    0=

    C0 C1

    C0 xi xiyi( ) xi2

    yixi( )

    2Nxi

    2

    --------------------------------------------------=

    C1xi xiyi( ) Nxiyi

    xi( )2

    Nxi2

    -----------------------------------------------------=

    C0 C1 yc 0.02 1.04x+=