Lean Six Sigma New GB v 5.0 Analyze

download Lean Six Sigma New GB v 5.0 Analyze

of 59

description

six sigma

Transcript of Lean Six Sigma New GB v 5.0 Analyze

  • TCS Internal

    Lean Six Sigma Green Belt Training ANALYSE PHASE

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 2

    DMAIC Roadmap

    Define

    Identify Project CTQs

    Develop Project Charter

    Prepare High Level Process map, SIPOC

    Measure

    Establish Performance

    Standard

    Assess Measurement

    System Variation

    Estimate Current

    Capability

    Identify Potential

    Causes

    Sampling & Data Collection

    Analyze

    Identify variation using Graphical

    analysis

    Prioritize & Validate causes

    Improve

    Define y= f (x)

    Identify Solutions

    Prioritize And

    Implement Solutions

    Control

    Optimize & refine solutions

    Control X's &

    Monitor Y's

    Measure actual benefits

    Close &

    Hand-over project

    Understand As-Is

    process

    Complete Stakeholder

    analysis

    Measure improvements

    Define Measure Analyze Improve Control

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 3

    Why Analyze ?

    To understand the problem and identify root causes

    To avoid actions based on intuition, preconceived ideas & symptoms

    To develop sustainable process improvements for long term benefits

    Recalibrate project scope Establish performance goals for the

    process

    Find the Xs that affect Y most

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 4

    Identify The Vital Few

    Process Measures ( Xs)

    Process

    Input Measures

    (Xs)

    Outputs (Ys)

    Variation in Output Y depends on process as well as input variables (Xs)

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 5

    The Funnel Effect

    Y = f(x1,x2,x3,x4,,xn)

    Root-cause identification is the task of elimination

    30+ variables

    15-20 variables

    10-15 variables

    5-10 variables

    3-5 variables

    A N A L Y S E

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 6

    Identify Variation using Graphical Tools

    Validate Causes

    Analyze Phase FLOW :

    Box Plot

    Hypothesis testing

    Scatter Plot

    Pareto Analysis

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 7

    Source: Donald Wheeler: Understanding Variation

    Why Graphical Analysis

    Graphs help us understand the nature of variation Graphs make nature of data more accessible to the human mind

    Graphs help display the context of the data Graphs should be the primary presentation tool in data analysis

    If you cant show it graphically, you probably dont have a good conclusion

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 8

    Box Plot

    Purpose: To begin an

    understanding of the distribution of the data

    To get a quick, graphical comparison of two or more processes

    When:

    First stages of data analysis

    * Outlier any point outside the lower or upper limit

    Maximum Observation that falls within the upper limit = Q3 + 1.5 (Q3 - Q1)

    75th Percentile (Q3) Median (50th Percentile)

    25th Percentile (Q1)

    Minimum Observation that falls within the lower limit = Q1 - 1.5 (Q3 - Q1)

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 9

    Box Plot

    Things to look for in a Box Plot: Are the boxes about equal or different? Do the groups appear normal (symmetrical box

    halves and whiskers) or skewed? Are there outliers?

    Op1 Cycl Op2 Cycl

    0

    10

    20

    Boxplots of Op1 Cycl and Op2 Cycl(means are indicated by solid circles)

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 10

    Box Plot Example

    Minitab Command: Graph > Box plot Graph > Histogram

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 11

    Box Plot - Example

    TAT- Agent 2TAT- Agent 1

    70

    60

    50

    40

    30

    20

    Da

    ta

    Boxplot of TAT- Agent 1, TAT- Agent 2

    7060504030

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    T A T - A gent 1

    Fre

    qu

    en

    cy

    6050403020

    10

    8

    6

    4

    2

    0

    T A T - A gent 2

    Fre

    qu

    en

    cy

    Histogram of TAT- Agent 1 Histogram of TAT- Agent 2

    Can you now interpret Box Plots?

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 12

    Scatter Plot

    Statistical significance of that strength is denoted by,

    Coefficient of Correlation r

    Scatter Plot tool can be used when

    Both X and Y are in continuous format

    If we want to associate Y with a single X

    To judge the strength of relationship between Y and X

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 13

    Correlation

    r is always between 1 & +1.

    Positive value of r means direction of movement in both variables is same

    Negative value of r means direction of movement in both variables is inverse

    Zero value of r means no correlation between the two variables

    Higher the absolute value of r, stronger the correlation between Y & X

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 14

    Correlation measures the linear association between the

    Output (Y) and one input variable (X) only

    y-e

    ffect

    x-cause

    Positive Correlation

    n=30 r=0.9

    y-e

    ffect

    x-cause

    Negative Correlation

    n=30 r=-0.9

    Positive Correlation May Be Present

    n=30 r=0.6

    Negative Correlation May Be Present

    n=30 r=-0.6

    No Linear Correlation

    n=30 r=0.0

    No Correlation

    n=30 r=0.0

    Types of Correlations

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 15

    Scatter Plot & Correlation - Example

    Minitab Command: Stat > Basic Statistics > Correlation Variables: On-boarding Test score & Floor Performance Score

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 16

    Scatter Plot & Correlation - Example Minitab Command: Graph>Scatter plot Y variables: Floor performance Score, X variables: On-boarding Test scores

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 17

    Scatter Plot & Correlation - Example

    Minitab output: Correlations: On-boarding Test Score, Floor Performance Score Pearson correlation ( r ) of On-boarding Test Score and Floor Performance Score = 0.786

    757065605550

    100

    95

    90

    85

    80

    75

    70

    On-boarding Test Score

    Flo

    or

    Pe

    rfo

    rma

    nce

    Sco

    re

    Scatterplot of Floor Performance Score vs On-boarding Test Score

    r value is indicating reasonably strong Positive Correlation.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 18

    Scatter Plot Vs Correlation Analysis

    Scatter Plot Suggests relationship between two variables but does not quantifies

    Correlation Analysis Quantifies strength or degree of relationship in terms of Correlation of Coefficient r

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 19

    Pareto

    What is it ?

    The Pareto Principle states that only a "vital few" factors are responsible for producing most of the problems. This principle can be applied to quality improvement to the extent that a great majority of problems (80%) are produced by a few key causes (20%). If we correct these few key causes, we will have a greater probability of success.

    Why use it ?

    For the team to quickly focus its efforts on the key causes of a problem.

    When to use it ?

    Data is Discrete, i.e., Classified into types

    with frequencies for each type.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 20

    Pareto - Example

    Minitab Command: Stat > Quality Tools> Pareto Chart Chart defects table: Query Type for Labels in & Total received for Frequencies in

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 21

    Pareto - Example

    Which factors do you consider as vital from the above Pareto chart?

    Total 247 234 8644116 3431 27092685 1749 800 506 269Percent 1.4 1.3 4.923.4 19.5 15.4 15.2 9.9 4.5 2.9 1.5Cum % 93.8 95.1100.023.4 42.9 58.2 73.5 83.4 88.0 90.8 92.4

    Sub typeOt

    her

    ANON

    YMOU

    S

    FAST

    CAR

    D &

    PIN

    PAYM

    ENT

    QUER

    Y

    CLAR

    IFIC

    ATIO

    N MA

    IL

    CARD

    S RE

    LATE

    D

    FRAU

    D RE

    LATE

    D

    CARD

    CAN

    CELLAT

    ION

    CUST

    OMER

    INFO

    RMAT

    ION

    DISP

    UTE

    ADDR

    ESS CH

    ANGE

    20000

    15000

    10000

    5000

    0

    100

    80

    60

    40

    20

    0

    Tota

    l

    Pe

    rce

    nt

    Pareto Chart of No. of Queries rec'd

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 22

    Identify Variation using

    GraphicalTools

    Validate Causes

    Analyze Phase FLOW :

    Box Plot

    Hypothesis testing

    Scatter Plot

    Pareto Analysis

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 23

    Measurements are organized into statistics to provide insight into spread, shape, consistency and location of the process

    A hypothesis test is simply comparing reality to an assumption and asking Did things change ?

    A hypothesis test is testing whether real data fits the model A hypothesis test is comparing statistic to a hypothesis

    What is Hypothesis Testing

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 24

    = mean of the population = standard deviation of the population x = mean of the sample s = standard deviation of the sample

    Parameters: ,

    Sampling From a Population

    Entire Population

    of Data Sample

    Statistical Inference

    Statistics: x, s, etc.

    Analysis

    Sampling saves costs and time. Sampling provides a good alternative to collecting all the data. Identifying a specific confidence level allows us to make reasonable

    business decisions.

    Statistical Inference Relies On Sampling From A Population Of Data

    Core of Hypothesis Testing

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 25

    Common terms in Hypothesis Testing

    The Null Hypothesis (H0) There is no evidence of difference. It is assumed to be true unless proven otherwise. You never prove it, you only fail to reject it.

    The Alternative Hypothesis (Ha) The statement that we would like to show is true. It usually defines the direction of desirable change. The alternative hypothesis

    can be : >,

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 26

    Hypothesis Testing P Value

    P < : Reject Ho P > : Accept Ho

    Alpha is the maximum acceptable probability of making type I error. (In other words, USL for type I error).

    The p-value is the probability that you will be wrong if you select the alternative hypothesis. This is a Type I error.

    For most decisions, acceptance level of a Type I error is set at = 0.05.

    Thus, any p-value less than 0.05 means we reject the null hypothesis.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 27

    Hypothesis Testing Road Map

    Determining statistical differences within and between

    populations

    Discrete data

    Continuous data

    Test of equal variances

    2-sample t-test

    ANOVA

    Hypothesis testing

    two samples

    one sample

    multiple samples

    two sample

    Comparing Means

    Chi-square

    test 1- sample t-test

    Comparing Variances

    Comparing Proportions

    2- Proportio

    n test

    1-Proportio

    n test

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 28

    Process Scenarios for Hypothesis tests

    Tool Process Scenarios

    1 Sample t-test To compare a teams performance against target Data set containing performance scores like Daily/ Weekly scores Sample size can be less than 30 as well but higher is better.

    2 Sample t-test To compare one teams performance against other or To compare performance of a team before and after improvement.

    Data set containing performance scores like Daily/ Weekly scores Sample size can be less than 30 as well but higher is better.

    ANOVA To compare performance of multiple teams on a metric like Quality score. Data set containing performance scores like Daily/ weekly scores of multiple teams.

    Test of equal variances

    To compare variance or Std deviation of one teams performance with another. Data set containing performance scores like Daily/ Weekly scores

    1-Proportion test

    To compare proportion defects/ defectives of a team against a target

    2-Proportion test

    To compare proportion defects/ defectives of a team against another team.

    Chi-square test To check association between variables like whether there is any association between two teams w.r.t. their Error types.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 29

    Hypothesis Testing t-Test Procedure

    t Test is mainly used to calculate differences in means. Theoretically t test can be used for even small sample sizes (as small as 10) when data is normally distributed.

    Null hypothesis is averages of two groups are same.

    Ho : 1 = 2 Ha : 1 >< 2

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 30

    One Sample T-test

    One sample T- test is used to compare the performance of a process with the set standard/ historical data/ target.

    e.g. The historical average CSI of a process is 4.35. Process Manager is interested in understanding the present CSI based on the data collected in last 15 days.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 31

    One-sample T-test

    Example: Organization ABC is measuring the no. of days to get money from XYZ

    after invoices are sent. Historical data suggests that earlier payments were received within 25 days, however some improvement actions were implemented. Process wanted to check whether improvement plans have any impact on the performance.

    The sample data was collected. The time taken for receiving the payments are : 22, 23, 22, 25, 28, 27, 28, 25, 23, 21 days.

    Establish whether we get money in 25 days with 95 % Confidence.

    Instructions Stat > Basic stat > 1 sample t

    Enter data as: Variable C1 Days Test Mean: 25 , Alternative Not Equal

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 32

    One-sample T-test using Minitab

    Minitab Output

    T-Test of the Mean

    Test of mu = 25.000 vs mu not = 25.000

    Variable N Mean StDev SE Mean T P

    Days 10 24.400 2.591 0.819 -0.73 0.48

    Interpretation: Since p > 0.05, the improvement plan did not make any difference in the process performance.

    Stat > Basic Stat > 1 Sample t.

    Since P is >0.05,

    _________

    Null Hypothesis

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 33

    2 Sample T test is used for comparing the averages of 2 sets of readings

    Test is used when the dependent variable (response or Y) is continuous and the independent variable (factor or X) is discrete.

    Test can be performed on data from independent samples stacked in a single column with a second discrete variable in another column.

    Variances may be equal or unequal.

    The null hypothesis is that the sample means are not different.

    H0: m1 = m2

    Ha: m1 > < m2

    Hypothesis Testing 2 Sample T Test

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 34

    Hypothesis Testing 2 Sample T Test

    Example :

    The time required for installing a software by new and experienced engineers is given below. Establish whether experienced engineers are better.

    Experienced 15.80,14.19, 15.32, 14.65, 12.25, 15.42, 12.92, 13.98, 16.28,14.53

    New 16.10, 17.24, 17.65, 16.8, 18.42, 18.12, 15.24, 16.14, 15.26, 14.65

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 35

    Hypothesis Testing 2 Sample T Test

    Stat > Basic Statistics > 2 Sample t..

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 36

    Hypothesis Testing 2 Sample T Test Two-Sample T-Test and CI: Experienced,New Two-sample T for Experienced vs New N Mean StDev SE Mean Experienced 10 14.53 1.26 0.40 New 10 16.56 1.29 0.41 Difference = mu (Experienced) - mu (New) Estimate for difference: -2.028 95% CI for difference: (-3.234, -0.822) T-Test of difference = 0 (vs not =): T-Value = -3.55 P-Value = 0.002 DF = 17 Two-Sample T-Test and CI: Experienced, New Two-sample T for Experienced vs New N Mean StDev SE Mean Experienced 10 14.53 1.26 0.40 New 10 16.56 1.29 0.41 Difference = mu (Experienced) - mu (New) Estimate for difference: -2.028 95% upper bound for difference: -1.034 T-Test of difference = 0 (vs

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 37

    Analysis Of Variance (ANOVA)

    One-way ANOVA is used to compare several sample means for two or more levels of a single factor (groups of data). In this sense, it is an extension of a two-sample t-test.

    Comparing all groups at once with ANOVA is preferable to comparing two groups at a time with the two-sample t-test (pooled variance).

    Hypothesis: H0: m1 = m2 = m3 = versus Ha: there is at least one difference

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 38

    ANOVA Assumption

    The purpose of one-way ANOVA is to compare means. The means of different groups of data can only be compared if the variances within each group are statistically the same.

    ANOVA has two assumptions:

    Data for each group should be normal The data sets have equal variances. H0: s12 = s22 = s32 = versus

    Ha: there is at least one difference

    Test of ANOVA is robust enough to give good result even if the assumptions are not met.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 39

    ANOVA: Example

    A contact centre used to receive call for different processes within organization. The Contact Centre head wanted to understand whether the response time is affected by different processes.

    Response time data was collected for the 3 processes for doing ANOVA analysis.

    6.5 7 6

    6 6 5.4

    6 6.5 4.4

    7 5.5 5.5

    6.5 6 4.5

    7 6.5 5

    6 7 6

    7.5 6 7

    6 5.5 6 5.5 5 4.5

    5.5 4 3.5

    6 6.5 3.5

    7 6 4

    Process C Process B Process A

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 40

    7.06.56.05.55.04.54.0

    Median

    Mean

    6.506.256.005.755.50

    1st Q uartile 5.5000Median 6.00003rd Q uartile 6.5000Maximum 7.0000

    5.4611 6.4620

    5.5000 6.5000

    0.5939 1.3671

    A -Squared 0.46P-V alue 0.212

    Mean 5.9615StDev 0.8282V ariance 0.6859Skew ness -1.02716Kurtosis 1.44419N 13

    M inimum 4.0000

    A nderson-Darling Normality Test

    95% C onfidence Interv al for Mean

    95% C onfidence Interv al for Median

    95% C onfidence Interv al for S tDev9 5 % Confidence Intervals

    Summary for Process B

    7.57.06.56.05.5

    Median

    Mean

    7.06.86.66.46.26.0

    1st Q uartile 6.0000M edian 6.00003rd Q uartile 7.0000M aximum 7.5000

    5.9683 6.7240

    6.0000 7.0000

    0.4484 1.0322

    A -Squared 0.59P -V alue 0.101

    M ean 6.3462S tDev 0.6253V ariance 0.3910S kew ness 0.387879Kurtosis -0.844201N 13

    M inimum 5.5000

    A nderson-Darling Normality Test

    95% C onfidence Interv al for M ean

    95% C onfidence Interv al for M edian

    95% C onfidence Interv al for S tDev9 5 % C onfidence Inter vals

    Summary for Process C

    7.06.56.05.55.04.54.0

    Median

    Mean

    6.506.256.005.755.50

    1st Q uartile 5.5000Median 6.00003rd Q uartile 6.5000Maximum 7.0000

    5.4611 6.4620

    5.5000 6.5000

    0.5939 1.3671

    A -Squared 0.46P-V alue 0.212

    Mean 5.9615StDev 0.8282V ariance 0.6859Skewness -1.02716Kurtosis 1.44419N 13

    Minimum 4.0000

    A nderson-Darling Normality Test

    95% C onfidence Interv al for Mean

    95% C onfidence Interv al for Median

    95% C onfidence Interv al for S tDev95 % Confidence Intervals

    Summary for Process B

    All three process response time data pass the normality test.

    Even if the data is not normal, one can go ahead with test of equal variances.

    ANOVA: Assumptions Testing

    Stat > Basic Statistics > Graphical Summary

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 41

    ANOVA: Assumptions Testing

    Assumption Testing:

    Variances testing requires stacked data.

    Stat > ANOVA > Test for Equal Variances

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 42

    ANOVA: Example

    Test for Equal Variances: Stacked versus Process 95% Bonferroni confidence intervals for standard deviations Process N Lower StDev Upper Process A 13 0.717000 1.07094 2.00301 Process B 13 0.554475 0.82819 1.54898 Process C 13 0.418654 0.62532 1.16955 Bartlett's Test (Normal Distribution) Test statistic = 3.25, p-value = 0.197 Levene's Test (Any Continuous Distribution) Test statistic = 1.84, p-value = 0.173

    Process C

    Process B

    Process A

    2.001.751.501.251.000.750.50Pr

    oces

    s95% Bonferroni Confidence Intervals for StDevs

    Test Statistic 3.25P-Value 0.197

    Test Statistic 1.84P-Value 0.173

    Bartlett's Test

    Levene's Test

    Test for Equal Variances for Stacked

    Since p value > 0.05 through Bartletts test, data passes the test of equal variances assumption.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 43

    ANOVA: Example

    Stat > ANOVA > One way (Unstacked)..

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 44

    ANOVA: Example

    One-way ANOVA: Process A, Process B, Process C Source DF SS MS F P Factor 2 12.043 6.022 8.12 0.001 Error 36 26.686 0.741 Total 38 38.729 S = 0.8610 R-Sq = 31.10% R-Sq(adj) = 27.27% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----+---------+---------+---------+----- Process A 13 5.0231 1.0709 (-------*-------) Process B 13 5.9615 0.8282 (-------*-------) Process C 13 6.3462 0.6253 (-------*-------) ----+---------+---------+---------+----- 4.80 5.40 6.00 6.60 Pooled StDev = 0.8610

    Interpretation:

    Since p < 0.05, the difference in the response time is significant and the process can be called a significant factor.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 45

    Proportion Testing

    Proportion Testing is used to understand whether the proportion created by the factor level is significant.

    It can be of 2 types:

    One Proportion Test:

    Ho : PA = P0

    Ha : PA > = < P0

    Ho : PA = PB

    Ha : PA > = < PB

    Two Proportion Test:

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 46

    Proportion Testing Example

    1 Proportion Test:

    A HR Services complaints resolution process is meant for resolving the complaints raised by associates. The data provided in the table suggests the % of complaints resolved by the process within 8 Hrs of timeline. Process manager claims that the process is resolving at least 30% of the complaints on more than 80% of the occasions. Is it possible to use 1 P test for validating the claim of process manager ?

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 47

    1 Proportion Test: Example

    Day Complaints Resolved %1 252 353 304 365 326 337 348 369 28

    10 3011 2912 3213 3114 2815 3516 2517 3518 3019 3620 3221 3322 3423 3624 2825 3026 2927 3228 3129 2830 35

    Data Suggests:

    Total no. of trials: 30

    No. of events of complaints resolved >= 30% : 22

    One Proportion Test:

    Ho : PA = 0.8

    Ha : PA > 0.8

    Issues

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 48

    1 Proportion Test: Example

    Stat > Basic Statistics > 1 Proportion

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 49

    1 Proportion Test: Example

    Minitab Project Report Test and CI for One Proportion Test of p = 0.8 vs p > 0.8 95% Lower Exact Sample X N Sample p Bound P-Value 1 22 30 0.733333 0.570066 0.871

    Interpretation:

    Since p > 0.05 through 1 P test, it is not advisable to say that the team is resolving at least 30% of complaints per day more than 80% of the times. Process managers claim of providing resolution on more than 80% of the occasions is not valid.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 50

    2 Proportion Test: Example

    Data Suggests:

    Team Member A:

    Total no. of invoices resolved: 60

    Total no. of invoices without error: 32

    Team Member B:

    Total no. of invoices resolved: 65

    Total no. of invoices without error: 48

    2 Proportion Tests: In a invoice processing process, the process manager is thinking of

    giving promotion to one of the team members A and B. For this he wants to look at the last 7days of invoices processed by them for getting a feel of better performer. Can you use 2P test for identifying better performer ?

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 51

    2 Proportion Test: Example

    Stat > Basic Statistics > 2 Proportion

    Test and CI for Two Proportions Sample X N Sample p 1 32 60 0.533333 2 48 65 0.738462 Difference = p (1) - p (2) Estimate for difference: -0.205128 95% upper bound for difference: -0.0663404 Test for difference = 0 (vs < 0): Z = -2.43 P-Value = 0.008 Fisher's exact test: P-Value = 0.014

    Interpretation:

    Since p < 0.05 through 2 P test, the performance of Team Member A can be considered significantly less than performance of team member B. Hence process manager can select member B for promotion.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 52

    2 Proportion Test - Exercise

    On auditing two pizza outlets, 7 deliveries were late out of 155 in first one and 22 deliveries were late out of 200 in the second one. Find with 99% of confidence if the two proportions are different.

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 53

    Contingency Table

    Contingency table is used when both output and input variables are attribute in nature. It uses Chi square test for reaching to the conclusion.

    Chi Square Test:

    Ho : Y is independent of X

    Ha : Y is not independent of X

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 54

    During a project for looking into the recruitment possibilities, the

    Personnel Department wanted to understand whether the chances

    of being hired is dependent upon the age of the person. Can the

    linkage between age and chances of being hired be statistically

    validated ?

    Hypothesis:

    Ho : Hiring of a person is independent of his/ her age

    Ha : Hiring of a person is not independent of his/ her age

    Contingency Table : Example

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 55

    Hired Not Hired

    Old 30

    45

    150

    230 Young

    Total

    Total 75 380 455

    275

    180

    Data was collected for all the candidates who were taken through the recruitment process in last one year.

    Old: > 35 Years

    Young:

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 56

    Stat > Table > Chi Square Test

    Each cell must have a count of >=5 for going ahead

    with the test.

    Contingency Table : Example

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 57

    Chi-Square Test Expected counts are printed below observed counts Hired Not Hire Total 1 30 150 180 29.67 150.33 2 45 230 275 45.33 229.67 Total 75 380 455 Chi-Sq = 0.004 + 0.001 + 0.002 + 0.000 = 0.007 DF = 1, P-Value = 0.932

    Interpretation:

    Since p > 0.05 , the hiring of a candidate is not dependent upon his/ her age.

    0 1 2 3 4 5

    Contingency table generate numbers by calculating observed values and expected values. In a chi square distribution, If there is independence, we expect the difference to be close to 0. The further away we are, the more likely the variables are dependent. To help us make that decision, we only need to look at p value.

    Contingency Table: Analysis in Minitab

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 58

    58

    Are ladies more likely to be right handed compared to gentlemen?

    Hypothesis:

    Ho : There is no relationship between gender & dexterity

    Ha : There is a relationship between gender & dexterity

    Chi Square Test : Exercise

  • 22 November 2013 Copyright 2013 Tata Consultancy Services limited 59

    End of Analyse Phase

    Lean Six Sigma Green Belt Training ANALYSE PHASE Slide Number 2Slide Number 3 Identify The Vital FewSlide Number 5 Analyze Phase FLOW : Why Graphical AnalysisBox PlotBox Plot Box Plot Example Box Plot - ExampleScatter Plot CorrelationSlide Number 14Scatter Plot & Correlation - ExampleScatter Plot & Correlation - ExampleScatter Plot & Correlation - ExampleScatter Plot Vs Correlation Analysis ParetoPareto - Example Pareto - Example Analyze Phase FLOW : Slide Number 23Statistical Inference Relies On Sampling From A Population Of Data Common terms in Hypothesis Testing Hypothesis Testing P ValueHypothesis Testing Road MapProcess Scenarios for Hypothesis tests Hypothesis Testing t-Test Procedure One Sample T-test One-sample T-test One-sample T-test using MinitabSlide Number 33 Hypothesis Testing 2 Sample T Test Hypothesis Testing 2 Sample T Test Hypothesis Testing 2 Sample T Test Analysis Of Variance (ANOVA) ANOVA Assumption ANOVA: Example ANOVA: Assumptions Testing ANOVA: Assumptions Testing ANOVA: Example ANOVA: Example ANOVA: ExampleProportion Testing Proportion Testing Example1 Proportion Test: Example1 Proportion Test: Example1 Proportion Test: Example2 Proportion Test: Example2 Proportion Test: Example2 Proportion Test - Exercise Contingency Table Contingency Table : ExampleSlide Number 55Slide Number 56Slide Number 57Slide Number 58End of Analyse Phase