STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 ·...

45
Continued on page 3 STATISTICS DIGEST The Newsletter of the ASQ Statistics Division Chair’s Message ........................... 1 Editor’s Corner .............................. 3 MINI-PAPER Profound Statistical Concepts: When Theory Collides with Reality ....................................... 4 In Remembrance of Connie M. Borror, 1966–2016 .... 17 Design of Experiments .................. 19 Statistical Process Control ............. 20 Statistics for Quality Improvement ............................... 22 Stats 101 .................................... 26 Testing and Evaluation ................. 28 Standards InSide-Out ................... 32 FEATURE Two Case Studies of Reducing Variability .................. 33 Upcoming Conference Calendar ...43 Statistics Division Committee Roster 2016 ............................................44 IN THIS ISSUE Vol. 35, No. 2 June 2016 is time of year, I am often involved in discussions exploring the question “What is Statistics?” As a person who has an advanced degree in Statistics and who has always referred to herself as a Statistician (even though my official company title is Engineer), I find it an interesting question and one where my answer is changing rather rapidly. If you Google the question, a formal definition pops up “the practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.” With new and relatively inexpensive technology that collects and stores data and software and computation power available to analyze massive amounts of data, our role as statisticians is changing and the answer to that question is changing along with it. What comes to mind if you were asked this open-ended question? Do you think of the design of an experiment, collection of the data, and then a formal analysis of data? Do you think of parallel computing with machine learning algorithms being applied to every click a person makes when logged onto a website, where the profile of that person changes with every click? Do you think of applying analysis techniques to data that were collected by someone else and stored in a public domain for anyone to analyze? In my mind, all of these scenarios are something a statistician should be prepared to tackle. My answer to that question seems to change annually, given the rapid changes in technology. You might be wondering why this question surfaces in the spring year after year. It is because of my involvement as a statistics judge in local, regional, and international science fairs. Statistics is not a category by itself, but can be found in any of the science fair categories. When giving awards for the appropriate use and application of statistics, my fellow judges and I look at all the projects at a fair. Every year and at almost every science fair, regardless if it is middle school or high school, there is a discussion on whether a certain project qualifies as one that contains statistics (or is it computer science, or is it mathematics). In the many years that I have been a statistics judge at science fairs, I’ve seen the answer to that question change and evolve as technology and computing power has changed, and the field of statistics has developed new methods that embrace the new technology. I enjoy being a judge, especially the opportunity to speak to these developing scientists. If you are interested, there are probably science fairs in your area. I know that they are always looking for qualified judges. If there is not a special award for statistics, consider starting one in your area. I should let you know that the students are amazing: their interest, knowledge, and the problems they tackle are really quite impressive! Judging is rewarding. However, you have been forewarned, judging these projects will force you to think about the question “What is Statistics?” As for Statistics Division business, WCQI will be over by the time this is published. For those of you who attended WCQI, I hope you visited our booth and attended Message from the Chair by Theresa I. Utlaut eresa I. Utlaut

Transcript of STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 ·...

Page 1: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

Continued on page 3

STATISTICS DIGESTThe Newsletter of the ASQ Statistics Division

Chair’s Message ...........................1

Editor’s Corner ..............................3

Mini-papErprofound Statistical Concepts: When Theory Collides with reality .......................................4

in remembrance of Connie M. Borror, 1966–2016 ....17

Design of Experiments ..................19

Statistical process Control .............20

Statistics for Quality improvement ...............................22

Stats 101 ....................................26

Testing and Evaluation .................28

Standards inSide-Out ...................32

FEaTUrETwo Case Studies of reducing Variability ..................33

Upcoming Conference Calendar ...43

Statistics Division Committee roster 2016 ............................................44

in THiS iSSUE

Vol. 35, No. 2 June 2016

This time of year, I am often involved in discussions exploring the question “What is Statistics?” As a person who has an advanced degree in Statistics and who has always referred to herself as a Statistician (even though my official company title is Engineer), I find it an interesting question and one where my answer is changing rather rapidly. If you Google the question, a formal definition pops up “the practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in

a whole from those in a representative sample.” With new and relatively inexpensive technology that collects and stores data and software and computation power available to analyze massive amounts of data, our role as statisticians is changing and the answer to that question is changing along with it.

What comes to mind if you were asked this open-ended question? Do you think of the design of an experiment, collection of the data, and then a formal analysis of data? Do you think of parallel computing with machine learning algorithms being applied to every click a person makes when logged onto a website, where the profile of that person changes with every click? Do you think of applying analysis techniques to data that were collected by someone else and stored in a public domain for anyone to analyze? In my mind, all of these scenarios are something a statistician should be prepared to tackle. My answer to that question seems to change annually, given the rapid changes in technology.

You might be wondering why this question surfaces in the spring year after year. It is because of my involvement as a statistics judge in local, regional, and international science fairs. Statistics is not a category by itself, but can be found in any of the science fair categories. When giving awards for the appropriate use and application of statistics, my fellow judges and I look at all the projects at a fair. Every year and at almost every science fair, regardless if it is middle school or high school, there is a discussion on whether a certain project qualifies as one that contains statistics (or is it computer science, or is it mathematics). In the many years that I have been a statistics judge at science fairs, I’ve seen the answer to that question change and evolve as technology and computing power has changed, and the field of statistics has developed new methods that embrace the new technology.

I enjoy being a judge, especially the opportunity to speak to these developing scientists. If you are interested, there are probably science fairs in your area. I know that they are always looking for qualified judges. If there is not a special award for statistics, consider starting one in your area. I should let you know that the students are amazing: their interest, knowledge, and the problems they tackle are really quite impressive! Judging is rewarding. However, you have been forewarned, judging these projects will force you to think about the question “What is Statistics?”

As for Statistics Division business, WCQI will be over by the time this is published. For those of you who attended WCQI, I hope you visited our booth and attended

Message from the Chairby Theresa I. Utlaut

Theresa I. Utlaut

Page 2: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

2 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asq.org/statistics

Submission Guidelines

Mini-PaperInteresting topics pertaining to the field of statistics; should be understandable by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

FeatureFocus should be on a statistical concept; can either be of a practical nature or a topic that would be of interest to practitioners who apply statistics. Length: 1,000-3,000 words.

General InformationAuthors should have a conceptual understanding of the topic and should be willing to answer questions relating to the article through the newsletter. Authors do not have to be members of the Statistics Division. Submissions may be made at any time to [email protected].

All articles will be reviewed. The editor reserves discretionary right in determination of which articles are published. Submissions should not be overly controversial. Confirmation of receipt will be provided within one week of receipt of the email. Authors will receive feedback within two months. Acceptance of articles does not imply any agreement that a given article will be published.

VisionThe ASQ Statistics Division promotes innovation and excellence in the application and evolution of statistics to improve quality and performance.

MissionThe ASQ Statistics Division supports members in fulfilling their professional needs and aspirations in the application of statistics and development of techniques to improve quality and performance.

Strategies1. Address core educational needs of members • Assessmemberneeds • Developa“base-levelknowledgeofstatistics”curriculum • Promotestatisticalengineering • Publishfeaturedarticles,specialpublications,andwebinars

2. Build community and increase awareness by using diverse and effective communications

• Webinars • Newsletters • BodyofKnowledge • Website • Blog • SocialMedia(LinkedIn) • Conferencepresentations(FallTechnicalConference,WCQI,etc.) • Shortcourses • Mailings

3. Foster leadership opportunities throughout our membership and recognize leaders

• Advertiseleadershipopportunities/positions • Invitationstoparticipateinupcomingactivities • Studentgrantsandscholarships • Awards(e.g.Youden,Nelson,Hunter,andBisgaard) • Recruit,retainandadvancemembers(e.g.,SeniorandFellowstatus)

4. Establish and Leverage Alliances • ASQSectionsandotherDivisions • Non-ASQ(e.g.ASA) • CQECertification • Standards • Outreach(professionalandsocial)

Updated October 19, 2013

Disclaimer

The technical content of material published in theASQStatisticsDivisionNewslettermaynot have been refereed to the same extent as the rigorous refereeing that is undergone for publication in Technometrics or J.Q.T. The objective of this newsletter is to be a forum for new ideas and to be open to differing points of view. The editor will strive to review all articles and to ask other statistics professionals to provide reviews of all content of this newsletter. We encourage readers with differing points of view to write to the editor and an opportunity to present their views via a letter to the editor. The views expressed in material published in this newsletter represents the views of the author of the material, and may or may not represent the official views of the Statistics Division of ASQ.

Vision, Mission, and Strategies of the ASQ Statistics Division

The Statistics Division was formed in 1979 and today it consists of both statisticians and others who practice statistics as part of their profession. The division has a rich history, with many thought leaders in the field contributing their time to develop materials, serve as members of the leadership council, or both. Would you like to be a part of the Statistics Divisions’ continuing history? Feel free to contact [email protected] for information or to see what volunteer opportunitiesareavailable.Nostatisticalknowledgeisrequired,butapassionforstatisticsisexpected.

Page 3: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asq.org/statistics ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 3

Message from the ChairContinued from page 1

ourreception.Ifyoudidvisit thebooth,youprobablynoticedthenewpublicationtitled“StatisticalRoundtables.”It isacollectionof90StatisticsRoundtablearticlesthatwereoriginallypublishedinQualityProgress.Therearemoredetailsaboutthebook on the ASQ website. The editors, Christine Anderson-Cook and Lu Lu, did an excellent job selecting the articles to include in this publication, which was not an easy task.

Iwanttothankourmembersandthe2015StatisticsDivisionLeadershipteamfortheirworkinachievingtwoPARmedals.TheDivisionwasawardedaGoldPARmedalforPerformanceandaSilverPARmedalforInnovation.ThePerformanceawardisbasedon leader engagement, member value creation, and member retention and growth. The Innovation medal is for the work that was donetotransitiontheStatisticsDigesttoamoretechnicalpublication.AspecialthankyoutoMatthewBarsalou,theeditorofthe Digest, for his tireless effort in putting the Digest together (and my apologies for needing more than one reminder to send youmyarticle);andAdamPintar,2015StatisticsDivisionChair,whosubmittedtheapplicationforthePARInnovationaward.

I hope the Statistics Division is serving you well and that you are taking full advantage of the resources we offer such as the webinars, conferences, resources on our website, and, of course, the Statistics Digest that you are reading now.

Pleasefeelfreetosendyourcommentsandsuggestionstomeatanytimeduringtheyear.Also,ifyouareinterestedingettingmoreinvolved in the division, please don’t be shy. Send me a note and let me know what your interests are and I’ll see what I can do.

Editor’s Cornerby Matt Barsalou

Welcome to the June 2016 issue of Statistics Digest. Unfortunately, I have sad news to announce. I am very sorrytoinformthereadersthatourSPCcolumnistConnieBorrorpassedawayinApril2016.Conniewasthe 2016 ASQ Shewhart Medal winner, an ASQ Fellow, and a former editor-in-chief of the journal Quality Engineering. A tribute to Connie, written by her friend and colleague Christine M. Anderson-Cook, is featured in this issue.

Thisissue’sMini-Paperis“ProfoundStatisticalConcepts:WhenTheoryCollideswithReality”byBeverlyDaniels,aMasterBlackBeltandDirectorofOperationalExcellenceatIDEXXLaboratories.Thispaperis

based on her talk at ASQ’s 2015 World Conference on Quality and Improvement and I thought it was one of the most useful talks I have attended so I am pleased to have this opportunity to share the paper with you.

OurFeatureis“TwoCaseStudiesofReducingVariability”byRichNewmanandisbasedonhis2015ASQJointTechnicalCommunitiesConferencepresentation.Richillustratesthebasic concepts using illustrations in place of formulas and walks us from a non-work example to introduce the concept to a realworld example for further illustration.

WealsohaveourcolumnsonDOEbyBradleyJonesandDouglasC.Montgomery,JoelSmithcoveringthis issue’sSPCcolumn,Statistics forQualityImprovementbyGordonClark,JackB.ReVelle’sStats101,TestingandEvaluationbyLauraFreeman,andMarkJohnson’sStandardsInSide-Out.

Finally, I am pleased to announce this publication has earned the Statistics Division an ASQ PARSilverawardforInnovation(seeFigure1).ThisawardwaspresentedforchangingtheStatistics Digest from a newsletter to a more technical content oriented publication.

Matt Barsalou

Figure 1: 2015 PAR Silver

Page 4: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

4 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Mini-PaPErProfound Statistical Concepts: When Theory Collides with realityby Beverly DanielsDirector, Operational ExcellenceMaster Black BeltIDEXX Laboratories

ThelateGeorgeBoxisfamousforremindingusthat“allmodelsarewrong;someareuseful.”1 The most common statistical models that are taught to most Quality practitioners are too frequently limited to theoretical distributional models and tests of statistical significancethatarefocusedontheNormaldistributionandtestsofmeans.Certainlyotherdistributionalmodelsandothertestsofsignificancearegivensomeattention.TheseincludetheBinomialandPoissondistributionsandtestsofsignificancesuchasthehomogeneity of variance and non-parametric tests. Most statistical training readily accessible to Quality practitioners is naturally of a limited nature—there never seems to be enough time to cover all of the important aspects of statistical analysis that we will need in our careers. While some of the more practical approaches that are designed to handle non-normal and non-homogenous distributions (e.g. multi-vari, blocking and split plots, etc.) are taught and are available for study in the literature, their presence istoooftenoverwhelmedbyaperceivedlacktimetoteachandalackoftimetolearn.Ofcourse,therearemanystatisticians,instructors, and practitioners who avoid this. ASQ was founded by many of these distinguished individuals. However, as Six Sigma has grown, this knowledge and training has become diffused by a large number of trainers who primarily teach the common ideal models.2

The emergence of powerful user-friendly statistical software—while a welcome addition to our arsenal of tools—hasn’t been able to break this time constraint either. In fact, the ability to perform a plethora of statistical tests at the literal “push of a button” has had a further unintended consequence of a narrowing focus on some of the more popular “ideal” conditions and tests. It’s not that these software packages can’t perform less common analyses intended for the messy real world reality, it’s that the tests based onthemorecommonlytaughttheoreticalidealsaresoaccessiblethatlittlethoughtisnowrequiredtopushthebutton.Becauseof the perceived time constraint, too many people get trained primarily on how to use the software and not on how to perform practical statistical analyses; they don’t know how to assess the process variation to design the correct experimental test structures. Improving our knowledge of the limitations of the common models and the more useful alternatives is the responsibility of both instructors and students.

Several times a year I hear from practitioners the following statements:

• Ihavesomedata;whatstatisticaltestscanIperform?• Mydistributionisn’tNormal.NowwhatdoIdo?• Mypvaluewaslessthan.05;whydidn’tmyfixwork?

Certainlymanyofyoualsohearfingernailsontheblackboardeverytimeyouheartheseandsimilarstatements.Butthereisacontingent of instructors who limit their time to the common models and an even larger contingent of Quality practitioners who are trained in or only remember these common models.

This gap is further compounded by the almost total lack of training in the difference between an enumerative study and an analytic study as defined by Deming.3

The commonly taught distributional models and statistical tests of statistical significance or confidence intervals are usually taught in the ideal context of identically distributed and homogenous distributions. When confronted with what Deming referred to as an Enumerative study where we are trying to characterize a population through random sampling, these statistical approaches work very well. However, the quality practitioner is more often confronted with problems that need to be solved or prevented. These situations require what Deming referred to as an Analytical study.4 Analytical studies are focused on understanding causal mechanisms so that we can “make predictions about the future,” i.e. solve and prevent problems. These are not simple or easy studies to design or analyze. They require a deep understanding of both statistical variation and science. When we limit ourselves

Page 5: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 5

Mini-PaPEr

to the common statistical theories of distributional models and tests of statistical significance on Analytical studies we too often get the wrong answer.5

In this paper I discuss a more effective alternative for solving and preventing most real world Quality problems.

“In theory, there is no difference between theory and practice.But, in practice, there is.”

Jan L.A. van de Snepscheut6

This is a discussion of how our common “ideal” models are getting in the way of improvements and what we can do about it. It is meant to be thought provoking. It requires thought to achieve a deep understanding of our profound statistical concepts and to be able to overcome some of our preconceived notions of how statistics can be applied to dealing with real world problems.

Some of the common models that trip us up:

• TheNormalDistributionAssumption• HomogenousDistributionAssumption• Independent&IdenticallyDistributedDistributions

These models form the basis of the tests of statistical significance testing that are so commonly taught.

So what happens when our reality doesn’t match the common model? We may assume there is something wrong with our process, wemanipulatethedatasomehowtomatchthecommontheory,orwefeelparalyzed.Noneofthisisnecessaryandworseitcanlead to false conclusions and ineffective solutions.

The Normal Distribution“Normality is a myth; there never was,

and never will be, a normal distribution”7

RobertCharlesGeary

TheNormalDistributionisaman-madeconstruct;itisnotalawofphysics.Manyprocessesnaturallyproducesymmetrical“bell”shaped distributions of data, but many naturally do not. Some examples are coating thickness, fill volumes, defect rates, flatness, true position, and processes that involve tool wear or some other component that is consumed. Where is it credibly proven that processesaresupposedtobeNormallydistributed?

The Normal ReactionWhathappenswhenadatasetfailsatestforNormality?SomeQualitypractitionerswilltrytosearchforoutliersandthrowthem out of the data set; some will try to transform the data; some will be stumped because they might have heard the fallacy that “thereisnothingyoucandoifyourprocessisn’tNormal.”Forsomethingthatdoesn’tactuallyoccurveryoftenandreallyisn’tneeded for many tests of statistical significance, we spend an awful amount of time testing for it and agonizing over its absence.

Let’sfirsttakeabrieflookatsomeofthenegativeconsequencesofplacingsomuchrelianceontheNormalityofourdata.

Censoring “outliers”:Outlierdetectionwasintendedforstaticdatasets(enumerativestudies)notfordatastreams(analyticalstudies).MostoutliertestsrelyontheNormaldistributionasthedistributionalmodelandalsousea95%coveragelimit.Ifthedatasetsarelargeenough,somedatathatfitsthedistributionbutliesbeyondthe95%limitswillbeflaggedasanoutlier.Thesedatapointsarenotoutliersinthesensethattheydon’tbelong;theyaresimplyextremevalues.Beyondthatissue,wemustthink about what an outlier test actually tells us. A data point that is flagged as an outlier to any theoretical distribution doesn’t mean that the data point is invalid.8 It doesn’t mean that the customer won’t get that value. When engaged in problem solving

Page 6: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

6 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Mini-PaPEr

the outlier data might be our best opportunity for solving the problem.9 When looking to censor data from any data set the only values we should remove are those that are validated as being:

• Impossibleresults• Mis-recordedortypos• Resultsfromaninvalidmeasurementevent• Misreadmeasurements

Transforming the data: Transformations can hide the data and divert us from understanding the data to trying to make our data set fit some theoretical model so we can apply tests of statistical significance. While transformations can work to allow us to use certain tests of statistical significance, they require special skills to perform correctly. Moreover, if the process is not homogenous the answer will be incorrect; no transformational formula will save you from non-homogeneity. In today’s world many processes are not homogenous, especially those that are creating defects. Since many problem solving efforts do not require tests of statistical significance10, attempting to transform our data may be a waste of time.

Which comes first, the data or the model? An incorrect choice of statistical model prior to understanding the data can lead to errors in data collection, analysis, and interpretation. It can lock you into a theoretical model that doesn’t approximately describe your data and it can hide the truth hidden in the data. If your data doesn’t fit a model, it’s the selection of a model that is incorrect, it’s not the data (or the process) that are ‘wrong’. For some problems, distributional model fitting is at best wasted effort for the Quality professional. At worst it could result in incorrect conclusions and generate distrust in the Quality sciences.

Sowhatcanwedo?ThelateEllisOttwascontinuallyexhortinghisstudentstounderstandthetechnicalbackgroundoftheproblem, collect some data, draw some plots of the data, and think about what you have learned and what action you should take. Dr.Ottwantedtoteachhisstudentsstatisticalthinking:plotyourdataandthinkaboutyourdata.11

Let’slookatanexampleofaproblemthatinvolvedseriouslynon-Normaldatabutwasabletobesolvedwithstatisticalthinkingrather than forced theory.

Example 1, Noisy Gears (Non-Normal Data with unequal variances)Two gears engage at high rpms. Frequently each of these gears are rotating at different speeds (revolutions per minute: rpm). Duringtheprototypetestingaloudnoiseoccurswhenthegearsengage.Observationshowsthatthegearsarenotmeshingsmoothly and the gears are grinding against each other. There is no obvious cause for this delayed engagement.

The team’s first step was to establish a measurement system for the delayed engagement or noise. The first decision involved the conditions of the test: at which delta rpm would the test be run? There was no quick way to assess what range of conditions could be experienced in the field and so the engineers determined the likely range of deltas and a Measurement System Analysis (MSA) would test gears across the range to determine the delta that would provide the best information. There were two choices for the measurement of the noise itself: duration and amplitude. Amplitude is measured in decibels and is a logarithmic function. Amplitude only measures one aspect of the noise—its loudest point and is an indirect measure of the force of the collision. Duration is a measure of how long it takes the gears to mesh, which is a function of the physical features that are prolonging meshing as well as the delta rpm at the moment of engagement of the gears. Duration is often a non-normal distribution and is bounded at 0.

The measurement system analysis had a different structure than a typical one since this is a measurement of a function. Five gear setswereselectedforthestudyandwererandomlyengagedtwiceatfivedifferentdeltarpms.Bothdurationandamplitudeweremeasured for each event (Figure 1).

Page 7: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 7

Mini-PaPEr

There is more to the selection of the appropriate measure of the problem than ease of collection, analysis or passing an MSA; the measurement must be a true representation of the failure. In this case duration provides a better measure of what is actually occurring with the gears—it matches the physics of what is happening. While the customer’s initial perception of the quality of the vehicle will be effected by the noise, it is the potential for physical damage to the gears by prolonged grinding that poses the largest threat. Duration of the noise is also a measure of how long it takes the gears to mesh which is a function of the delta rpm and the features that are inhibiting meshing. Additionally, the data indicate that the larger the delta between the two gear rpms, the longer (and louder) the noise is. When the MSA data are plotted on a Youden plot we see a clearer picture of the noise and gain even further insight into the causal mechanism (Figure 2). This clarity comes from graphing the actual data in context of the study design rather than parameter estimates from a common theoretical statistical analysis.

A second view of the data that combines both the delta rpm and the resulting duration shows the effect of test condition and the measurement repeatability in the same chart (Figure 3).

Figure 1: Response of noise with increasing difference in delta rpm between gears

Figure 2: Youden Plot of MSA results shows the bias at higher delta rpm values

Page 8: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

8 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Mini-PaPEr

Figure 3: Multi-Vari Chart of Noise MSA showing both the delta rpm and the repeated measures of noise duration

This plot allows us to dissect the variation seen in the previous chart. A substantial hint as to what is happening is seen when we look at the repeated measurements of each unit at each delta rpm across the range of delta rpm settings. It is important to note that the 5 units were completely randomized for the 1st and 2nd run across delta rpm settings. (i.e. the lower delta rpm settings were not run sequentially before the higher delta rpm settings.) The second engagement noise duration is longer than the first at high delta rpms. Why? This is the important question. This insight comes from graphing the actual data in context of the study design. In this case the results lead us to test under the worst case condition of the highest delta rpm. This will provide much better detection of the causal mechanism.

Occasionally,theMSAwillalsoprovideinsightintothecausalmechanismaswiththisstudy.Physicalobservationofthepartsreveals that the parts are being damaged. The data indicates that at high delta rpm settings it takes longer to fully engage due to the physicaldamage.Bylookingatthedamagedareasthefeaturesthatcouldcreatethenoisewereapparent.ADesignofExperiments(DoE) was planned with these 3 features of interest.

In the interim, a second prototype had been built independently of this investigation. It was tested and the noise duration was substantially less at the higher delta rpm settings (Figure 4). The 3 factors that were suspected based on the damage were all changed in the second prototype design, confirming that the original selection of factors was appropriate.

Figure 4: Performance of Prototype 1 (solid line) vs. Prototype 2 (dashed line)

Page 9: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 9

Mini-PaPEr

The study design was a 23 full factorial (3 factors at 2 levels). The sample size was 2 gears per condition. Why 2? There was no sample size equation used; not that these aren’t useful at times it’s just that in this case the difference in performance between prototype 1 and 2 was so patently obvious at a sample size of 2 gears. Testing would occur at only the highest delta rpm level and each gear would be tested twice in a random test order. The test would be truncated at 5 seconds to prevent catastrophic damage to the gears (Table 1).

This design will produce data that are non-normal, non-symmetrical and bounded by zero, the two repeated measurements of each part are not independent (although there is some independence between the two parts in each condition and replication across the conditions), the testing is truncated at 5 seconds and the variances of the two levels are not equal. While we could try to determine the correct statistical analysis, we could also try to plot the data and look at it first (Figure 5).

Table 1: Experimental Design Structure for Noisy Gears

Figure 5: The main effects plot for noise duration

Page 10: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

10 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Mini-PaPEr

Thisplotsaysitall.FactorBisthecontrollingfactorfornoise.AsDemingpointedoutmanytimes,thereisnostatisticaltestofsignificance that will increase our belief in this result or change the action that we take next. With this knowledge the team was abletoalterFactorBsuchthatthenoiseandthetimetomeshthegearswasminimalandwithinthespecifiedrange.

In Analytical studies, experimental structure and replication provide confidence. Sophisticated statistical tests can help us tease out small effects but solid experimental structures that account for process variation may tease out small differences with much smaller sample sizes. Further, in my experience, industrial quality practitioners are usually looking for large differences.

Homogenous Process StreamsMost tests of statistical significance rely less on a distributional assumption and more on the homogeneity of the process. If the process under study is non-homogenous, statistical tests of significance will correctly detect the non-homogeneity but will not necessarily gives us a practically useful answer. A homogenous process will produce parts or results that will appear to vary randomly without trends, shifts or cycles. A homogenous process is one in which the factors that primarily determine the location of the data are the same factors that primarily determine the piece to piece variation. The largest component of variation is part to part or event to event and the variation of the sample means will be a function of the sample size and the population standard deviation ( √ ).NotallhomogenousprocessesareNormalandnotallNormallydistributedprocessesarehomogenous.

For enumerative studies we create homogeneity by random sampling. Enumerative studies are concerned with characterizing a static data set and the action taken will be on that data set. A prime example of this is acceptance sampling. An enumerative study is not intended to make predictions about the future performance of a process. The majority of the work that a quality professional engages in is to determine the causes of poor performance and take action to improve the future performance. This is the purpose of an analytical study. Since many process streams are not homogenous, the first step in an analytical study is to understand the non-homogeneity so we can design studies that won’t result in a misleading conclusion from common or misapplied statistical analysis.12

Example 2, Stacking Faults (Non-Homogenous categorical data)Stacking faults are a particular type of defect in a silicon wafer used to manufacture semiconductor devices. An additional silicon layer is grown on a silicon wafer’s surface to produce a very clean substrate for subsequent fabrication steps. The additional layer is an epitaxial growth. If there are contaminants or structural imperfections at the surface of the wafer prior to the growth of the epitaxial layer the contaminants can result in a larger structural defect that propagates through the epitaxial layer. These defects are collectively referred to as stacking faults. If the fault is in the wrong location, the device may fail at subsequent testing steps or may pose a reliability risk. The stacking fault inspection involves microscopically checking the surface of the wafer at 5 locations on each of 3 wafers. The maximum average allowable count is 10 faults. Seven of the last 25 lots were rejected and scrapped for exceeding this limit.

An engineer was assigned to test a new cleaning process to reduce the contaminants on the wafer surface. The first test was run on one production lot using the proposed process. A randomly selected lot that was cleaned with the current process (produced during the same time as the experimental lot) was also tested as a control. All of the wafers in both lots were tested for stacking faults. The original analysis involved the commonly applied t-test on the number of observed stacking faults on each wafer from both lots.

Although the p value was less than 0.05, no one really believed the conclusion that the new process was better than the current processastheresultsforthetwolotsweresosimilar(Figure6).Beforeabandoningthenewcleaningmethod,theQualityEngineer (QE) took a stab at the analysis.

The QE recognized that the data were not continuous, but categorical counts. The better theoretical choice for a data model was thePoissondistributionandtheQEredidtheanalysisusingPoissonconfidenceintervals(Figure7).

Page 11: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 11

Mini-PaPEr

Thisanalysisresulted inaconclusionthat thetwoprocesseswerenotstatisticallysignificantlydifferent.Nowwehavetwoconflicting conclusions—which one is correct? In this case, it turned out that wasn’t the relative question. It’s not about the model or the statistical analysis; it’s about the structure of the experiment in relation to the variation of the process.

A quick control chart of the stacking fault counts for the 25 previous lots reveals that the process is not in statistical control; it is non-homogenous (Figure 8).

ThecurrentprocessdidnotevenhaveaPoissondistributionbecausetheprobabilityofafaultisnotconstantfromwaferlottolot.At-testorOnewayANOVAaren’tappropriatebecausethedataarecountsnotcontinuousdataandthePoissonconfidenceintervals and any transformation won’t be valid because the process isn’t homogenous.

Why is the process non-homogenous? Let’s consider the science for a moment. Silicon wafers are cut from ingots of silicon that are grown from a silicon seed. Each silicon seed is different and the growth conditions will vary slightly from ingot to ingot. Wafers are then sliced from the ingot to form a wafer lot. The cutting tool used to slice each ingot is subject to wear and will not cut as cleanly as it wears. Theoretically, the science tells us that imperfections in the silicon structure will be relatively homogenous

Figure 6: t-test with confidence intervals for the new cleaning process vs. the current process

Figure 7: Poisson confidence intervals for the new cleaning process vs. the current process

Page 12: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

12 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Mini-PaPEr

within an ingot and different from ingot to ingot. The cutting process will also introduce contaminants and surface imperfections that are relatively homogenous within a wafer lot but different from wafer lot to lot. The number of stacking faults will be partially dependent on the initial contamination of the wafers and the structural imperfection of the ingot. Although it would have been better to fix the causes of the variation and substantially reduce the stacking fault rates, this performance was considered state of the art at the time and the ingot supplier was not inclined to take this effort on. So the semiconductor manufacturer was stuck with improving their cleaning process.

A requirement of a t-test or any other test of statistical significance is that the parts within each level are independent of each other. Although there were 8 wafers used for the new process and the current process, the results were not independent due to the homogeneity within the wafer lots. So regardless of the statistical test, the basic requirement of independence was violated. Additionally, it is possible that the first experiment chose a wafer lot that was highly contaminated compared to the control lot making the initial comparison invalid.

A better experiment would be one that accounts for the actual physics of the process. The new design is a “matched pair” block design13 described in Table 2.

The results of the experiment are plotted in Figure 9.

The best statistical test of significance may not be obvious, but here the graphical display of the appropriately structured experiment makes the statistical mathematics redundant. Within each pair the new cleaning method results in fewer stacking faults than with the current method. This improvement is replicated with all 3 pairs within an ingot and across multiple ingots with different silicon seeds, growth conditions, sawing conditions and cleaning events. The probability of this happening simply by chance is vanishingly small.

An alternative graphical approach is to plot the difference between the two methods (Figure 10). This plot clearly shows that the amount of improvement using the new method is dependent on the underlying (uncleaned) fault rate. There is more improvement in raw counts when a wafer has more contamination and damage. The improvement in counts is naturally less when a wafer has few faults.

Figure 8: Control chart for stacking fault inspection. Note: Each data point is not a single count. It is the average count of 3 wafers. The chart shown is a u chart with a subgroup size of 3.

Page 13: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 13

Mini-PaPEr

Table 2: Experimental Design Structure for Stacking Fault Cleaning Method Comparison

Figure 9: Results of the “matched pair” study showing the effect of the new cleaning method in direct comparison to the current method

Page 14: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

14 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Mini-PaPEr

Aretherestatisticaltestsofsignificancethatwouldfitthissituation?Ofcoursethereare,butlookatthechartsagain—doweneed them? What additional value would one of these tests provide? Would a precise p value change the decision? The confidence in our conclusion that the new method is better than the old method comes from replication and the structure of the study, not from a p value or confidence interval.

“…levels of significance furnish no measure of belief in a prediction.Probability has use; tests of statistical significance do not.”

W. Edwards Deming14

The important decision is whether or not the new cleaning method is worth implementing. Taking a closer look at the improvement (Table 3), we can see that the two lots that would have been rejected with the current cleaning method were passing with the new method.

How can we use this data to predict the improvement we might expect to see in the future?

Table 3: Experimental results of new vs. current cleaning method

Figure 10: Method comparison plot of difference between the new and current cleaning method

Page 15: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 15

Mini-PaPEr

There are several methods available to us. We can use the method comparison chart and a simple calculation to make a rough estimate of the improvement we expect to see when lots would otherwise have counts above 10. Looking at the method comparison chart (Figure 10) shows roughly two groups of data (below 10 and above 10) with two different average levels of improvement (basedonthecurrentmethodresults).Ourhistoricaldatashowsthatrecentfailinglotshaveaveragecountsbetween11and20.The average improvement for the two lots that had counts above 10 using the current method is 6.7. The average improvement for the 3 lots with counts less than 11 is 4.3. This provides a rough estimate of the improvement. Using this information, we can predict the level of improvement we might have had with our historical data (Figure 11).

Five of the seven failing lots would have had acceptable stacking fault counts with the new method. Additionally, the overall stacking fault rate improves for every lot, so that the die yield loss and field reliability failures would also be expected to improve.

Can we provide a more precise estimate? Yes, but it is still an estimate no matter how precise we make it. Is there any added value to a more precise estimate? We have a fairly small data set, so more sophisticated approaches may not provide a better estimate. The nature of the data set again provides some serious challenges to a choice of statistical modelling: There are only 5 lots and the 3 values within a lot are not independent. While the current method results correlate to the new method results they do not have the traditional ‘dependent’ relationship of the common linear regression model. Certainly there are techniques to deal with this, but will they provide more confidence in the answer? All estimates at this point will be wrong; additional statistical manipulation will only increase the precision of the estimate; it cannot increase the accuracy. An example of trying to be more precise with the datawehaveistocalculatethebestfitlineformulawiththeCurrentmethodresultsastheindependentvariable(X)andtheNewmethod results as the dependent variable (Y). In this example we use all of the individual wafer counts with the paired wafers servingastheX:Yvalues(Figure12).Theresultingequationofthelineisusedtopredicttheimprovementusingthehistoricalbaseline (Figure 13). As can be seen the change in the predicted improvement is trivial.

Certainly there will be improvement but it is not complete; there will still be stacking faults. Is it enough? We could run more experimentstogathermoredatatoincreaseourconfidenceintheestimate.Butwillitbeworthit?Theanswerisinthesciencenotthe statistics. If the new cleaning method is relatively inexpensive, easy to implement and poses no risk of adverse consequences, anyimprovementwillbewelcome.Ifthemethodisexpensiveandtimeconsumingtoimplementand/orposesahighriskofadverse consequences, we will certainly want to increase our confidence in the amount of improvement we can expect to see. This may require us to better understand the cost of lot rejections, the effect on individual die yield and device reliability before we canmakeabetterinformeddecision.Thiswouldrequireasecondmorecomplexexperimentand/oranalysisofexistingdata.Thisparticular cleaning method was inexpensive, easy to implement and had no adverse consequences. The method was implemented without additional analyses or experiments and the expected improvement was achieved.

Figure 11: Average amount of improvement for historical data

Page 16: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

16 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Mini-PaPEr

As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality.

Albert Einstein15

As quality practitioners it is well worth our time to learn about analytical studies and the study designs and tools that help us perform more effective studies. We need to learn how to ask appropriate questions regarding the variation of our processes and how to better analyze the results of our experiments. This requires us to broaden our knowledge beyond the common statistical modelsandtoapplymorestatisticalthinking.Remember,statisticswithoutphysicsisgambling;sciencewithoutstatisticalstructureis psychics.

Figure 12: Bivariate fit of the new cleaning method vs. the current cleaning method from the matched pairs experiment

Figure 13: Expected improvement of the historical baseline using different estimates of the expected improvement

Page 17: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 17

References 1. Box,GeorgeE.P.;Draper,NormanR.,EmpiricalModel-BuildingandResponseSurfaces,1987,p.424,Wiley 2. Schwinn, David, “Teaching Statistics that Help not Hinder Management,” Quality Digest, September 2012 3. Deming, W. Edwards, “On Probability as a Basis for Action,”TheAmericanStatistician,1975,Vol.29,No.4,pp.146–152 4. Ibid 5. Stauffer,Rip,“Render unto Enumerative Studies…,” Quality Digest, July 2013 6. Rosenberg,Doug,Stephens,Matthew“Use Case Driven Object Modeling with UML Theory and Practice,” 2007 Apress, p. xxvii 7.Geary,R.C.,“Testing for Normality,”Biometrica,Vol.34,pp.209–242 8. Wheeler, Donald, “Probability Models do not Generate Your Data,” Quality Digest, March, 2009 9. Wheeler, Donald, “All Outliers are Evidence,” Quality Digest, May, 2009 10. Deming, W. Edwards, “On Probability as a Basis for Action,”TheAmericanStatistician,1975,Vol.29,No.4,pp.146–15211.Neaubauer,DeanV.,“Pl-Ott the Data! A retrospective look at the contributions of master statistician Ellis R. Ott,” Quality Digest,

May 200712.Daniels,Beverly,“Overcoming Doubt and Disbelief ,”SixSigmaForumMagazine,November,201213.Moen,RonaldD.,Nolan,Thomas,W.,Provost,LloydP.,“QualityImprovementthroughPlannedExperimentation,”2ndEdition,

McGraw-Hill, 1999, chapter 414.Deming,E.Edwards,ForwardtoStatisticalMethodfromtheViewpointofQualityControlbyWalterA.Shewhart,1986,Doverreprint15. AlbertEinstein’sAddresson“GeometryandExperience”atthePrussianAcademyofSciencesinBerlinonJanuary27,1921

About the AuthorBevDanielsgraduatedfromMichiganTechnologicalUniversitywithaBSEE.Shehashadextensiveexperienceinthelast30yearsdeveloping and providing Lean and Six Sigma training courses as well as implementing Lean and Six Sigma in a variety of manufacturingindustries.SheiscurrentlytheMasterBlackBeltandDirectorofOperationalExcellenceatIDEXXLaboratories,Inc.ShehasheldvariouspositionsinqualityengineeringandmanagementwithFairchildandNationalSemiconductor,Hondaof America Manufacturing, General Electric Aircraft Engines, Warn Industries and Intel.

Mini-PaPEr

in rEMEMbranCE of ConniE M. borror, 1966–2016by Christine M. Anderson-CookLos Alamos National Laboratory

The statistics and quality communities of the ASQ and ASA mourn the loss of a colleague, teacher,mentor,leader,anddearfriend,ConnieM.Borror.OnApril10,2016,Connielosther brave battle with cancer before reaching her 50th birthday. Connie was deeply connected toArizonaStateUniversityassheearnedherPh.D.inIndustrialEngineeringtherein1998andwasaProfessorintheDivisionofMathematicalandNaturalSciencesatArizonaStateUniversity West from 2005 until her death.

Connie and I were born in the same year. It feels like we grew up professionally together, sharing successes and setbacks throughout our developing careers. I got to know Connie well when I spent a sabbatical in the Industrial Engineering department at Arizona State University (ASU). She was talented, inspiring, and never hesitated to jump in when a friend or colleague needed help. In professional organizations, she was an enthusiastic contributor who held numerous leadership positions within the American Society for Quality, the ASQChemicalandProcessIndustriesDivision,thesectiononQualityandProductivityofthe American Statistical Association, the Institute of Industrial Engineers, and the ASQ and ASA Fall Technical Conference. It feels like her death leaves a huge hole in the many lives that she so deeply influenced. I would like to take this opportunity to showcase some of the highlights of her all-too-brief career, and share some personal reflections.

Connie Borror September 2012

Page 18: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

18 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

in rEMEMbranCE of ConniE M. borror, 1966–2016

Connie was recognized for her many contributions with a number of awards. She was an elected fellow of the American Society forQualityandtheAmericanStatisticalAssociation,andafellowandcharteredstatisticianfortheRoyalStatisticalSociety.Professionally,Conniewassoaringatthetimeofherdiagnosis.AtASU,shewasnamedaFoundationProfessorinJune2015,andwaspartofateamthatreceivedthe2015President’sAwardforInnovationaspartoftheVietnamHigherEngineeringEducation Alliance team. She was also recognized with numerous teaching awards throughout her career. In 2016, she was the first female recipient of the Shewhart Medal, a prestigious career award from the American Society for Quality for the individual “who demonstrated outstanding technical leadership in the field of modern quality control, especially through the development to its theory, principles, and techniques.” I was delighted when ASQ arranged for a special early ceremony in February of this year to award this honor to her, so that she would have a moment to be celebrated by her friends and colleagues. It was a special moment she got to enjoy surrounded by friends and family in the midst of a very difficult battle against cancer.

Connie was a prolific researcher in diverse areas including quality control, design of experiments, response surface methodology, measurement systems, robust parameter design, and reliability. Her focus was always on solving real problems and making statistical methods accessible and understandable for practitioners. She authored more than 80 peer-reviewed papers in statistical, engineering, and quality journals, was the author of 2 books, and edited a substantial revision of The Certified Quality Engineering Handbook, 3rd edition. She taught extensively at ASU as well as in short courses in industry and government. Connie was an exceptional teacher with a talent for communicating difficult concepts to people with diverse backgrounds while also making it fun to learn. Her humility made her approachable and learning less intimidating.

In addition to her research and teaching contributions, she was always generous with service to the profession. Connie served as editor of Quality Engineering from 2011 to 2013 and on its editorial board from 1999 through 2011. She was also on the editorial boards of the Journal of the American Statistical Association, Journal of Quality Technology, Quality and Reliability Engineering International, Quality Technology and Quantitative Management, International Journal of Statistics and Management Systems, and Journal of Probability and Statistical Science. She was a mentor to numerous undergraduate and graduate students, as well as to manyearlyandmid-careerstatisticsandqualityprofessionals.Shemadesurethatthe9Ph.D.studentsthatsheco-advisedexcellednot only as students, but also as professionals long after graduation as their careers developed. For the lives that she impacted, she was a lasting force that continuously looked for opportunities to share and encourage growth.

For those of us who were privileged to know Connie well, we understand that her impact goes so much further than this list of her technical contributions and recognitions. She served as a role model for statisticians, engineers, and women. In my mind her greatest weakness—she sometimes had self-doubt and I think did not always remember what a special researcher, collaborator, teacher, and friend she was—was also one of her greatest strengths. She made everyone feel comfortable, which meant that she was always looking to help anyone who crossed her path. Her passion and enthusiasm for statistics and quality made learning and collaborating with her fun. The big smile that she wore so much of the time made her incredibly approachable, particularly for young professionals looking to get a start in their careers.

Her loss leaves a hole in the many lives of those she interacted with. The statistics community will mourn for the lost opportunity ofthecontributionsthatsheneverhadanopportunitytomake.AshernephewJohnBringersaid,Connielikedtoadoptstrays.Sometimes it was a stray animal, and sometimes it was a stray person. She adopted anything or anyone that she felt needed to be loved. Her empathy for others—people and animals—meant that she could sense when someone needed some extra support. She lived her life with the goal of elevating others and making their lives better in any way that she could. As the Dean at ASU West said at Connie’s celebration of life tribute, the only time that Connie would get angry was when she was fighting on someone’s behalf. My hope for you, Connie, is that you realized how many lives you deeply touched, and how your friendship, contributions, and dedication made such a big difference to us. Connie, you were a dear friend and we will miss you!

Page 19: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 19

ColuMnDesign of Experimentsby Bradley Jones, PhDJMP Division of SAS

Why isn’t Design of Experiments in Routine Use?

This may seem like an unusual question, given that we are in the midst of a widespread resurgence in the use of statistical methodology. To quote John Tukey, “…it’s a great time to be a statistician.” However, we both have spent a lot of time working with many different types of organizations and while we have seen tremendous growth in the application of design of experiments, in our view therearestillopportunitiesthatarebeingmissed.Belowweidentifysomeofthemostcommonexcuses for not making more extensive use of design of experiments, give at least one explanation about what the underlying causes are, and in some cases suggest possible ways to address the concern.

1. Organizationalresistancetochange.Someorganizationshavebeenmotoringalonginthemiddlelane,havingatleastreasonable success, and see little reason to try or adopt new approaches to problem solving. The have successfully developednewproductsandprocessesusingbest-guessexperimentationandOFATs(OneFactorataTime)foryears.The management and technical staff may be completely unaware of design of experiments and other powerful, modern statistical techniques. Competitive pressures, possible new business opportunities, or a change in management philosophy (such as consideration of a Six Sigma strategy) may change this. More succinctly, pain causes change.

2. Lack of management commitment. It is important to have management support when trying new techniques. Management may be unfamiliar with design of experiments and statistical techniques could be out of their comfort zone. They may also have had a previous unsuccessful experience using design of experiments, perhaps because of inadequate pre-experimental planning,orbecauseinappropriatedesignand/oranalysistechniqueswereused.Introductiontodesignedexperimentsand data driven decision making in business schools could help here.

3. Lackofstatisticalexpertiseinengineeringandscientificpersonnel.MostBS-levelengineersandscientiststakeatmostone course in statistics. The typical “engineering statistics” course is loaded with a lot of topics: probability, probability distributions,sampling,descriptivestatistics,statisticaltestandintervals,ANOVA,andthen,maybeifthereistimeremaining, a quick introduction to factorial designs and control charts. During the 1990s effort was devoted towards making these courses more relevant to practice and less of a “baby math-stat” course. This has been somewhat successful, but there is more to do on that front. The fact is that many engineers and scientists do not take a university statistics course that really addresses their needs once they are in practice. Another unhappy aspect of this is that these courses are all-too-often poorly taught. This is often a service course and assigned to someone without any real experience about actually using the techniques in real engineering or scientific problems. Six Sigma initiatives have brought some practical trainingtoindustryandfororganizationsthataredoingthisitcangoalongwaytowardssolvingthisproblem.Butthequality and relevance of these courses varies widely.

4. Over-relianceonknowledgeofunderlyingtheory.Thisisanall-too-commonproblem;teamleadershipbelievesthatthe project can be addressed by relying on “first principles.” Utilizing one’s knowledge of the underlying theory is an integral part of the successful use of the scientific method but it needs to be integrated into a well-thought-out approach to research and development that also makes use of sound experimental strategy. The first principles approach often leads to viewing experimentation as confirmation only, and testing comes too late in the development cycle to take advantage of the discovery and exploration aspects of good experimental strategy. Greater use of computer simulation models will have a beneficial effect.

Bradley Jones

Page 20: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

20 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Control Chart Formulas and Constants

Depending on the data type and subgroup size, there are several commonly-used control charts in practice and each is constructed differently. In the tables below you will find a quick “cheat sheet” for the various formulas as well as the unbiasing constants used in their construction. Table 1 shows the formulas used for calculating standard deviations when using continuous data and Table 2 explains the abbreviations used in the first table. Table 3 is for continuous data control charts and Table 4 contains the unbiasing constants used in the formulas. Table 5 is for attribute data control charts.

ColuMnStatistical Process Controlby Joel SmithMinitab, Inc.

Joel Smith

Thetablesinthisarticlearecopyright©JoelSmithbutmaybeusedwithoutseekingpermission.Republicationanddistributionareallowed,providedthatthefollowingcreditlineisincluded:©2016,J.Smith.OriginallypublishedbytheASQStatisticsDivisioninStatisticsDigest,June2016.

Table 1: Formulas for continuous data standard deviations

Table 2: Abbreviations for continuous data standard deviations

Page 21: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 21

Statistical Process Control

Table 3: Formulas for continuous data centerlines and control limits

Table 4: Control chart unbiasing constants

Page 22: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

22 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Statistical Process Control

Reducing Patient Wait Time in Hospital Emergency DepartmentsLong waiting times and length of stay at hospital emergency departments is an important public health problem. This article describes an approach to improve patient flow in an Emergency Department (ED) by developing a simulation model and optimizing its predictions with respect to controllable variables and constraints. The approach described was applied at the Saint Camille hospitalinParis.Thehospitalhasabout300bedsanditsEDoperates24hoursperdayandservesmore than 60,000 patients per year.

Long wait times are an increasing problem in the United States, and visits to Hospital EDs have been increasing.From1999to2009,ithadincreasedby32%to136millionannualvisits(Hing,Bhuiya

2012).Thatis,itincreasedatannualrateof2.8%.Forsomehospitals,thisincreasehasresultedincrowdingandlongerwaittimestoseeaprovider.Between2003and2009,themeanwaittimetobeexaminedbyaproviderincreasedby25%to58.1minutes.However, the distributions of wait times are highly skewed since more serious conditions are treated more quickly. The median waittimeincreasedby22%to33minutes.

ColuMnStatistics for Quality improvementby Gordon Clark, PhDProfessor Emeritus of the Department of Integrated Systems at The Ohio State University and Principal Consultant at Clark-Solutions, Inc.

Gordon Clark

Table 5: Formulas for attribute data centerlines and control limits

Page 23: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 23

Statistics for Quality improvement

TheNationalAcademyofEngineeringandtheInstituteofMedicinepreparedareportpresentingtheimportanceofsystemsengineering tools in improvinghealthcareprocesses (Reid,Compton,Grossmanetal2005).Theyemphasized theuseofsimulation. A discrete-event simulation of patient flow through an ED represents the ED as it evolves over time. The simulations state is stochastic since the processes such as patient arrival times, patient severity, and treatment times are stochastic and represented by random variables. Thus one must replicate the simulation model to estimate performance measures such as the average waiting time, the histogram of waiting times for a specified set of ED resources such as number of beds, doctor availability and nurse availability.

Simulation Model FeaturesThe simulation study of the Saint Camille ED is an excellent example since it uses a realistic model and it uses the Arena Simulation packageOptQuestforsimulationoptimization(Ghaines,Wargon,Jouinietal2015).InadditiontoArena,OptQuestisavailableforFlexSim,Simio,Simul8andProModelaswellasothercommerciallyavailablesimulationsoftware(OptTek,2016).OptQuestusesmetaheuristicprocedures,includingTabuSearch,NeuralNetworks,ScatterSearch,andLinear/IntegerProgramming,intoasinglecompositemethod.KleijnenandWan(2007)comparedOptQuestwithtwoothermethodstooptimizetheoutputofaninventory simulations and found it to give the best result.

The objective of the study at Saint Camille was to determine how much the current staffing budget should be increased to alleviateEDcongestion.OneKeyPerformanceIndicator(KPI)forthestudywastheaveragelengthofpatientstay(LOS ). It is also important to explicitly consider the wait time for urgent-care cases to see a doctor. Let DTDT be the average door to doctor time for urgent-care patients. The Triage nurse categorizes patients into five categories, where ES1 is the most severe and ES5 patients are the least severe. Urgent Care patients are ES1, ES2 and ES3 patients.

Figure1presentsthePatientFlowDiagramwhichisasimplerversionoftheConceptualModelgivenbyGhaines,Wargon,Jouini et al (2015). The Conceptual Model also shows nine places where patients wait times occur such as waiting for a doctor, nurse or a stretcher.

Model ValidationThe first step in their model validation was to have experts review each step in the Conceptual Model. The model was revised many times to obtain the experts’ agreement that it was an accurate representation of the ED.

Thenextstepwastocomparethelengthofstay(LOS)fortherealsystemwiththesimulationvalues.Theysimulated11weeksofEDoperation(7,604patients)andcomparedthemwiththeLOSvaluesofactualoperations(37,986patients).PlotsoftheLOSdistributionsareverysimilarparticularlyafter200minutesintheED.

Staffing Level OptimizationThe overall objective was to allocate the staffing budget to improve ED performance. They wanted to use the simulation results to address two questions:

• Howmuchshouldtheyincreasethecurrentstaffingbudget?• Howshouldtheadditionalbudgetbeallocated?

LetthecontrolvariableXi,j represent the amount of resource use occurring during shift j. Five categories of ED staff were considered. There were two categories of physicians. Junior physicians could only treat ES3, ES4 and ES5 patients. Senior physicianscouldtreatallpatientcategories.ThefivecategoriesofhospitalstaffwereSenior,Junior,Nurse,TriageNurseandStretcherBearer.Letidenotethestaffcategory.Thedayshiftwasfrom9:30amto6:30pm,andthenightshiftwasfrom6:30pmto 9:30 am. Let Ci,j be the average salary per person for resource i during shift j.

Page 24: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

24 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Statistics for Quality improvement

Figure 1: Patient Flow Diagram

Page 25: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 25

Statistics for Quality improvement

The optimization problem is:

Figure 2 presents the simulation optimization results for seven different constraints on the overall staffing budget and five values of L which is the upper limit on DTDT. A value for L of 57 minutes is equivalent to no constraint on DTDT. The absence of values in Figure 2 for particular values of L and increases in staffing budget indicate the absence of feasible solutions for those constraints.Notethattheimprovementintheaveragelengthofstayissignificantuntilthebudgetincreasesby20%.BasedonthisresultthemanagersatSaintCamilledecidedtoincreasethestaffingbudgetby10%toachieveareductioninLOSof33%.

In addition to estimating the level of improvement as a function of budget increase, the simulation results identified the staff changes that would be most effective.

StaffChangesinthe10%BudgetIncrease

• MaximumDTDT of 57 minuteso Add one senior physician on night shift for ES1, ES2 and ES3 patients

• MaximumDTDT of 50 minuteso Add one senior physician on night shift for ES1, ES2 and ES3 patientso Delete one senior physician on night shift for ES4 and ES5 patientso Add one junior physician on night shift for ES3, ES4 and ES5 patientso Add one triage nurse on the day shift

Figure 2: Simulation Optimization Results

Page 26: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

26 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Statistics for Quality improvement

Orthogonal Array

If you or your organization is not already using Robust Designand/orDesign of Experiments (DoE), and if you don’t expect to do so in the near future, then you can pass on this column. You’re missing a good bet, but that’s your choice. For those of you who are already involved or plan on becoming involved in the use of Robust Designand/orDoE, read on.

In the course of designing a simple experiment, let’s suppose your team has three factors which need to be studied at two levels each. Using the common practice of Factorial Design, the number of experimental runs needed for a Full Factorial Design is equal to the number of factor levels (in this

ColuMnStats 101by Jack B. ReVelle, PhDConsulting Statistician at ReVelle Solutions, LLC

Jack ReVelle

ReprintedwithpermissionfromQualityPress©2004ASQ;www.asq.org.Nofurtherdistributionallowedwithoutpermission.

The above results were surprising to managers since the emphasis is on adding doctors rather than nurses. This tendency was apparent for other solutions at different budget increase levels. Also, the only nurse added in the above results was a triage nurse.

ConclusionsConstruction of a detailed simulation model like one described in this article does require a substantial effort. In addition to constructing the model, data must be collected to estimate service time distributions and the arrival patterns of patients. The use of a simulation avoided the costs, interference and disruption of the actual system using designed experiments.

This application of simulation included the use optimization software to identify preferred solutions. This software allows the use of constraints such as an upper limit on the average time for an urgent care patient to see a doctor. The optimization software simplified the identification of preferred solutions.

References1. K.Ghanes,M.Wargon,Q.Jouinietal (2015).“Simulation-basedOptimizationofStaffingLevels inanEmergencyDepartment,”

Simulation: Transactions of the Society for Modeling and Simulation International,9(10),942–953.2. E.Hing,F.Bhuiya(2012).“WaitTimeforTreatmentinHospitalEmergencyDepartments:2009”National Center for Health Statistics

Data Brief,No.102,August2012.3. J.P.C.Kleijnen,W.Wan(2007).“OptimizationofSimulatedSystems:OptQuestandAlternatives,”Simulation Modelling Practice and

Theory15(3),354–362.4. http://opttek.com/(2016).5. P.P.Reid,W.D.Compton,J.H.Grossmanetal(2005).Building a Better Delivery System: A New Engineering/Health Care Partnership,

NationalAcademiesPress,Washington,DC.

Page 27: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 27

Stats 101

case = 2) raised to the power of the number of factors (in this case = 3). Two raised to the third power or 23 = 8 experimental runs. The layout of the eight runs is as follows:

A Full Factorial Design is used when an experimenter needs to study both main effects (the results of just the factors) and interaction effects (the results of combining two or more factors). When an experimenter needs to study only the main effects or a combination of the main effects and only some of the interactions, then a Full Factorial Design is not required and a Fractional Factorial Design is selected.

Intheprecedingthreefactor-twolevelexample,anyfourrunswouldconstituteaone-halfFractionalFactorial,e.g.,RunsNo.1,2,3&4or3,4,5&6or1,4,6&7.Theseareallone-halfFractionalFactorialsbecausetheyhavehalfasmanyrunsastheFullFactorial from which they were drawn. In the third set of runs, we have a Fractional Factorial with some special properties and so this is referred to as an Orthogonal Array. When this set is studied, it appears as follows:

Page 28: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

28 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Stats 101

SpecialPropertyNo.3:ToqualifyasanOrthogonal Array, a Fractional Factorial must be the smallest one (i.e., the one with the least number of runs) for a given number of factors and factor levels and which possesses the first two special properties.

These Orthogonal Arrays are important for use by experimenters because they permit studies to be conducted without having to pay for more runs than are actually necessary. In the foregoing case, if there were no need to study the three 2-way interactions (A ×B,A×C&B× C) or the one 3-way interaction (A ×B× C), then it doesn’t make sense to use a Full Factorial when a one-half Fractional Factorial can provide all the needed information.

In this case, the Orthogonal Array design costs half as much and takes half as long to conduct because only half the runs are necessary. In cases where an experimenter is studying more factors and factor levels than in this example, it is quite common to use an Orthogonal Array which is one-sixteenth of its corresponding Full Factorial. There is clearly a major savings in cost and time to be realized.

Last, but certainly not least, there is still another important advantage in using an Orthogonal Array. When a Design of Experiments (DoE) is performed using an Orthogonal Array, the results of the experiment can identify a combination of factors and factor levels which yield the best results (in terms of whatever Performance Measures/Metrics are used) even if that combination was not one contained in the original Orthogonal Array.

For example, in the foregoing case the DoEmightidentifyFactorsA,B&Callatlevel2astheoptimaldesigncombinationeventhoughthatcombinationwasnotpartoftheexperiment.RecallthatitwasapartoftheoriginalFullFactorial,butnotapartofthe subsequent Fractional Factorial (Orthogonal Array).

Value of Continuous Metrics in Testing

ProbabilitybasedrequirementsarecommonforDepartmentofDefense(DoD)weaponsystems.Probabilitybasedmetricssuchasprobability-of-detectionorprobability-of-hitprovidemeaningfuland easy to interpret test outcomes. However, they are information-poor metrics that are extremely expensive to test. The reason is that there are only two possible outcomes of any test: success and failure. These binary outcomes provide no information on the distance from the line that divides success and failure for each outcome that occurred. For example, consider an aircraft searching for a submarine. The available fuel limits the aircraft’s search time. A successful mission occurs if the aircraft finds the submarine before it runs out of fuel. Clearly, in the successful cases the time to find

thesubmarinecontainsvaluableinformationinadditiontothemissionpass/failresponse,whichhasbeenusedhistoricallytoassess a probability based requirement.

Quality control and test engineers face similar issues in estimating the probability that a part measurement meets specifications. Hamada (2002) provides a great discussion on the advantages of continuous measurements in estimating conformance probabilities. Inthepaper,Hamadashowsa40%to95%reductioninsamplesizerequirementsbasedontheactualconformanceprobabilityforachievingsimilarlowerconfidencebounds.Similarly,Cohen(1983)showsthatusingthedichotomous(pass/fail)variablewhenacontinuousvariableismoreappropriateresultsinastatisticalpowerequivalenttodiscarding38%to60%ofyourdata!

ColuMnTesting and Evaluationby Laura Freeman, PhDAssistant Director of the Operational Evaluation Division and Test Science Task Leader at the Institute for Defense Analyses

with Dr. Rebecca Dickinson (also at IDA)

Laura Freeman

Page 29: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 29

Table 1 shows common examples of information loss in defense testing based on a dichotomization of a probability based requirement.

So how is it possible to achieve such large reductions in the sample size simply by converting to a continuous metric? The trick is in the estimation of the probability density function and interpreting quantiles of that distribution for the probability requirements. Thisrequiresslightlymorecomplexmathematicsthanpass/failcalculations,butforsuchdramaticreductionsintherequiredsample size, the extra complexity is worthwhile. Data with a higher scale (continuous) can always be converted to lower scale data (binary), but not vice versa. Figure 1 illustrates the methodology for a detection time calculation for the chemical agent detector.

The far left panel shows a histogram of the detection times, the middle panel shows the normal distribution fit of the detection times, and the far rightpanel shows theprobability that the continuousdistribution is less than60 seconds.Note, forsimplicity we assumed a normal distribution; other distributions might improve the fit and therefore provide better probability calculations.

Illustrative Example: Chemical Agent DetectorToillustrateexactlyhowmuchinformationislostbyonlyconsideringapass/failvariable,considerandanillustrativeexamplebased on the chemical agent detector requirement. A chemical agent detector must detect a number of agents at varying concentrations and under varying conditions (e.g., temperature and humidity range). Table 2 shows a notional test design matrix for characterizing detection probability. Statistical power analysis was used to determine the number of replications in eachbinofthenotionaltestdesigntodetecta10%changeinprobabilityofdetectionbetweenconditions(confidence=90%andpower=80%).Foreachagenttypethenumberofreplicationsperbincanbereducedfrom14to5byconvertingtothetime based metric; a 65% reduction in test size.

The implications are even more dramatic when we compare the analysis of the data with the two different measures. Figure 2 showsdetectiontimedataforanindividualagenttype.Visually,itisclearthatthereisalotofinformationinthecontinuous

Testing and Evaluation

Table 1: Examples of information loss

Figure 1: Methodology for a detection time calculation

Page 30: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

30 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Testing and Evaluation

data. For example, a clear linear trend is present between the concentration level and the detection time. Additionally, it looks likelythatdetectionswilloccurbelowthe60secondrequirementattherequiredconcentration(0.5mg/m3).

The table below compares predictions from a logistic regression analysis to time to detect regression analysis at the required concentrationon0.5mg/m3.Thelogisticregressionanalysisdirectlyprovidesanestimateofthedetectionprobability.Ontheother hand the regression analysis estimate is based on a quantile prediction and the confidence intervals are calculated using propagation of uncertainty through the multivariate delta method.

Themostnotableresultisthequalityofinformationwehaveonthedetectionprobability.Thepass/faillogisticanalysishasanintervalthatisapproximately300%largerthanthetimebasedanalysis!Itisalsointerestingtonoticethatthereisapracticaldifferenceintheprobabilityofdetection.Usingthetimebasedanalysisweclearlymeettherequirementof85%probabilityofdetectionwithstatisticalconfidence(SeeTable3).However,usingthepass/failanalysistheresultisunclear.Re-examiningthestrength of the linear relationship in the above figure provides some insights into this difference.

It is worth reiterating in this example that data with a higher scale can always be converted to lower scale data, but not vice versa. Fortunately, in this case the detection times were recorded which allowed us to compare the two analyses. In previous tests of the samesystem,onlythepass/faildatawerecollectedresultinginfailedopportunitiestobetterunderstandtheperformanceofthedetector.

Table 2: Notional test design matrix

Figure 2: Detection time data

Page 31: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 31

Testing and Evaluation

Challenges and Potential Solutions• Challenge 1:MeasurementError—Onecommonreasonengineersciteforusingapass/failmetricisthatmeasurement

error on the continuous metric is large. However, even when measurement error is large, it is bound to provide more information than a simple discretization of the data.

• Challenge 2: Accounting for non-detects—This methodology requires a one-to-one translation between the continuous metric and the probability. In the chemical agent detector test, the test had to be run longer to capture times beyond the threshold, and even then certain agent concentration levels were never detected. In these cases advanced statistical methods such as censored data analysis or mixture distributions provide potential solutions.

• Challenge 3: High fidelity instrumentation—Sometimes collecting continuous data is not as easy as breaking out a stop watch and can add significant costs to the test. For example, measuring miss distance on live weapon tests often requires highspeedcamerasandcarefullycontrolledconditionstoensuretheresultsarewithinthecamera’sfieldofview.Onemust weigh the costs and benefits of gathering the continuous data against the potential test cost increases.

• Challenge 4:Pass/Failmaybeafunctionofmultiple(possiblycorrelated)continuousvariables.Thischallengemaybesolved by multivariate statistical methods, but the methods would need to be unique to each specific situation.

The ultimate solution is to think continuously!

The whole process becomes much less complex and even more cost efficient if requirements are converted into continuous measurements. For example, if we can recast requirements in terms of a mean or median detection time with an associated variance, testing will be more efficient. This is also not without its own challenge of understanding the meaning of the continuous metrics. However, the potential cost saving is large. Figure 3 shows the cost inflation to detect either a 10 or 20 percent change in aprobabilitymetricversusdetectingsignaltonoiseratiosfrom0.5to2.0forcontinuousmetrics(80%powerforboth).

ReferencesHamada,M.(2002).TheAdvantagesofContinuousMeasurementsoverPass/FailData.Quality Engineering,15(2),253–258.Cohen, J. (1983). The Cost of Dichotomization. Applied Psychological Measurement,7(3),249–253.

Table 3: Probability of detection

Figure 3: Signal to noise ratio for continuous response

Page 32: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

32 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

ColuMnStandards inSide-outby Mark Johnson, PhDUniversity of Central FloridaStandards Representative for the Statistics Division

Cochran’s Test for Homogeneity of Variances

This newsletter’s contribution will be a bit more specific and technical than the previous ones. The topic du jour concerns Cochran’s test for homogeneity of variances that figures arises in four of the approximately100publishedISOTC69statisticalstandards:

ISO16269-4Statisticalinterpretationofdata—Part4:Detectionandtreatmentofoutliers;ISO13528Statisticalmethodsforuseinproficiencytestingbyinterlaboratorycomparison);ISO5725-2Accuracy(truenessandprecision)ofmeasurementmethodsandresults—Part2:Basicmethod for the determination of repeatability and reproducibility of a standard measurement method;

ISO5725-5Accuracy(truenessandprecision)ofmeasurementmethodsandresults—Part5:Alternativemethodsforthedetermination of the precision of a standard measurement method

Anoverviewof the variouspartsof the ISO5725 series isprovided in ISO5725-1Accuracy (trueness andprecision)ofmeasurementmethodsandresults—Part1:Generalprinciplesanddefinition.Inthesestandards,Cochran’stestisrecommendedfor use in identifying unusually large variances, typically in the context of recognizing an underperforming laboratory. Here I will argue that although Cochran’s test is employed in these standards, standards are not infallible and when necessary or realized, standards may be in need for revision. In order to influence the revision, the consensus process of standardization must take place.

First, some background. A homogeneity of variance test concerns a population of k groups with the null hypothesis of interest being H0: σ1

2 = σ22 … σk

2 versus the alternative hypothesis HA: σi2 ≠ σj

2 for some i ≠ j. The variances are associated with a specific measured characteristic from the items in each group. Assume a random sample of size ni is available for each group and compute the sample variance within each group:

The test statistic associated with Cochran’s test is

∑C = . Large values of C suggest unequal variances (one relatively large

samplevariancewilldothetrick)whilevaluescloseto1/k favor the null hypothesis. For equal sized groups, the null distribution is functionally related to the Fdistribution,whileforunequalsizegroups,specialtablesareavailable(PearsonandHartley,1970).

So what is my concern with Cochran’s test? It has been well-known for over thirty years that Cochran’s test along with the well-knownBartlett’s test arenotoriouslynon-robust todepartures fromnormality (Conover et al., Technometrics, 1981). For example, for sample sizes of (10, 10, 10, 10) bloated significance levels of 0.23 (versus the advertised 0.05) for a double exponential distribution parent and 0.48 for a squared normal distribution parent have been noted. In contrast the recommended Brown-Forsythetest(Lev1:medintheTechnometricspaper)preservedthesignificancelevelandhadsuperiorpoweramongthosetestsholdingthesignificancelevelclosetothenominal5%.TheBrown-Forsythetestisintuitivelyappealingasitconsistsofreplacing each observation by its absolute difference from its group median and then testing for equality of means in the groups using these values (a test that is robust!).

Mark Johnson

Page 33: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 33

Standards inSide-out

Gettingbacktostandards,considerISO16269-4(theoutlierstandard).Thisstandardnotestheimportanceoftestingforoutlyingvariances in the context of interlaboratory experiments. As motivation, the introduction to Cochran’s test offers:

“It is of great importance to detect outliers from a given set of variances evaluated from sets of sample data, in particular in estimating the precision of

measurement methods by means of a collaborative interlaboratory experiment.”

This standard provides an equal sample size version of Cochran’s test and then illustrates it with an example of 5 laboratories each of which conducted 8 replicate tests. In this illustrative example, one laboratory’s variance of 12.134 was slightly less than thesumoftheotherfourlaboratoryvariancesyieldingateststatisticvalueof0.4892whichexceedsthe5%criticalvalueof0.4564. Hence, it was concluded that the laboratory with the largest variance was deemed significantly different from the others.

The standard justmentioned cites another standard ISO 5725-2which also advocatesCochran’s test in the context ofinterlaboratory testing. A balanced uniform-level experiment is conducted in which p laboratories each are charged to obtain n replicate test results under repeatability conditions at each of q levels. The results are obtained from q batches of materials (one perlevelofthetest).Anassumptionininterlaboratorytestingisthatthewithinlaboratoryvariancesaresmallandsimilar.Ofcourse,prudencesuggeststhatthisassumptionwarrantsinspection,forwhichISO5725-2offersCochran’stest.Thisstandardacknowledges that other tests are available but this one has been chosen—no rationale being given. Further commentary here emphasizes the importance of equal sample sizes and recommends that if the equal sample size provision is not met then merely using a common value of n is acceptable. Another possible outcome is that there may be one exceptionally small variance, indicating a high-performing laboratory (rather than one that is unlikely to be cheating on the numbers). Finally, it is mentioned with some caution that the test could be performed recursively, yielding further outliers beyond one initial aberrant laboratory.Ofcourse,suchanapproachcouldleadtoexcessivenumbersofoutliers.MichaelMorton,aTC69UnitedStatesdelegate provided a compelling argument at the Dalian, China meeting in June 2015 that this recursive use of Cochran’s test is very problematic.

The situation at present is that Cochran’s test has been in use in these standards for decades and to revise these standards to reflect the latest thinking on homogeneity of variances tests (albeit circa the 1980’s!) is challenging. As the crop of standards noted in this review come up for periodic review, the US experts will strive to bring them up to the current state of the art via the international consensus process. If you are interested in becoming involved in standards work (for which this column’s content indicates the sort of issues encountered), contact Jennifer Admussen at [email protected].

fEaTurETwo Case Studies of reducing Variabilityby Rich NewmanJohnson & Johnson Vision Care®

When process capability is considered undesirable, most folks attempt to center the process by using engineering knowledge or design of experiments. The benefit of this approach is that a set of process inputs may need to be changed, which may be relatively easy to do. As an example, increasing the temperature of a process from 265° to 275° may result in average tensile strength being at its target value. While centering the process can be of value, understanding and reducing the process variability may have a greater impact on process capability. To illustrate the concepts, I will first provide a non-work example and then provide a work example using contact lenses.

My goal in this article is to illustrate the concepts using many pictures and no equations. For details around the equations, a book is cited in the reference section.

Page 34: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

34 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Two Case Studies of reducing Variability

Non-work ExampleIn December of 2014, my wife and I were discussing gifts for the holidays as well as upcoming resolutions. Fitness trackers, a.k.a. pedometers, were becoming very popular. We both felt like we were very active but weren’t sure exactly how active we really were.SowedecidedtoinvestintheJawboneUp™.MywifewasreadytomakesomeNewYear’sresolutionsintheformofstepgoals.BeforeIarbitrarilypickedgoalswhichcouldhaveresultedinaresolutionthatIforgotaboutcomeFebruary,Iwantedtounderstand or characterize my process first.

AfterIsetupandcalibratedmytracker(EquipmentIQ/OQ),thefirststeptocharacterizetheprocesswastoidentifyallofthepotential sources that could lead to differences in my daily step count. For me, the potential sources of variability, or variability components, were month-to-month, week-to-week, weekend-vs-weekday, and day-to-day. Living in Florida where the weather is typically warm year round, I felt that there was little risk in assuming no season-to-season differences.

The next step was to determine how many levels of each source would be evaluated. For this study, I chose two months thinking that collecting data from two months, across eight weeks, and across sixty days would provide a decent understanding of the process.Furthermore,mywifewason-boardwithpostponingmakingNewYear’sresolutionsuntilthebeginningofMarch.

Thenextstepwastodeterminetheorder,orhierarchy,ofthevariabilitycomponentsfromthehighesttothelowest.Onelevelishigher than another if it encompasses the lower level. In this example, the highest level is month since it encompasses all other variability components. In other words, days, weekends, weekdays, and weeks are all part of the month. The next highest level is weeksinceitencompassesweekend/weekdayandday.Thenextlevelisweekendvs.weekdaysincedaysareapartoftheweekendand weekdays. The lowest level is day.

In variance component analysis, the lowest level is known as the within term because it contains any variability source that was not studied. As an example, I only planned on measuring my daily steps once per day. Thus, measurement error is part of the day-to-day variability. If I took two measurements per day by wearing two fitness trackers, then I would have been able to separate out day-to-day variability from measurement error since day would no longer be the lowest variance component.

As a side note, variance component studies tend to be “bottom-heavy” in that we see the lowest levels considerably more often than the higher levels. In this example, two months go into understanding month-to-month variability while there are sixty days that go into understanding day-to-day variability. As a result, we do a better job understanding the lower levels. For that reason, it is recommended when planning these experiments to take as many levels as possible of the highest sources and a small number of levels for the lower sources.

The next step was to compute the variance components and create the supporting graphs. When computing the variance components,moststatisticalsoftwarepackageslikeMinitab®andJMP®willaskyoutoputinthevariancecomponentsinorderfrom the highest source to the second lowest source. The lowest source is not asked for since it is officially the within term. If

Figure 1: Order of the variance components

Page 35: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 35

Two Case Studies of reducing Variability

you are not sure of the order of the sources or if the ordering includes multiple sources at the same level (this can happen when you have an interaction between two sources), then it is recommended to partner with someone who is trained in statistics when performing the analyses.

In Figure 2, the y-axis is the number of daily steps taken. The x-axis indicates the month, week, and whether the day occurs during the week or weekend. Each point represents the step count for one day. The vertical lines connect the lowest and highest stepcountsforonesetofweekendsorweekdays.Notethatweek1of2015startedonaThursday.Sinceweek1onlyincludedThursday and Friday, I officially started the study on Saturday, January 3rd and counted it as the start of week 2.

Using statistical software, the variance components are presented in Figure 3. The first column lists the variability sources, the secondcolumnprovidesthe%oftotalvariabilityfromeachsource,thethirdcolumnprovidesapictureillustratingthe%oftotal,and the last column provides the standard deviation associated with each component. Thus, the total standard deviation is 4107.6 stepswith5.8%ofthetotalvariabilityfrommonth-to-monthdifferences,12.6%fromweek-to-week,6.4%fromweekendvs.weekday,and75.2%withinasetofweekendsandweekdays.The75.2%includesday-to-dayvariability,measurementerror,andwhatever other sources that were not studied yet varied daily.

There are a couple of things to note from Figure 3. Day is not listed as a variance component. As mentioned previously, since it isthelowestlevel,dayispartofthe“Within”term.Next,thesumofthe%oftotalvariabilityforallofthevariancecomponentsaddsupto100%.Forthatreason,oneshouldlookatthe%oftotalandthesizeofthetotalstandarddeviationinthelastcolumn.Forexample,ifthetotalstandarddeviationwas45steps,onemaynotcareif98%ofthetotalvariabilitywasweek-to-week.Ontheotherhand,ifthetotalstandarddeviationwas15000steps,onemaycareif14%ofthetotalvariabilityismonth-to-month.

Figure 2: Graph of daily steps

Figure 3: Variance components

Page 36: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

36 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Two Case Studies of reducing Variability

To support the conclusions above, Figure 2 can be enhanced for each variability source. Specifically, a horizontal line representing the mean of each level can be added. Furthermore, each level can be graphed using a different symbol. Figures 4a and 4b are provided below. Figure 4a highlights the impact of month by adding horizontol lines for the average number of steps taken per month and coding each month with a different symbol. Figure 4b highlights the impact of week. From both graphs, month-to-month and week-to-week do not provide major opportunities to reduce the process variation.

Thebiggestsourceofvariabilityis“within”atapproximately75%ofthetotalvariability.ThisisseeninFigures2,4a,and4bbythe lengths of the vertical lines connecting the lowest and highest step values per weekend or weekday.

TheEmpiricalRule,whichstates95%oftheobservationsfallwithintwostandarddeviationsoftheaverage,canbeusedtoprovide a range for the number of steps I might take for any given day. The overall average from all sixty days is approximately 14700stepsandtheoverallstandarddeviationfromFigure2isapproximately4100steps.Thus,about95%ofthetime,Iwalkbetween6500and22900steps(14700+/–2*4100).ThisisillustratedinFigure5.

At this point, I can either accept my process or work on reducing the variability. Since the characterization indicated that the biggest opportunitytoreducethevariabilityistofocusonthewithinsource,IsetmyNewYear’sresolutiongoaltotakeaminimumof12000 steps per day. This was chosen under the assumption that the lower values would raise while the higher values would stay the same. To meet this resolution, I would go for walks three times a day, regardless of month, week, weekend, or weekday.

Figures 4a and 4b: Variability graphs highlighting month (on the top) and week (on the bottom)

Page 37: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 37

Two Case Studies of reducing Variability

Figure 6 includes the characterization data and the first two months of step data after setting the goal of 12000 a day minimum. From the graph, one can see the reduced variability.

FromFigure7,thetotalstandarddeviationreducedfrom4106to2376steps.Theimpactisthatthe+/-associatedwithmydailynumber of steps was cut almost in half.

During these two months, my wife also tracked her daily steps. Figure 8 provides a graph of her daily steps and Figure 9 provides the analyses associated with her daily step data. For her, the largest variability source was weekend vs. weekday. Thus, she needs a different solution than me to reduce the variability of her process. Interestingly, she is a high school teacher whoisonherfeetmostofthedaybutdoesn’taccumulatemanystepswhileintheclassroom.Asaresult,shechoseaNewYear’s resolution to reduce the variability by increasing her steps during weekdays by going for a walk on her lunch break and in-between classes.

Figure 5: 95% bounds on my daily number of steps

Figure 6: Daily Steps before and after New Year’s Resolution

Page 38: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

38 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Two Case Studies of reducing Variability

Work ExampleForintellectualproperty(IP)reasons,thefollowingworkexamplealterstherealdataanddescriptionoftheprocess.ThesechangesprotectIPbutstilldemonstratetheconceptsofimprovingcapabilitybyunderstandingandreducingvariability.

During the spring of 2015, an engineer met with me about a new product that was at the early stages of development and having issueswithspherepower,ameasureofdistance.Thespherepowerspecificationsof–3.0+/–0.25DioptersaredictatedbyanISOstandard.Notethatnegativepowersindicatemyopia,ornearsidedness.ThePpkwasat0.52andtheprocesswasrelativelycentered between the specifications. Since the specifications cannot be changed and shifting the process would only result in a larger amount of lenses out of specification, the only way to improve the capability is to reduce the variability.

The first step was to identify the potential variability sources. The engineer and I walked through the process and identified six variability sources. The first was monomer batch. In simple terms, this is the liquid material that turns into the contact lens. A

Figure 7: Variance component analyses for the 2 months after New Year’s Resolution

Figure 8: Graph of my wife’s daily steps

Figure 9: Analyses of my wife’s daily steps

Page 39: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 39

Two Case Studies of reducing Variability

monomer batch is piped directly into the manufacturing line. Typically, one monomer batch feeds up to three manufacturing lines. Thus, line is the second variability source. For each line, lenses are manufactured in groups known as production lots which consist of lenses with identical prescriptions. A new production lot requires appropriate changes to the process to dial in the desired prescription. Thus, lot is the third variability source. Since lots can consist of 100,000 lenses, a lot can be arbitrarily split intomultiplesublots.Notethatnothingchangesfromamanufacturingperspectivebetweenthesublots.Theironlypurposeistodetermine if the process changes over time within an production lot. For each sublot, lenses are tested. Since this particular test is non-destructive, multiple readings are taken for each lens. Thus, sublots and lenses are the fourth and fifth variability sources. The last level is the within term which includes the variability from the multiple readings of the same lens.

The next step was to choose the levels for each variability source. For this experiment, three monomer batches were evaluated. A typical, a weak, and a strong batch were each specifically selected to represent the range of monomer batches and force the variability to be as large as possible. In this manner, if the monomer batch variability is small, we believe that capability will be stable for any incoming monomer batch. For each batch, 3 manufacturing lines and 4 production lots per line were evaluated. These numbers were chosen to make the design as “top heavy” as possible considering time and cost constraints. For each of the 36 production lots, 5 sublots were evaluated. For each sublot, 2 lenses were measured 3 times each. This resulted in 1080 measurements from the 360 lenses.

From Figure 10, the histogram of the data supports the statement that the process is relatively centered with large variability. If only the histogram was evaluated, it would be impossible to identify the opportunities to reduce variability. From Figure 11,

Figure 10: Histogram and variability graph of sphere power values

Figure 11: Analysis of sphere power values

Page 40: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

40 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Two Case Studies of reducing Variability

Figures 12a and 12b: Variability graphs highlighting subplots (on the top) and lines (on the bottom)

the three largest variability sources are sublot-to-sublot, line-to-line, and within lens. Figure 12a takes the graph in Figure 10 and gives each sublot a different symbol. Furthermore, sublots 1 and 4 are slightly darkened to illustrate that generally subplot 1 runs low and subplot 4 runs high. Figure 12b gives each manufacturing line a different symbol. Furthermore, for each batch, one manufacturing line is darkened to illustrate that the lines are not equivalent. With this information, the engineer investigated the manufacturing lines, subplots, and measurement system. He did not investigate the monomer batches, lots, or lenses sincethe%oftotalvariabilityforeachofthesesourceswasrelativelylow.Inthecaseofmonomerbatch,thiswasfantasticas the strongest, weakest, and typical strength batches are relatively repeatable. From the investigation, the engineer made a series of improvements to the process. A follow-up experiment was conducted to determine the impact. This experiment evaluted five monomer batches, three manufacturing lines per batch, one inspection lot per line, nine sublots per lot, one lenspersublot,andthreemeasurementsperlens.Notethatbyevaluatingoneinspectionlotperlineandonelenspersublot,

Page 41: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 41

the variability sources of inspection lot and lens cannot be estimated. Also note that five monomer batches were chosen to increase the number of lines evaluated as opposed to estimate the variability source of monomer batch.

Figure 13 includes the graph with data pre-changes and post-changes. Figure 14 provides the variability source information for the post-change data only. From the graph, the variability has been greatly reduced post change. From the table, the total standard deviation reduced from 0.13891 to 0.06115 Diopters. At the same time, the graph and the variability analyses indicate that there may be an opportunity to further reduce line-to-line differences.

In summary, variance components are fantastic for characterizing and finding opportunties to reduce variation. The experiment requires determining the sources of interest, the heirarchy of the sources, and the levels for each source. The analyses determine which sources are making the largest contributions to the total variability and which sources are negligible. The next step is to use engineering, science, statistics, or any other tool to reduce the variability. Ideally, an experiment is run to confirm that the variability has been reduced. As indicated in this article, understanding and reducing variability can apply to anything from daily stepstothemanufacturingofcontactlensestohelpingdetermineNewYear’sresolutions.

ReferenceMontgomery, Douglas C., Design and Analysis of Experiments 3rd Edition, 1991, John Wiley and Sons

About the AuthorRichNewmanisaSeniorPrincipalQAStatisticianforJohnson&JohnsonVisionCare®.HereceivedhisPh.D.inStatisticsfromTexasA&MUniversity in 1998. His entire career has been in the medical device industry. In addition to roles in statistics, he has taken on roles around validations, quality, operations, and technical recruiting.

Two Case Studies of reducing Variability

Figure 13: Variability Graph pre and post changes

Figure 14: Analysis of post change data only

Page 42: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

42 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Nominations Sought for 2016 William G. Hunter Award

William G. Hunter was the founding chair of the Statistics Division of the American Society for Quality Control (now American Society for Quality). His leadership as a communicator, consultant, educator, and innovator and his ability to integrate statistical thinking into many disciplines serve as exemplary models for the Division’s members.

Objective: The Statistics Division established the William G. Hunter Award in 1987 to encourage and promote outstanding accomplishments during a career in the broad field of applied statistics, and to recognize implementers who get results.

Qualifications: Any outstanding leader in the field of applied statistics, regardless of ASQ or ASQ Statistics Division membership status, is qualified. Candidates must have demonstrated a high level of professionalism, significant contributions to the field, and a history of inspirational leadership. A person may be nominated many times, but can win the award only once.

Procedure: The nominator must have the permission of the person being nominated and letters from at least two other people supporting the nomination. Claims of accomplishments must be supported with objective evidence. Examples include publication lists and letters from peers. Nominators are encouraged to read the accompanying article “William G. Hunter: An Innovator and Catalyst for Quality Improvement” written by George Box in 1993 (http://williamghunter.net/george-box-articles/william-hunter-an-innovator-and-catalyst-forquality-improvement) to get a better idea of the characteristics this award seeks to recognize.

Nominations: Nominations for the current year will be accepted until June 30. Those received following June 30 will be held until the following year. A committee of past leaders of the Statistics Division selects the winner. The award is presented at the Fall Technical Conference in October. The award criteria and nomination form can be downloaded from the “Awards” page of the ASQ Statistics Division website (http://asq.org/statistics/about/awards-statistics.html) or may be obtained by contacting Necip Doganaksoy ([email protected]).

Statistics Division Members Recognized as ASQ Fellows

Eduardo Heidelberg, QRM Consultants LLC, Parlin, N.J.—In recognition of his long-term career as a quality practitioner; for continuous dedication to improving the understanding of quality risk management; and for leadership and mentoring efforts at the section, division and national levels within ASQ.

T. M. Kubiak, Performance Improvement Solutions, Fort Mohave, Ariz.—For exceptionally long service to ASQ and outstanding leadership and commitment to the quality profession and the Society; for passion and dedication to quality and excellence in utilizing diverse process improvement tools; and for dedicated professional leadership through the ASQ Six Sigma Black Belt handbook series.

Robert William Stoddard II, Software Engineering Institute, Pittsburgh—For contributions to the fields of quality, reliability, Six Sigma, and process improvement as a book author, innovative leader, patent holder, entrepreneur, corporate leader, recognized quality award winner, ASQ certified instructor, long-time active ASQ division leader and member of the core team initiating the ASQ Certified Software Quality Engineer exam.

Last chance to apply for BOTH the FTC Student Scholarship and FTC Early Career Grant to attend the Fall Technical Conference! Apply by August 1st, 2016. Applications available at

http://asq.org/statistics/about/awards-statistics.html

Page 43: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 43

upcoming Conference CalendarRSS 2016 International Conference5–8September,2016Manchester,UnitedKingdomhttp://www.statslife.org.uk/events/events-calendar/eventdetail/480/-/rss-2016-international-conference

TheRSSconferenceprovidesauniqueopportunityforanyoneinterestedindataanalysisandstatisticstocometogethertoshareinformationandnetwork.Whateveryourexperience,fieldofexpertiseorbackground,theRSSconferenceprovidesanopenforumto exchange knowledge and experiences, whether in the formal conference sessions or in the many networking opportunities at refreshment breaks and evening social events. The conference is popular to such a diverse audience because it offers a broad and variedprogramoftalksandworkshopsnotfoundinanyotherUKbasedstatisticalconference.

ENBIS-16 Conference11–15September,2016Sheffield.UnitedKingdomhttp://www.enbis.org/activities/events/current/424_ENBIS_16_in_Sheffield/?_ts=1880&_ts=1880

We cordially invite you not only to engage in highly rewarding scientific and professional exchange during the conference, but also to findsomeleisuretimeandexplorewhatSheffieldandYorkshirehavetooffer.MoreinformationandtheformalCallforPaperswillbe available soon.

60th Annual Fall Technical Conference6–7October,2016Minneapolis,MNhttp://asq.org/conferences/fall-technical/

The Fall Technical Conference is the premier forum for researchers and practitioners to discuss the more effective use of statistical methodsforresearch,innovation,andqualityimprovement.Itisco-sponsoredbyASQ(Chemical&ProcessIndustriesDivisionandtheStatisticsDivision)andtheAmericanStatisticalAssociation(SectiononPhysicalandEngineeringSciencesandSectiononQualityandProductivity).

Lean and Six Sigma Conference27–28February2017Phoenix,AZhttp://asq.org/conferences/six-sigma/about.html

Do you have technical proficiencies and leadership responsibilities within your organization? Are you actively involved in process improvement, organizational change, and development dynamics related to a successful lean and Six Sigma culture? This conference is for you!

2017 Joint Statistical Meeting29July–3August,2017Baltimore,MDhttp://www.amstat.org/meetings/jsm.cfm

JSM(theJointStatisticalMeetings)isthelargestgatheringofstatisticiansheldinNorthAmerica.TheJSMprogramconsistsnotonlyof invited, topic-contributed, and contributed technical sessions, but also poster presentations, roundtable discussions, professional development courses and workshops, award ceremonies, and countless other meetings and activities.

Page 44: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

44 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 2, 2016 asqstatdiv.org

Statistics Division Committee Roster 2016

CHAIR Theresa [email protected]

CHAIR-ELECTHerb [email protected]

TREASURER Mindy [email protected]

SECRETARYGary Gehring [email protected]

PAST CHAIR Adam [email protected]

Operations

OPERATIONS CHAIRJoel [email protected]

MEMBERSHIP CHAIRGary Gehring [email protected]

VOICE OF THE CUSTOMER CHAIRJoel [email protected]

CERTIFICATION CHAIRBrian [email protected]

STANDARDS CHAIRMark [email protected]

Member Development

MEMBER DEVELOPMENT CHAIRMindy Hotchkiss [email protected]

OUTREACH/SPEAKER LIST CHAIR Steve Schuelka [email protected]

EXAMINING CHAIRDoug [email protected]

Content

CONTENT CHAIRAmy Ste. Croix [email protected]

NEWSLETTER EDITORMatthew [email protected]+49-152-05421794

WEBINAR COORDINATORAshley [email protected]

SOCIAL MEDIA MANAGERBrian [email protected]

WEBSITE AND INTERNET LIAISONLandon [email protected]

STATISTICS BLOG EDITORGordon [email protected]

STATISTICS DIGEST REVIEWER AND MEMBERSHIP COMMUNICATIONS COORDINATORAlex [email protected]

Awards

AWARDS CHAIR Scott [email protected]

OTT SCHOLARSHIP CHAIRLynne [email protected]

FTC STUDENT/EARLY CAREER GRANTSJennifer [email protected]

HUNTER AWARD CHAIRJoel [email protected]

NELSON AWARD CHAIROpen

BISGAARD AWARD CHAIRScott [email protected]

YOUDEN AWARD CHAIRAdam [email protected]

Conferences

WCQI/TCC CONFERENCEGordon [email protected]

FTC STEERING COMMITTEEPeter Parker [email protected]

FTC PROGRAM REPRESENTATIVEMindy Hotchkiss [email protected]

FTC SHORT COURSE CHAIR Jiguo [email protected]

Auditing

AUDIT CHAIR Steve [email protected]

By-Laws

BY-LAWS CHAIRAdam [email protected]

Nominating

NOMINATING CHAIR Adam [email protected]

Planning

PLANNING CHAIR Theresa [email protected]

APPOINTED

OFFICERS

Page 45: STATISTICS DIGEST - ASQasq.org/statistics/2016/06/statistics-digest-june-2016.pdf · 2016-06-28 · by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

The ASQ Statistics Division Newsletter is published three times a year by the Statistics Division of the AmericanSociety for Quality.

All communications regarding thispublication, EXCLUDING CHANGE OF ADDRESS, should be addressed to:

Matthew Barsalou Editoremail: [email protected]

Other communications relating to the ASQ Statistics Division should be addressed to:

Theresa I. Utlaut Division Chairemail: [email protected]: (503) 613-7763

Communications regarding change of address should be sent to ASQ at:

ASQP.O. Box 3005Milwaukee, WI 53201-3005

This will change the address for all publications you receive from ASQ. You can also handle this by phone (414) 272-8575 or (800) 248-1946.

Upcoming NewsletterDeadlines for Submissions

Issue Vol. No. Due DateOctober 35 3 August 15

ASQ Statistics Division

VISIT THE STATISTICS DIVISION WEBSITE:www.asq.org/statistics

ASQ Periodicals with Applied Statistics content

Journal of Quality Technologyhttp://www.asq.org/pub/jqt/

Quality Engineeringhttp://www.asq.org/pub/qe/

Six Sigma Forumhttp://www.asq.org/pub/sixsigma/

STATISTICS DIVISION RESOURCES:

LinkedIn Statistics Division Grouphttps://www.linkedin.com/groups/ASQ-Statistics-Division-2115190

Scan this to visit our LinkedIn group!

Connect now by scanning this QR code with a smartphone (requires free QR app)

Check out our YouTube channel at youtube.com/

asqstatsdivision

If confidence intervals on individual parameters do not overlap, we know for sure a statistically significant difference exists. It’s when the confidence intervals do overlap that the conclusions are unclear. We must rely on additional exploratory analysis to determine statistical significance and expert knowledge to determine practical significance.

Connie M. Borror, “Statistics Roundtable: On Overlapping.” Quality Progress. Vol. 45 No. 4.