Capability in Rockwell C Scale Hardness.pdf

23
Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology [J. Res. Natl. Inst. Stand. Technol. 105, 511 (2000)] Capability in Rockwell C Scale Hardness Volume 105 Number 4 July–August 2000 Walter S. Liggett, Samuel R. Low, David J. Pitchure, and John Song National Institute of Standards and Technology, Gaithersburg, MD 20899-8980 [email protected] [email protected] [email protected] [email protected] A measurement system is capable if it pro- duces measurements with uncertainties small enough for demonstration of compli- ance with product specifications. To es- tablish the capability of a system for Rock- well C scale hardness, one must assess measurement uncertainty and, when hard- ness is only an indicator, quantify the re- lation between hardness and the product property of real interest. The uncertainty involves several components, which we des- ignate as lack of repeatability, lack of re- producibility, machine error, and indenter error. Component-by-component assess- ment leads to understanding of mechanisms and thus to guidance on system upgrades if these are necessary. Assessment of some components calls only for good-quality test blocks, and assessment of others re- quires test blocks that NIST issues as Standard Reference Materials (SRMs). The important innovation introduced in this paper is improved handling of the hardness variation across test-block surfaces. In addition to hardness itself, the methods in this paper might be applicable to other local measurement of a surface. Key words: calibration; critical to product quality; experimental design; indentation hardness; measurement system comparison; spatial statistics; standard reference mate- rial; surface measurement; test method; trend elimination; uncertainty compo- nent. Accepted: February 29, 2000 Available online: http://www.nist.gov/jres 1. Introduction Systems for hardness measurement do not follow ex- actly a protocol that one might choose. For this reason, there is a difference between the measurement one ob- tains and the measurement one wishes to obtain. To characterize this difference, one can perform test mea- surements and from these compute the measurement uncertainty. This paper shows how to do this. Rockwell C scale measurement is an indentation test [1,2]. A protocol for performing this test consists of prescriptions for choosing a properly shaped indenter, for driving the indenter into the material, and for calcu- lating the hardness value from the indentation depths. The indenter prescribed is a diamond cone with 120 cone angle blended in a tangential manner with a spher- ical tip of 200 m radius. Driving the indenter into the material involves two levels of force, 98.07 N and 1471 N. (These forces can be achieved with masses of 10 kg and 150 kg.) First, as shown in Fig. 1, the smaller force is applied for a prescribed time interval and then the first indentation depth is measured. Second, the force is increased to the higher level in a way that drives the indenter into the material at a prescribed velocity. Third, the larger force is held for a prescribed interval. Fourth, the force is reduced back to the lower level and held for a prescribed interval after which the second indenta- tion depth is measured. Not everyone who chooses a 511

Transcript of Capability in Rockwell C Scale Hardness.pdf

Page 1: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

[J. Res. Natl. Inst. Stand. Technol. 105, 511 (2000)]

Capability in Rockwell C Scale Hardness

Volume 105 Number 4 July–August 2000

Walter S. Liggett, Samuel R. Low,David J. Pitchure, and John Song

National Institute of Standards andTechnology,Gaithersburg, MD 20899-8980

[email protected]@[email protected]@nist.gov

A measurement system is capable if it pro-duces measurements with uncertaintiessmall enough for demonstration of compli-ance with product specifications. To es-tablish the capability of a system for Rock-well C scale hardness, one must assessmeasurement uncertainty and, when hard-ness is only an indicator, quantify the re-lation between hardness and the productproperty of real interest. The uncertaintyinvolves several components, which we des-ignate as lack of repeatability, lack of re-producibility, machine error, and indentererror. Component-by-component assess-ment leads to understanding of mechanismsand thus to guidance on system upgradesif these are necessary. Assessment of somecomponents calls only for good-qualitytest blocks, and assessment of others re-quires test blocks that NIST issues as

Standard Reference Materials (SRMs). Theimportant innovation introduced in thispaper is improved handling of the hardnessvariation across test-block surfaces. Inaddition to hardness itself, the methods inthis paper might be applicable to otherlocal measurement of a surface.

Key words: calibration; critical to productquality; experimental design; indentationhardness; measurement system comparison;spatial statistics; standard reference mate-rial; surface measurement; test method;trend elimination; uncertainty compo-nent.

Accepted: February 29, 2000

Available online: http://www.nist.gov/jres

1. Introduction

Systems for hardness measurement do not follow ex-actly a protocol that one might choose. For this reason,there is a difference between the measurement one ob-tains and the measurement one wishes to obtain. Tocharacterize this difference, one can perform test mea-surements and from these compute the measurementuncertainty. This paper shows how to do this.

Rockwell C scale measurement is an indentation test[1,2]. A protocol for performing this test consists ofprescriptions for choosing a properly shaped indenter,for driving the indenter into the material, and for calcu-lating the hardness value from the indentation depths.The indenter prescribed is a diamond cone with 120�

cone angle blended in a tangential manner with a spher-ical tip of 200 �m radius. Driving the indenter into thematerial involves two levels of force, 98.07 N and1471 N. (These forces can be achieved with masses of10 kg and 150 kg.) First, as shown in Fig. 1, the smallerforce is applied for a prescribed time interval and thenthe first indentation depth is measured. Second, the forceis increased to the higher level in a way that drives theindenter into the material at a prescribed velocity. Third,the larger force is held for a prescribed interval. Fourth,the force is reduced back to the lower level and held fora prescribed interval after which the second indenta-tion depth is measured. Not everyone who chooses a

511

Page 2: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

Fig. 1. Rockwell C scale hardness test illustrated with change of force withtime (a) and resulting indentation depth (b). Indenter depth measurementsused in calculation of hardness value are indicated by � symbol.

Rockwell C scale protocol chooses the same prescribedintervals and velocity. The American Society for Testingand Materials (ASTM) [1] and the International Organi-zation for Standardization (ISO) [2] only limit thechoices allowed. The hardness is calculated from thedifference between the second depth and the first. Thisdifference is expressed in multiples of 0.002 mm andsubtracted from 100. For steels, Rockwell C scale hard-ness values are typically between 25 HRC and 65 HRC.

In manufacturing, one generally indexes measure-ment capability by contrasting measurement uncer-tainty with the tolerance that the product is required tomeet [3]. The tolerance is the difference between theupper and lower specification limits for the productproperty. In the case of hardness, tolerances are an issuebecause the bases on which they are set are often ob-scure. What is usually not made explicit is that hardnessis an indicator of a product property critical to qualityand that a tolerance on the critical property is the properbasis for a hardness tolerance.

Two features distinguish the approach to uncertaintyassessment presented here. The first is reliance entirelyon test blocks, and the second is specification of blocklocations for test measurements. Although we do notdeal with it here, one can obtain insight into hardnessuncertainty by comparing what one considers to be theideal protocol with the actual realization of this proto-col. Such an approach can be based on propagation ofuncertainties [4]. For some deviations of the realizationfrom the ideal, such as a deviation in applied forces, inindenter geometry, or in time intervals of force applica-tion, one can (1) characterize the deviation as an uncer-tainty, (2) obtain sensitivity coefficients that relate suchdeviations to the resulting hardness measurements, and(3) assess the corresponding component of the measure-ment uncertainty. The coefficients in the second stepcannot be obtained mathematically as in the usual prop-agation of uncertainties but must be obtained experi-mentally. Rather than taking this three-step approach,we assess the contribution to the uncertainty of such

512

Page 3: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

deviations through use of the NIST Rockwell C scaleSRMs [5, 6]. Another possibility that we do not dealwith here is arbitrary choice of measurement locationson test blocks. Commonly, guidelines recommend use ofthe average of five measurements taken somewhere onthe test block (Annex H in [4]). The computations asso-ciated with this are based on the assumption that thelocations were chosen randomly. Rather than consider-ing this possibility, we specify measurement locationsand proceed accordingly.

The usefulness of measurement uncertainty for deci-sion making increases when the relation between com-ponents of uncertainty and the physical mechanisms thataffect the measurement system are clarified. In its orga-nization, this paper proceeds component by componentgiving for each component the needed test measure-ments and analysis. Although we do not define a compo-nent for each possible mechanism, the components wedefine can be related to groups of possible mechanismsand further distinctions are sometimes possible. Think-ing in terms of mechanisms is especially useful in de-ciding how one’s hardness equipment might be up-graded. Such thinking is also important when one istrying to understand why the results of two hardnessmeasurements differ. Not all mechanisms and thus notall uncertainty components necessarily contribute to theexplanation of the difference between two measure-ments. Thinking in terms of mechanisms allows one todecide which components can properly be used to ex-plain an observed inconsistency.

The mention of equipment upgrades raises the ques-tion of whether the expense is necessary. This involvesnot only the measurement uncertainty but also theproduct requirements that the measurement system is toensure. This issue is the subject of the next section.

The remaining sections explain how to assess experi-mentally each uncertainty component. The experimen-tal work makes use of test blocks, which introduce an-other source of variation, the nonuniformity of thehardness across test block surfaces. Dealing with thissource of variation as one assesses the hardness uncer-tainty components is the subject of Secs. 3-5.

As usually applied, Rockwell C scale hardness is atest method that indicates properties of an entire unitthrough measurement of only a small portion of the unit.Development of test methods with this characteristic isimportant beyond hardness applications. For example,under the name “combinatorial methods,” there is con-siderable interest in experiments involving runs undermany different conditions but with only a small quantityof material for each condition, a quantity so small thatonly an indicator of the desired product property can bemeasured. Because such experiments are usually per-formed with material samples laid out on a surface, the

uncertainty analysis methods in this paper may be in-structive.

2. Product Specification Limits

Rockwell C scale hardness is used to test the safety ofairplane landing gear, the strength of fasteners, thesafety of gas cylinders under accidental impact, and theperformance of rotary lawn mower blades. Theseproduct characteristics and hardness are related in thecase of steel because both are affected by heat treat-ment. Excess heat treatment leads to landing gear thatsnap, gas cylinders that shatter, and mower blades thatfracture. Insufficient heat treatment leads to landinggear that bends, gas cylinders that deform under stress,and mower blades that become dull rapidly.

Clearly, some product characteristics measured byRockwell C scale hardness are critical to quality. Manu-facturers of products for which this is true have littlechoice but to develop and maintain appropriate capabil-ity in hardness measurement. An alternative test methodis, of course, a possibility, but such development wouldlikely be very expensive. Thus, treating Rockwell hard-ness with disdain because of its simple protocol andlong history is not an option.

For some product characteristics but not hardness, theengineering knowledge needed to determine acceptablevalues of the characteristic, that is, to set specificationlimits, is available without further experiment. For ex-ample, say that the characteristic is the shape of a part.To calculate specification limits, one could envision howdeviations from the ideal shape would affect the use ofthe part, identify various part dimensions that togethercan be employed to assure successful usage, and finallyspecify limits on these part dimensions. Similarly, saythat the characteristic is the composition of some mate-rial. To calculate specification limits, one could antici-pate how various constituents would affect usage and setspecification limits on the concentrations of these con-stituents. These are examples of product characteristicsfor which one can set specification limits theoretically.

The case of Rockwell C scale hardness is different. Infact, this difference is sometimes emphasized by sayingthat determinations of dimension and of concentrationare measurements whereas Rockwell hardness is a testmethod. As illustrated by Rockwell hardness, test-method protocols are sometimes quite removed fromproduct characteristics of real interest. There is no quan-titative model that connects Rockwell hardness and theperformance of airplane landing gear under stress. Theonly way to establish the connection is through experi-ment. In the case of landing gear, this effort involvesdetermination of the relation between tensile strength

513

Page 4: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

and hardness. In fact, one can find publications that givethis relation [7], but one must be concerned with thereliability of the published relation and whether it holdsfor all classes of steel. It would seem that in many cases,the use of a test method to assure a product characteris-tic requires costly experimental work and that one wouldonly do this work if the characteristic were critical toquality and a test method were the only way to gauge thecharacteristic.

To compensate for problems in the experimentalwork connecting product characteristic and test method,one might think of tightening the specification limitsuntil one is sure that test method results indicate satis-factory product performance. This reasoning may bebehind some customer-supplier agreements that containspecification limits on Rockwell C scale hardness.Tightening specification limits, of course, results inmore stringent requirements on the uncertainty of thetest method. Reducing this uncertainty may be difficult.In this case, one must decide what experimental work toinvest in: One could improve one’s understanding ofthe specification limits or reduce one’s uncertainty inuse of the test method.

The idea that specification limits drive uncertaintyrequirements is complicated in the case of Rockwell Cscale hardness because of differences among uncer-tainty components. The most important difference isbetween the components that describe the variation of aparticular testing system and the components that arisewhen one is comparing systems. Characterization of theformer entails what is commonly referred to as gagerepeatability and reproducibility (gage R&R) [3]. Char-acterization of the latter involves comparison of testingmachines and indenters.

Rockwell C scale measurement can be used to reducethe variation in a heat treatment process. Although re-duced variation alone does not guarantee that the partsproduced will be satisfactory, such reduction is a rea-sonable first step. If this is one’s goal, one should useonly a single hardness measurement system. The mea-surement capability needed involves only the short andlong term variation of this system. Thus, a gage R&Rstudy is sufficient to characterize the uncertainty com-ponents of interest and thus to determine whether thehardness system is capable for the task.

More broadly, one must consider the relation of one’shardness measurements to the hardness measurementsthat were part of setting the specification limits. Gener-ally, these two sets of measurements will involve differ-ent machines and indenters. Despite the differences, thespecification limits must apply to measurements fromboth. There might be just two systems involved, thesupplier’s and the customer’s. Alternatively, both sys-tems might be referred to the hardness measurements

made by NIST. In either case, the uncertainty compo-nents that characterize the machine and indenter errorsenter the determination of measurement capability. Of-ten, these errors are larger than the errors apparent in agage R&R study, and determining how to reduce themby upgrading a measurement system is more difficult.

3. Gage R&R3.1 Assessment With Nonuniform Blocks

Repeatability, the first “R” in “R&R”, is defined as“closeness of the agreement between the results of suc-cessive measurements of the same measurand carriedout under the same conditions of measurement” [4].Two stipulations in this definition, same measurand andsame conditions, require care. Successive measure-ments of the same measurand cannot be realized be-cause a hardness measurement can be made only onceat each test block location and test block hardness is notperfectly uniform. Thus, a way around block nonunifor-mity must be employed. In the definition, the words“same conditions” must be augmented by spelling outthe conditions to be held constant.

For the dead-weight, Rockwell-hardness machine atthe NIST [6], we specify that “successive measure-ments...under the same conditions” means an uninter-rupted sequence of measurements made with the sameindenter and machine configuration. Our use of theword “uninterrupted” is intended to imply that the mea-surements are made one after another over a short pe-riod of time. “Made with the same machine configura-tion” means made without changing the test cyclesketched in Fig. 1 or changing the setting of the ma-chine in any other way. Because the NIST machine hasa flat anvil, we add the proviso that in the course of themeasurements, the block not be removed from the anvil.For testing machines of other designs, other provisosmight be suitable. Subsequently in this paper, we fre-quently stipulate that measurement conditions be heldconstant as in repeatability assessment.

We define repeatability in terms of measurements ona perfectly uniform test block and then show how re-peatability can be estimated with nonuniform testblocks. The key is prescription of measurement loca-tions on the test blocks. The prescribed locations mustbe far enough apart to avoid the hardening caused by theresidual stress left by an indentation and must be sym-metrical and close enough that deviations due to blocknonuniformity cancel. Most of the methods presented inthis section are based on measurement patterns com-posed of hexagons with 6 mm between adjacent ver-tices. The distance 6 mm seems large enough to avoidthe crowding effect and small enough to allow suffi-

514

Page 5: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

cient indents on a block. Measuring at the vertices andcenter of a 6 mm hexagon requires that one developone’s experimental technique. This can be done withsome effort.

That block nonuniformity cannot be ignored whenthe machine has good repeatability is illustrated by themeasurements that are presented in Fig. 2 as a his-togram. NIST obtained these 83 values by measuringone of the SRM blocks, 95N63001, on a square 5 mmgrid. The measurement spread shown in this histogramis caused by both lack of repeatability and block nonuni-formity. These two sources can be distinguished if thelocation of the measurements is taken into account.Block nonuniformity is largely characterized by smoothvariation with location because the major causes of thisnonuniformity, variation in the material and the heattreatment applied to it, vary smoothly. Thus, the hard-ness contour plot in Fig. 3 indicates how much the blocknonuniformity contributes to the measurement spreadshown in Fig. 2. Figure 3 shows that this block is hardernear its edges and softer in the middle. Moreover, thevariation shown by the contours largely explains thespread in Fig. 2. Not only do Figs. 2 and 3 illustrate thepossible impact of block nonuniformity on uncertaintyassessment, they also illustrate that the basis for dealingwith block nonuniformity is the smooth variation ofhardness with location.

3.2 Trend Elimination

As the basis for the methods in this section, we as-sume that hardness variation over a 6 mm hexagon re-sembles variation with a constant gradient. (See Sec 3.8for further discussion of this approximation.) Figure 4shows a 6 mm hexagon with numbered vertices that isembedded in an equilateral triangular grid covering ablock 52 mm in diameter. Say that we measure the cen-ter point, number 7, and a cross-hexagon pair, perhapsvertices 1 and 4, holding measurement conditions con-stant as in repeatability assessment. If the hardness hasa constant gradient, then, except for the variation due tolack of repeatability, the center measurement equals theaverage of the other two measurements. By examiningthe deviation from equality, we can assess the re-peatability without concern for block nonuniformity. Inaddition, arrangement of measurements on a 6 mmhexagon can be used to eliminate block nonuniformityin system comparisons such as comparison of indenters.If we choose conditions for the center measurement thatdiffer from those for the other two, then we can deter-mine the effect of this difference hindered only by thelack of repeatability, not by block nonuniformity.

The foregoing can be expressed mathematically. Asshown in Fig. 4, let the locations on the outside of thehexagon be numbered clockwise from 1 to 6, and let thecenter be numbered 7. The 3 cross-hexagon pairs arevertices 1 and 4, vertices 2 and 5, and vertices 3 and 6,

Fig. 2. Histogram of measurements taken on block 95N63001.

515

Page 6: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

Fig. 3. Contour plot for block 95N63001 obtained with commercial software.

respectively. The x-y coordinates of the locations areapproximately given in millimeters by (–6,0), (–3,5),(3,5), (6,0), (3,–5), (–3,–5), and (0,0). Let the sevenhardness measurements be denoted Hi , i = 1,...,7. Leteach cross-hexagon pair be measured under constantconditions as in repeatability assessment. The compari-sons are among (Hi + Hi+3)/2, i = 1,...,3 and H7. If theblock nonuniformity has a constant gradient, then theactual hardness as a function of x-y coordinates on thesurface is given by b0 + b1 xi + b2 yi for some coeffi-cients b0, b1, and b2. We see that nonuniformity with thisdependence on location does not enter the comparisons.

It is useful to express the hardness measurements ina way that makes explicit the effect of the lack of re-peatability. Consider the general case of system com-parisons. In this case, each cross-hexagon pair is mea-sured under the same conditions, but different pairs and

the center point may be measured under different condi-tions. We denote the system conditions for vertices 1 and4 by A, the conditions for 2 and 5 by B, the conditionsfor 3 and 6 by C, and the conditions for the center pointby D. We have

(H1 + H4)/2 = �A + (�1 + �4)/2(H2 + H5)/2 = �B + (�2 + �5)/2(H3 + H6)/2 = �C + (�3 + �6)/2

H7 = �D + �7

The �i, i = 1,...,7, denote the effects of the lack of re-peatability. We model these seven effects as statisticallyindependent with zero mean and standard deviation � .We can think of �A, �B, �C, �D as the hardness valuesone would obtain at the center point of the hexagonunder the four sets of conditions and in the absence of

516

Page 7: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

Fig. 4. A 6 mm equilateral triangular grid for a block with a hexago-nal measurement pattern highlighted and numbered.

any effect of lack of repeatability. If the sets of condi-tions were the same, these values would be the same.

If we knew the value of � , then we could obtain aconfidence interval for �A – �B, say. The standard devia-tion of two-location averages is � /�2. Recall that thevariance (the square of the standard deviation) of a sumof independent errors is given by the sum of the vari-ances of the individual errors. A 95 % confidence inter-val for �A – �B is given by

(H1 + H4)/2 – (H2 + H5)/2 � 1.96 � .

The formula that applies when the center value is in-volved is somewhat different. A 95 % confidence inter-val for �A – �D, say, is given by

(H1 + H4)/2 – H7 � 1.96 �3/2 � .

Other comparisons are analogous to one or the other ofthese formulas. Note that whether or not these compari-sons are satisfactory for the purpose depends on the sizeof the standard deviation � . If the standard deviation istoo large, more precise comparisons can be made byimproving the repeatability of the machine or by repeat-ing the measurements on hexagons laid out elsewhereon the block.

3.3 Assessing Repeatability

Consider measurement of all seven points of a 6 mmhexagon under constant conditions as in repeatability

assessment. As explained in the previous section, if theblock nonuniformity has constant gradient, only the lackof repeatability causes (H1 + H4)/2, (H2 + H5)/2,(H3 + H6)/2, and H7 to differ. Moreover, only lack ofrepeatability causes H1 + H3 + H5 – H2 – H4 – H6 to dif-fer from 0. Thus, one can estimate � , that is, assess therepeatability, from these quantities.

Usually, one estimates the variance � 2 and denotesthe estimate by s 2. Let the mean of the seven readingsbe

–H =

17 �7

i = 1

Hi .

The variance estimate is given by

s 2 =14 �[H7 –

–H ]2 + �3

i = 1

2 [(Hi + Hi + 3)/2 ––H ]2

+ [H1 + H3 + H5 – H2 – H4 – H6]2/6�.

Because this estimate has only 4 degrees of freedom, itsvariability must be taken into account when it is used todraw conclusions. If the corresponding standard devia-tion s were to be used instead of � to form confidenceintervals, then the appropriate value from the Student’st -table, 2.776, would replace the 1.96 shown above.

For purposes such as reporting the lack of repeatabil-ity of a testing machine, one would want an estimate of� with more degrees of freedom. One way to do this isto measure more than one hexagon with the same inden-ter, say m hexagons. Let the hardness measurement forlocation i on hexagon j be given by Hji . Proceeding asabove for each hexagon, we obtain an estimate of � foreach hexagon, which we denote by sj . The overall esti-mate of the standard deviation is given by

� 14m

�mj = 1

4 s j2

.

This estimate has 4m degrees of freedom. More gener-ally, if the estimate sj had �j degrees of freedom, then theoverall estimate would have

�mj = 1

�j

degrees of freedom and would be given by

517

Page 8: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

��mj = 1

�j s j2 � �m

j = 1

�j .

In many cases, the 12 degrees of freedom obtained from3 hexagons is sufficient.

Figure 5 shows, for each of three hardness levels, themeasurement results on three hexagons given as devia-tions from

–Hj . The units are hardness on the Rockwell

C scale (HRC). The center is plotted with a circle, andmembers of different cross-hexagon pairs are plottedwith triangles, squares, and diamonds, respectively. Thevariation shown is due to block nonuniformity and lackof repeatability. Nonuniformity with a steep gradientwould lead to some pairs exhibiting large deviations inopposite directions from the center line. There is evi-dence of this. Members of pairs do not lie equal dis-tances from the center line because of lack of repeatabil-ity.

For the three HRC levels 25, 45, and 63 the standarddeviation estimates s for the variation due to lack ofrepeatability are 0.029 HRC, 0.033 HRC, and 0.024HRC, respectively. Each of these estimates has 12 de-grees of freedom. These estimates do not providedefinitive evidence that the repeatability varies withhardness level.

3.4 Long-Term Variation

Comparison of cross-hexagon averages with centermeasurements obtained previously is one way to moni-tor the long-term variation of a hardness measurementsystem. One begins by measuring the centers of enoughhexagons to cover the period during which monitoringis to occur. This may require laying out hexagons onseveral blocks of the same hardness level. One measuresthe centers under the constant conditions necessary toavoid any variation beyond the inevitable lack of re-peatability. Then, over the monitoring period, one mea-sures cross-hexagon pairs and compares pair averageswith the center. Note that this approach cancels bothblock nonuniformity and block-to-block variation. Forthis reason, we can combine readings from hexagons ondifferent blocks in the same way we combine readingsfrom hexagons on the same block.

To describe this approach mathematically, let theouter points of each hexagon be indexed by i as above,and let j index the hexagons. Denote the center measure-ment for hexagon j by H0j . The order in which thecross-hexagon pairs are measured during the monitor-ing is important. It is perhaps best to measure one pairon each hexagon before returning to measure a secondpair on any hexagon. Thus, for some ordering of thehexagons, one measures a pair on each hexagon, then asecond on each, and finally a third. For each monitoringpoint, one can place on a run chart the deviation of theacross-hexagon average from the center-point measure-ment

Fig. 5. Repeatability data with different symbols for the center point (circle) and each cross-hexagonpair (triangle, square, or diamond).

518

Page 9: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

(Hji + Hj(i + 3))/2 – H0j.

Frequency of monitoring and the measurement condi-tions to be held constant during monitoring are bothissues to be decided.

For each hardness level, Fig. 6 shows a run chart withthese deviations plotted versus time. On the day thecenter measurements were made, three hexagons foreach level were measured fully to assess the repeatabil-ity. These measurements are the ones shown in Fig. 5.From these data, we computed the nine deviationsshown at day 0. A monitoring deviation was obtainedeach day the machine was to be used, but as shown, themachine was used irregularly during the period por-trayed. On some days, two or even three monitoringdeviations were obtained. All of these are shown in Fig.6.

The run charts in Fig. 6 are worrisome because thereseem to be special causes that cannot be easily identi-fied. Clearly, the machine was operating differently onthe day that the center measurements were made. Forthis reason, almost all the deviations are positive. More-over, during the period covered by the last part of thechart, the deviations seem to fall and then recover. Con-cern about the causes of these appearances is reenforcedby the fact that they appear to some degree at eachhardness level, although they are most pronounced at the

lowest level. The first place one might look for causes isin the mechanical operation of the testing machine. Onewould like to institute some procedure for using themachine that would assure that whatever the cause, itseffect would be limited. Some such procedures, for ex-ample, warm up procedures, are already in place butmay not be fully effective. In particular, the NIST ma-chine provides a trace of indenter depth versus time foreach measurement. At the beginning of each day, thistrace is used to detect the need to adjust the movingparts of the machine so that the forces are applied at theproper rates. After some thought, we have concludedthat although Fig. 6 is worrisome, the effects are notpronounced enough to permit identification of thecauses.

3.5 Assessing Reproducibility

Parallel to the definition of repeatability, reproduci-bility, the second “R” in “R&R”, is defined as“closeness of the agreement between the results of suc-cessive measurements of the same measurand carriedout under changed conditions of measurement” [4].What conditions change must be stated. In the operationof the NIST dead-weight Rockwell-hardness machine,the change of primary interest is the change from day today that occurs with the same indenter and without

Fig. 6. Long-term variation shown for each day by the difference between a cross-hexagon pair and thecenter.

519

Page 10: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

changing the test cycle sketched in Fig. 1 or resetting themachine parameters in any other way. Often, thechanged conditions in discussions of reproducibility in-volve changes in operator although this is less importantwith computer-controlled machines. There are alsoother possibilities for defining changed conditions.

When one thinks of reproducibility, one usuallythinks of sources of error in addition to the ones thatcause lack of repeatability. Thus, one decomposes ahardness measurement H into three terms: � , the termthat reflects the error sources that cause lack of re-peatability; , the term that reflects the additional errorsources associated with the lack of reproducibility; and� , the actual hardness biased by the machine and inden-ter error. We have

H = � + + � .

In thinking about a measurement system over time, onehas to note when each term in this equation changes. Inthe case of our model of the NIST machine, the term �changes with every new measurement; the term changes with each new day; and the term � changesonly when the machine or indenter is changed intention-ally.

The effect of the additional sources of error is to adda term to the cross-hexagon pair (Hji + Hj(i + 3))/2, whichwe denote ji , and to add a term to the center measure-ment H0j , which we denote 0. Say that we use just onecross-hexagon pair for each monitoring point, which weassume involves changes in the additional error sources.Denoting the deviations by Dji , we have

Dji =(Hji +Hj(i+3))/2– H0j =ji – 0 + (�ji +�j(i+3))/2–�0j .

On the right side of this equation, the error terms arestatistically independent if their subscripts differ. We seethat the run chart has a constant offset because 0 neverchanges. Moreover, there is some dependence betweenDji and Djk because �0j only changes with j .

Typically, reproducibility is assessed through the useof a standard deviation estimate. This implies that thechanges incorporated in the definition of reproducibil-ity must have effects that are reasonably portrayed asrandom with constant standard deviation. Think of whata run chart fashioned as those discussed above wouldlook like if this were true. Say that the conditions thatchange in accordance with the reproducibility definitionchange with every new point on the chart. Then, withone minor departure, the deviations portrayed by the runchart would appear to vary randomly without theamount of variation changing appreciably. The depar-ture is the dependence discussed above caused by threerun chart points having a common center measurement.

As an aside, note that one could form a run chart withchanged conditions only every second point or everythird point. Such an alternative is reasonable but re-quires a somewhat different reproducibility assessment.

If the lack of reproducibility is reasonably summa-rized by a standard deviation, then we can add controllimits to the run chart of the deviations. Say that we haven deviations available, which for generality, we take tohave been obtained from one or more cross-hexagonpairs on several hexagons. The control limits should becentered at

–D =

1n �

j�

i

Dji ,

where the sums cover the pairs from various hexagonsfor which deviations are available. Using 3 times thestandard deviation, we obtain for the control limits

–D � 3� 1

n– 1 �j�

i(Dji –

–D )2 .

These control limits account for the repeatability and thereproducibility of the measurement system. Points out-side the control limits would indicate a special cause ofvariation to be investigated. Note that the control limitsdo not have to be re-estimated even though the monitor-ing involves hexagons from several blocks.

We now have a model for hardness measurementswith two sources of uncertainty, � and . The varianceof � is denoted by � 2 and is estimated by s 2 as given inSec. 3.3. The variance of is denoted by � 2

. An esti-mate of � 2

can be obtained from the deviations Dji . Wehave

s 2 =

1n – 1 �j

�i

(Dji ––D )2 –

32

s 2

Note the subtraction of 3s 2/2 to remove from the vari-ability observed on the control chart the part at-tributable to the lack of repeatability. The stability ofthis estimate might be seriously compromised if fromdeviation to deviation, measurement conditions do notchange as envisioned in the definition of reproducibility.Note in particular that if the change envisioned is thechange from day to day, then measuring several cross-hexagon pairs within a day will not do much to providea better estimate of �

2.The desirability of randomness in the sequence of

deviations Dji arises from the idea that randomness im-plies that each deviation is the effect of many causes

520

Page 11: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

none of which is dominant. If there are many causes,then one expects the statistical properties of the devia-tions, their mean and standard deviation, to remain con-stant in the future. What is worrisome about Fig. 6 is thatthere is some indication that a few dominant causes existthat might lead to deviations in the future that are unlikeanything observed in Fig. 6.

Is a standard deviation an appropriate summary forFig. 6? We computed s using only one value of Dji fromdays during which two or three were measured. Weobtained for the lowest level 0.056 HRC, for the middlelevel 0.038 HRC, and for the highest level 0.036 HRC.This standard deviation does gauge the day-to-day varia-tion, the error component called lack of reproducibility,observed in Fig. 6. This component seems to be largelythe effect of day-to-day variations in the way the ma-chine moves in executing the test cycle. What is wor-risome is that s may be too small to cover this errorcomponent as it might appear in future measurements.Another caution is that these results might be misleadingin the interpretation of combinations of measurementsmade over successive days because Fig. 6 seems to indi-cate some lack of randomness, that is, some day-to-daydependence.

3.6 Comparison and Assessment

Consider now the comparison of indenters and testingmachines. Generally, the notion that changes in inden-ters or machines are changes that should be incorpo-rated in a definition of reproducibility seems contrivedbecause one usually has only a few indenters or ma-chines with which to experiment. Thus, a statement ofdifferences would seem to be more satisfying than astandard deviation as a way of summarizing the varia-tion observed when the indenter or machine is changed.

The comparisons we consider are set up so that onlythe lack of repeatability enters. One could do compari-sons as suggested in Sec. 3.2 and use the repeatabilityassessment in Sec. 3.3 to obtain confidence intervals.However, for efficiency, one might compare indenters ormachines and assess the repeatability in the same exper-iment. Say one has 3 indenters to be compared using the14 locations available in 2 hexagons. Let the indenters bedenoted A, B, and C. In each hexagon, we assign inden-ter A to vertices 1 and 4, B to 2 and 5, and C to 3 and6. Further, we assign A to the center of hexagon 1 andB to the center of hexagon 2. As above, we denote thehardness reading for location i and hexagon j by Hji .

Analysis of this experiment should be done with aprogram for multiple regression, which most readerswill find available on their computers. A multiple re-gression program models the observations in terms ofparameters and errors. In general, the model is given by

y = X� + �,

where y is the vector of observations, X is referred to asthe design matrix, � is the vector of parameters, and �is the vector of unknown errors. These errors are due tolack of repeatability in the case considered here. Amultiple regression program requires input of y and X ;it returns an estimate of � , which we denote by � .

One might expect that we would take the 14 hardnessmeasurements Hji ; j = 1,2; i = 1,...,7; as the elements ofy , but this leads to the need for elements in � that modelthe block nonuniformity across the hexagons. Instead,we have decided to confine the elements of � to �A, �B,�C, the hardness readings with indenters A, B, and C,respectively, that one would obtain under perfect re-peatability at the center of hexagon 1, and , the differ-ence in hardness between the centers of hexagons 1 and2. We have

�A

�B

� = �C

.

So that we can confine � to these 4 parameters, we mustadopt as elements of y the combinations of hardnessreadings that we introduced in Sec. 3.2. We take asobservations the across-hexagon averages and the centerreading scaled so that their standard deviation is � . Weadd (H11 + H13 + H15 – H12 – H14 – H16)/�6 and the cor-responding quantity from the other hexagon so that theregression routine includes them in the estimation of � .The data values to which the regression model is fit aregiven by

(H11 + H14)/�2(H12 + H15)/�2(H13 + H16)/�2

H17

y =(H11 + H13 + H15 – H12 – H14 – H16)/�6

(H21 + H24)/�2.

(H22 + H25)/�2(H23 + H26)/�2

H27

(H21 + H23 + H25 – H22 – H24 – H26)/�6

As a consequence of our choice of � and y, the designmatrix is given by

521

Page 12: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

�2 0 0 00 �2 0 00 0 �2 01 0 0 00 0 0 0

X = �2 0 0 �2.

0 �2 0 �20 0 �2 �20 1 0 10 0 0 0

The logic of the model we have adopted can be seen byconsidering the equation y = X� , the model for y witherror term removed. For example, from the sixth row,we have

(H21 + H24)/�2 = �2�A + �2 .

This shows that the first cross-hexagon average forhexagon 2 is given by the hardness reading obtainedfrom indenter A in hexagon 1 plus the hardness differ-ence between hexagon 1 and hexagon 2. This illustratesthe parameterization that we have chosen.

To estimate the elements of � , we need a regressionprogram that runs without adding a constant term to themodel. (Adding a constant term consists of adding acolumn with elements 1 to X and another element to � .)The proper regression program gives estimates of theparameters, which we denote

�A

�B� =�C

Most programs will also produce an estimate of � ,which in this case has 6 degrees of freedom. If not, onecan compute the residuals, which are given as the ele-ments of the vector y – X� , compute the sum of squaresof the residuals, and divide the sum by the degrees offreedom to obtain an estimate of � 2. We denote thisestimate by s 2.

What we wish to investigate is the difference betweenindenters, �A – �B, for example. We can easily estimatethis difference as �A – �B but obtaining a confidenceinterval is more involved. To express �A – �B as the dotproduct of vectors, let

1–1

aAB =00

so that �A – �B = aTAB � . As shown in texts on regression

[8], an estimate of the covariance matrix of the parame-ter estimates is given by

s 2(XTX )–1.

Better regression programs give this matrix. Using thismatrix, we obtain an estimate of the variance of thedifference �A – �B given by s 2aT

AB (XTX )–1 aAB. Simi-larly, the variances of other differences can be obtained.Taking into account the 6 degrees of freedom in thevariance estimate, we obtain the 95 percent confidenceinterval

�A – �B � 2.447 s�aTAB (XTX )–1 aAB .

As an example, consider comparison of the NISTprimary indenter with four other indenters. To accom-modate this number of indenters, we implemented theabove design twice on two sets of two hexagons. Forboth sets, the NIST indenter was assigned as the Aindenter. Then, in the first set, indenters 1 and 2 wereassigned to B and C, respectively, and in the second set,indenters 3 and 4 were assigned to B and C, respectively.The measurements are shown in Fig. 7 as deviationsfrom the average of the measurements with the NISTindenter. Dominating Fig. 7 are the differences betweenthe NIST indenter and the other four. We see, for exam-ple, that indenter 1 gives higher hardness readings at the25 and 45 levels, and that indenter 2 gives lower readingsat the 45 and 63 levels.

Figure 7 is affected by both block nonuniformity andlack of repeatability. The averages of cross hexagonpairs are not affected by block nonuniformity. The twohexagons in a set are nearly replicates. Thus, the resultsfor all three indenters should line up. They do not be-cause we centered them on the average of the measure-ments for the NIST indenter. The effect of the lack ofrepeatability on this centering can be taken into accountby sliding the entire column of symbols up and down toachieve a better match. This is effective in some casessuch as the rightmost two columns. Obtaining the bestestimates of differences and an estimate of the standarddeviation requires the regression methodology describedabove. We applied this methodology to each set of twohexagons and pooled the standard deviation estimatesfrom the two sets. The results of this analysis in termsof differences from the NIST indenter are given in Table1. The uncertainties are standard deviation estimateseach with 12 degrees of freedom. One can use these toobtain confidence intervals.

.

522

Page 13: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

Fig. 7. Measurements with five indenters given as deviations from the averages of measurementswith the NIST indenter.

Table 1. Differences from NIST indenter of indenters 1 to 4 eachgiven with 1 standard uncertainty and the estimated standard devia-tion of the lack of repeatability

HRC 25 HRC 45 HRC 63

Indenter 1 0.742�0.021 0.532�0.016 –0.008�0.018Indenter 2 0.111�0.022 –0.182�0.017 –0.274�0.019Indenter 3 0.357�0.021 0.161�0.016 –0.113�0.018Indenter 4 0.338�0.022 0.399�0.017 0.156�0.019Std. Dev. s 0.033 0.025 0.028

Table 2. Design for comparison of indenters A to G

Hexagon 1, 4 pair 2, 5 pair 3, 6 pair

1 A B D2 B C E3 C D F4 D E G5 E F A6 F G B7 G A C

3.8 Reflections on Methodology

Constant hardness gradients across 6 mm hexagonsonly approximate block nonuniformity, although the ap-proximation is useful as the methods in this sectionshow. Referring to Fig. 3, one sees that hardness as afunction of location is, at least in part, smoothly varyingbut this variation is only approximately planar across6 mm hexagons. Thus, to some extent, block nonunifor-mity affects the results obtained with the methods inthis section. The methods in Sec. 4 are based on a morerealistic model. In this subsection, we consider the con-stant-gradient approximation in terms of the more realis-tic model used in the next section.

The model for block nonuniformity presented in Sec.4 pictures the hardness variation as composed of twocomponents, a smooth but curvilinear function and anirregularly varying function that appears to have no

3.7 More Elaborate Comparisons

Someone who is familiar with the matrix notation formultiple regression can generalize the foregoing to ex-perimental designs involving different sets of compari-sons or different numbers of hexagons. As an example,consider the following rigorous comparison of sevenindenters labeled A through G. Generally, sevenhexagons can be laid out on a block. One could assignthe indenters to the across-hexagon pairs as shown inTable 2. One would also want to assign the center pointsof the hexagons. This plan, without the center points, iscalled a balanced incomplete block plan in the experi-mental design literature. In that literature, a block is ahexagon, not a test block. Analysis of data from such adesign can be done through generalization of the multi-ple regression approach discussed in Sec. 3.6.

523

Page 14: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

spatial continuity. The smooth component is reducedbut not eliminated through the trend elimination meth-ods presented in this section. The irregular componentcannot be clearly distinguished from the lack of re-peatability of the testing machine. Because a hardnessmeasurement interferes with a subsequent measurementthat is too close, one cannot tell whether what appearsto be lack of repeatability is in part small-scale variationin the test block that does not appear smooth at permis-sible distances between measurements. In the use of themethods in this section, one can proceed as though whatis observed is all due to lack of repeatability but realizethat the resulting assessment of this uncertainty compo-nent may be too large. The model of block nonunifor-mity in the next section is actually a combined model ofblock nonuniformity and lack of repeatability.

In Sec. 4, we model block nonuniformity probabilisti-cally. Such a model is supported by contour plots suchas Fig. 3 obtained for other blocks. Comparing thesecontour plots shows that the hardness variation is largelysmooth but shows little similarity from block to block.Thus, we treat the block nonuniformity as a largelysmooth random function and estimate the covarianceproperties of this random function. In fact, the certifi-cates that accompany NIST’s test blocks present suchestimates [5]. If one had such estimates for one’s blocks,then one could estimate the part of the block nonunifor-mity that remains after trend elimination. Generally, ofcourse, covariance properties are not available for thetest blocks one is using and therefore, one would have toestimate them.

There are cases in which the irregular component inblock nonuniformity might be distinguished from othererror sources that influence the repeatability. Pre-sumably, the irregular component has a constant vari-ance for all blocks in a single manufacturing batch.Consider first the case of two machines. If, for a givenbatch of blocks, one machine has better repeatabilitythan the other, then one can conclude that the extravariation observed in the poorer machine is due to errorsources other than the irregular component of the blocknonuniformity. Consider second the case of two batchesof blocks. It seems possible that the same machine couldexhibit different repeatability on two different batchesof blocks. One could attribute such a difference to dif-ferences between the irregular components of thebatches. One might ask whether different types of steelwere used for the different batches, for example.

The methods in this section offer a solution to a prob-lem in the use of several blocks in the same experiment.The problem arises when the measurement locations ona block are treated as randomly chosen. Even if suchchoice is properly made through designation of a set oflocations on a block and random selection among these

locations, the fact that the variance varies from block toblock remains. The problem is that the variance inducedby random selection depends on the nonuniformity ofthe block. The methods in this section remove the grossfeatures of block nonuniformity and thus allow one toassume that whatever residual there is, it can be re-garded as similarly distributed for every block in a man-ufacturing batch. Thus, the methods in this section allowpooling of variance estimates. Such pooling is also partof the estimation NIST uses in its block certification.Such pooling is especially valuable in the use of severalblocks to apply control chart methodology over a longperiod.

4. Comparison with NIST

We now consider experimental methods based on theNIST SRMs. As shown in Sec. 3, there is much one canlearn about a measurement system using any good-qual-ity test blocks. More can be learned from NIST SRMs.In particular, judging the difference between what one’ssystem does and ideal execution of the Rockwell C Scalemethod is best done with NIST SRMs.

Test blocks, including NIST’s, are nonuniform. Thinkof the test surface of a NIST block as a collection ofmeasurement locations. The metal at each of these loca-tions will have a hardness value that differs slightly fromthe hardness values at other locations. If it were possibleto determine the hardness at every location, then wewould have hardness as a function of location. The func-tion we would see would be largely smooth althoughperhaps also with a rapidly varying component. We ex-pect the smooth part to be dominant because blocknonuniformity is largely due to the nature of the steeland the test block manufacturing process, factors thatvary smoothly across the block. NIST measured hard-ness as a function of location on several test blocks andfound this to be true. Moreover, we found that theseblocks have different hardness functions that are how-ever, similar in their smoothness. Thus, a model thatportrays each block as having a different but similarlysmooth hardness function seems reasonable.

Generally, the user of a NIST SRM asks for the differ-ence in hardness readings between the user’s own equip-ment and ideal equipment. The user might also ask forthe difference between the user’s equipment and NIST’s.The day that NIST measures a user’s SRM block at theseven locations, it could also make measurements atmany more locations. The user can ask what NISTwould have obtained at other locations, in particular, atthe locations of the user’s measurements. We now con-sider this question and answer it by providing a predic-tion of what NIST would have observed the same day as

524

Page 15: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

well as an uncertainty for this prediction. Note that thisis not the same as a prediction of what NIST would haveobserved on a different day, which involves the NISTreproducibility. Also, note that in a comparison withuser measurements, the user repeatability and repro-ducibility must be considered. We return to these uncer-tainty components in the next section. Because NISTcould have continued to make measurements on an SRMblock after it had made the initial seven, we can think interms of a function of location s , H (s ), the value NISTwould have observed at s the day it made the initial sevenmeasurements.

The method we use to predict hardness values atuntested locations is based on a geostatistics formulathat models the hardness across the surface of a block asa random function described by a semivariogram [9].The semivariogram is a mathematical model that de-scribes how the measured hardness difference betweenany two test block locations relates to the physical spac-ing that separates them. In statistical terms, this semivar-iogram gives one half the variance of the hardness dif-ference between any two locations on the test block.Thus, the square root of twice the semivariogram is thestandard deviation of this difference.

We can obtain a semivariogram estimate from themeasurements made on block 95N63001, which wehave already depicted in Figs. 2 and 3. If we let i indexthese measurements, then for each value of i we have ahardness value Hi and a location given by the coordinatevalues xi and yi . The semivariogram characterizes differ-ences between these measurements as a function of the

distance between them

dij = �(xi – xj )2 + (yi – yj )2 .

The semivariogram for a particular distance is estimatedfrom the hardness differences for points that distanceapart. Let � (d ) denote the estimated semivariogram fordistance d . We have

2� (d ) =1nd

�d=dij

(Hi – Hj )2,

where the sum is over all the pairs of measurements (i ,j ) for which d = dij and nd is the number of pairs in thesum. We see that the estimated semivariogram is onehalf the average of hardness differences squared andthus a variance estimate.

A semivariogram characterizes the smoothness of thehardness measurements across a surface through theway it decreases with decreasing distance. The semivar-iogram estimate obtained from block 95N63001 isshown in Fig. 8. The important part of this semivari-ogram is the part for distances less than 26 mm, half thediameter of the block. The values for distances near50 mm are estimated from the few pairs of points thatspan the block. That these values tend toward zero is theresult of the concave hardness surface shown in Fig. 3,which is peculiar to this block. For distances less than26 mm, the semivariogram decreases with decreasingdistance as expected from the smoothness of the blocknonuniformity. Note that the smallest distance is 5 mm,

Fig. 8. Empirical semivariogram for block 95N63001 (one half the mean square of differences).

525

Page 16: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

the smallest distance possible with a 5 mm grid. Wewould like to know the value that the semivariogramapproaches as the distance approaches zero because thisvalue characterizes the irregular component in the blocknonuniformity and the lack of repeatability. One canestimate this value by extrapolating the values in Fig. 8to zero. On this basis, Fig. 8 suggests that the repeatabil-ity of the NIST machine is very good. The value ob-tained by extrapolating to zero is, after taking the squareroot to convert it to a standard deviation, comparable tothe repeatability results given in Sec. 3.3.

We estimated a semivariogram function for each SRMhardness range and use the result in our prediction ofhardness values at unmeasured locations. This estima-tion is based on several blocks, not just one. An appro-priate estimation algorithm is given by Curriero andLele [10]. The algorithm actually used for the first batchof NIST Rockwell C scale SRMs is somewhat differentbut the resulting estimates are nearly the same. We fit anexponential semivariogram model to the data [9]. Thismodel is given by

0 if d = 0� (d ) = c0 + ce (1 – exp(– d /ae )) if d > 0.

The estimates are shown in Fig. 9. Note that these esti-mated functions provide an extrapolation to zero dis-tance. Other extrapolations are possible. The way thezero value varies with hardness range raises an interest-ing question. Are there sources of error in NIST’s testingmachine that are more severe for softer blocks or is theresome variation in the blocks at distances much smallerthan 5 mm that is more severe for softer blocks? Ourdata does not allow us to distinguish these alternatives.

Consider now prediction based on the seven NISTmeasurements, which are located at the vertices and atthe center of a 20 mm hexagon. Let these initial loca-tions be s01, ..., s0n0, where n0 = 7. The values H (s01), ...,H (s0n0) are provided on the SRM certificate. Consideranother n points, s1, ..., sn . Of course, these n + 7 loca-tions are subject to the minimum spacing requirements.We wish to predict

Hpred =1n �n

k=1

H (sk ).

This formulation includes prediction for a single point(n = 1). We consider the more general case becauseusers often make groups of measurements on testblocks. We compute not only a single-value predictionfor this quantity, Hpred, but also the prediction variance,� 2

pred. This variance corresponds to the first source ofuncertainty listed on the NIST certificate. Put together,we can obtain prediction intervals, for example, a 95 %interval (Hpred – 1.96�pred, Hpred + 1.96�pred).

The prediction Hpred, which is a linear combination ofthe measurements on the certificate, is given by

Hpred = �n0

i=1

�i H (s0i ),

where

�n0

i=1

�i = 1.

The predictions are based on the semivariograms givenon the SRM certificates. For the first SRM batch, they

Fig. 9. Semivariogram functions applicable to the first issue of the NIST SRMs.

526

Page 17: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

are shown in Fig. 9. These semivariograms follow theexponential model given above and are functions of dis-tance on the surface of the block. For convenience, wechange our notation for the semivariogram. For twolocations, sa and sb , separated by �(xa – xb )2 + (ya –yb )2, we denote the value of the semivariogram by

� (sa – sb ) instead of by � (�(xa – xb )2 + (ya – yb )2).Computation of the coefficients and the prediction

variance consists of four steps. The first step is inversionof the n0 � n0 matrix � with i , j element � (s0i – s0j ). Letthe elements of the inverse � –1 be denoted gij . Thisinverse matrix depends only on the semivariogram forthe points measured by NIST and therefore can be com-puted once for each hardness level. The second step iscomputation of

�i =1n �n

k=1

� (sk – s0i ).

The third step is computation of three quadratic forms

Q22 = �n0

i=1�n0

j=1

�i gij�j

Q12 = �n0

i=1�n0

j=1

gij�j

Q11 = �n0

i=1�n0

j=1

gij .

The final step is computation of the coefficients thatmultiply the NIST measurements H (s0i ) to form theprediction Hpred and the prediction variance

�i = �n0

j=1

gij �j +(1 – Q12)

Q11�n0

j=1

gij

� 2pred = Q22 –

(Q12 – 1)2

Q11–

1n 2 �n

k=1�n

k'=1

� (sk – sk' )

Put together, these two quantities give a prediction inter-val for the average value NIST would have obtained onthe day the SRM measurements were obtained. For ex-ample, a 95 % interval is given by

Hpred � 1.96�pred = �n0

i=1

�i H (s0i ) � 1.96�pred.

An example of prediction for n = 1 is shown in Fig.10. This figure shows the predicted hardness Hpred foreach location on a block, actually a low-range block,based on the seven measurements made on this block.We see that in terms of gross features, the nonunifor-mity of this block differs from the nonuniformity ofblock 95N63001 shown in Fig. 3.

The foregoing computational details provide little in-sight into the prediction itself. As a consequence of thesemivariogram estimates and the locations of the NISTmeasurements on a 20 mm hexagon, the values of �i that

Fig. 10. Hardness values of unmeasured locations on a NIST SRM block predicted from the seven measurements NIST made.

527

Page 18: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

result from the above computation are positive (or atleast close to being positive). This implies that the pre-dicted value must lie between the smallest and thelargest of the NIST measurements. In the case n = 1,Hpred can be regarded as an interpolated value. In fact, ifonly two points were measured (n0 = 2) and the predic-tion were for the point half way between them, then theprediction would be the average of the two measure-ments.

For the purpose of correcting for machine and inden-ter error as discussed in Sec. 5 and for other metrologi-cal purposes, it is best to use Hpred. Current practice,however, is to provide a single hardness value for a testblock. On its certificates, NIST also provides a singlevalue for a block. This certified average hardness valueis an estimate of the hardness function integrated overthe test block surface divided by the surface area. It isanalogous to the certified value assigned to commer-cially produced hardness test blocks, which is usuallycalculated as the arithmetical average of the measure-ments. In the case of the NIST block, the certified aver-age hardness value is the average of the predicted hard-ness values for all test surface locations, and not thearithmetical average of the seven NIST measurements.Because the locations chosen for the seven NIST mea-surements provide a good representation of the range insurface hardness, the two averages are nearly identicalin value.

Some blocks such as the one portrayed in Fig. 3, wemeasured more than just the seven times that is charac-teristic of the SRM blocks NIST offers. For theseblocks, we can choose n0 measurements, use these n0

measurements to predict the other measurements, andcompare the prediction Hpred with the actualization Hpred.The question of how well the prediction performs hastwo aspects. One is whether the prediction interval con-tains the actualization with the frequency implied by thepercent confidence chosen. The other is whether sub-tracting the prediction from the actualization reducesthe variation by a substantial amount. There are somedetails to be considered in this performance investiga-tion. We used two sets of locations in filling blocks withmeasurements. This results in what we refer to as filledblocks and partially filled blocks. Unfortunately, neitherset of locations contains the SRM locations. For thisreason, we chose other locations as the basis for theprediction.

We consider the first aspect using a filled block foreach hardness range. Because the locations actuallymeasured on the SRM blocks were not measured on thefilled blocks, we use as a basis for prediction the mea-surements at the points (0,0), (–20,0), (–10,15), (10,15),(20,0), (10,–15) and (–10,–15). For each location not inthis set, we computed the predicted value, subtracted it

from the actualization, and divided by the predictionstandard deviation to obtain (Hpred – Hpred)/�pred, wheren = 1. Note that the actualization is the value we set outto predict, which is the value NIST obtained. We callthese values standardized residuals and show them inFig. 11.

The values shown in this figure should ideally lieoutside (–1.96, 1.96) only one time in twenty. Since wehave plotted the values versus the distance of the locationfrom the center of the block, we can see that the statisti-cal model on which the prediction interval is based doesnot hold exactly. We see an edge effect that is not sur-prising when one thinks of how test blocks are manufac-tured. On these blocks, the hardness near the edge variesmore rapidly with distance than is portrayed by thesemivariogram. This is particularly true of the HRC 63block. The contour for this block is shown in Fig. 3. It isnot clear what can be done about this edge effect. First,not all blocks have edge effects, and in fact, the contoursfor different blocks show little resemblance. Thus, use ofa model of the block nonuniformity that takes into ac-count the edges does not seem worthwhile. Second, infinding the difference between the user’s equipment andNIST’s, the user will make more than one measurement,and this will relieve the problem unless the user makesall the measurements near the edge. We conclude that ifthe user is somewhat cautious about measurements nearthe edge of the block, the prediction approach presentedhere is serviceable.

We investigate the second aspect of prediction perfor-mance using three partially filled blocks. We comparethe variation of the actualization with the variation of thedifference between the actualization and the prediction.Because of the locations measured on the partially filledblocks, we cannot predict on the basis of a hexagonpattern of points. We used instead the six points (–22,0), (–5, 0), (5, 0), (23, 0), (0, –15), and (0, 15), a crosspattern. We predicted the other points on the block forwhich we already had hardness measurements, the actu-alizations. Let S1 be the centered sum of squares of theactualizations Hpred for the locations predicted, and let S2

be the centered sum of squares of the correspondingresidual values Hpred – Hpred. The difference S1 – S2

shows how well the prediction corresponds to the actual-ization. We compute

R 2 = (S1 – S2)/S1.

For the low range, mid range, and high range, weobtain for R 2 the values 0.69, 0.61, and 0.39. In theinterpretation of these values, note first that R 2 behaveslike the similar quantity in regression analysis; the value1 corresponds to perfect prediction. Moreover, the valueof R 2 will generally be higher for blocks that are more

528

Page 19: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

Fig. 11. Measured value minus predicted value divided by standard deviation of the predictedvalue for a block of each hardness level.

nonuniform. The values we obtained from the blocksconsidered are evidence of how large R 2 might beamidst the first batch of NIST SRMs. The values formore uniform blocks would be smaller. What the effectof substituting the hexagon pattern for the cross patternwould be, we do not know. Nevertheless, we see that forthe more nonuniform blocks in a batch, prediction re-duces the variation substantially.

5. Measurement Correction5.1 User-NIST Difference

People who make hardness measurements should ex-pect their results to differ from results NIST would haveobtained with its indenter and testing machine and, forthis reason, should entertain correction of their measure-ments so that the agreement is better. The procedures inthis section are a guide to making such corrections and,in addition, a guide to deciding whether such correc-tions are satisfactory. If they are unsatisfactory, equip-ment upgrades may be the only option.

Users of NIST’s Rockwell C scale SRMs can observethe difference between their measurements and NIST’sat the three hardness levels for which SRMs are offered.More precisely, as discussed in Sec. 4, users can observethe difference between the average of their measure-ments on an SRM block and a prediction of what NIST

would have obtained for the same locations. This sectionshows how a correction of a user measurement at anyhardness level can be obtained from the differences atthe three available hardness levels. Correction, however,may not provide sufficiently small measurement uncer-tainty. In this section, we discuss measurement correc-tion including all the uncertainty components that acorrected measurement entails. After computing the to-tal uncertainty from the components, the user can de-cide whether measurement correction is sufficient orwhether an equipment upgrade is needed.

In Sec. 3.5, we expressed a hardness measurement asthe sum of three terms, H = � + + � , where � involvesthe actual hardness as well as the machine and indentererrors. In the case of the user machine and indenter, weuse the symbol � . In the case of the NIST machine andindenter, we use the symbol �NIST. Ideally, one wouldlike to correct � to the actual hardness value. However,because the NIST SRMs involve machine and indentererror, it is better to think of correcting � to �NIST. Thecorrection is

f (�NIST) = � – �NIST,

and therefore, we have

H = �NIST + f (�NIST) + + � .

529

Page 20: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

On this basis, we correct measurements to the scaledefined by the NIST machine and indenter.

The correction is determined from user measure-ments on a NIST block for each hardness level. The firstquestion is choice of measurement locations. Actually,one can make measurements on a NIST SRM whereverone wants (subject to the minimum spacing require-ments) and use the equations in Sec. 4 to obtain theNIST certified value (with uncertainty) for the averageof these locations. Nevertheless, some choices of loca-tions are better than others. Although we have not stud-ied this issue, a reasonable possibility seems to be anextension of the 6 mm hexagons discussed above. Onecan lay out 6 mm hexagons around each of the NISTmeasurements and select one or more cross-hexagonpairs as one’s measurement locations. In comparing anacross-hexagon average with the center value, oneshould not assume that the block nonuniformity cancelsas we do in Sec. 3. The prediction uncertainty producedby the equations in Sec. 4 takes into account the fact thatthe nonuniformity does not cancel exactly. Thus, eventhough one chooses locations based on 6 mm hexagons,one should use Sec. 4 to obtain the NIST predictioninterval.

Say that one has made n measurements on a NISTSRM and that the average of these measurements is

–H .

Moreover, say that one has followed Sec. 4 to obtain theNIST prediction for the average hardness of these loca-tions Hpred. The difference

–H – Hpred would not be zero

even if–H had been obtained with the same machine and

indenter NIST used to certify its SRMs. Using bars todenote averages, we obtain from the equation for Hgiven above

–H = �NIST + f (�NIST) + + � .

The standard deviation of is � , and the standarddeviation of � is � /�n . Moreover, think of Hpred ascomposed of three components

Hpred = �NIST + NIST + �pred,

where �NIST is the actual hardness reading corrupted bythe NIST machine and indenter error, NIST reflectsNIST’s lack of reproducibility, and �pred is the predictionerror discussed in Sec. 4. The standard deviation of NIST,which we denote �–NIST, is given on the NIST certifi-cate. We have

–H – Hpred = f (�NIST) + + � – NIST – �pred.

The combined standard deviation for the last four termsin this equation is

� = �� 2 + � 2/n + � 2

–NIST + � 2pred.

Estimation of � 2 is discussed in Sec. 3.3; estimation of� 2

is discussed in Sec. 3.5; the value of � 2–NIST can be

obtained from the certificate; and computation of � 2pred is

discussed in Sec. 4.Comparison of one’s hardness system with NIST’s

requires consideration of all three hardness levels forwhich NIST has issued SRMs. Let m index these hard-ness levels. We write

–Hm – Hpred(m), �m – �NIST(m) and

�m . The random errors , � , NIST, �pred in the expres-sion for

–Hm – Hpred(m) occur at each hardness level and

thus should have the subscript m . Three of these randomerrors, � , NIST, �pred, are statistically independent fromone hardness level to another. The fourth, , the errorassociated with the user reproducibility, may not bestatistically independent from one hardness level to an-other, but we assume that it is.

5.2 Curvature

Two aspects of f (�NIST) distinguish hardness mea-surement correction from calibration problems in otherfields. First, over the range of Rockwell C scale hard-ness, the function f (�NIST) is smooth and small. This willbe true if the user’s machine and indenter are reasonablyclose to the Rockwell C scale prescription. On this basis,we can replace f (�NIST) with f (Hpred). Second, the func-tion f exhibits some curvature. For this reason, we mustinclude deviations of f (�NIST) from linearity as an uncer-tainty component.

We approximate f (�NIST) by a + (� – 1)�NIST. Be-cause the NIST blocks have hardness values near 25, 45,and 65 HRC, we can check this approximation by con-sidering the difference between the deviation for themiddle block and the average of the deviations for thehighest and lowest blocks. Let

r3 =Hpred(2) – Hpred(1)

Hpred(3) – Hpred(1)

and let

r1 =Hpred(3) – Hpred(2)

Hpred(3) – Hpred(1)

If we ignore the error in these ratios, then this gauge ofcurvature is given by

� = r3(�3 – �NIST(3)) + r1(�1 – �NIST(1)) – (�2 – �NIST(2))

and is estimated by

� = r3(–H3 – Hpred(3)) + r1(

–H1 – Hpred(1)) – (

–H2 – Hpred(2))

530

Page 21: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

The standard deviation of � is given by

�r 23� 2

3 + r 21� 2

1 + � 22

Using this standard deviation, one can form a confi-dence interval for � and judge how large � might be inlight of the various sources of random error. If � is smallin reference to the measurement application, then a lin-ear measurement correction is reasonable. BecauseNIST issues blocks at only three hardness levels, weconsider only the possibility of a correction linear in theuser’s reading since there would be no way to check thefit of a curvilinear correction.

5.3 Linear Correction

We now consider estimating the correction and theuncertainty associated with this estimation. First, wecalculate the average of the NIST values

Havg =13 �

3

m=1

Hpred(m),

the estimated slope

� – 1 =�3

m=1

(–Hm – Hpred(m))(Hpred(m) – Havg)

�3

m=1

(Hpred(m) – Havg)2

and the estimated intercept

=13 �

3

m=1

(–Hm – Hpred(m)) – (� – 1) Havg.

Denote the hardness reading to be corrected as U . Thecorrection to be subtracted from U is

C = [ + (� – 1)U ]/� .

Thus, the corrected reading is U – C . In terms of thestandard deviation, the uncertainty in C is given by

�C = ��3

m=1�13

+(U – Havg)(Hpred(m) – Havg)

�3m=1(Hpred(m) – Havg)2 2� 2

m��1/2

.

5.4 Uncertainty

Consider finally, the uncertainty components of U –C , a user measurement corrected to the NIST scale. To

begin, one must ask about the scale to which the uncer-tainty refers. Since the measurement is corrected to theNIST scale, it is reasonable to take the NIST scale as the

truth to which the uncertainty refers. In this case, thereare four uncertainty components, the one due to userlack of repeatability, the one due to user lack of repro-ducibility, the one due to error in estimation of the linearcorrection, and one that must account for the curvaturein the relation between user measurements and NIST’s.In terms of the usual uncertainty parlance, the first threeof these are Type A [4]. The standard uncertainties ofthese components are given above. The uncertainty thatarises because the correction really needed is non-linearcan be gauged by � . From the confidence interval for � ,we can develop bounds on the uncertainty due to thissource. Such bounds should be serviceable if not com-pletely defensible because f (�NIST) could take an evengreater excursion. With these uncertainty componentsassessed, one is in a position to decide whether one’smeasurements after correction are sufficient for the pur-pose.

The implications of � can be further understood interms of its relation to and � . The estimates � , , and� provide an approximate decomposition of

–Hm –

Hpred(m), i = 1, 2, 3. This approximation is exact ifHpred(1) = 25, Hpred(2) = 45, and Hpred(3) = 65. We considerthis special case because the NIST SRMs are approxi-mately at these levels. With some algebra, we see that

� = (–H1 +

–H3)/2 –

–H2

= (–H1 +

–H2 +

–H3)/3 – 45�

� = (–H3 –

–H1)/40

The decomposition is

–H1 – 25 = + 25(� – 1) + � /3–H2 – 45 = + 45(� – 1) – 2� /3–H3 – 65 = + 65(� – 1) + � /3

We see that the user values are decomposed into a linearfunction of the NIST predictions Hpred(m) and our gauge ofcurvature � . Thus, � accounts for the part of the usermeasurements not amenable to a linear correction.Moreover, if � 2

m is the same for all m , then the estima-tion error for + Hpred(m)(� – 1) is independent of theestimation error for � . Thus, the uncertainty in thelinear correction and the uncertainty in the gauge ofcurvature � when combined as the root sum of squaresgives approximately the uncertainty in

–Hm – Hpred(m).

These results show that because of the levels of theNIST SRMs, the quantities introduced in this section aremore simply related than might appear at first. Of par-ticular note is the fact that applying the linear correctiondoes not increase the uncertainty in user measurements.

A completely defensible recipe for incorporating thecurvature into the uncertainty does not seem possible.Our gauge of curvature, a confidence interval for � , does

531

Page 22: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

not tell us about the differences between user and NISTmeasurements at hardness levels between the levels ofthe NIST SRMs. These differences could be larger than2� /3. Moreover, these differences could be material de-pendent. Such dependence might be especially severewhen � is large. To this must be added the complicationthat, as shown above, the effect of the curvature on thecorrected user measurements is � /3 or –2� /3 dependingon the level of the NIST SRM. A consensus recipe forthe case of � small may be possible, but if the confi-dence interval for � suggests that � may be large, theuser of the uncertainty statement must be careful.

Unless all parties agree to correct their measurementsto the NIST scale, one must consider the relation ofmeasurements to the ideal Rockwell C scale. One way todo this is to add to the above uncertainty components,the uncertainty components given on the NIST certifi-cate for machine and indenter error. This will increasethe total uncertainty. The advantage of agreement tocorrect to the NIST scale is that the NIST machine andindenter errors do not have to be considered in a com-parison. Of course, it would be still better to improvemachines and indenters so that errors attributable tothese sources are reduced.

6. Summary

Rockwell hardness occupies a preeminent position inmechanical testing because of its long history and wideuse. For this reason, Rockwell hardness is the naturalchoice for a paper on detailed understanding of testmethods. This paper provides procedures that can beimplemented to evaluate a hardness measurement sys-tem, concepts that help with the investigation of hard-ness measurement, and an example that can guide testmethod development.

This paper provides procedures for repeatability andreproducibility assessment, for control charts, for com-parison of systems, for use of NIST SRM test blocks,and for measurement correction. These procedures givea fine understanding, but a reader may ask whether suchan understanding is necessary. The answer is, of course,that it depends on the hardness application.

This paper explains concepts related to sources ofmeasurement uncertainty, use of nonuniform blocks, thecombined uncertainty of measurement, and evidence ofthe need for equipment upgrades. For a reader involvedin hardness testing, these concepts are important even ifthe procedures in this paper are not implemented. Forexample, if one is puzzling over why two hardness mea-surements differ, one needs these concepts in thinkingabout possible causes.

Although it does not apply to all test methods, thispaper does apply to many tests currently under develop-ment. Generally, this paper applies to tests for productswith both upper and lower specification limits. One gen-eral aspect of test method development discussed in thispaper is the need to manage two distinct areas of exper-imental effort, assessment of measurement uncertaintyand determination of the relation between the testmethod and the product property critical to quality. Asecond aspect of broad interest is the use of test methodsfor local measurements of surfaces. Applicable to sucha test method use are approaches to surface variationsuch as trend elimination and spatial statistics. A thirdaspect is the need for reference materials to assure long-term comparability of test method results. In general,this paper shows how test methods can be treated withmore care than is usual in current practice.

7. References

[1] ASTM E 18-97a, Standard Test Methods for Rockwell Hardnessand Rockwell Superficial Hardness of Metallic Materials,American Society for Testing and Materials, West Con-shohocken, PA (1998).

[2] ISO 6508-1986, Metallic Materials—Hardness Test—RockwellTest (scales A-B-C-D-E-F-G-H-K), International Organizationfor Standardization (ISO), Geneva, Switzerland (1986).

[3] V. Czitrom and P. D. Spagon, Statistical Case Studies for Indus-trial Process Improvement, Society for Industrial and AppliedMathematics, Philadelphia, PA (1997).

[4] International Organization for Standardization (ISO), Guide tothe Expression of Uncertainty in Measurement, Geneva,Switzerland (1993) (corrected and reprinted 1995).

[5] National Institute of Standards and Technology, Standard Refer-ence Material Certificates for Rockwell C Scale Hardness, SRM2810 (Low Range), SRM 2811 (Mid Range), and SRM 2812(High Range).

[6] S. R. Low, R. J. Gettings, W. S. Liggett, and J. Song, Rockwellhardness—A method-dependent standard reference material,Proceedings of the 1999 NCSL Workshop and Symposium,National Conference of Standards Laboratories, Boulder, CO,1999, pp. 565-589.

[7] V. E. Lysaght, Indentation Hardness Testing, Reinhold Publish-ing Corporation, New York, NY (1949).

[8] R. D. Cook and S. Weisberg, Applied Regression IncludingComputing and Graphics, John Wiley and Sons, New York, NY(1999).

[9] N. A. C. Cressie, Statistics for Spatial Data, John Wiley andSons, New York, NY (1991).

[10] F. C. Curriero and S. Lele, A composite likelihood approach tosemivariogram estimation, J. Agri. Biol. Environ. Stat. 4, 9-28(1999).

532

Page 23: Capability in Rockwell C Scale Hardness.pdf

Volume 105, Number 4, July–August 2000Journal of Research of the National Institute of Standards and Technology

About the authors: Walter S. Liggett is a mathematicalstatistician in the Statistical Engineering Division of theNIST Information Technology Laboratory. Samuel R.Low is a materials research engineer in the MetallurgyDivision of the NIST Materials Science and EngineeringLaboratory. David J. Pitchure is a physical science tech-nician in the Metallurgy Division of the Materials Sci-ence and Engineering Laboratory. John Song is a me-chanical engineer in the Precision Engineering Divisionof the Manufacturing Engineering Laboratory. The Na-tional Institute of Standards and Technology is anagency of the Technology Administration, U. S. Depart-ment of Commerce.

533