INVESTIGATION, IMPROVEMENT,
AND EXTENSION OF TECHNIQUES
FOR MEASUREMENT SYSTEM
ANALYSIS
By
MUKUL PATKI
Bachelor of Engineering
Barkatullah University
Bhopal, India
1997
Master of Science
Oklahoma State University
Stillwater, Oklahoma
2000
Submitted to the Faculty of the
Graduate College of
Oklahoma State University
in partial ful¯llment of
the requirements for
the Degree of
Doctor of Philosophy
December 2005
INVESTIGATION, IMPROVEMENT,
AND EXTENSION OF TECHNIQUES
FOR MEASUREMENT SYSTEM
ANALYSIS
Thesis Approved:
Dr. Kenneth Case
Thesis Advisor
Dr. Camille Deyong
Dr. Manjunath Kamath
Dr. Mark Payton
Dr. David Pratt
Dr. A. Gordon Emslie
Dean of the Graduate College
ii
ACKNOWLEDGEMENTS
We are blind until we see that in human plan
Nothing is worth the building unless it builds the man
Why build those cities glorious if man unbuilded goes
In vain we build the world, unless the builder also grows
{ Edwin Markham
I would like to express my deepest gratitude towards the members of my PhD
committee for making the last three years of my life an extremely rewarding experience
and helping me grow as a professional. This is a group of some of the ¯nest teachers I
have known in my life.
I would like thank Dr. Payton| my ¯rst, and the ¯nest ever, statistics teacher|
for his patience with this \statistically challenged" student of his, and hours and hours
of his time that kept up the quality of my research. Dr. Deyong, over the years, has
taught me a number of subtle lessons not only in Industrial Engineering, but in real life
too. It is important to know, as Dr. Pratt often says, that you are doing the right thing
before focusing on doing it right. Some of the greatest lessons in work ethic, professional
integrity and discipline I have learned, have come from Dr. Pratt, and he may not even
know it. Having come to this country from half way around the globe, the transition
into a new culture and a new system of eduction would have been signi¯cantly harder,
had it not been for Dr. Kamath. He has been a mentor, a teacher, and an advisor in the
true sense of the word. Ordinary people, as Dr. Case often says, can do extraordinary
iii
things for short periods of time. It is the unspoken part of that that sentence that has
always intrigued me| that to become truly extraordinary, it requires attaining that level
of performance and staying there, steady. Dr. Case has been more than an advisor. As
a teacher he has taught me to think, to step back and actually think| something that,
I have learned, is harder than it sounds. I want to express my gratitude towards him for
listening to me and understanding even when he disagreed.
That I have reached here, surviving a couple of rough patches life threw my way, is
in a large part due to the rock-solid support I knew I could always count on, from my
family. The immense sacri¯ces my parents have made over the years can never be made
up for, but it does feel good to make them proud. Last, but not the least, my friend,
my buddy, my soulmate|my wife, Rashmi. For not only standing by me during all that
life threw our way the last few years, but picking up a \sword" and ¯ghting with me,
shoulder to shoulder, I will not belittle you by saying \thank you". All I have to say to
you is|\We did it!"
{Mukul Patki
Stillwater, OK
December, 2005
iv
Table of Contents
1 Introduction 1
1.1 Measurement Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Measurement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Measurement System Analysis . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 MSA Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Components of Variation . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Problems with the Current Model . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Within-Appraiser Variation . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Equipment-to-Equipment Variation . . . . . . . . . . . . . . . . . 9
1.4.3 Adjusting the Estimate for Part Variation . . . . . . . . . . . . . 10
1.4.4 Applicability to Chemical and Process Industries . . . . . . . . . 10
1.5 Measurement System Acceptability Criteria . . . . . . . . . . . . . . . . 11
1.6 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Literature Review 14
2.1 Nomenclature and Notation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Planning the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Analyzing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Range-Based Estimation . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . . 21
2.3.3 Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Some Problems With MSA . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 Within-Appraiser Variation . . . . . . . . . . . . . . . . . . . . . 23
2.4.2 Using Multiple Equipment . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Correcting the Estimate of PV . . . . . . . . . . . . . . . . . . . . 25
2.4.4 MSA for Chemical and Process Industries . . . . . . . . . . . . . 25
2.5 Evaluating Measurement System Acceptability Criteria . . . . . . . . . . 28
v
2.5.1 Number of Data Categories and Discrimination Ratio . . . . . . . 29
2.5.2 Precision-to-Tolerance Ratio (PTR) . . . . . . . . . . . . . . . . . 30
2.5.3 Other measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Theoretical Background 33
3.1 Current Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Range Based Estimates of Variance Components . . . . . . . . . . 34
3.1.2 ANOVA-Based Estimates of Variance Components . . . . . . . . 38
3.2 Accounting for Within-Appraiser Variation . . . . . . . . . . . . . . . . . 39
3.2.1 Estimating Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.2 Estimating Trivial Bounds . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Correcting the Estimate for PV . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 On Using Multiple Equipment . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 MSA for Destructive Testing . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5.1 An In-Depth Look at the Process . . . . . . . . . . . . . . . . . . 49
3.6 Comparing Measurement System Acceptability Criteria . . . . . . . . . . 52
4 The Simulation Process 54
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 MSA and Within-Appraiser Variation . . . . . . . . . . . . . . . . . . . . 56
4.3 MSA Using Multiple Devices . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 MSA for Destructive Testing . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Analyzing the Results 69
5.1 MSA and Within-Appraiser Variation . . . . . . . . . . . . . . . . . . . . 69
5.1.1 Range-Based Estimation . . . . . . . . . . . . . . . . . . . . . . . 70
5.1.2 ANOVA-Based Estimation . . . . . . . . . . . . . . . . . . . . . . 73
5.1.3 Analyzing the Results . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 MSA Using Multiple Devices . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.1 Analyzing the Results . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Comparing Measurement System Acceptability Criteria . . . . . . . . . . 87
5.4 MSA for Destructive Testing . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.1 Analyzing the Results . . . . . . . . . . . . . . . . . . . . . . . . 92
vi
5.4.2 Adaptation for Chemical and Process Industries . . . . . . . . . . 99
6 Research Contributions and Summary 104
6.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A Abbreviations 109
B GUI Sample Screens 112
C Bounds Derivation 115
D Bounds Data 118
E Analysis of LBa and UBe 120
F E®ectiveness of new PV estimate 121
G Results of MSA with multiple devices 123
H Veri¯cation of MSAD techniques 126
I MSAD Output 128
J MSAD Analysis 133
vii
List of Figures
1.1 Ampli¯cation of true process variation . . . . . . . . . . . . . . . . . . . 4
1.2 Components of observed variation . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Within-appraiser variation . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1 Flow chart of the simulation process . . . . . . . . . . . . . . . . . . . . 56
5.1 Trivial bounds on EV and WIAV . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Comparison of AV estimates with AV . . . . . . . . . . . . . . . . . . . . 79
5.3 Distribution of the two AV estimates over 1000 runs . . . . . . . . . . . . 80
5.4 Number of null estimates per 50 reps. of bounds Vs. f . . . . . . . . . . . 82
5.5 Estimated bounds on EV and WIAV . . . . . . . . . . . . . . . . . . . . 83
5.6 Trivial bounds on EV and WIAV . . . . . . . . . . . . . . . . . . . . . . 84
5.7 Relative performance of estimated bounds under varying conditions . . . 85
5.8 (a) Observed Vs. Recommended DR for PTR=10% and (b) Observed Vs.
Recommended PTR for DR=5 for various process capabilities . . . . . . 89
5.9 Number of null estimates for ^¾e Vs UV and LV*EV . . . . . . . . . . . . 98
5.10 Number of null estimates for ^¾eNew Vs UV and EV . . . . . . . . . . . . 98
5.11 Number of null estimates for ^¾a Vs UV, EV, and AV . . . . . . . . . . . 99
B.1 Top-level interface allowing the user to choose application . . . . . . . . . 113
B.2 MSA with within-appraiser variation . . . . . . . . . . . . . . . . . . . . 113
B.3 MSA when testing is destructive . . . . . . . . . . . . . . . . . . . . . . . 114
viii
List of Tables
2.1 Capability Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Analysis of variance table . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 ANOVA for multiple equipment scenario . . . . . . . . . . . . . . . . . . 46
3.3 Estimates of variance components . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Expected Means Squares for Stage 1 . . . . . . . . . . . . . . . . . . . . 47
3.5 Expected Means Squares for Stage 2 . . . . . . . . . . . . . . . . . . . . 48
3.6 Modi¯ed Expected Means Squares for Stage 1 . . . . . . . . . . . . . . . 50
5.1 Results of hypothesis tests comparing ANOVA-based and range-based es-
timates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Results of hypothesis tests testing e®ectiveness of VC estimates . . . . . 76
5.3 Pairwise comparisons of AV estimates and true AV . . . . . . . . . . . . 79
5.4 Factor levels for testing the e®ectiveness of within-appraiser variation . . 81
5.5 Relative performance of capability metrics . . . . . . . . . . . . . . . . . 88
5.6 Summarized results of MSAD hypothesis tests . . . . . . . . . . . . . . . 97
5.7 Null estimates for ¾e and ¾a . . . . . . . . . . . . . . . . . . . . . . . . . 97
D.1 Count of null estimates per 50 replications . . . . . . . . . . . . . . . . . 118
D.2 True values and bound estimates . . . . . . . . . . . . . . . . . . . . . . 119
F.1 Design Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
F.2 H0: No di®erence between the two estimates of ¾p . . . . . . . . . . . . . 121
F.3 Results of tests for e®ectiveness of ^¾pOld (left) and ^¾pNew (right) . . . . . 122
G.1 Design Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
G.2 Results of tests for e®ectiveness of ^¾p (left) and ^¾a (right) . . . . . . . . . 124
G.3 Results of tests for e®ectiveness of ^¾e (left) and ^¾pa (right) . . . . . . . . 124
G.4 Results of tests for e®ectiveness of ^¾pe (left) and ^¾ae (right) . . . . . . . . 125
ix
H.1 Measurement data for Stage 1 . . . . . . . . . . . . . . . . . . . . . . . . 126
H.2 Measurement data for Stage 2 . . . . . . . . . . . . . . . . . . . . . . . . 127
H.3 Comparison of calculated with published estimates . . . . . . . . . . . . 127
I.1 5 factor CCD design matrix . . . . . . . . . . . . . . . . . . . . . . . . . 129
I.2 H0: The two approaches produce equal estimates (® = 0:01) . . . . . . . 130
I.3 H0: Both estimates of EV are good estimators of true EV (® = 0:05) . . 131
I.4 H0: Trimmed mean of both estimates are good estimators of true EV
(® = 0:05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
x
Chapter 1
Introduction
Chapter Overview
The purpose of this chapter is to familiarize the reader with the need and importance of
having a precise and accurate measurement system in an industrial environment. The
current methodology (also referred to as the traditional method) for measurement system
analysis (MSA) is described brie°y. Some problems with the current methodology are
discussed. The objectives of this research, which address these issues, are given.
A number of symbols and abbreviations have been used throughout this document.
A brief explanation of all these can be found in Appendix(A).
1.1 Measurement Data
The success of any organization depends on its ability to produce consistently on target.
For the discrete part manufacturing industry, the target may be de¯ned in terms of phys-
ical dimensions of the part produced such as length, radius, curvature or surface ¯nish.
For chemical and process industries, it may be de¯ned in terms of physical properties
such as moisture content, viscosity or chemical composition. In either case, the target is
the voice of customer translated into technical requirements and, to ensure success, the
capability to produce consistently on target should be evaluated.
1
To evaluate this, a typical approach used in the industry is to draw a random sample
of the product, and based on the measurements made on that sample, draw inferences
about the process such as its capability (e.g., Cp, Cpk) and its state of statistical control
(using control charts). These inferences help a company make critical decisions, such
as|whether or not the process needs adjustment; if yes then how, when and where are
the adjustments to be made; and if not, then how can the process capability be further
improved. Sometimes the measurement data are used to learn more about the process.
For example, designed experiments may use measurement data to study which factors
a®ect the characteristic of interest and what combination of these factors will allow us to
produce on target with minimum variation. A team of reliability engineers, on the other
hand, may use data to predict the probability that a given product-type will perform its
intended function satisfactorily for a certain period of time.
The above discussion reveals that critical decisions that a®ect customer satisfaction,
which in turn a®ects the ¯nancial future of a company, are made based on measurement
data. If the data cannot be trusted, the information drawn from it would be meaningless
and the decisions based on that information will be useless at best. Such data may
result in a false understanding of the system, unnecessary tampering with the process,
or ignoring serious problems that need to be ¯xed. For this approach to work, the
data must be trustworthy and precision and accuracy of the measurement system should
be satisfactory and hence quanti¯able. In order to make the measurement data more
dependable, it is important to understand the e®ect of the measurement system in greater
detail.
2
1.2 Measurement System
A measurement system can be viewed as a production system, where the output is mea-
surement data instead of parts. Measurement value di®ers from the true value of the
property being measured by an amount known as measurement bias or measurement
error. This bias or error, however, is not constant. The variation in measurement error is
known as measurement variation, which depends on the measuring device or equipment
being used, the operators or appraisers using the equipment, procedures used, and envi-
ronmental and other conditions that may a®ect the measurement process. The process of
understanding, estimating, analyzing and controlling the measurement e®ect is known as
measurement system analysis (MSA), or gauge repeatability and reproducibility (gauge
R&R) study.
Consider a random sample of ten parts drawn from a production process. The vari-
ation in the true value of the dimension of interest of these parts is known as part-to-part
variation or part variation or process variation (PV). To quantify this variation, these
parts go through a measurement process which adds its own variation known as mea-
surement variation (MV) as mentioned above. The observed variation, which acts as
an input to decision making, is essentially a combination of PV and MV. Figure (1.1)
illustrates how the inherent process variation can be ampli¯ed by the measurement sys-
tem. Measurement variation stems from the fact that neither the equipment used for
measurement, nor the appraisers using the equipment are perfect with respect to pre-
cision and accuracy. Ideally, we would like MV to be zero so that process variation
can be estimated without error. But various factors, such as bias and inconsistency of
operators, bias and inconsistency of measuring devices, environmental conditions, and
3
Production System Measurement System Observed Variation
Figure 1.1: Ampli¯cation of true process variation
inconsistent sample preparation and measurement processes can introduce measurement
error. As noted by (Barrentine; 1991), measurement error manifests itself in the form of
false conclusions about products with respect to speci¯cations. For example, a product
close to, but within the speci¯cation limit, may be classi¯ed as defective if the measure-
ment error is large. Similarly, a product out of spec but close to the spec limit may be
classi¯ed as non-defective due to measurement error. This increases both producer's as
well as consumer's risk. If the data being collected are used for control charting pur-
poses, measurement variation can mask the true process variation and make it di±cult
to identify special causes. In a designed experiment, measurement variation can damp
the signi¯cance of e®ects being estimated. Hence it is essential to estimate measurement
error, identify its source and control it within acceptable limits.
1.3 Measurement System Analysis
Measurement System Analysis (MSA) deals with identifying, estimating, analyzing and
controlling various components of measurement error. The Automotive Industry Action
Group (AIAG), a group led by Ford, General Motors and Chrysler has published a refer-
ence manual (AIAG; 1996) for MSA that has become a standard for MSA implementation
across the manufacturing industry. Based on such a study, if the performance of a mea-
4
surement system is found to be unsatisfactory, a company may allocate critical resources
to identify and ¯x the problem. For example, if the analysis indicates that the high
measurement variation is primarily due to equipment as opposed to appraiser, then the
company may choose to invest in equipment calibration or purchase of new equipment
as opposed to training the appraisers. Hence, it is not only essential that the over-
all measurement error be estimated accurately, but also that it be allocated accurately
among its components|equipment and appraiser. These estimates are based on statisti-
cal properties of multiple measurements obtained from a measurement system operating
under stable conditions (AIAG; 1996). The two primary techniques used to estimate the
components of measurement variation are discussed in the following subsections.
1.3.1 MSA Techniques
Analysis of Variance (ANOVA)
The study is performed in the form of a designed experiment based on a two-way random
e®ects model. Appraiser and part are treated as random e®ects. Expected mean squares
and observed mean squares are used to obtain point estimates on desired components.
The primary advantages of using ANOVA are that con¯dence intervals can be calculated
on these components of variance and the interaction component can be estimated.
Range-Based Estimation
This technique uses average range, adjusted with an appropriate factor (d¤2
), to obtain
an unbiased estimate of standard deviation. Despite the advantages of ANOVA, this
method is still widely used in industry primarily due to its simplicity. Data are collected
in a spreadsheet format and a series of simple calculations is required to get the desired
5
estimates. This technique is very e®ective if the part-by-appraiser interaction component
is believed to be small.
1.3.2 Components of Variation
Figure(1.2) shows how the current model breaks down the observed variation. In order to
understand the problems with this model, enumerated in the next section, it is important
to understand what these components represent and how they are calculated.
Part Variation(PV)
Part variation is essentially part-to-part variation and represents the inherent variability
in the production system. It manifests itself in the form of variation in the true value
of the dimension of interest for di®erent parts. Since the true dimension of a part is
unknown and unknowable, the average of repeated measurements on a part, averaged
over all appraisers, is treated as an estimate of the true value of the part. The range
of these estimates is divided by the appropriate d¤2
to obtain an unbiased estimate of
the standard deviation. This standard deviation is used as an estimator of PV. The
deviation of the part dimension from the process mean is known as part e®ect. Part
e®ect is assumed to follow a normal distribution.
Equipment Variation(EV)
Equipment variation exists due the inability of the equipment to repeat measurements
with perfect precision. It is the variation in multiple measurements taken by one appraiser
on the same part and is also known as repeatability. In a designed experiment sense, it is
essentially the replication error. To estimate EV, the range of repeated measurements on
6
Observed Variation
Measurement
Variation
Process Variation
(part-to-part variation)
Equipment Variation
(within-equipment variation)
Appraiser Variation
(appraiser-to-appraiser variation)
Figure 1.2: Components of observed variation
a part, averaged over all parts and all appraisers, is divided by an appropriate d¤2
value to
get an unbiased estimate of the standard deviation. EV or replication error is assumed
to be normally distributed.
Appraiser Variation(AV)
Appraiser variation is the variation caused by using more than one appraiser (Dolezal
et al.; 1998) in the measurement process and is also known as reproducibility or appraiser-
to-appraiser variation. The ¯rst step in measuring AV is taking the range of appraiser
averages and using an appropriate d¤2
value to get an unbiased estimate of the standard
deviation. Some authors (see Vardeman and VanValkenburg (1999)) use this standard
deviation itself to estimate AV, which is not correct. AIAG (1996) gives a correction by
adjusting this estimate for a fraction of equipment variation. AV adds an appraiser bias
to the true value of the part. This bias varies by appraiser and is assumed to follow a
normal distribution.
7
1.4 Problems with the Current Model
Some form of MSA has existed, at least for the discrete part manufacturing industry, for
decades. There is still some ambiguity in the terminology and some disagreement among
authors on what the terminology represents. There is also some disparity in de¯nitions
of terms and their mathematical expressions along with the possibility that some com-
ponents of measurement variation have not yet been accounted for. The applicability of
these techniques to continuous process industries presents a whole new challenge.
As mentioned above, AIAG (1996) adjusts the \raw" estimate of AV for a fraction
of equipment variation. The adjusted quantity, also known as reproducibility, still does
not represent true appraiser-to-appraiser variation. Nevertheless, the two terms are used
interchangeably. It is easy to demonstrate a disparity in the de¯nition and the formula
for reproducibility (Vardeman and VanValkenburg; 1999).
Under the current model, replication error is entirely attributed to the equipment.
As will be shown shortly, the replication error may have another component to it besides
equipment variation. This makes the de¯nition of repeatability a little ambiguous. Does
it now represent equipment variation or replication error, as they are not the same any
more? Hence, there may be some disagreement in the de¯nitions of the terms repeata-
bility and reproducibility. As Burdick et al. (2003) noted, such labels do not add value
to answering the questions of interest. Hence, we will refrain from using these terms
throughout this document. The following four subsections summarize the problems in
the current state of MSA that this research will focus on.
8
1.4.1 Within-Appraiser Variation
Within-appraiser variation is the variation due to the inability of an appraiser to repeat
measurements with perfect precision. A closer look at EV above reveals an underlying
assumption that if identically performed measurements on a part are not exactly the
same, then the variation must be due to the imprecision of the equipment. The fact
that appraiser imprecision, if it exists, would also manifest itself in the same way, is
completely ignored. It is equivalent to assuming perfect precision within each appraiser.
In practice, however, it is possible that variation in measurements on the same part
(using the same equipment and appraiser) may be partly due to appraiser imprecision or
within-appraiser variation. It is easy to see that ignoring within-appraiser variation may
produce in°ated estimates of EV. Hence, it is possible that a company decides to invest
in re-calibrating or buying new equipment based on high EV estimates, when the real
problem is appraiser imprecision and training appraisers may be a more e®ective strategy.
The traditional model (AIAG; 1996) does not account for appraiser imprecision or within-
appraiser variation.
1.4.2 Equipment-to-Equipment Variation
Typically only one equipment is used in an MSA study. EV is essentially the within-
equipment variation as indicated in Figure(1.2) above. This restricts the validity of
the inferences drawn to that particular equipment or measuring device. In practice, a
measurement system may consist of multiple equipment and a signi¯cant portion of the
observed variation may be due to the fact these equipment are not consistent with each
other. In other words, there may be a bias associated with the equipment. A company
9
may be interested in knowing variation among equipments and the current approach
does not allow for that. Using multiple equipments in the study will allow us to estimate
among-equipment or equipment-to-equipment variation and may produce more realistic
estimates of the true process variation. It should be noted that each equipment being used
may have a di®erent within-equipment variation. The model should explicitly account
for that.
1.4.3 Adjusting the Estimate for Part Variation
The method for estimating PV was described in the previous section. It is easy to show
that this technique estimates is not just PV, but a sum of PV, a fraction of EV and
a fraction of part-by-appraiser interaction. The current technique clearly overestimates
PV. The magnitude of EV and the interaction component and the number of replications
and appraisers used in the study determines how signi¯cant this overestimation would
be.
1.4.4 Applicability to Chemical and Process Industries
MSA in its current form uses statistical properties of multiple measurements on the same
part to estimate the various components of measurement variation. In chemical and pro-
cess industries most tests are destructive in nature. For example, measuring the moisture
content of a sample of a chemical compound will require it to go through a test that will
end up destroying the sample. This makes is impossible to take multiple measurements on
the same sample. Hence, the traditional approach of estimating components of variance
cannot be used. There is a very high demand in this industry for a statistically sound
approach that will identify and accurately estimate various components of measurement
10
variation.
1.5 Measurement System Acceptability Criteria
Once measurement variation and its variance components have been estimated, the goal
is to reduce measurement variation to acceptable levels, if it is not already. Hence, it is
important to determine how much measurement variation should be considered accept-
able and what criteria should be used to make that decision. A wide range of metrics can
be found in the literature to evaluate the measurement system capability. For example,
precision-to-tolerance ratio, percent total variation, percent process variation, intraclass
correlation coe±cient, discrimination ratio, number of distinct data categories (or clas-
si¯cation ratio) and probable error. All these metrics come with certain recommended
values that suggest the acceptability of the measurement system. The question that
remains to be answered is whether these metrics, if used in the recommended manner,
produce consistent outcomes with respect to the acceptability of the measurement sys-
tem under study. In other words, is it possible that some of these metrics conclude that
a measurement system is acceptable while others conclude otherwise. If so, then under
what conditions does this discrepancy occur and which metrics, if any, are relatively
robust to variations in these conditions.
11
1.6 Objectives
General Objective
Identify any components of variation ignored in the traditional MSA, improve upon the
existing estimates and expand the applicability of MSA to industries other than discrete
part manufacturing.
Speci¯c Objectives
1. Account for within-appraiser variation.
² Develop a mathematical model consistent with the concept of within-appraiser
variation.
² Derive lower bound on within-appraiser variation.
² Use the lower bound to adjust the EV estimate appropriately.
² Show that the estimates of other components of variance do not change as a
result of this development.
² Use simulation to demonstrate the e®ectiveness of the bounds.
2. Enhance the current MSA approach so that inferences drawn will be applicable to
all equipment in the measurement system.
² Develop guidelines for selecting and using multiple equipment and collecting
data.
² Derive estimates for equipment-to-equipment variation.
² Appropriately adjust estimates of other components of variance that may have
changed as a result of this development.
12
² Verify the estimates using simulation.
3. Derive mathematically correct expression for PV and demonstrate its superior ac-
curacy over the traditional estimate.
4. Evaluate various measurement system acceptability criteria
² Conduct a simulated experiment by varying the sigma-capability of a process
and draw conclusions about relative merits and robustness of the metrics.
5. Develop a methodology similar to MSA for application in chemical and process
industries.
² Develop guidelines for determining sample sizes, sample selection and data
collection.
² Identify any sources of variation in addition to the conventional sources for
the discrete part case.
² Develop a mathematical base for estimating the contribution of each of these
sources.
² Use simulation to verify the estimates.
13
Chapter 2
Literature Review
2.1 Nomenclature and Notation
Measurement system analysis (MSA), also known as gauge capability analysis or gauge
repeatability and reproducibility analysis is an e®ort to understand, identify, quantify
and control the sources of measurement variation (Burdick et al.; 2003; Potter; 1990;
Montgomery and Runger; 1994a; Dolezal et al.; 1998). There is a consensus among au-
thors that variation in identically performed measurements on the same part is primarily
due to measurement error. One of the primary objectives of MSA, though, is the isola-
tion of the sources of variability in the measurement system (Burdick et al.; 2003). It
is in this stage that a severe lack of standardization in nomenclature and notation is
obvious (John; 1994). Authors disagree on everything from trivial things like spellings of
terms (gauge or gage), kind of alphabet used to represent the underlying model (greek
or roman, capital or small) to more serious issues like the meaning of repeatability and
reproducibility and what they represent.
In order to understand these di®erences, it is important to introduce some notation
and de¯ne the basic underlying model. The notation used here will be consistent with
AIAG (1996) as it is the most widely used reference in the industry for MSA implemen-
tation. MSA is typically conducted in the form of a two-factor experiment based on
14
random e®ects model (Vardeman and VanValkenburg; 1999; Dolezal et al.; 1998). This
means that a certain number of parts are randomly selected and are measured multiple
times by randomly selected appraisers (or operators). The underlying model is given by
yijm = ¹ + ®i + ¯j + (®¯)ij + ²ijm (2.1)
where i = 1; :::; n j = 1; :::; k m = 1; :::; r
The subscripts i; j; and m represent part, appraiser and measurement (or replica-
tion), respectively. The term yijm represents the mth measurement by the jth appraiser
on the ith part and ¹ is an unknown constant. The terms ®i, the part e®ect; ¯j , the
appraiser e®ect; ®¯ij , the part-by-appraiser interaction; and ²ijm, the replication error
are independently and normally distributed with mean zero and variance ¾2
p; ¾2
a; ¾2
pa and
¾2, respectively.
There is a general agreement among authors that the variance of identically per-
formed measurements on the same part, or the replication error variance (¾2) is the
repeatability of the measurement system. Reproducibility, however, has been de¯ned
in multiple ways. AIAG (1996) de¯nes reproducibility as variation among appraisers
or simply appraiser variation (AV) and use ¾2
a as an estimate of AV or reproducibility.
Mitchell et al. (1997) also use ¾2
a to estimate reproducibility. Wheeler (1992) criticizes re-
producibility as it tells only that the appraiser-to-appraiser di®erences are signi¯cant but
gives no information about which appraiser(s) is the problem. He states that reproducibil-
ity is useless when applied to a measurement process used in-house. Montgomery and
Runger (1994a) prefer to use ¾2
a +¾2
pa to estimate reproducibility, but they do not specif-
ically call it appraiser-to-appraiser variation. They reason that since part and appraiser
are the only two factors in the study and the interaction e®ect is essentially a measure-
15
ment error, it should be included in the reproducibility. Vardeman and VanValkenburg
(1999) also use ¾2
a+¾2
pa to estimate reproducibility and speci¯cally call it variation among
operators. Their reasoning is a little more mathematical. They suggest that for any one
part (i = 1), the model (2.1) reduces to y1jm = ¹+®1+¯j+®¯1j+²1jm. If °j = ¯j+®¯1j ,
then the variance of °j , ¾2
a +¾2
pa, clearly represents variation among appraisers. Unfortu-
nately, the same reasoning can be used to include ¾2
pa in part or process variation (PV).
For instance, the model in Eq(2.1) reduces to yi1m = ¹ + ®i + ¯1 + ®¯i1 + ²i1m for any
one appraiser (j = 1). Now, if °i = ®i + ®¯i1, then the variance of °i, ¾2
p + ¾2
pa, clearly
represents part-to-part variation. Vardeman and VanValkenburg (1999) recognize this
anomaly and use it to dispute ¾2
p as an estimate of PV. They argue that the interaction
variance should be a part of both PV and AV.
All this discrepancy and confusion seems to originate from the obvious compulsion to
label all the variance components. The term ¾2
pa represents the variance of the interaction
e®ect and should be recognized as such. Any attempt to arbitrarily include it with AV or
PV will be misleading. The terms ¾2
p and ¾2
a are the only true estimates of part-to-part
(PV) and appraiser-to-appraiser (AV) variation, respectively. It is acceptable, however,
to label ¾2
a+¾2
pa as reproducibility as long as it is recognized that reproducibility and AV
will not be the same in that case. As Burdick et al. (2003) have pointed out, such labels
do not add value to our understanding of the system and hence we will purposefully
refrain from using them.
16
2.2 Planning the Study
Literature addressing issues related to measurement error can be traced back to the
1940s (Grubbs; 1948). Early research was primarily focused on avoiding the potential
loss due to measurement error. Eagle (1954) suggests tightening of speci¯cations for
testing purposes to minimize the risk of committing ¯-error (accepting a non-conforming
part). This, however, increases the risk of committing ®-error (rejecting conforming
parts). Besides, this is a reactive approach and does not help much in estimating and
reducing measurement error.
Most techniques used today are proactive and concentrate on estimating and re-
ducing the measurement error. Eagle (1954) states that determining measurement error
requires repeated measures using one device and multiple operators or multiple devices
and one operator. The most widely used form of MSA today employs one device and
multiple operators. There is no reason why multiple devices and multiple operators can-
not be used. Montgomery and Runger (1994b) give a mathematical model for such a
case. Whenever an additional factor is added to the experiment, a decision must be made
as to whether the sample will be selected randomly or not. If all factors are random,
the underlying model is called a random-e®ects model; if all factors are ¯xed, it is called
a ¯xed-e®ects model; and if some factors are random and others are ¯xed, it is called
a mixed model. This becomes especially relevant when using the ANOVA technique
described later in this section. It is important to note that if a factor is treated as ran-
dom, the inferences about its e®ect are applicable to the entire population from which
the sample was drawn. On the other hand, if a factor is treated as ¯xed, the inferences
about its e®ect are restricted to the speci¯c levels employed in the experiment. Hence,
17
the decision to treat a factor as ¯xed or random must be made judiciously, depending
on the desired outcome. Hahn and Nelson (1970), for example, suggest using a mixed
model with a single appraiser, randomly selected parts, and ¯xed measuring devices.
Dolezal et al. (1998) show the analysis of a mixed model case with ¯xed operators. It is
interesting to see that whereas the con¯dence intervals (CI) of random e®ects are based
on the Chi-square distribution, the CI of ¯xed e®ects in a mixed model situation are
based on a non-central Chi-square distribution with a speci¯c non-centrality parameter.
Montgomery and Runger (1994a) recommend that even if the number of operators is
very small and can be a ¯xed factor, it should be treated as a random draw from the
potential population of operators.
Montgomery and Runger (1994a) recommend using a larger number of parts with
fewer measurements on each. They list multiple advantages of doing this|(i) a gauge
may be more stable near the center of the operating range than towards the extremes
and using many parts increases the chances of detecting any such non-linearity; (ii) if the
measurement variance depends on the mean measurement, this trend can be detected;
and (iii) it is di±cult to get complete replication of measurement and hence, too many
measurements on a part increase the chance of introducing other factors of variability.
Wheeler (1991) recommends only two replications for the same reason. Montgomery and
Runger (1994a) also caution against placing too much emphasis on keeping conditions
\identical" during replications. Since such care is usually not taken during routine mea-
surements, this may cause the underestimation of measurement error. They recommend
that if linearity is an issue, then parts must be chosen over the entire operating range
of the instrument, even beyond the speci¯cation. In such a case using a random sample
may not be the best choice.
18
Depending on the situation, there may be many factors that a®ect the measure-
ment process. Montgomery and Runger (1994a) recommend using 25% or less of total
resources in the initial study for identifying important factors through fractional factorial
or screening designs.
2.3 Analyzing the Data
The two most commonly used techniques to estimate measurement variance components,
as mentioned above, are the range-based method and ANOVA. These are discussed in
greater detail in the following sections.
2.3.1 Range-Based Estimation
Patnaik (1949) notes that the distribution of the range in normal samples is independent
of the population mean, but depends on the sample size and population standard devia-
tion. He gives the mathematical basis for the factor d2, which is based on sample size and
is used to estimate the standard deviation from the sample range. AIAG (1996) suggest
that the value of d2 should also depend on the number of samples used. They introduce a
new factor d¤2
that varies with both number of samples and the sample size, and converges
to d2 as number of samples become large (¯fteen or more). Wheeler (1992) considers d¤2
to be an unnecessary complication as the uncertainty in the range will usually be greater
than the di®erence between d2 and d¤2
. It has, however, become a common practice in
MSA to use d¤2
and we will continue with the practice. Vardeman and VanValkenburg
(1999) provide a statistical basis for the range-based approach
Most authors in the recent literature discourage the use of range-based approach
19
(Montgomery and Runger; 1994b; Burdick et al.; 2003; Vardeman and VanValkenburg;
1999; John; 1994). The main criticism of this approach is that it does not allow for the
estimation of the interaction variance component, does not allow for the construction of
con¯dence intervals on the variance components and gives a downwardly biased estimate
of reproducibility. Patnaik (1949) himself notes that range furnishes a less e±cient esti-
mate of standard deviation. John (1994) use an example from Wheeler (1992) to show
that the estimates obtained using this approach vary signi¯cantly from the ANOVA-based
estimates. John (1994) indicates that using ranges is inappropriate for the semiconductor
industry. With the modern-day computing power, most practitioners are moving away
from this approach toward the ANOVA-based approach. However, there are still a lot of
companies that use this approach and hence it cannot be ignored.
There is a general consensus among authors in the way the repeatability (or replica-
tion error) standard deviation is calculated. The range of multiple measurements taken
by an appraiser on a given part is calculated. This range is averaged over all parts
and appraisers, divided by the appropriate d¤2to obtain an unbiased estimate of standard
deviation that represents repeatability. For calculating reproducibility, multiple measure-
ments taken by an appraiser are averaged over all parts. The range of these appraiser
averages is divided by d¤2
to estimate reproducibility standard deviation. It is easy to
show that this estimate represents ¾2
a + 1
n¾2
pa + 1
nr¾2. Vardeman and VanValkenburg
(1999) note that some authors like Montgomery (1996) and Kolarik (1995) use this to
estimate AV. This will obviously result in overestimate of AV. Vardeman and VanValken-
burg (1999) criticize AIAG for adjusting this estimate for the fraction of ¾2 but not for
the fraction of interaction variance, ¾2
pa. AIAG (1996), however, clearly indicate that the
range-based approach should be used only if the additive model is deemed appropriate,
20
i.e., the interaction e®ect can be neglected. Montgomery and Runger (1994a) introduce
an alternative way of calculating reproducibility. The average of replicate measurements
by an appraiser on each part is calculated. The range of these averages is obtained for
each part. The average of these ranges is used to estimate reproducibility. The variance
calculated in this manner represents ¾2
a+¾2
pa+1
r¾2. Vardeman and VanValkenburg (1999)
also use this estimate but emphasize that it must be adjusted for a fraction of ¾2.
2.3.2 Analysis of Variance (ANOVA)
Analysis of variance is a technique used to partition the total sum of squares into a portion
due to regression and a portion due to error (Walpole and Myers; 1985). The sum of
squares due to regression is further partitioned into various factors and interactions. The
error mean square is pure replication error, or repeatability. Other variance components
(VC) are not directly readable from the ANOVA table. Their values need to be calculated
using expected mean square (EMS) values. Most statistical software can provide EMS
values for all factors and interactions based on the assumptions of the underlying model.
For guidelines on deriving EMS the reader is referred to Kuehl (2000) or Montgomery
(2001). Even though normality of e®ects is a basic assumption of ANOVA, Montgomery
and Runger (1994b) state that normality is not essential to use EMS for obtaining VC
estimates. However, they note, if the assumption is met, it is easy to construct con¯dence
intervals on the VCs.
Most authors agree that the estimates based on ANOVA are more accurate and
allow for the construction of con¯dence intervals and estimation of interaction e®ects as
stated above. One of the disadvantages of using ANOVA, however, is that the estimates
of VCs may turn out to be negative (Montgomery and Runger; 1994b). Kuehl (2000)
21
suggests various remedies for this problem. One remedy is to assume the VC to be zero;
but that may produce biased estimates of other VCs as noted by Montgomery and Runger
(1994b). They suggest using a modi¯ed ANOVA, which is nothing but redoing ANOVA
with the insigni¯cant term (usually the interaction term) removed. This allocates the
degrees of freedom for that term to error. Another solution is to use other methods of
estimating VCs (Kuehl; 2000; Montgomery and Runger; 1994b).
2.3.3 Other Techniques
Wheeler (1992) very strongly recommends using graphical techniques in place of the
traditional analysis. He plots an ¹X
-chart of the operator averages of three operators
and suggests that an out of control condition indicates a signi¯cant operator di®erence.
With control limits based on just three points, however, it may inappropriate to place
too much con¯dence in the outcome.
Montgomery and Runger (1994b) suggest methods such as maximum likelihood es-
timates (MLE) or MINQUE estimates. MLEs maximize the likelihood function of the
sample such that each VC is required to be non-negative. MINQUE produces estimates
that are best quadratic unbiased and are guaranteed to be non-negative. Both these
procedures are iterative in nature. These estimates, as illustrated by Montgomery and
Runger (1994b), give a covariance matrix of all VCs. The variance of each VC, obtained
from the diagonal elements of this matrix, along with the assumption of normality al-
lows the construction of con¯dence intervals using z-values. These intervals are easy to
construct and are usually narrower than those obtained from ANOVA. A non-iterative
version of MINQUE also exists, but it is not guaranteed to produce non-negative esti-
mates (Montgomery and Runger; 1994b). Both MLE and MINQUE require specialized
22
software and give estimates close to those obtained by using modi¯ed ANOVA. Hence
ANOVA has become the technique of choice.
2.4 Some Problems With MSA
Recall that the objective of this research is to address the issue of within-appraiser
variation and among-equipment variation, provide correct estimate for PV and adapt
MSA to address the needs of chemical and process industries. This section will address
any previous work in these areas.
2.4.1 Within-Appraiser Variation
No previous research has explicitly acknowledged the existence of within-appraiser vari-
ation or addressed the issue otherwise. Some relevant work will be discussed here.
Burdick et al. (2003) note that the variance of measurements may not remain con-
stant in all cases. One reason given, if this variance varies over time, is \operator fatigue".
Fatigue results in the appraiser's inability to keep the bias constant over time. This vari-
ation in appraiser-bias is essentially within-appraiser variation. The time-dependence in
the case of operator fatigue makes it easy to spot such variation by plotting residuals
against time. If, however, the variation is due to inadequate training or other human
factors, there may not be a covariate such as time.
Montgomery and Runger (1994a) mention that an out of control condition on the
R chart plotting ranges of measurements would indicate that the appraiser is having
di±culty using the equipment. If indeed this is the case, it is possible to get an R chart
that is in a state of statistical control but has very wide control limits. This would
23
happen if the appraiser inconsistency, or the within-appraiser variation, is randomly
scattered over all measurements. Wide control limits on an R chart would lead to high
estimates of equipment variation since the variance of measurements, or replication error,
is typically attributed to equipment.
To summarize the above discussion, some authors have shown that within-appraiser
variation can be detected either through residual plots if a covariate is present or through
control charts if appraiser inconsistencies are rare and sporadic. If, however, the appraiser
is regularly inconsistent in the use of measuring devices, these tools may not detect within-
appraiser variation. In such a case, a plot of measurement ranges sorted by appraiser
may be helpful. For example, if the average range of an appraiser shows a signi¯cant
shift, high variation can be assumed within that appraiser. The absence of shift, however,
should not be confused with the absence of appraiser inconsistency as it will only indicate
that appraiser inconsistency does not vary signi¯cantly from appraiser to appraiser.
2.4.2 Using Multiple Equipment
As noted earlier, if routine implementation of the measurement system involves multi-
ple equipment, there may be a signi¯cant equipment-to-equipment variance component
involved. Montgomery and Runger (1994b) proposes a model that adds equipment as
another factor to the experiment. However, there has been no e®ort to estimate vari-
ance components under this scenario. Despite is potential usefulness, this model is rarely
implemented in practice.
24
2.4.3 Correcting the Estimate of PV
The estimate of PV provided by AIAG has not been challenged by any researcher. Yet,
there is a small but de¯nite error in the formula, as will be shown in the next chapter.
2.4.4 MSA for Chemical and Process Industries
Within MSA, the research e®ort devoted to chemical and process industries where test-
ing is destructive, is an order of magnitude less than that devoted to the discrete part
manufacturing case.
Some authors that have addressed the issue, focus on minimizing the within-sample
variation and treating measurements on di®erent subsamples as di®erent measurements
on the same sample (Spiers; 1989; Ackermann; 1993).
Spiers (1989) uses an example, measuring tensile strength of tin, to illustrate this.
They cut thirty samples from a single sheet of tin and randomly assigned ten samples to
each of three appraisers. Each sample was then cut into three subsamples. The tensile
strength measurement on each of these subsamples was treated as multiple measurement
on the sample. The di®erences were considered negligible among subsamples due to
adjacency.
Ackermann (1993) recognizes that choosing samples for such a study is the most
critical step and that care should be taken to minimize lot-to-lot, within lot and within
sample variation. She further states that high material variation can mask out the
appraiser-to-appraiser variation. Even though this is true, if all material variation is
minimized the results of such a study will be valid only for that narrow range of val-
ues. Any non-linearity in the measurement system will never be discovered under such
25
circumstances. For example, if the objective is to measure the moisture content of a
chemical compound and all material is chosen so that the moisture content is very close
to 0:5%, then we will never know how the measurement system behaves if the moisture
content in the sample rises to, say, 1%. This is the reason why authors like Montgomery
and Runger (1994a) have emphasized the fact that parts should be chosen over the entire
range of values that the measurement system can measure. This allows for the assess-
ment of linearity in the measurement system. To minimize the masking out of AV, as
alluded to by Ackermann (1993), the ¯rst sample assigned to each appraiser should be
near identical, so should the second sample and so on.
Ackermann (1993) notes that merely minimizing material variation is not enough;
the di®erences, must be accounted for. Yet, in this paper, measurements by an appraiser
on multiple subsamples are treated as multiple measurements on the same sample and
any di®erences among these subsamples are ignored. The work presented in this paper
is based on the work of Spiers (1989), who also ignores these di®erences. The work of
Spiers (1989) and Ackermann (1993) is relevant to this discussion in that they address
the issue of destructive testing. However, the applications presented are not from the
CPI, where sampling issues can be much more complex. It may be reasonable to ignore
di®erences among subsamples in some of these cases, for example, the tensile strength
measurement of a tin sheet.
ASQ Chemical and Process Industries Division (2001) throws some light on sampling
issues in the CPI. They recommend repetitive sampling of incoming material to establish
homogeneity, but acknowledge that when a specimen is destroyed, there is an additional
variability from specimen to specimen, however small it has been made. Besides, for
a sample to be truly representative of the population, it must be random. For gases
26
and liquids, sampling ports may limit any consideration for randomness. For example,
accessibility restricts true randomness for materials in storage silos, railcars, etc. (ASQ
Chemical and Process Industries Division; 2001). Besides, randomness can be especially
di±cult to achieve if sampling is done on a time-frequency basis. The ASQ Chemical and
Process Industries Division (2001) states that the element of time and its rami¯cations
on sampling is one of the major di®erences between the CPI and the discrete part manu-
facturing industry. They further state that in CPI, production processes often drift over
time and data are autocorrelated; data are based on individual measurements and many
production processes are not in a state of control; speci¯cations are often not statistically
based on production processes and measurement system knowledge.
Wheeler (1991) recommends using a range-chart by plotting measurement ranges.
An out of control condition on such a chart can indicate that either the measurement
system is out of control or that the samples measured were not homogenous. Such a chart
will be useful only if a reasonable assurance of sample homogeneity can be achieved. ASQ
Chemical and Process Industries Division (2001) suggest plotting two types of control
charts | a production process control (PPC) chart and a measurement system control
(MSC) chart. The former is a regular control chart created by measuring material actu-
ally being produced. The latter, however, is created using control material (CM) local to
the particular site. The objective of the former is to determine whether the production
process is in control, while that of the latter is to ensure that the measurement system is
accurate and precise enough for the PPC chart to be trusted. However, if the sampling
frequency is not high enough, the MSC chart may fail to serve its intended purpose. For
example, if the average run length (ARL) of the PPC chart for a shift of, say, x units is
less than the ARL of the MSC chart, and if a shift of x units occurs in the measurement
27
system, then the PPC chart will detect it before the MSC chart and unnecessary tamper-
ing with the process may occur (ASQ Chemical and Process Industries Division; 2001).
To ensure that changes in the measurement system are detected by the MSC chart before
the PPC chart, the ASQ Chemical and Process Industries Division (2001) recommends
that the ARLMSC = 1
2ARLPPC.
Mitchell et al. (1997) have come up with a very interesting approach to address
the issue of using MSA for destructive testing. Bergeret et al. (2001) use these results
in three di®erent applications. The applications have been chosen such that testing is
not destructive. Then this technique is used pretending that the testing is destructive.
The results are compared with regular MSA study to assess the e®ectiveness of the
technique. In this technique, the experiment is performed in two stages. In stage one
only one operator is used, who divides each sample into subsamples and measures them.
The equipment variation is assumed to be confounded with subsample variation. In the
second stage multiple operators measure each sample only once. The equipment variation,
in this stage, is assumed to be confounded with sample variation. A simple manipulation
of expected mean squares from the two stages, yields the equipment variation.
2.5 Evaluating Measurement System Acceptability
Criteria
This section identi¯es various metrics used to assess measurement system acceptabil-
ity and the criteria associated with them. Some such metrics being discussed here are
summarized in Table (2.1)
28
Table 2.1: Capability Metrics
Metric Notation Formula
Percent of process
variation
%PV ¾m
¾p
Percent of total
variation
%TV ¾m
¾t
Precision-to-tolerance
ratio
PTR 5:15¾m
USL-LSL
Intraclass correlation
coe±cient
r ¾2
p
¾2
t
Discrimination Ratio DR
q
1+r
1¡r
Number of Data
Categories
ndc 1:41¾p
¾m
E®ective resolution ER 0:67¾m
2.5.1 Number of Data Categories and Discrimination Ratio
A popular measure that shows whether the measurement system is capable of mak-
ing distinction between parts at the desired level of resolution is known as \number of
data/distinct categories" (ndc). AIAG (1996) de¯nes ndc as 1:41( ¾p
¾m
) or
q
2( ¾2
p
¾2m
). This
expression is widely used by authors (Vardeman and VanValkenburg; 1999; Burdick et al.;
2002; Dolezal et al.; 1998), and is inspired by Wheeler's \classi¯cation ratio"(Wheeler
and Lyday; 1984). Wheeler later improved this metric and called it discrimination ratio
(DR) (AIAG; 2003). AIAG has changed its recommended value for ndc from three in
1990 to ¯ve in 1996.
Nunes and Cherekdjian (1995) state that DR allows us to quantify the sensitivity of
the measurement system by comparing process variation to measurement error. However,
before we understand DR it is essential to understand intraclass correlation coe±cient
(r), which is a measure of similarity of observations within groups relative to that among
groups (Kuehl; 2000). In the context of MSA, r can be de¯ned as 1 ¡ ¾2
measurement
¾2
observed
or
29
1 ¡ ¾2m
¾2
o
and DR has been de¯ned by Wheeler (1992) as
DR =
r
1 + r
1 ¡ r
(2.2)
=
s
2¾2
o
¾2m
¡ 1
Wheeler (1991) recommends that DR should be ¯ve or more for the measurement system
to be useful. Dolezal et al. (1998) state that the measurement process is adequate if the
DR is greater than or equal to three. Burdick et al. (2003) de¯ne DR as 1+r
1¡r (without
the square-root) which is incorrect.
Wheeler (1991) states that DR de¯nes the number of product categories that the
measurement will support. AIAG (1996) de¯ne ndc as the number of distinct levels of
product dimensions (or categories) that can be reliably obtained from the data. Both
these quantities are not only similar in concept but turn out to be very similar in value.
2.5.2 Precision-to-Tolerance Ratio (PTR)
Another measure of interest is precision-to-tolerance ratio (PTR) given by
PTR =
5:15¾m
USL ¡ LSL
(2.3)
PTR is considered to be the fraction of the tolerance consumed by the measurement
system (Montgomery and Runger; 1994a; AIAG; 1996; Burdick et al.; 2003; Mitchell
et al.; 1997). Wheeler (1992), however, states that whenever two quantities are compared
by a ratio there is an implicit assumption that the numerator can be added to some other
quantity to yield the denominator . He presents a convincing argument that measurement
error and tolerance are not additive since standard deviations are not additive, and
hence the former does not "consume" the latter and it is misleading to use such ratios.
30
Moreover, this ratio provides no information about the capability of a measurement
system to detect product variation. For example, Spiers (1989) considers the micrometer
under investigation to be capable because it consumes only 18% of the tolerance. The
truth is, it tells us nothing about the capability of the instrument to distinguish among
product categories. Wheeler (1992) states that a measurement system with high PTR
(undesirable) can be capable of detecting product variation if PV is large and vice versa if
PV is low. Some authors like Montgomery and Runger (1994a) and Burdick et al. (2003)
strongly advocate the use of this ratio and state that PTR of 10% or less indicates an
adequate measurement system. Montgomery and Runger (1994a), however, acknowledge
that PTR can be minimized to any desired value by arti¯cially in°ating the speci¯cations.
Morchower (1999) indicates a PTR of 5% or less is preferable.
Sometimes a multiplier of 6 instead of 5.15 may be used in Eq(2.3) (Burdick et al.;
2003). Mitchell et al. (1997) also uses the reciprocal of this ratio as measurement capa-
bility index (Cp).
2.5.3 Other measures
AIAG (1996) also use \percent of Total Variation (%TV = 100MV=TV )", where MV
and TV represent measurement variation and total variation respectively, as a measure of
measurement system capability. Wheeler (1992) states that even though this measure is
meaningful in terms of indicating the usefulness of the measurement system, it still su®ers
from the other problem stated above{total variation and measurement variation are also
not additive and hence should not be compared using a ratio. For example, a %MV of
30% does not indicate, contrary to popular belief that measurement variation is 30% of
total variation. Taking the ratio of variances instead of standard deviations may eliminate
31
Wheeler's concern with this ratio and may meaningfully represent the portion of observed
variation that is due to measurement. Another performance measure, which is similar to
%PV and is often used, is \percent of Process Variation (%PV = 100MV=PV )", where
PV represents process or part variation. Montgomery and Runger (1994a) consider these
two to be more useful than PTR.
Wheeler (1991) gives another measure called \e®ective resolution" which is the max-
imum of two quantities|probable error and measurement unit, where probable error is
de¯ned as 0:67¾measurement.
32
Chapter 3
Theoretical Background
Chapter Overview
The model presented by AIAG, which is the basis for most MSA studies, will be referred to
as the \current model" or the \traditional model". A mathematical basis for the estimates
of equipment variation (EV), appraiser variation (AV) and part variation (PV) using both
the range-based method and ANOVA will be developed. In an attempt to resolve the
issue of within-appraiser variation, a new model will be proposed. A mathematical basis
will be developed for within-appraiser variation and any e®ect of the changes made to the
traditional model on estimates of EV and AV, will be investigated. To allow for multiple
measuring equipment in the study, an enhanced version of the traditional model will be
presented, and new estimates with respect to additional equipment will be derived. A
corrected formula, based on the traditional model, for PV and discrimination ratio will
be derived. Procedures for the veri¯cation of all enhancements and alterations to the
traditional model will be outlined.
3.1 Current Model
The linear model underlying the traditional MSA techniques is given in Eq(2.1) as yijm =
¹ + ®i + ¯j + (®¯)ij + ²ijm. The meaning of these symbols is described in Section 2.1.
33
Consider a study conducted with one measuring device, k randomly selected ap-
praisers and n randomly selected parts. Each appraiser measures each part r times. The
estimates of various variance components will be derived for this scenario.
3.1.1 Range Based Estimates of Variance Components
Using average range to estimate standard deviation has been an old tradition in the ¯eld
of quality. A brief explanation of the statistical theory behind this is provided here.
Vardeman and VanValkenburg (1999) is the primary source of this explanation. Suppose
that x1; : : : ; xn are IID normal (¹; ¾2) variables, then their range is given by
R = max xi ¡ min xi
= ¾
µ
max
µ
xi ¡ ¹
¾
¶
¡ min
µ
xi ¡ ¹
¾
¶¶
This implies that R has the distribution of a ¾ multiple of the range of n standard normal
variables. Now, consider n standard normal variables, z1; : : : ; zn. Let their range be given
by W = max zi ¡ min zi and the expected value of W be given by E[W] = d2(n). If the
average of multiple such ranges is given by ¹R
, then ¹R
d2(n) = ^¾. Then, clearly, the expected
value of R is given by E[R] = ¾d2(n) and E
h
R
d2(n)
i
= ¾. For details on the distribution of
range of standard normal variables, W, and the calculation of d2(n) values, see Patnaik
(1949). Tabulated values of d2(n) for reasonable values of n can be found in most quality
control books. AIAG recommends using d¤2
(n) instead, which takes into account the
number of ranges that the average is based on. These values can be found in AIAG
(1996) for subgroup sizes of up to ¯fteen and number of ranges up to ¯fteen and in AIAG
(2003) for subgroup sizes and number of ranges up to 20. For more details and other
values of d¤2
(n), the reader is referred to Duncan (1974) and Elam (2001).
34
It would be advantageous, at this point, to introduce some notation. As noted
earlier, yijm represents the mth measurement by the jth appraiser on the ith part. The
average of r measurements taken by the jth appraiser on the ith part is given by ¹yij: =
1
r
P
m yijm = ¹ + ®i + ¯j + ®¯ij + ¹²ij:. The bar on the statistic indicates that it is an
average and the subscript over which the average is taken is replaced by a dot. Similarly,
the range of r measurements taken by jth appraiser on the ith part is given by Rijm =
maxm yijm ¡ minm yijm = yijm00 ¡ yijm0 where m00 and m0 represent the largest and the
smallest measurement for a given i; j. The subscript on R with an underscore is the one
over which the range has been taken. The average of these ranges over all parts for the jth
appraiser will be given by ¹R
:jm = 1
n
Pn
i=1(maxm yijm¡minm yijm) = 1
n
Pn
i=1(yijm00¡yijm0 ).
Equipment Variation
Using the linear model given in Eq(2.1), the range of multiple measurements by the jth
appraiser on the ith part can be represented as Rijm = yijm00 ¡yijm0 = ²ijm00 ¡²ijm0 where
m00 and m0 represent the largest and the smallest measurement for a given i; j. Since
²ijm represents replication error, and is normally distributed with mean 0 and variance
¾2, we get
µ
E
·
Rijm
d2(r)
¸¶2
= ¾2 (3.1)
Using ¹R
::m as an estimate of E[Rijm],
¹R
::m
d2(r) = ^¾ can be used as an estimate for equipment
variation where
¹R
::m =
1
nk
Xk
j=1
Xn
i=1
(yijm00 ¡ yijm0) (3.2)
35
Appraiser Variation
The average of all measurements on all parts made by the jth appraiser can be expressed
as
1
nr
Xn
i=1
Xr
m=1
yijm = y¹:j: = ¹ + ®¹: + ¯j + ®¹¯:j + ²¹:j: (3.3)
The range of these averages for all appraisers can be expressed as R:j: = (¯j00¡¯j0)+
(®¹¯:j00 ¡ ®¹¯:j0) + (²¹:j00: ¡ ²¹:j0:) where j00 and j0 represent appraisers with the largest and
the smallest measurement averages respectively. Since ¯j ; ®¯ij ; and ²ijm are normally
distributed with mean 0 and variance ¾2
a; ¾2
pa; and ¾2 respectively, we obtain
µ
E
·
R:j:
d¤2
(k)
¸¶2
= ¾2
a +
¾2
pa
n
+
¾2
nr
(3.4)
Since, only one such range is obtainable for a given experiment, E[R:j:] = R:j: and
R:j:
d¤2
(k)
=
r
^¾2
a +
^¾2
pa
n
+
^¾2
nr
(3.5)
The quantity of interest here, is ^¾2
a, the appraiser variation or appraiser-to-appraiser
variation. As mentioned earlier, some authors like Montgomery (1996) and Kolarik (1995)
use the entire expression in Eq(3.4) to estimate AV. A slightly better estimate is given
by AIAG by correcting for ¾2
nr , using EV as an estimate of ¾. They recommend using the
range-based approach only if the interaction can be ignored and hence do not adjust this
estimate for ¾2
pa
n . The estimate for AV is thus given by
AV = ^¾a =
sµ
R:j:
d¤2
(k)
¶2
¡
EV 2
nr
(3.6)
As mentioned previously, Montgomery and Runger (1994a) demonstrated a di®er-
ent way of calculating appraiser variation which was later improved by Vardeman and
VanValkenburg (1999). We will call this new estimate of AV, ^¾aNew. In this approach the
36
average of multiple measurements made by the jth appraiser on the ith part is calculated
as
¹yij: =
1
r
X
m
yijm = ¹ + ®i + ¯j + ®¯ij + ¹²ij: 8i; j (3.7)
The range of these averages calculated for the ith part over all appraisers is given by
Rij: = ¹yij00: ¡ ¹yij0: = (¯j00 ¡¯j0)+(®¯ij00 ¡®¯ij0)+(¹²ij00: ¡¹²ij0:). Again, since ¯j ; ®¯ij ; and
²ijm are normally distributed with mean 0 and variance ¾2
a; ¾2
pa; and ¾2 respectively, we
obtain
µ
E
·
Rij:
d¤2
(k)
¸¶2
= ¾2
a + ¾2
pa +
1
r
¾2 (3.8)
There will be n such ranges, one for each part. The average of these ranges, ¹R
a =
1
n
P
i Rij: = 1
n
P
i(¹yij00: ¡ ¹yij0:) is used as an estimate of E[Rij:]. Ignoring the interaction
variance component as before, an estimate of appraiser variation is thus given by
AV = ^¾aNew =
sµ ¹R
a
d¤2
(k)
¶2
¡
EV 2
r
(3.9)
Part Variation
AIAG argues that the average of all measurements on the ith part over all appraisers is
the best obtainable estimate of the true value of the part. Hence variation among these
grand averages of each part truly represents PV. If the grand average of the ith part is
given as
¹yi:: =
1
kr
Xr
m=1
Xk
j=1
yijm (3.10)
Then PV, expressed as standard deviation (¾p) is estimated as Ri::
d¤2
(n) , where Ri::
represents the range of these part grand averages taken over all parts. We will show,
later in this chapter, that this method overestimates PV. An appropriate correction will
be provided.
37
3.1.2 ANOVA-Based Estimates of Variance Components
Recall that the experiment being considered here is based on k randomly chosen apprais-
ers performing r measurements on each of n randomly selected parts. This is a two-factor
random e®ects model. The variation in identically performed measurements on the same
part is the replication error (¾2), also called repeatability or equipment variation under
the AIAG model. Table 3.1 summarizes the analysis of variance for such an experi-
ment. The \df" column contains the degrees of freedom for each source listed under
the \Source" column. The \MS" column contains the observed values of mean-squares
for a particular experiment and the \EMS" column shows the expected values for those
mean-squares.
Table 3.1: Analysis of variance table
Source df MS EMS
Part (n ¡ 1) MSp ¾2 + r¾2
pa + kr¾2
p
Appr. (k ¡ 1) MSa ¾2 + r¾2
pa + nr¾2
a
Appr. x Part (n ¡ 1)(k ¡ 1) MSpa ¾2 + r¾2
pa
Error (equip.) nk(r ¡ 1) MSe ¾2
The components of variance can be easily estimated from this table as follows:
Equipment Variation = ^¾2 = MSe
Interaction Variation = ^¾2
pa =
MSpa ¡MSe
r
Appraiser Variation = ^¾2
a =
MSa ¡MSpa
nr
Part Variation = ^¾2
p =
MSp ¡MSpa
kr
38
3.2 Accounting for Within-Appraiser Variation
Consider the linear model underlying the traditional MSA given in Eq(2.1). The term
¯j represents appraiser bias. This model does not allow for any variation in this bias
associated with an appraiser. For example, consider an appraiser measuring the length
of a part, in millimeters, having a bias (¯j) of ¡2mm. The current model assumes that
every time this appraiser measures a part, he/she will measure it exactly 2mm less than
the true value. The concept of within-appraiser variation is more realistic in that it
assumes ¯j to be only an average bias with a certain variation from reading to reading.
The relationship between variation among appraisers and variation within apprais-
ers is depicted in Figure 3.1 for three appraisers. The appraiser-to-appraiser variation,
¾2
a, governs the variation in mean appraiser biases, the ¯js. Hence, the jth appraiser
performs with a mean bias of ¯j . The bias associated with a particular measurement
on a given part has a variance of ¾2
aj , which is the within-appraiser variation for the
jth appraiser. Under the traditional model, ¾2
aj = 0 for all appraisers. The di±culty
in visually detecting (through control charts or residual plots) or estimating appraiser
inconsistency is primarily because it is confounded with equipment variation. Replica-
tion error, thus, has two components to it|equipment variation and within-appraiser
variation. In other words, ²ijm = °ijm+ºijm, where ²ijm represents replication error from
Eq(2.1) and °ijm; ºijm represent measurement error due to within-appraiser variation (for
appraiser j) and equipment variation, respectively. It is easy to see why equating repli-
cation error to equipment variation will overestimate the latter. The model in Eq(2.1)
39
2
a
2
a1
2
2 a
2
3 a
1
2 3 0
a 3
a 3
Figure 3.1: Within-appraiser variation
can thus be modi¯ed as follows:
yijm = ¹ + ®i + ¯j + (®¯)ij + °ijm + ºijm (3.11)
where i = 1; : : : ; n j = 1; : : : ; k m = 1; : : : ; r
and symbols have their usual meaning. The terms °ijm and ºijm are independently and
normally distributed with mean zero and variance ¾2
aj (variance within appraiser j) and
¾2
e (within-equipment variation), respectively. Note that whereas ¾2
aj varies by appraiser,
¾2
e is constant. This is because this model allows for only one equipment. This will
change as we enhance the model to include multiple equipment in a subsequent section.
Let Rijm = yijm00¡yijm0 = ²ijm00¡²ijm0 represent the range of all measurements on the
ith part taken by the jth appraiser, where m00 and m0 represent the largest and the smallest
measurement for a given i; j. Then, from Eq(3.11), Rijm = (°ijm00¡°ijm0)+(ºijm00¡ºijm0 ).
Since °ijm, and ºijm are normally distributed with mean 0 and variance ¾2
aj and ¾2
e
40
respectively, we obtain,
µ
E
·
Rijm
d2(r)
¸¶2
= ¾2
j = ¾2
aj + ¾2
e (3.12)
where ¾2
j is the replication error for the jth appraiser. The average of the ranges, Rijm,
taken over all parts for appraiser j, given by ¹R
:jm = 1
n
Pn
i=1(yijm00 ¡ yijm0) can be used
as an estimate of Rijm.
Since equipment and within-appraiser variation are confounded with each other, it
is di±cult, if not impossible, to get a point estimate on them. It is, however, possible to
estimate the bounds on these quantities, as shown in the next subsection.
3.2.1 Estimating Bounds
From Eq(3.12), it is easy to see that the replication error for appraiser j, ¾2
j = ¾2
aj +¾2
e . A
similar expression can be derived for each of k appraisers and can be collectively expressed
as Ak where the subscript indicates the number of equations in the set.
Ak =
0
BBBBBBBBBB@
¾2
a1 + ¾2
e = ¾2
1
¾2
a2 + ¾2
e = ¾2
2
...
¾2
ak + ¾2
e = ¾2
k
1
CCCCCCCCCCA
Assume that the equations are ordered such that ¾2
1 > ¾2
2 > : : : > ¾2
k. Let · =
¡k
2
¢
. If Ei
is the ith equation in Ak, then performing the operations Ei¡Ei+x where i = 1; :::; (k¡1)
and x = 1; :::; (k ¡ i), gives us the following · equations
A· =
0
BBBBBBBBBB@
¾2
a1
¡ ¾2
a2 = ¾2
1 ¡ ¾2
2
¾2
a1
¡ ¾2
a3 = ¾2
1 ¡ ¾2
3
...
¾2
ak¡1
¡ ¾2
ak = ¾2
k¡1 ¡ ¾2
k
1
CCCCCCCCCCA
41
Using the logic that if a ¡ b = c and a; b; c are nonnegative, then a + b ¸ c, we get
A·+ =
0
BBBBBBBBBB@
¾2
a1 + ¾2
a2
¸ ¾2
1 ¡ ¾2
2
¾2
a1 + ¾2
a3
¸ ¾2
1 ¡ ¾2
3
...
¾2
ak¡1 + ¾2
ak
¸ ¾2
k¡1 ¡ ¾2
k
1
CCCCCCCCCCA
Adding all inequalities in A·+, we get a generalized form for k appraisers as
(k ¡ 1)
Xk
j=1
¾2
aj
¸
Xk
j=1
(k ¡ 2j + 1)¾2
j
)
1
k
Xk
j=1
¾2
aj
¸
1
k(k ¡ 1)
Xk
j=1
(k ¡ 2j + 1)¾2
j (3.13)
A detailed description regarding how the generalized form was calculated can be found
in Appendix(C). The left side of the inequality is the average within appraiser variation
expressed as variance and the right side is an estimate of the lower bound on it (LBa).
The replication error for the jth appraiser is ¾2
j , and can be found using Eq(3.12).
By adding all equations in Ak and dividing by k, the number of appraisers, we get
1
k
Xk
j=1
¾2
aj + ¾2
e =
1
k
Xk
j=1
¾2
j (3.14)
The summation term on the left represents average within-appraiser variation for which
a lower bound, LBa, is given by Eq(3.13). Hence the above equation can rewritten as
LBa + ¾2
e ·
1
k
Xk
j=1
¾2
j
) ¾2
e ·
1
k
Xk
j=1
¾2
j ¡ LBa (3.15)
The term to the right of the inequality represents an estimate of the upper bound on
within-equipment variation (UBe).
42
3.2.2 Estimating Trivial Bounds
As has been shown earlier, ¾2
j = ¾2
aj + ¾2
e 8j. Clearly, ¾2
e · ¾2
j 8j, where the equality
holds when ¾2
aj = 0. Hence ¾2
e · minj(¾2
j ). In other words, UBeT riv = minj(¾2
j ) places an
upper bound on equipment variation. From Eq(3.14)and having an upper bound on ¾2
e
(UBeT riv) we have 1
k
Pk
j=1 ¾2
aj
¸ 1
k
Pk
j=1 ¾2
j ¡UBeT riv where the left side of the inequality
represents the average within-appraiser variation and the right side gives an estimated
lower bound on it (LBaTriv)
Recall that all of the estimates (LBa, UBe, LBaTriv, UBeT riv) have be derived using
expected values. When calculating an actual value for these estimates, estimates of these
expected values will be used. This causes the estimated bounds to be ine®ective at times.
This will be discussed in more detail later.
3.3 Correcting the Estimate for PV
Consider the model given in Eq(2.1). The average of all measurements by all appraisers
on the ith part can be represented as
1
kr
Xk
j=1
Xr
m=1
yijm = y¹i:: = ¹ + ®i + ¯¹: + (®¹¯)i: + ²¹i::
The the range of ¹yi:: over all parts is given by Ri:: = (®i00¡®i0)+(®¯i00:¡®¯i0:)+(²i00::¡²i0::).
As shown earlier, Ri::
d2(n) is traditionally used to estimate part variation, even though it
estimates not ¾2
p, but rather
µ
Ri::
d2(n)
¶2
= ^¾2
p +
^¾2
pa
k
+
^¾2
e
kr
(3.16)
Hence the correct estimate of PV is not Ri::
d2(n) , but rather ^¾2
p =
³
Ri::
d2(n)
´2
¡ ^¾2
pa
k ¡ ^¾2
e
kr . As
mentioned previously, the range-based approach is appropriate when the additive model
43
is believed to adequately represent reality and hence, the interaction term can be ignored,
so that ¾2
pa = 0. The current estimate of process variation should still be adjusted for
^¾2
e
kr , the last term in Eq(3.16) and be given as
^¾2
p =
µ
Ri::
d2(n)
¶2
¡
^¾2
e
kr
=) ^¾2
p =
µ
Ri::
d2(n)
¶2
¡
1
kr
µ ¹R
::m
d2(r)
¶2
=) ^¾2
p = max
Ã
0;
µ
Ri::
d2(n)
¶2
¡
1
kr
µ ¹R
::m
d2(r)
¶2
!
(3.17)
where, ¹R
::m = 1
nk
Pn
i=1
Pk
j=1(maxm yijm ¡ minm yijm) is the average of measurement
ranges over all parts and all appraisers. Under the traditional model,
¹R
::m
d2(r) is typically
used as EV to estimate ¾e, the equipment variation. The last equation follows because a
negative estimate of variance is usually replaced by zero and counted as a null estimate.
Under the traditional model (ignoring within-appraiser variation), the equation above
can be rewritten as PV 2
new = PV 2
old ¡ EV 2
kr .
3.4 On Using Multiple Equipment
As mentioned above, in a situation where multiple equipment or measuring devices of
the same type are routinely used for measurement, a signi¯cant equipment-to-equipment
variance component may exist. This is ignored under the current model and may result
in overestimating the true process variation. Once the decision to use multiple equipment
has been made, a decision as to whether equipment should be treated as a random or a
¯xed factor should be made. This decision depends on the situation and on the objectives
of the study.
For example, if only a small and speci¯c number of equipment are used during
44
routine measurements, then all equipment may be used in the study and equipment
may be treated as a ¯xed factor. The design would then be analyzed as a mixed model
assuming that parts and appraisers would still be random. Any interaction involving part
or appraiser would also be treated as random. The objective with respect to equipment,
however, should be to study the main e®ects of those speci¯c equipment. Any attempt
to generalize the conclusions to the entire population of equipment (if larger than the
sample) will lead to erroneous conclusions. Besides, as mentioned earlier, Montgomery
and Runger (1994a) recommend that even in such a case, the factor should be treated as
a random sample from the potential population.
For now, equipment will be considered to be a random factor. The objective of the
study with respect to equipment would be to estimate the variance components due to
equipment. The linear model that retains all the provisions of the AIAG model, accounts
for within-appraiser variation and allows for multiple equipment to be used is given below:
yijlm = ¹ + ®i + ¯j + !l + ®¯ij + ®!il + ¯!jl + ®¯!ijl + °ijlm + ºijlm (3.18)
where i = 1; : : : ; n j = 1; : : : ; k l = 1; : : : ; q m = 1; : : : ; r
and symbols have their usual meaning. The term !l is the equipment e®ect and is
normally distributed with mean zero and variance ¾2
e . The terms ®¯ij ; ®!il, and ¯!jl
represent two-way interactions between part and appraiser, part and equipment, and
appraiser and equipment, respectively. They are normally distributed with mean zero and
variance ¾2
pa; ¾2
pe, and ¾2
ae, respectively. The three-way interaction among part, appraiser
and equipment is denoted by ®¯!ijl, and is normally distributed with mean zero and
variance ¾2
pae. The replication error is, as before, split into within-appraiser variation and
within-equipment variation. Since the latter can be di®erent for di®erent equipment, the
45
variance of ºijlm is now denoted by ¾2
el and represents the variation within the equipment l.
Similar to previous models, there are n parts and k appraisers. Each appraiser measures
each part using each of q equipment r times. The analysis of variance for this scenario
is given in Table 3.2. The estimates of variance components can be obtained from Table
3.3.
Table 3.2: ANOVA for multiple equipment scenario
Source DF MS EMS
P n ¡ 1 MSp ¾2 + r¾2
pae + qr¾2
pa + kr¾2
pe + kqr¾2
p
A k ¡ 1 MSa ¾2 + r¾2
pae + qr¾2
pa + nr¾2
ae + nqr¾2
a
E q ¡ 1 MSe ¾2 + r¾2
pae + kr¾2
pe + nr¾2
ae + nkr¾2
e
PxA (n ¡ 1)(k ¡ 1) MSpa ¾2 + r¾2
pae + qr¾2
pa
PxE (n ¡ 1)(q ¡ 1) MSpe ¾2 + r¾2
pae + kr¾2
pe
AxE (k ¡ 1)(q ¡ 1) MSae ¾2 + r¾2
pae + nr¾2
ae
PxAxE (n ¡ 1)(k ¡ 1)(q ¡ 1) MSpae ¾2 + r¾2
pae
Error nkq(r ¡ 1) MSE ¾2
Table 3.3: Estimates of variance components
^¾2
ae = MSae¡MSpae
nr ^¾2
e = MSe¡MSpe¡MSae+MSpae
nkr
^¾2
pe = MSpe¡MSpae
kr ^¾2
a = MSa¡MSpa¡MSae+MSpae
nqr
^¾2
pa = MSpa¡MSpae
qr ^¾2
p = MSp¡MSpa¡MSpe+MSpae
kqr
^¾2
pae = MSpae¡MSE
r
3.5 MSA for Destructive Testing
An interesting approach to resolve the problem of destructive testing has been developed
by Mitchell et al. (1997) by conducting the experiment in two stages. In this section, the
mathematical background for their approach is developed. This approach was initially
developed for the semiconductor industry and the same example will be used. Also, the
46
underlying assumptions will be examined and the adaptation of this approach for the
chemical and process industries will be discussed.
In the ¯rst stage 10 parts (or wafers)(n = 10) are selected and 5 locations (l = 5)
are marked on each wafer such that \location 1" on each wafer refers to the exact same
location. The same statement can be made about other locations. Only one operator is
used in this stage. Mitchell et al. (1997) consider this to be a nested design with locations
nested within parts. This choice of design is disputed later on in the section. The linear
model assuming a nested design, however, can be described as
yit = ¹ + ®i + ¿t(i) (3.19)
where i = 1; : : : ; n t = 1; : : : ; l
The term yit is the value of the tth location of the ith part and ¹ is the overall part mean.
It is important to note the di®erence between this model and the previous models. The
term yit represents the true value and not the measurement value, unlike all previous
models. The terms ®i, the part e®ect, and ¿t(i), the e®ect of locations nested within
parts, are independently and normally distributed with mean zero and variance ¾2
p and
¾2
l(p) respectively. The EMS for this scenario are shown in Table(3.4). The part-to-part
Table 3.4: Expected Means Squares for Stage 1
Source DF MS EMS
Part n ¡ 1 MSp ¾2
l(p) + l¾2
p
Location (Part) n(l ¡ 1) MSl(p) ¾2
l(p)
variation can thus be calculated as shown in Eq(3.20).
¾2
p =
MSp ¡MSl(p)
l
(3.20)
Since the variation in the measured values of multiple locations within a wafer may be
47
due to location variance and/or due to variation in the measuring device, these two e®ects
(location and equipment) are confounded (¾2
l(p) = ¾2
l + ¾2
e ).
In Stage 2, 3 appraisers are given 10 di®erent parts (30 parts total), and each ap-
praiser measures the exact same location on the parts. Measuring the exact same location
ensures that no within-wafer or location-to-location variation is included in multiple mea-
surements made by an appraiser. Since each appraiser receives a di®erent set of 10 parts,
this is a two factor nested model with parts being nested in appraiser. The appropriate
linear model for this scenario is
yij = ¹ + ¯j + ®i(j) (3.21)
where i = 1; : : : ; n j = 1; : : : ; k
The term yij is the measurement by the jth appraiser on the ith part and ¹ is the overall
part mean. The terms ¯j , the appraiser e®ect, and ®i(j), the e®ect of parts nested
within appraisers, are normally and independently distributed with a mean zero and
variance ¾2
a, and ¾2
p(a) respectively. The EMS for this stage are given in Table(3.5) The
Table 3.5: Expected Means Squares for Stage 2
Source DF MS EMS
Appraiser k ¡ 1 MSa ¾2
p(a) + n¾2
a
Part(Appraiser) k(n ¡ 1) MSp(a) ¾2
p(a)
di®erences in the measurements made by an appraiser on multiple parts can be explained
by either variation in part dimensions or inconsistency of the measuring device. Hence
equipment variation (¾2
e ) is now confounded with part-to-part variation (¾2
p). In other
words, MSp(a) = ¾2
p + ¾2
e . Substituting ¾2
p from Eq(3.20), the equipment variation (¾2
e )
can be estimated.
Hence, even though measurement variation, in destructive testing, is confounded
48
with sample-to-sample variation, this approach developed by Mitchell et al. (1997) pro-
vides a very clever way of estimating it. This approach needs to be examined, however,
for the underlying assumptions and how those assumptions hold in various scenarios.
Also, it is necessary to show how this method can be adapted for chemical and process
industries and if this adaptation violates some of the underlying assumptions.
3.5.1 An In-Depth Look at the Process
From the description of Stage 2 of the experiment it appears that the di®erence in the
measurement of the same location on two parts by the same appraiser is attributed to
only equipment variation and part-to-part variation. The implicit assumption here is
that the \location e®ect" is constant for a given location regardless of the part. In
other words, if ¹ + ®i + ¿t(i) be the true value of the tth location on the ith part and
¹ + ®i0 + ¿t(i0) be the true mean of the tth location on the i0th part, then this assumption
implies that¿t(i) = ¿t(i0). This is clearly in con°ict with the model for Stage 1.
To see this more clearly, consider taking the range of 10 measurements made by an
appraiser in Stage 2. This range can be expressed as Rij = (yi00j ¡ yi0j) = (®i00(j) ¡ ®i0(j))
such that i00 and i0 refer to the parts with maximum and minimum measured value,
respectively. The corresponding variance can be expressed as
E
µ
Rij
d¤2
(n)
¶2
= ¾2
p(a) = MSp(a) = ¾2
p + ¾2
e (3.22)
Now consider taking, based on Stage 1, the range of measurements of the tth location
on 10 parts by an appraiser. This range can be expressed as Rit = (yi00t ¡ yi0t) =
(®i00 ¡ ®i0) + (¿t(i00) ¡ ¿t(i0)) and the corresponding variance as
E
µ
Rit
d¤2
(n)
¶2
= ¾2
p + ¾2
l(p) = ¾2
p + ¾2
e + ¾2
l (3.23)
49
The variance estimates in Eq(3.22) and Eq(3.23) should have been the same as
they are trying to estimate the same quantity|both were estimated using the range of
measurements on the same location by the same appraiser over 10 parts.
One way these estimates can be the same is if ¿t(i00) = ¿t(i0) = ¿t in Stage1, or the
\location e®ect" for a given location (t in this case) is constant regardless of part (i00 and
i0 in this case). This modi¯es the Eq(3.19) such that yit = ¹ + ®i + ¿t, making location
a crossed e®ect instead of a nested e®ect. This model is equivalent to selecting l random
location e®ects and crossing them with all parts. A problem with the model as de¯ned
above is that the di®erence between true value of a location and the corresponding
part mean is forced to be constant (= ¿t). In practice, it is di±cult to imagine an
application where this would be true, especially in the chemical and process industry
domain. Fortunately, using a crossed design allows for an interaction between part and
location. Hence the true model can be described as
yit = ¹ + ®i + ¿t + ®¿it (3.24)
This model is closer to the experimental scenario described by Mitchell et al. (1997) as it
takes into account the natural variability that will exist despite using the same locations
on each part. The corresponding EMS are given in Table(3.6).
Table 3.6: Modi¯ed Expected Means Squares for Stage 1
Source DF MS EMS
Part (P) n ¡ 1 MSp ¾2
pl + l¾2
p
Location (L) n(l ¡ 1) MSl ¾2
pl + n¾2
l
P x L (n ¡ 1)(l ¡ 1) MSpl ¾2
pl
The EMS in Table(3.4), based on the nested model, are related to the EMS shown
here as shown in Eq(3.25).
MSl(p) =
SSl + SSpl
n(l ¡ 1) + (n ¡ 1)(l ¡ 1)
(3.25)
50
where SS indicates Sum of Squares and subscripts have their usual meaning. The part-to-
part variation can be expressed as ¾2
p = MSp¡MSpl
l and the location-to-location variation
as ¾2
l = MSl¡MSpl
n . Equipment variation, under this model, will be confounded with the
part-by-location interaction.
It is important to assess the impact of these changes to the model for Stage 1 on
the model for Stage 2. Since location e®ects are considered to be crossed, it may seem
like each measurement for Stage 2 should be expressed as yij = ¹ + ¯j + ®i(j) + ¿t. But
since only one (and the same) location is being measured on each part (t = 1) in all cases
¿t = ¿1 is a constant. Hence ¹, the unknown constant will automatically account for it.
Since n parts for each appraiser were chosen separately at random, it is correct to treat
part as a random factor nested within appraisers. No changes are required to the model
in Stage 2.
Independent of the discussion above, consider the fact that Mitchell et al. (1997)
state that in Stage 1, equipment variation is confounded with location-to-location varia-
tion. The argument that can be made in favor of this statement is that measured values
of various locations on the same part are di®erent not only because true location di-
mensions are di®erent (location e®ect) but also due the inconsistency of the measuring
device. However, it can just as easily be argued that equipment variation is confounded
with part-to-part variation|measured values on a particular location across di®erent
parts are di®erent not only due to di®ering part e®ects (and location e®ects) but also
due to inconsistent measuring device. In fact, this is the exact argument made in Stage
2 to claim the confounding of equipment variation with part-to-part variation. Only,
location e®ect is ignored in Stage 2 as explained above. The modi¯cation to the Stage
1 model suggested resolves this ambiguity by realizing that EV is essentially confounded
51
with part-by-location interaction and not with either part e®ect or location e®ect. This is
also consistent with the assumption of Stage 2 that EV is confounded with PV because,
by using only one location, the interaction manifests itself as PV.
The ideas presented here will be tested using simulation. Data would be simulated in
a manner consistent with Eq(3.24) for Stage 1 and Eq(3.21) for Stage 2. The VCs will be
estimated using both the approach given by Mitchell et al. (1997) and the modi¯cation
proposed above. Statistical tests will be performed to test the hypotheses that the
modi¯ed approach is signi¯cantly di®erent from the old approach and that the VCs
estimated using both approaches are, on average, equal to the quantities being estimated.
A designed experiment will be simulated to evaluate the robustness of these hypotheses
across various combinations of PV, LV, EV and PL interaction.
3.6 Comparing Measurement System Acceptability
Criteria
Among all metrics that assess measurement system acceptability, precision-to-tolerance
ratio (PTR) is the only one that takes into account the speci¯cation range, or tolerance
of the product. While this makes PTR a unique metric, it introduces the potential
for inconsistency in conclusion when compared with other metrics. The tolerance of a
process and its variation have a de¯nite link through the sigma capability of the process.
For example, in a Six-Sigma capable process, if the process standard deviation is given
by ¾p, the specs are ¹ § 6¾p, where ¹ is the process mean. Hence the tolerance is 12¾p
and process span is 6¾p (¹ § 3¾p). The convention of 1:5¾ shift in the process mean is
inconsequential to this discussion. If the sigma capability of a process is given by s, then
52
the tolerance of the process can be calculated as 2s¾p.
The sigma capability of the process is used as a common platform for comparison
of these techniques. An arbitrary process standard deviation, say ¾p, will be selected.
The sigma capability of the process will then be used to calculate tolerance values. The
measurement standard deviation will then be systematically varied from extremely low
values to values larger than ¾p. For each scenario all metrics will be calculated, thus
revealing any systematic patterns or inconsistencies. The implementation of this part
will be in Microsoft Excel.
53
Chapter 4
The Simulation Process
Chapter Overview
The validity of the ideas presented in the previous chapter is tested through Monte Carlo
simulations of the measurement process using MATLAB as the programming language.
This chapter walks the reader through the simulation process by illustrating the output
of these programs at each step. This acts not only as a justi¯cation of the approach used
but allows the reader to gain appreciation for the analysis conducted and conclusions
drawn in the next chapter.
4.1 Introduction
There are four distinct areas into which this research can be divided| MSA with within-
appraiser variation, MSA using multiple devices, MSA for destructive testing, and com-
parison of measurement system acceptability criteria. The software developed essentially
consists of three di®erent programs addressing each of the ¯rst three areas combined
into a single application. This section provides a brief overview of each of these and the
subsequent sections discuss each program in more detail.
The ¯rst program allows the user to include within-appraiser variation. It estimates
variance components using the traditional approach and the one recognizing the existence
54
of within-appraiser variation. The latter estimates lower bounds for within-appraiser vari-
ation and upper bounds for equipment variation using two di®erent approaches discussed
in Chapter 3. Range-based estimates of variance components are typically considered to
be inferior to ANOVA-based estimates. This program calculates both, allowing for a
comparison between the two. In addition, this program calculates discrimination ratio
and number of distinct data categories and compares them. It also calculates Appraiser
Variation using a di®erent approach suggested by Montgomery and Runger (1994a).
The second program works under the assumption that the measurement process is
destructive. It tests and extends the approach suggested by Mitchell et al. (1997).
The third program extends the traditional model to include multiple measuring
devices. It uses a three-way random e®ects model (instead of a traditional two-way
model), hence adding a component of variation.
The general approach used to generate data in these programs is very similar and
is shown in the form of a °ow chart in Fig(4.1). It starts with taking an arbitrary part
mean (input from the user) and generates true part dimensions by adding part-to-part
variation to it. Next, each component of variation|appraiser, replication, equipment
etc., is added to the data in a stepwise manner. When the data are ready in a form that
would be available if true part dimensions were not known, various estimation approaches
are used to estimate the variance components of interest. Eventually these estimates will
be compared to the true value being estimated.
55
Take maximum number of
simulations (maxSim) to be run
as input by the user. Let the
current simulation = i
START
Take part mean ( ), and part-to-
part SD (
p ) as input by the
user and generate n parts such
that each part is +
1
z
p
Take measurement SDs (
a e , )
as input by the user and add
measurement variation to each
part such that each part is
now +
1
z
p +
2
z
a +
3
z
e
Set the current simulation
i = i + 1
i<maxSim
YES
Perform statistical tests to
compare the improved
estimates with traditional
ones and test the
effectiveness of both.
NO
END
Figure 4.1: Flow chart of the simulation process
4.2 MSA and Within-Appraiser Variation
This experiment was simulated using 10 parts, 3 appraisers and 2 replications. The
appropriate linear model for this scenario is given by Eq(3.11) as yijm = ¹ + ®i + ¯j +
(®¯)ij + °ijm + ºijm. The presence of the interaction component is not critical to the
ideas being demonstrated here and will be ignored for this example. Based on a part
mean of 50 (¹) and standard deviation of 2 the following ten parts were generated such
that each part is 50 + 2 ¤ N(0; 12)
Each number in this vector represents ¹ + ®i. These ten parts will be the basis for
all future calculations in this example. To correspond with the number of appraisers and
replications, this matrix is rearranged as follows.
56
trueParts=
48.7528
51.9829
48.6799
41.6799
44.8551
51.2155
43.7170
48.2641
45.2931
44.1272
truePartMatrix =
48.7528 48.7528 48.7528 48.7528 48.7528 48.7528
51.9829 51.9829 51.9829 51.9829 51.9829 51.9829
48.6799 48.6799 48.6799 48.6799 48.6799 48.6799
41.6799 41.6799 41.6799 41.6799 41.6799 41.6799
44.8551 44.8551 44.8551 44.8551 44.8551 44.8551
51.2155 51.2155 51.2155 51.2155 51.2155 51.2155
43.7170 43.7170 43.7170 43.7170 43.7170 43.7170
48.2641 48.2641 48.2641 48.2641 48.2641 48.2641
45.2931 45.2931 45.2931 45.2931 45.2931 45.2931
44.1272 44.1272 44.1272 44.1272 44.1272 44.1272
In the table above, the ¯rst two columns would eventually correspond to two repli-
cations by the ¯rst appraiser on ten parts and next two columns will correspond to two
replications by the second appraiser on all ten parts and so on.
A bias for each of the three appraisers is calculated as normal random variables with
mean zero and a standard deviation provided as an input by the user. These biases (¯j),
for this example are
aBias=
-2.0423 -0.8033 0.3473
The following matrix is an elementary transformation of aBias to correspond with
the format of \truePartMatrix" shown above.
The °exibility in these transformations is obtained by using an elementary transfor-
57
aBiases=
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
-2.0423 -2.0423 -0.8033 -0.8033 0.3473 0.3473
mation matrix, denoted by E1 in the program. E1 is obtained by taking the \Kronecker
product" Ik
N
j0r
where Ik is a k x k identity matrix, k is the number of appraisers, j0r
is a 1 x r vector of ones and r is the number of replications. The Kronecker product is
an operator, denoted by
N
, that takes two matrix arguments of arbitrary sizes. If A be
an mxn matrix, and B be an rxs matri, then A
N
B is an mrxns matrix, formed from
A by multiplying each element of A by the entire matrix B and putting it in the place
of the element of A. For formal de¯nition, properties and use of kronecker products the
reader is referred to Christensen (2002) or Graybill (1983).
For the purposes of this example, within-appraiser variation has been included (the
user has the option to ignore it). As shown earlier, replication error is the sum of within-
equipment variation and average within-appraiser variation. The user inputs a value of
replication error, expressed as standard deviation, to be used in the simulation. This
replication error, after converting to variance, is split between equipment and within-
appraiser variation based on a fraction f such that ¾2
e = (1 ¡ f)¾2. The default for
f is 0.5, which means the replication error is divided equally into its two components.
This default value has been retained for this example. The fraction attributed to within-
appraiser variation is used as an average and the standard deviation for each appraiser is
calculated as a uniform random variable with variance V . To simulate within-appraiser
58
variation, a variation is added to each measurement made by an appraiser based on their
corresponding standard deviation. The matrix \wiav" shows these values in a format that
corresponds with the \truePartMatrix". These values represent °ijm. If the user chooses
to use the traditional model, which does not account for within-appraiser variation, these
values will all be zeros.
wiav =
-0.4544 1.0205 3.8563 0.9547 0.0415 3.2377
4.1638 -0.0516 -2.7213 0.3244 3.8858 3.7676
-0.9602 -2.2705 0.4105 5.6130 -3.3534 2.9544
-5.9380 8.3592 -0.4223 -0.8355 -3.3497 6.0825
0.0381 -1.0080 -3.5332 6.7195 -0.9340 1.5187
0.2793 -5.5154 3.5950 4.5814 -0.0950 -3.1246
1.2386 6.9263 -0.0469 -5.9072 1.2948 -0.6933
1.9558 1.2738 1.6285 -5.1038 4.9135 -11.3703
5.0011 -4.3787 -2.1758 -1.7418 2.8863 2.1974
-2.1436 2.4274 -1.9909 -0.5643 -8.1278 0.5430
Similarly replication errors (ºijm) calculated based on the standard deviation input
from user are shown by matrix \ev".
ev=
-1.1822 2.5851 2.1678 -2.2490 -0.8936 -1.3398
-1.3094 0.8818 -1.9624 3.4714 2.1642 2.6819
-2.1613 2.5619 -1.3770 3.8749 4.7453 0.7762
-0.0955 -0.9955 2.6790 3.2701 0.4586 0.7861
0.7587 -2.2374 -1.8185 -2.5119 -0.5332 -3.4147
-0.6607 1.6153 -0.8257 -0.4271 1.4033 0.4557
-0.9998 0.0824 -1.0123 -0.3979 -0.9752 1.3713
-0.0720 -1.5124 3.2395 0.6150 3.7250 -1.2736
-0.3495 -0.1783 0.1618 -1.1447 2.2137 -2.0052
-1.9145 -4.0177 -2.1621 -1.9553 -2.4551 -0.3712
The ¯nal measurements calculated as the sum of truePartMatrix, aBiases, repEr-
rors and wiav are shown as matrix \measurements". These measurements represent
¹ + ®i + ¯j + °ijm + ºijm. These measurements can now be used for estimating vari-
ous components of variance using both range-based and ANOVA-based approaches. In
59
measurements=
45.0740 50.3162 53.9737 46.6552 48.2480 50.9980
52.7950 50.7709 46.4959 54.9753 58.3802 58.7796
43.5161 46.9290 46.9102 57.3646 50.4192 52.7579
33.6042 47.0014 43.1333 43.3113 39.1361 48.8959
43.6096 39.5674 38.7001 48.2595 43.7352 43.3064
48.7917 45.2731 53.1814 54.5665 52.8711 48.8939
41.9136 48.6835 41.8545 36.6086 44.3840 44.7423
48.1056 45.9832 52.3287 42.9719 57.2498 35.9675
47.9024 38.6939 42.4758 41.6033 50.7405 45.8326
38.0268 40.4946 39.1708 40.8042 33.8916 44.6462
addition to estimation, range-based estimates will be compared with ANOVA-based esti-
mates; wherever improved estimates have been derived, these will be compared with the
traditional estimates; and all estimates will be tested for e®ectiveness. This analysis can
be found in Chapter 5.
4.3 MSA Using Multiple Devices
This experiment was simulated using 5 parts, 2 measurement devices, 2 appraisers and
2 replications. The appropriate model for this scenario is given by Eq(3.18) as yijlm =
¹ + ®i + ¯j + !l + ®¯ij + ®!il + ¯!jl + ®¯!ijl + °ijlm + ºijlm.
Based on a part mean of 100 (¹) and standard deviation of 2, as supplied through
user input, 5 parts were generated such that each part is 100 + 2 ¤ N(0; 12) and are
shown as vector \trueParts". To correspond with the number of appraisers, equipment
trueParts =
98.7722
100.7510
102.9358
100.5557
100.6055
and replications, the above vector is modi¯ed as shown by \truePartMatrix". The part,
60
truePartMatrix=
98.7722 98.7722 98.7722 98.7722 98.7722 98.7722 98.7722 98.7722
100.7510 100.7510 100.7510 100.7510 100.7510 100.7510 100.7510 100.7510
102.9358 102.9358 102.9358 102.9358 102.9358 102.9358 102.9358 102.9358
100.5557 100.5557 100.5557 100.5557 100.5557 100.5557 100.5557 100.5557
100.6055 100.6055 100.6055 100.6055 100.6055 100.6055 100.6055 100.6055
appraiser and equipment associated with any cell in this matrix can be found from the
corresponding cell in each of the matrices, \parts", \appr" and \eq". Columns 1 & 2
represent two replications by appraiser 1 using equipment 1; columns 3 & 4 represent
replications by appraiser 2 using equipment 1; columns 5 & 6 correspond to replications
by appraiser 1 using equipment 2; and the last two columns represent replications by
appraiser 2 using equipment 2. A bias for each of the 2 appraisers is calculated as
parts = appr =
1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2
2 2 2 2 2 2 2 2 1 1 2 2 1 1 2 2
3 3 3 3 3 3 3 3 1 1 2 2 1 1 2 2
4 4 4 4 4 4 4 4 1 1 2 2 1 1 2 2
5 5 5 5 5 5 5 5 1 1 2 2 1 1 2 2
eq =
1 1 1 1 2 2 2 2
1 1 1 1 2 2 2 2
1 1 1 1 2 2 2 2
1 1 1 1 2 2 2 2
1 1 1 1 2 2 2 2
normal random variables with mean zero and a standard deviation provided as an input
by the user. These biases (¯j), for this example are given by \aBias".
An elementary transformation of this matrix is used so that it corresponds with the
format of \truePartMatrix". The transformation matrix is obtained as E1 = [A A] where
A=Ik
N
j0r
where Ik is a k x k identity matrix, k is the number of appraisers, j0r
is a 1
61
x r vector of ones and r is the number of replications. For this example, E1 is shown
below.
aBias= E1 =
-1.1236 -1.0313 1 1 0 0 1 1 0 0
0 0 1 1 0 0 1 1
The transformation of aBias obtained using E1 is shown below as aBiases.
aBiases =
-1.1236 -1.1236 -1.0313 -1.0313 -1.1236 -1.1236 -1.0313 -1.0313
-1.1236 -1.1236 -1.0313 -1.0313 -1.1236 -1.1236 -1.0313 -1.0313
-1.1236 -1.1236 -1.0313 -1.0313 -1.1236 -1.1236 -1.0313 -1.0313
-1.1236 -1.1236 -1.0313 -1.0313 -1.1236 -1.1236 -1.0313 -1.0313
-1.1236 -1.1236 -1.0313 -1.0313 -1.1236 -1.1236 -1.0313 -1.0313
A bias for each equipment is calculated using the equipment-to-equipment variation
given as an input by the user. For this example, these biases (!l) are given by the vector
eqBias as shown below.
Consider another transformation matrix E2 obtained as the Kronecker product
Iq
N
j0k
r where symbols have their usual meaning and q is the number of measuring
devices used in the experiment.
eqBias= E2 =
4.6504 -0.5033 1 1 1 1 0 0 0 0
0 0 0 0 1 1 1 1
Using E2, eqBias is transformed so as to match the format of \truePartMatrix".
The next step is to simulate 2-way and 3-way interactions. To understand how
interaction is simulated, recall that the part e®ect ®i was simulated such that the e®ect
changed whenever i changed; but for a given value of i the e®ect remained the same
regardless of the values of j; m; and l. In other words, as seen in \truePartMatrix", the
62
eqBiases =
4.6504 4.6504 4.6504 4.6504 -0.5033 -0.5033 -0.5033 -0.5033
4.6504 4.6504 4.6504 4.6504 -0.5033 -0.5033 -0.5033 -0.5033
4.6504 4.6504 4.6504 4.6504 -0.5033 -0.5033 -0.5033 -0.5033
4.6504 4.6504 4.6504 4.6504 -0.5033 -0.5033 -0.5033 -0.5033
4.6504 4.6504 4.6504 4.6504 -0.5033 -0.5033 -0.5033 -0.5033
e®ect changes as part changes, but for a given part it remains constant regardless of the
equipment, appraiser or replication associated with it.
Hence the part-by-appraiser interaction e®ect (®¯ij) should be simulated such that
it changes with a change in part (subscript i), or appraiser (subscript j) but should be
una®ected by the change in equipment (subscript l) or replication (subscript m). Hence
an n x k matrix of normal random variables was created using the part-by-appraiser inter-
action standard deviation provided as an input to the simulation. It is then reorganized
using the transformation matrix E1 to create the interaction matrix paInt.
paInt =
-0.0620 -0.0620 0.5149 0.5149 -0.0620 -0.0620 0.5149 0.5149
-0.6349 -0.6349 -1.2682 -1.2682 -0.6349 -0.6349 -1.2682 -1.2682
1.0488 1.0488 0.5651 0.5651 1.0488 1.0488 0.5651 0.5651
-1.2179 -1.2179 0.1425 0.1425 -1.2179 -1.2179 0.1425 0.1425
0.4174 0.4174 -0.6530 -0.6530 0.4174 0.4174 -0.6530 -0.6530
Similarly, the part-by-equipment interaction e®ect (®!il) and appraiser-by-equipment
interaction e®ect (¯!jl) matrices were generated for this example as shown in peInt and
aeInt. The matrix peInt was transformed using the elementary transformation matrix
E2, whereas aeInt was transformed using a new transformation matrix E3 as shown be-
low. Similarly the 3-way interaction (®¯!ijl) is simulated and transformed using E3 to
create paeInt.
The ¯nal measurements are calculated as the sum of truePartMatrix, aBiases, eqBi-
ases, paInt, peInt, aeInt, paeInt, and repErrors and are shown below as the matrix
63
peInt =
2.1053 2.1053 2.1053 2.1053 -0.8943 -0.8943 -0.8943 -0.8943
-0.5349 -0.5349 -0.5349 -0.5349 0.4797 0.4797 0.4797 0.4797
-0.4735 -0.4735 -0.4735 -0.4735 -0.0505 -0.0505 -0.0505 -0.0505
-1.2496 -1.2496 -1.2496 -1.2496 -1.0313 -1.0313 -1.0313 -1.0313
-1.5530 -1.5530 -1.5530 -1.5530 -0.5728 -0.5728 -0.5728 -0.5728
aeInt =
0.7654 0.7654 0.4799 0.4799 0.0134 0.0134 -0.4845 -0.4845
0.7654 0.7654 0.4799 0.4799 0.0134 0.0134 -0.4845 -0.4845
0.7654 0.7654 0.4799 0.4799 0.0134 0.0134 -0.4845 -0.4845
0.7654 0.7654 0.4799 0.4799 0.0134 0.0134 -0.4845 -0.4845
0.7654 0.7654 0.4799 0.4799 0.0134 0.0134 -0.4845 -0.4845
E3 =
1 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 1
\measurements".
paeInt =
-0.8813 -0.8813 -0.6358 -0.6358 -1.6277 -1.6277 0.0162 0.0162
-0.6440 -0.6440 0.6634 0.6634 0.4890 0.4890 -0.2430 -0.2430
-1.5190 -1.5190 1.6799 1.6799 -0.4235 -0.4235 1.9121 1.9121
0.0399 0.0399 1.2018 1.2018 1.7426 1.7426 -1.0231 -1.0231
0.6737 0.6737 0.0116 0.0116 0.7866 0.7866 -1.4837 -1.4837
repErrors =
-1.3700 -0.8832 -0.6980 2.5090 0.5900 2.1694 -1.3531 -3.7538
-1.3695 -1.3865 -1.5963 -1.7533 -2.3388 -0.4037 1.4838 -2.7851
-2.7193 -0.5678 2.4035 1.3479 0.2512 1.0028 -1.8466 1.0009
1.3186 -0.5846 -1.6701 0.0965 -2.3603 -0.7547 -0.7026 0.8232
-0.1622 -2.5199 0.9650 1.4954 2.3829 -2.8814 -2.6791 0.9175
Referring to the underlying model described earlier, this represents ¹+®i+¯j+!l+
®¯ij +®!il +¯!jl +®¯!ijl +ºijlm. The within-appraiser e®ect °ijlm is ignored from this
example. These measurements can now be used for estimating the various components
of variance as discussed in the next chapter.
64
measurements =
102.8564 103.3431 104.1577 107.3647 95.1647 96.7440 95.0368 92.6361
101.8599 101.8429 102.1141 101.9570 97.1326 99.0677 99.1843 94.9154
103.5648 105.7163 111.2098 110.1541 102.1481 102.8998 101.4968 104.3442
103.7389 101.8357 103.0794 104.8460 96.0752 97.6808 95.9222 97.4480
104.2736 101.9158 103.4752 104.0055 102.0062 96.7419 93.1979 96.7945
4.4 MSA for Destructive Testing
The program simulating MSAD uses the term \unit" instead of \part" to keep the appli-
cation more general for use in other industries such as chemicals, food and other process
industries. The concept of distinct parts is not applicable to these industries, whereas, a
unit is a more general entity. Depending on the context, a unit could be a barrel full of
chemical/oil or a truck load of chemical/oil or a grab sample from an ongoing production
process.
The results being demonstrated here are from a single run of this experiment, sim-
ulated with 10 units, 5 locations per unit and 1 appraiser in Stage 1, and 3 appraisers,
10 units per appraiser and 1 location per unit in Stage 2.
The matrix of units in Stage 1 is shown below such that rows represent parts and
columns represent locations. Each row represents ¹ + ®i.
unitsS1=
101.1062 101.1062 101.1062 101.1062 101.1062
103.9223 103.9223 103.9223 103.9223 103.9223
100.8282 100.8282 100.8282 100.8282 100.8282
94.8955 94.8955 94.8955 94.8955 94.8955
104.0019 104.0019 104.0019 104.0019 104.0019
98.5263 98.5263 98.5263 98.5263 98.5263
100.1815 100.1815 100.1815 100.1815 100.1815
98.6017 98.6017 98.6017 98.6017 98.6017
97.3815 97.3815 97.3815 97.3815 97.3815
101.7871 101.7871 101.7871 101.7871 101.7871
Next, a bias for each location is generated, representing ¿t, and is shown below as
the matrix \locations".
The unit-by-location interaction was simulated in a way similar to interactions have
65
locations=
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
3.4521 -1.0078 0.3488 -2.4465 0.061
been simulated thus far. The values for the interaction e®ect, represented by ®¿it, are
shown below as \luInt".
luInt=
-0.3042 -1.6417 2.1093 1.2994 4.5557
1.3936 -0.6333 -1.7938 1.6452 -3.6292
-0.5294 0.0174 -1.0024 3.1723 -0.1666
-0.8656 0.3828 -0.9800 -0.4100 -4.6867
2.8739 0.4443 -3.2569 -1.4796 1.3193
-1.7767 -3.0632 0.5471 0.6830 -0.7703
0.9569 0.8673 -0.0083 1.9801 -1.8034
3.1287 -2.6568 -2.4960 0.3854 0.2352
-1.0183 1.0056 3.6519 -0.0756 -2.8437
-0.9384 0.0052 1.5422 -4.7941 4.0260
Equipment variation for stage 1 calculated based on user input is shown below as
EVS1.
EVS1=
-3.2135 -2.9833 1.5153 0.2675 -0.8581
-4.1411 -1.7613 5.6944 5.8495 0.6183
1.3406 1.4180 -0.5290 -2.7673 7.7009
-0.3689 1.4319 -2.3633 -2.8723 -0.1880
-0.7772 2.7384 2.9627 1.4757 3.1118
-6.7977 -0.1874 5.3360 0.2882 -1.7879
-0.4256 2.0502 -0.5807 -0.9225 -0.2151
-2.3654 3.3084 1.1593 0.8366 -0.8703
-2.0738 -4.6449 4.9490 1.9353 0.9668
0.6320 -7.2807 2.7138 2.9022 1.8614
The measurement data for stage 1 are calculated as dataS1 = unitsS1 + locations +
luInt + EVS1 and is shown below. The matrix dataS1 represents yit = ¹+®i +¿t +®¿it.
Now, for stage 2, 30 di®erent units are used|10 per appraiser. These values, for the
same unit mean and unit-to-unit variance as Stage 1 are shown as unitS2.
66
dataS1=
101.0407 95.4735 105.0795 100.2267 104.8649
104.6269 100.5199 108.1716 108.9705 100.9723
105.0915 101.2559 99.6456 98.7868 108.4235
97.1132 95.7025 91.901 89.1668 90.0818
109.5508 106.1768 104.0565 101.5515 108.494
93.4041 94.2679 104.7581 97.0511 96.0292
104.165 102.0912 99.9413 98.7927 98.2241
102.8172 98.2455 97.6137 97.3772 98.0276
97.7415 92.7344 106.3311 96.7947 95.5657
104.9328 93.5037 106.3918 97.4487 107.7354
unitsS2=
95.4994 98.2792 102.3741
99.9099 97.2776 102.0766
101.6439 100.8262 103.0069
99.1996 103.179 95.8448
97.2365 100.3442 98.4453
99.0923 99.9631 99.82
96.7974 99.7633 95.5492
93.2081 100.4245 102.9516
96.3031 101.8382 99.6154
104.9658 101.0014 104.6096
The biases of each appraiser are shown below as apprBiases and the equipment
variation for this stage as EVS2.
apprBiases= EVS2=
4.3103 -1.5316 -3.1774 -7.5696 -4.0317 5.8562
4.3103 -1.5316 -3.1774 2.4576 -2.301 -3.0336
4.3103 -1.5316 -3.1774 2.4576 -2.301 -3.0336
4.3103 -1.5316 -3.1774 0.8135 1.0809 3.0664
4.3103 -1.5316 -3.1774 1.6084 6.299 0.3225
4.3103 -1.5316 -3.1774 -1.2671 -1.3991 6.2993
4.3103 -1.5316 -3.1774 -1.8971 -2.3282 2.002
4.3103 -1.5316 -3.1774 0.3366 5.6315 0.4305
4.3103 -1.5316 -3.1774 -3.7478 -3.3536 -1.9964
4.3103 -1.5316 -3.1774 -3.9796 0.4488 4.1241
The measurement data for stage 2 are calculated as dataS2 = unitsS2 + apprBiases
+ EVS2 and is shown below. The matrix dataS2 represents yij = ¹ + ¯j + ®i(j).
The measurement data in dataS1 and dataS2 can now be used to estimate the EV
and AV. This will be discussed in the next chapter.
67
dataS2=
92.24 92.7159 105.053
104.5486 93.8848 94.9325
108.4117 96.9937 96.7959
104.3233 102.7283 95.7338
103.1551 105.1117 95.5904
102.1355 97.0324 102.9419
99.2106 95.9035 94.3739
97.8549 104.5245 100.2046
96.8656 96.953 94.4416
105.2964 99.9187 105.5563
68
Chapter 5
Analyzing the Results
Chapter Overview
The previous chapter focused on explaining the approach used to generate data that
would be used for analysis under various scenarios. This chapter takes that data, conducts
the analysis, and discusses the results in detail. These results are for a single simulation
run giving us one estimate for each variable of interest. Hence, comparisons will not be
made to the true values. Such comparisons will be presented using appropriate statistical
tests based on the result of multiple such simulation runs.
5.1 MSA and Within-Appraiser Variation
We obtained the following as the ¯nal measurements in Section 4.2. Once these mea-
measurements=
45.0740 50.3162 53.9737 46.6552 48.2480 50.9980
52.7950 50.7709 46.4959 54.9753 58.3802 58.7796
43.5161 46.9290 46.9102 57.3646 50.4192 52.7579
33.6042 47.0014 43.1333 43.3113 39.1361 48.8959
43.6096 39.5674 38.7001 48.2595 43.7352 43.3064
48.7917 45.2731 53.1814 54.5665 52.8711 48.8939
41.9136 48.6835 41.8545 36.6086 44.3840 44.7423
48.1056 45.9832 52.3287 42.9719 57.2498 35.9675
47.9024 38.6939 42.4758 41.6033 50.7405 45.8326
38.0268 40.4946 39.1708 40.8042 33.8916 44.6462
surements are obtained, the next step is to estimate the various components of variation
69
in this data set using both range-based and ANOVA-based approaches.
5.1.1 Range-Based Estimation
First we calculate the range of the two measurements made by an appraiser on the same
part (Rijm = yijm00 ¡yijm0 ). Since this example contains ten parts and three appraisers, a
total of thirty ranges will be calculated. The transpose of this matrix of ranges is shown
below.
ranges=
5.2422 2.0242 3.4129 13.3973 4.0423 3.5186 6.7699 2.1224 9.2085 2.4678
7.3185 8.4794 10.4544 0.1779 9.5594 1.3851 5.2459 9.3568 0.8725 1.6334
2.7500 0.3995 2.3387 9.7597 0.4287 3.9772 0.3583 21.2823 4.9078 10.7546
The average of all these ranges (¹R
::m = 1
nk
Pk
j=1
Pn
i=1(yijm00 ¡ yijm0) as given previ-
ously in Eq(3.2)) is 5.4549. Dividing by the appropriate d¤2value gives us an estimate of
replication error (^¾) expressed as standard deviation (sigmaE=4.8359).
Then the average of all measurements taken by an appraiser across all parts is
calculated (¹y:j: = 1
nr
Pn
i=1
Pr
m=1 yijm) as shown in Eq(3.3). The averages for the three
appraisers in this example are given below. The range of these averages (rangeAppr
apprAvg=
44.8526
46.2672
47.6938
= 2.8412), when divided by the appropriate d¤2
value gives an unadjusted estimate of
appraiser variation (rawSigmaA = 1.4875) as given in Eq(3.4). After adjustment an
estimate of the true appraiser variation (^¾a) is calculated as sigmaA = 1.0215.
Next, the replication error for each appraiser is calculated separately (¾2
j ) and is
70
given by the following vector. Substituting these values into Eq(3.13) gives an estimate
apprRepE =
4.9101
4.6968
4.5005
of the lower bound on average within-appraiser variation (LBa = 1.2847). Using this
value of LBa, an estimate of upper bound on equipment variation is calculated (UBe
= 20.8565) (refer to Eq(3.15)). Recall that the portion of replication error assigned to
within-appraiser variation based on the fraction f is the true average within-appraiser
variation. Hence, using Eq(3.14), the true value of equipment variation is calculated
(eqVar = 6.7779) for comparison with the upper bound. Trivial bounds on equipment
variation and average within-appraiser variation are calculated as UBeT riv = 20:2547 and
LBaT riv = 1:8866, respectively.
Next the average of two measurements made by an appraiser on a given part
(¹yij: = 1
r
P
m yijm) are calculated. Since this example contains ten parts and three ap-
praisers, a total of thirty averages will be calculated. The transpose of this matrix of
averages is shown below. To calculate appraiser variation using the approach suggested
apprPartAvg =
47.6951 51.7829 45.2226 40.3028 41.5885 47.0324 45.2985 47.0444 43.2981 39.2607
50.3144 50.7356 52.1374 43.2223 43.4798 53.8739 39.2316 47.6503 42.0396 39.9875
49.6230 58.5799 51.5885 44.0160 43.5208 50.8825 44.5632 46.6087 48.2866 39.2689
by Montgomery and Runger (1994a) the range of these averages is calculated for each
part, thus giving us ten ranges as shown below. The average of these ranges is used to
calculate a \raw" estimate of appraiser variation (rawNewSigmaA = 2.5551) as shown
in Eq(3.8). The adjusted estimate of AV is given by Eq(3.9). When this estimate turns
71
apprPartRange =
2.6194 7.8443 6.9148 3.7132 1.9323 6.8415 6.0669 1.0417 6.2470 0.7268
out to be negative, it is considered to be zero as was the case in this example.
To calculate the part-to-part variation, the average of all measurements taken by all
appraisers is calculated for each part (¹yi:: = 1
kr
Pr
m=1
Pk
j=1 yijm as given by Eq(3.10). This
is given by the vector partEstimate. Each number in this vector is the best obtainable
partEstimate =
49.2108 53.6995 49.6495 42.5137 42.8630 50.5963 43.0311 47.1011 44.5414 39.5057
estimate of the true dimension of the corresponding part. The traditional estimate of
PV was calculated as sigmaP = 4.4634. After the correction recommended by Eq(3.16),
the new estimate of PV was 4.003.
Measurement variation (¾m) is calculated as sigmaM =
p
sigmaA2 + sigmaE2 =
4:9426 in this example. This relationship does not ignore within-appraiser variation.
Recall that sigmaE represents replication error, not just equipment variation and as
such, includes within-appraiser variation. If the user of the program chooses to ignore
the latter, sigmaE will then be numerically equal to equipment variation.
Total variation is calculated as sigmaT =
p
sigmaM2 + sigmaP2 = 6:6596 and the
intraclass correlation coe±cient is calculated as r = sigmaP2=sigmaT2 = 0:4492. Dis-
crimination ratio, calculated as
q
1+r
1¡r is found to be 1.6220.
This completes one simulation run. If multiple such runs are performed, the variables
described below track aggregate information.
The variable \errors" tracks the number of times the lower bound estimate (LBa)
of within-appraiser variation was violated. This estimated bound is said to be violated
72
if the true within-appraiser variation is less than LBa. The variables a2a, repE and p2p
calculate the mean of vectors sigmaA, sigmaE, and sigmaP, respectively. They repre-
sent appraiser-to-appraiser, replication error and part-to-part variation averaged over all
simulation runs. The variable mv =
p
a2a2 + repE2 calculates the average measurement
variation and tv =
p
mv2 + p2p2 calculates the average total variation. Measurement
variation as a percent of total variation expressed as standard deviation is calculated as
percTVsd = mv/tv and the same value expressed as variance is calculated as percTVvar
= percTVsd2.
5.1.2 ANOVA-Based Estimation
Consider the following matrices|part and appr. Each has n = 10 rows and kxr = 3x2 =
6 columns where n; k; and r represent, as usual, number of parts, number of appraisers
and number of replications. For the measurement value in any cell of the \measurement"
part =
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8
9 9 9 9 9 9
10 10 10 10 10 10
matrix shown earlier, the corresponding cell in the \part" and \appr" matrices reveals
which part and appraiser the measurement is associated with.
Stacking the columns of each of the three matrices|measurements, part, and appr,
into a single column and then merging the columns together prepares the data for ANOVA
as shown in the table below.
73
appr =
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
1 1 2 2 3 3
appr =
45.0740 1.0000 1.0000
52.7950 2.0000 1.0000
43.5161 3.0000 1.0000
33.6042 4.0000 1.0000
...
...
...
44.7423 7.0000 3.0000
35.9675 8.0000 3.0000
45.8326 9.0000 3.0000
44.6462 10.0000 3.0000
Running a 2-way ANOVA on this dataset generated the following output. Using
'Source' 'Sum Sq.' 'd.f.' 'Singular?' 'Mean Sq.' 'F' 'Prob>F'
'Part' [1.0778e+003] [ 9] [ 0] [119.7506] [4.6161] [6.9651e-004]
'Appr' [ 80.7241] [ 2] [ 0] [ 40.3620] [1.5559] [ 0.2276]
'Part*Appr' [ 214.8190] [ 18] [ 0] [ 11.9344] [0.4600] [ 0.9565]
'Error' [ 778.2499] [