DEN Discussion List Archive

[Date Prev][Date Next][Date Index] [Thread Index] [Author Index]

Bias Correction Factors



The bias correction factors used to obtain the scaling factors for computing
limits for process behavior charts have a long and distinguished history.
First, in 1925  empirical values were published in a paper by L. H. C.
Tippett.  Next, in 1933, E. S. Pearson and A. T. McKay had obtained the
exact distribution for the range of sample of size n = 3 drawn from a normal
distribution.  From this distribution the d2 and d3 values for n = 3 could
be obtained.  Then, in 1942, H. O. Hartley and E. S. Pearson published the
tabulated values for d2 and d3 for n = 2 to 20.  Since each of these values
required the numerical evaluation of a triple integral, they were quite
difficult to obtain initially.  In 1960. H. L. Harter published the values
of d2 and d3 out to 13 decimal places for n = 2 to n = 100.

In 1967, Irving Burr decided to look at the values for d2 and d3 for the
cases where the original data were not normally distributed.  He used 27
different non-normal distributions with skewness parameters as large as 1.9
and kurtosis parameters as large as 12.4.  What he found is best summarized
as a remarkable robustness to non-normality.  The worst case for d2 occurred
at n = 2, where the values found across these 27 distributions had a
coefficient of variation of 2.26%.   For n = 10 this coefficient of
variation dropped to 1.6%.

These coefficients of variation represent the uncertainty introduced into
the computation of the limits attributable to a lack of knowledge of the
original distribution for the individual values.  To put these values in
perspective, it is helpful to realize that the coefficient of variation for
the Average Range is equal to the inverse of the square root of twice the
degrees of freedom for that Average Range.  To consider the worst case
scenario, an XmR Chart, where the degrees of freedom for the Average Range
are approximately 0.61 times the number of two-point moving ranges used to
compute the Average Moving Range, a Coefficient of Variation of 2.26%
corresponds to a baseline for computing the limits that consists of 1606 X
values.

In other words, until you have obscene amounts of data, the uncertainties
introduced by using the normal theory values of d2 and d3 will be completely
overwhelmed by uncertainty in the Average Range value itself.  To worry
about the effects of non-normality is like going to the beach and using a
yardstick to try to verify a theoretical rise in the average sea-level of 3
millimeters.  You are just going to get in the way of those who are enjoying
the beach.

More on this can be found in my books Advanced Topics in SPC and Normality
and the Process Behavior Chart.

Hope this helps.

-- 
Donald J. Wheeler, Ph.D.
Fellow American Statistical Association
Fellow American Society for Quality



DEN Home | Main Index | Thread Index | Author Index