Comparison of Bootstrap Confidence Interval Methods in a Real Application: The Kappa Statistic in Industry

Kirk O. Monson and Dr. Bruce G. Schaalje, Statistics

The kappa statistic is defined as the proportion of agreement between two instruments after chance agreement is removed from consideration (Cohen 1960). It is very useful when assessing the similarities of two instruments or procedures. Industries often use an automated system for testing their products to ensure a level of quality before packaging their goods. Since these systems parts must be periodically replaced, it is necessary to determine if the new components have the same level of accuracy or performance as the old ones. The kappa statistic would be very useful in this situation if a more reliable method for determining confidence intervals was available.

I wrote code in the SAS/IML language that simulated random values of the kappa statistic in realistic situations involving true values of kappa close to 1, more than two categories, and non-uniform marginal distributions. The code calculated the kappa statistic and its confidence interval using three new methods, the bootstrap-t, the accelerated bias-corrected percentile method, and a new percentile method. I evaluated the results by comparing them with the results of previous research.

Unfortunately, these three new confidence interval methods were shown to be unsuccessful in improving the reliability of determining confidence intervals for the kappa statistic. In fact, two of the methods, the accelerated bias corrected percentile and the new percentile, performed much worse than the methods previously studied. The bootstrap-t performed about the same. We were expecting these new methods to improve the reliability of determining confidence intervals for the kappa statistic, especially the accelerated bias corrected percentile, so we were surprised with the results.

These are the results of two different runs where I varied the kappa, theta, and number of observations
parameters.

References

Cohen, J. (1960). AA Coefficient of Agreement for Nominal Scales,@ Educational and Psychological Measurement, 20, 37-46.

DiCiccio, T. and Efron, B. (1988). AA Review of Bootstrap Confidence Intervals,@ Journal of the Royal Statistical Society, Series B, 50, 338-355.

Efron, B. (1987). ABetter Bootstrap Confidence Intervals (with discussion),@ Journal of the American Statistical Association, 82, 171-200.

Hall, P.J. (1988). ATheoretical Comparison of Bootstrap Confidence Intervals (with discussion),@ Annals of Statistics, 16, 927-985.

Guerra, R., Polansky, A. M., and Schucany, W.R. (1997). ASmoothed Bootstrap Confidence Intervals with Discrete Data,@ Computational Statistics and Data Analysis, 26, 163-176.

Wilkinson, R. C. (1999). Reliable Confidence Intervals for the Kappa Statistic, with Application to the Semi-Conductor Industry, M.S. Thesis, Department of Statistics, Brigham Young University.