The results from an interlaboratory evaluation are said to be consistent if their dispersion is not more than what can reasonably be attributed to their stated variances. A well known test of consistency in interlaboratory evaluations is the Birge test, named after its developer physicist Raymond T. Birge. We show that the Birge test may be interpreted as a classical test of the null hypothesis that the variances of the results are less than or equal to their stated values against the alternative hypothesis that the variances of the results are more than their stated values. A modern protocol for hypothesis testing is to calculate the classical p-value under the null hypothesis of realizing a value of the test statistic equal to or larger than observed (realized) and to reject the null hypothesis when the p-value is too small. We show that, interestingly, the classical p-value of the Birge test statistic is equal to the Bayesian posterior probability of the null hypothesis based on commonly used non-informative prior distributions for the unknown statistical parameters. Thus the Birge test may be interpreted also as a Bayesian test of the hypothesis of consistency. The Birge test of consistency was developed for those interlaboratory evaluations where the results are uncorrelated. We present a general test of consistency for both correlated and uncorrelated results, of which the Birge test is a special case. Then we show that the classical p-value of the general test statistic under the null hypothesis of realizing a value equal to or larger than observed (realized) is equal to the Bayesian posterior probability of the null hypothesis based on non-informative prior distributions. The general test makes it possible to check consistency of correlated results from interlaboratory evaluations.
Pub Type: Journals
Bayesian hypothesis testing, generalized Birge test, generalized least squares, interlaboratory comparisons