We provide datasets with certified values for key statistics to assess the accuracy of ANOVA calculations in statistical software. The computational inacurracy may come from three sources for our datasets: the truncation error, the cancellation error, and the accumulation error.

The truncation error is the inexact binary representation error in storing decimal numbers according to the IEEE standard arithmetic. The cancellation error is caused by the so called "stiffness" of a dataset. In "Assessing the Accuracy of ANOVA Calculations in Statistical Software" (*Computational Statistics & Data Analysis* 8 (1989), pp 325-332), Simon and Lesage noted that as the number of constant leading digits of the observations in a dataset increases, the data grows more nearly constant (i.e., the stiffness increases), and accurate computation of sum of squares becomes more difficult because subtracting treatment means for the overall mean from the data produces large cancellation error. They also noted that, for the accumulation error, as the number of observations within a cell is increased, the total number of required arithmetic computations is increased. This increases the accumulation of small errors, making accurate computations difficult.

We include both generated and "real-world" data to allow computational accuracy to be examined at different stiffness levels and different accumulation error levels. Using the benchmark of Simon and Lesage (1989), our generated datasets are designed to have the number of constant leading digits to be 1, 7, and 13, and to have the number of observations per cell to be 21, 201, and 2001. Real-world data from our statistical consulting work at NIST includes a dataset with 7 constant leading digits, AtmWtAg, and a dataset with 3 constant leading digits, SiRstv.

Datasets are ordered by level of difficulty (lower, average, and higher) according to their stiffness - the number of constant leading digits. In the lower difficulty group, we have the SiRstv data with 3 constant leading digits and the three generated datasets with 1 constant leading digit. In the average difficulty group, we have the AtmWtAg data with 7 constant leading digits and the three generated datasets with 7 constant leading digits. In the higher difficulty group, we have the three generated datasets with 13 constant leading digits. This ordering is simply meant to provide rough guidance for the user. Producing correct results on a dataset of higher difficulty does not imply that your software will correctly solve all datasets of average or even lower difficulty.

For all datasets, multiple precision calculations (accurate to 500 digits) were made using the preprocessor and FORTRAN subroutine package of Bailey (1995, available from NETLIB). Data were read in exactly as multiple precision numbers and all calculations were made with this very high precision. The results were output in multiple precision, and only then rounded to fifteen significant digits. These multiple precision results are an idealization. They represent what would be achieved if calculations were made without roundoff or other errors. Any typical numerical algorithm (i.e., not implemented in multiple precision) will introduce computational inaccuracies, and will produce results which differ slightly from these certified values.

To improve the computational accuracy to a dataset, one remedial measure is to subtract the leading constant from all the observations in that dataset before analyzing it. For example, for dataset SmLs09 subtract 1e13 from all the observations before the analysis.

As noted in the General Background Information producing correct results for all datasets in this collection does not imply that your software will do the same for your own particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software.

We plan to update this collection of datasets in the future, and welcome your feedback on specific datasets to include, and on other ways to improve this web service.

Created August 15, 2018