Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Spreadsheets and Good Statistics: A Tale of Love and Hatred

When statistical rigor and common practice meet, paradigms clash. Statistical computing, the use of spreadsheets, and in particular the use of Microsoft Office Excel, frequently arouse disagreement for the sake of correctness on the one side and ease of use on the other. On the one hand, spreadsheets can be credited with giving the masses the ability to store, analyze, visualize, and even model data. I believe that a very large majority of all statistical work carried out in the world, and in particular in industry, is done exclusively using spreadsheets, most of it in Excel. The reason for this popularity is that they provide intuitive, flexible, and comprehensive ways for arranging data, looking at them, performing calculations, building interfaces for day-to-day use, and even for distributing applications within a company or organization. On the other hand, it is well-known that many spreadsheets and in particular Excel have numerical flaws which render them unacceptable to statisticians concerned with numerical correctness and statistical rigor. Moreover, the way spreadsheets are constructed and used leads towards data analytic "video gaming" and does not provide a foundation for reproducible research. This talk is about reconciling the world of good statistics with the use of spreadsheets. It starts with a reminder of what any practicing statistician should know about the spreadsheet paradigm (automatic recalculation, tabulation of expressions, cross tabulation, optimization, and integrated interface design). It then continues by describing a way of extending the ability of Excel to carry out "good' statistical computations by adding a bridge to the R language. Statisticians need not hate spreadsheets as long as they know how to use what's good and how to avoid what's bad. I argue that statisticians should even learn to love spreadsheets for their unique potential for giving statistical power to the people.

Christian Ritter
Institut de Statistique
Universit Catholique de Louvain, Belgium

Ritter and Danielson Consulting
Brussels, Belgium

Created September 8, 2010, Updated May 13, 2016