This article sets out to examine foundational issues in data science including current challenges, basic research questions, and expected advances, as the basis for a new Data Science Research Program and associated Data Science Evaluation (DSE) series, introduced by the National Institute of Standards and Technology (NIST) in the fall of 2015. The DSE series aims to address logistical and evaluation design challenges while providing rigorous measurement methods and an emphasis on generalizability rather than domain and application-specific approaches. Toward that end, each year the DSE will consist of multiple research tracks and will encourage the application of tasks that span multiple tracks. The evaluations are intended to facilitate research efforts, collaboration, leverage shared infrastructure, and effectively address cross-cutting challenges faced by diverse data science communities. Multiple research tracks will be championed by members of the data science community with the goal of enabling rigorous comparison of approaches through common tasks, datasets, metrics, and shared research challenges. The tracks will measure several different data science technologies in a wide range of fields will address computing infrastructure, standards for an interoperability framework, and domain-specific examples. This article also summarizes lessons learned from the Data Science Evaluation Series Pre-Pilot that was held in fall of 2015.
Journal of Database Management
Data Science Evaluation Series, Data Science Standards, Data Science Metrics, Data Science Measurements