The goal of the NIST IAD Data Science Evaluation (DSE) Series is to contribute to research efforts and calibration of technical capabilities for data analytics of big and small, homogeneous and heterogeneous, structured and unstructured, and complete and incomplete data. The overarching objective of the evaluations is to drive technology forward, to measure the state-of-the-art, to develop effective measurement techniques, and to encourage promising algorithmic approaches. The DSE provides a framework to evaluate different data analytic algorithms in a general setting, allowing for the ability to measure the effectiveness of data analytic algorithms written for one domain when applied to problems in multiple domains.
The DSE aims to address logistical and evaluation design challenges while providing rigorous measurement methods and an emphasis on generalizability rather than domain and application-specific approaches. Toward that end, the DSE will consist of multiple research tracks and will encourage the application of tasks that span multiple tracks across multiple domains.
In the Fall of 2015, NIST hosted a small-scale pre-pilot evaluation in the traffic domain, meant to exercise a range of evaluation metrics and measurement methods. This Pre-Pilot concluded with the DSE Workshop, where the DSE was launched and pre-pilot results were discussed. The DSE continued in 2016, when NIST followed with a Pilot Evaluation in the same traffic domain. Pilot evaluations are often run prior to establishing a new large-scale evaluation effort as they serve the following purposes: (1) allow researchers and evaluators to fine-tune the evaluation paradigm and protocols in the context of an actual evaluation environment; (2) gauge interest or develop research communities in complementary tasks; (3) establish baseline performance; (4) expose issues on a smaller scale prior to large-scale data processing efforts; and (5) measure the impact of representational choices on algorithms and efficiency.
Since the Pilot Evaluation has concluded, current work is focused on generalizing the DSE to evaluate data analytic algorithms applied to problems in a variety of domains, expanding to use cases other than vehicle traffic. Additionally, to facilitate participation, the plan is to move the DSE from an annual evaluation cycle to a continuous evaluation process, using a leaderboard to display and encourage progress throughout the evaluation. Hence, rather than having an annual evaluation cycle, the evaluation will start as a continuous process, encouraging submissions.
The Pilot Evaluation is a refinement of the NIST Pre-Pilot evaluation and was held in 2016. The Pilot Evaluation continued with the vehicle traffic use case, and refine the pre-pilot. Participation in the Pilot Evaluation was free and open to the public. Registration opened on July 5, 2016 but is now closed since the pilot evaluation has concluded.
The Pilot evaluation piloted the evaluation of four data analytic tasks: cleaning, alignment, prediction, and forecasting, with all tasks designed for data in the vehicle traffic domain.
The description of the Pilot Evaluation, including a description of the data, tasks, submission process, and metrics, is in the DSE Pilot Evaluation Plan.
IAD hosted a DSE Workshop at NIST on March 17–18 2016 to launch the DSE. See the linked page for more information and slide decks.
In 2015–2016 NIST ran a pre-pilot of the DSE, using the vehicle traffic domain as a use case. The pre-pilot consisted of data and tasks set in the traffic domain—a domain chosen due to its relevance to everyday life of the general public and its accessibility and availability of large amounts of public data. The purpose of the pre-pilot was to exercise evaluation of data analytic algorithms. It is important to note that the pre-pilot was not meant to solve any particular problem in the traffic domain, but rather to serve as an exemplar of a data science evaluation track. The objective is for the developed measurement methods and techniques to apply to additional use cases, regardless of the domain and data characteristics.