Designing and implementing a performance evaluation of an emerging technology to present a broad picture of how this technology would perform in its typical operating environment is a very challenging goal. Personnel from the National Institute of Standards and Technology (NIST) have developed the System, Component, and Operationally-Relevant Evaluation (SCORE) framework as formal guide for designing technology evaluations to capture both technical performance and end-user utility assessments of systems and their components within controlled and realistic environments to present an extensive (but not necessarily exhaustive) picture of how a system would behave in a set of realistic use-case domains. The framework has been applied to numerous evaluation efforts over the past three years producing valuable quantitative and qualitative metrics. This paper will present the building blocks of the SCORE methodology including the system goals and design criteria that drive the evaluation design process. An evolution of the SCORE framework in capturing utility assessments at the capability level of a system will also be presented. Examples will be shown of SCOREs successful application to the evaluation of the soldier-worn sensor systems and two-way, free-form spoken language translation technologies.
Proceedings Title: Proceedings of the 2008 Performance Metrics for Intelligent Systems (PerMIS) Workshop
Conference Dates: August 19-21, 2008
Conference Location: Gaithersburg, MD
Pub Type: Conferences
SCORE, DARPA, ASSIST, TRANSTAC, performance evaluation, elemental tests, vignette tests, task tests, speech translation, soldier-worn sensor