SCORE (System, Component and Operationally Relevant Evaluations) is a unified set of criteria and software tools for defining a performance evaluation approach for complex intelligent systems. It provides a comprehensive evaluation blueprint that assesses the technical performance of a system and its components through isolating and changing variables as well as capturing end-user utility of the system in realistic use-case environments.
The SCORE framework has proven to be widely-applicable in nature and equally relevant to technologies ranging from manufacturing to military systems. It has been applied to the evaluation of technologies in DARPA programs that range from soldier-worn sensor on patrol to speech-to-speech translation systems. It is also currently being applied to the assessing the control of autonomous vehicles on a shop floor.
Intelligent systems tend to be complex and non-deterministic, involving numerous components that are jointly working together to accomplish some overall goal. Existing approaches to measuring such systems often focus on evaluating the system as a whole or individually evaluating some of the individual components under very controlled, but limited, conditions. These approaches do not comprehensively and quantitatively assess the impact of variables such as environmental variables (e.g., lighting, external distances) and system variables (e.g., processing power, memory size) on the system’s overall performance. Through its comprehensive evaluation criteria and software tools, the SCORE framework has greatly enhanced the ability to quantitatively and qualitatively evaluate intelligent systems at the component level −and the system level− in operationally relevant environments.
SCORE is unique in that:
SCORE was initially applied to intelligent systems developed under the DARPA (Defense Advanced Research Projects Agency) ASSIST and TRANSTAC program, involving eight evaluations (involving over 60 personnel at each evaluation) assessing the performance of technologies developed by twelve independent research teams. The SCORE-based evaluations also provided the researchers and end users with the information that they needed to determine if and when the technology will be ready to be put to use. SCORE allowed developers to identify the various key components of the system and evaluate them both independently and as a whole, thus helping to determine the impact of the individual components on the performance of the overall system. This detailed analysis allows one to more accurately target the aspects of the systems that were shown to provide the greatest benefit to the overall advancement of the technology and therefore helped to identify where the program funding should be applied to get the most “bang for the buck.”
The Advanced Soldier Sensor Information Systems and Technology (ASSIST) program is an advanced technology research and development program whose objective is to exploit soldier-worn sensors to augment a Soldier’s mission recall and reporting capability to enhance situational knowledge within Military Operations in Urban Terrain (MOUT) environments.
The ASSIST evaluation design was driven by the two programmatic metrics laid out by DARPA:
Guided by the SCORE framework, numerous test types were developed to satisfy the above metrics. Elemental tests was used to measure technical performance at both the component and system levels. Vignette tests were created to perform System Level Testing- Utility Assessment. Task tests were generated to perform Capabilities Level Testing- Utility Assessments.
Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) is a DARPA advanced technology and research program whose goal is to demonstrate capabilities to rapidly develop and field free-form, two-way speech-to-speech translation systems enabling English and foreign language speakers to communicate with one another in real-world tactical situations where an interpreter is unavailable. Technical performance evaluations are intended to support quantitative assessment in the performance of TRANSTAC technologies.
The TRANSTAC evaluation design was driven by the two programmatic metrics laid out by DARPA:
The SCORE framework was implemented to address these metrics in the form of numerous evaluation types. Offline Evaluations were developed to measure the technical performance at the component level. Both Main Live Evaluations and Utility-Lab Evaluations were developed to measure System Level Testing at both Technical Performance and Utility Assessment level. The Names Evaluation was developed to measure both the Technical Performance at the Component Level testing and the Utility Assessment at the Capability Level testing. Lastly, Utility-Field evaluations are specifically design to measure Utility Assessment at the System Level.
The impact of this work has been far-reaching and substantial. This can be seen by:
Lead Organizational Unit:el
Related Programs and Projects:
100 Bureau Drive, M/S 8230