The U.S. National Institute of Standards and Technology (NIST) is developing a comprehensive set of standard test methods and associated performance metrics to quantify key capabilities of emergency response robots including ground, aerial (under 2kg), and aquatic systems.
(if you have problems registering, please email RobotTestMethods@nist.gov)
The U.S. National Institute of Standards and Technology (NIST) is developing a comprehensive set of standard test methods and associated performance metrics to quantify key capabilities of emergency response robots. These test methods address responder-defined requirements for robot mobility, manipulation, sensors, energy, communications, operator proficiency, logistics and safety for remotely operated ground, aerial (under 2kg), and aquatic systems. The objective is to facilitate quantitative comparisons of different robot models based on statistically significant robot capabilities data, captured within the standard test methods, to guide purchasing decisions and understand deployment capabilities. The test methods also support operator proficiency training and foster development and hardening of advanced mobile robot capabilities. The process used to develop these test methods relies heavily on robot competitions to refine proposed test apparatuses and response robot evaluation exercises in responder training facilities to validate the test methods. The resulting test methods are being standardized though the ASTM International Standards Committee on Homeland Security Applications; Operational Equipment; Robots (E54.08.01). This work has been predominantly sponsored by the Department of Homeland Security (DHS), Science and Technology Directorate, Office of Standards; with substantial support by the Department of Justice (DOJ); NIST’s Office of Law Enforcement Standards; Army Research Laboratory (ARMY-ARL); Joint Improvised Explosive Device Defeat Organization (JIEDDO); and Defense Advanced Research Projects Agency (DARPA). This work is conducted in collaboration with many more organizations world-wide, civilian and military, who help develop, validate, and use the resulting standards.
Emergency responders literally risk life and limb interacting with known hazards to protect the public. They typically wear only conventional personal protective equipment while manually dealing with a variety of extreme hazards for which remotely operated robots should be well suited. Examples include disabling or dismantling improvised explosive devices (pipes, packages, vehicle); searching for survivors in collapsed or compromised structures; investigating illicit border tunnels; establishing situational awareness during police actions; monitoring large scale industrial or transportation accidents; or assessing potential terrorist attacks using chemical, biological, or radiological sources. Responders want to “start remote and stay remote” when dealing with such hazards and need capable robotic systems that can be remotely operated from safe stand-off distances to provide situational awareness, negotiate complex environments, perform dexterous manipulation of objects, and many other tasks necessary to mitigate hazardous situations. Many responder organizations already own robots but have had difficulty deploying them effectively. New robots are promising advanced capabilities and easier operator interfaces, but it is hard for responders to sift through the marketing. Responders need quantitative ways to measure whether any given robot is capable and reliable enough to perform specific missions. They also need ways to train and measure operator proficiency to isolate deficiencies in equipment and/or improve very perishable operator skills.
Since 2001, a series of Presidential Policy Directives on National Preparedness have prompted increased funding for new and better technologies for emergency responders, including purchasing of response robots. The most recent 2011 Directive outlines the need for strengthening the security and resilience of the United States through systematic preparation for threats including acts of terrorism, pandemics, significant accidents, and catastrophic natural disasters. The Directive emphasizes three national preparedness principles: 1) an all-hazards approach, 2) a focus on capabilities, and 3) outcomes with rigorous assessments to measure and track progress in building and sustaining capabilities over time. This project applies all three principles specifically for response robots.
In 2005, the U.S. Department of Homeland Security, Science and Technology Directorate (DHS S&T) engaged in a multi-year partnership with the National Institute of Standards and Technology (NIST) to develop a comprehensive suite of standard test methods to quantify key capabilities of robots for emergency response and other hazardous applications. The resulting suite of DHS-NIST-ASTM International Standard Test Methods for Response Robots measures robot maneuvering, mobility, manipulation, sensing, endurance, radio communication, durability, reliability, logistics, and safety for remotely operated ground vehicles, aquatic vehicles, and small unmanned aerial systems in FAA Group I under 2 kg (4.4 lbs). The objective is to facilitate quantitative comparisons of different robot configurations based on statistically significant robot capabilities data captured within standard test methods to understand deployment capabilities, guide purchasing decisions, and support operator training with measures of proficiency.
This suite of test methods is being standardized through the ASTM International Standards Committee on Homeland Security Applications; Operational Equipment; Robots (E54.08.01) which includes equal representation of robot developers, emergency responders, and civilian/military test administrators. Robot developers benefit by using the standard test methods as tangible representations of operational requirements to understand mission needs, inspire innovations, make trade-off decisions, measure incremental improvements, and highlight break-through capabilities. Responders and soldiers benefit by using robot capabilities data captured within the standard test methods to guide purchasing decisions, support training, and measure operator proficiency. Fifteen standards have been adopted internationally and dozens more are being validated with associated apparatuses, procedures, and performance metrics. This suite of test methods addresses a range of robot sizes and capabilities including throwable robots for reconnaissance tasks, mobile manipulator robots for package size and vehicle-borne improvised explosive devices, and rapidly deployable aerial and aquatic systems.
FIGURE 1: The development cycle for DHS-NIST-ASTM International Standard Test Methods for Response Robots is a responder driven process for generating, validating, and standardizing test methods.
These standard test methods measure baseline robot/operator capabilities necessary to perform operational tasks defined by emergency responders, soldiers, and their respective organizations. No single test method describes a robot’s overall capabilities. But any user can select a set of standard test methods, 20 or more, to represent capabilities necessary to perform intended missions. If a robot/operator cannot complete the specified set of standard test methods, they will not likely be able to perform the operational tasks during deployments. Conversely, if a robot/operator can practice and demonstrate success (to statistical significance) across the specified set of representative test methods, it is much more likely that the robot/operator will be able to perform the associated operational tasks during deployments, even with the increased complexity of unknown environments.
Repeated testing within standard test methods followed by operational scenarios with embedded test apparatuses can provide training with inherent measures of proficiency prior to deployment. These standard test methods are designed with increasing levels of apparatus difficulty to ensure that all robots, and even novice operators, can be measured performing basic tasks. The test apparatuses provide incrementally more difficult settings to challenge even the most advanced robot capabilities. The key to developing a good standard test method is ensuring that the figurative measuring stick is long enough to capture performance at both ends of the available robotic capabilities spectrum, and that it separates performance results in between. These standard test methods also enable controlled introduction of environmental complexity such as darkness, terrains, confined spaces, etc. The overall suite is expanding to answer new mission requirements every year, and some test methods have already been updated to widen their scope for testing autonomous systems.
FIGURE 2: The suite of standard test apparatuses provide a breadth first approach toward testing. They rely on inexpensive materials that can be sourced around the world. They can be embedded into operational scenarios to provide measures of operator proficiency.
This document describes how to use these standard test methods to evaluate robots, specify and defend purchasing decisions, and train operators with measures of proficiency. Appendices reference related documents with complete listings of test method descriptions and captured robot capabilities data so they can be updated more often. The test method descriptions contain brief discussions of the purpose, metric, apparatus, and procedure. More comprehensive discussions are available within the standards published through ASTM International. Test methods in the prototyping stage are not included. The robot capabilities data includes only data sets captured during comprehensive testing events using 20 or more test methods so the trade-offs in competing capabilities are clear. Robot data sets are presented within the class of similarly sized and equipped robots defined by particular test event sponsors to convey the overall capabilities of each robot with general class of robots. Additional robot data collections are ongoing and plans are being made to make available all the robot data sets on the DHS website for System Assessment and Validation for Emergency Response (SAVER) so they may be updated regularly with the latest test results and disseminated.
What is a Standard Test Method?
The difference between a “standard test method” and a “standard equipment specification” is that standard test methods focus on measuring capabilities while imposing no design constraints or other specifications on the robotic system. This approach doesn’t inhibit innovation while robot developers work toward implementing and hardening solutions to sometimes competing requirements. Standard test methods are essentially just agreed upon ways to test robotic capabilities. So we are not developing a specification for a “standard robot” of any kind, like an equipment standard. Rather, we are developing a standard way to test remotely operated robotic systems.
Our consensus standards development process is being conducted within the ASTM International Standards Committee on Homeland Security Applications; Operational Equipment; Robots (E54.08.01) which includes equal representation of robot developers, emergency responders, and civilian/military test administrators. Standard test methods include detailed descriptions of the following:
How Do These Standard Test Methods Work?
There are only a few simple rules:
FIGURE 3: The suite of DHS-NIST-ASTM International Standard Test Methods for Response Robots provides a breadth first approach to testing in order to capture statistically significant performance in a rapid and repeatable way. This allows testing more often to ensure that system changes quantifiably improve overall performance.
Who Benefits From These Standard Test Methods?
For robot developers, standard test methods provide robot developers tangible representations of operational requirements to help understand mission needs, make trade-off decisions, inspire innovation, measure incremental improvements, highlight break-through capabilities, and harden new approaches. The test apparatuses can be used to practice and refine systems during development to help debug issues, to identify necessary improvements, and then to convey system capabilities to interested users.
For program managers, standard test methods clearly articulate program goals in terms of desired combinations of robot capabilities. They can encourage innovation and measure outcomes, which can be remotely monitored. Program deliverables can be tied to demonstration of capabilities within specified combinations of standard test methods (to statistical significance). Final evaluations can be conducted using embedded test apparatuses in operational scenarios.
For responders and soldiers, and their respective organizations, standard test methods provide objective and repeatable robot capabilities data. Users can trust the data captured at any participating standard test facility, no matter when or where in the world the testing was conducted. This helps inform and guide purchasing decisions by clearly indicating the range of available robot capabilities in any given test method, and the particular combinations of capabilities available in certain robot configurations. Responders and soldiers should no longer specify a series of “requirements” to guide a robot purchase, because all too often those requirements are competing with each other in the context of technical practicality, reliability, cost, etc. Over and over again this process has led either to disappointment, excessive cost, or both. Rather, responders and soldiers should make purchasing decisions by specifying available combinations of robotic “capabilities” as demonstrated within a suite of standard test methods. This process recognizes that robot developers have already made trade-off decisions in trying to implement functional and affordable systems while considering technical practicality, reliability, cost, etc.
How Are These Standard Test Methods Developed?
The process used to develop standard test methods begins with specific robot capability requirements defined by emergency responders and soldiers that could make their deployments more effective, efficient, or safe. Each requirement must have an associated metric as a way to measure the capability. Sometimes the metric is as simple as elapsed time. But we try not to make every test trail a race. Rather, we try to establish a task-based testing paradigm emphasizing statistically significant repetitions with time per task as a secondary measure of efficiency. Users can specify capability objectives and lower thresholds of performance below which will not be acceptable to communicate a range of acceptable performance that robot developers can use to make trade-of decisions. Where such robot requirements already exist, such as for some bomb squad applications, they may be used directly. Some responder communities, such as FEMA urban search and rescue teams, were solicited during the course of this project and have provided over 100 such requirements for 13 different robot categories.
The requirements are prioritized by responders and prototype test apparatuses are generated to isolate, repeatably test, and measure robot performance. Response robot evaluation exercises are hosted in responder training facilities to allow responders to validate the test methods and learn about emerging robotic capabilities. International robot competitions featuring the prototype test apparatuses are used to inspire innovation, leverage robot traffic (over 100 missions per competition), to refine apparatus designs. Robot competitions also support proliferation of the standard test methods for practice by encouraging benchmark comparisons for qualification. Once the apparatus is validated by responders and test administrators, it is balloted through the ASTM Standards Committee on Homeland Security Applications; Operational Equipment; Robots (E54.08.01).
FIGURE 4: The approach toward developing DHS-NIST-ASTM International Standard Test Methods for Response Robots includes iterative validation events on either side of the standards process shown down the center panel. The right panel shows outreach to robotic manufacturers and researchers in the form of practice events and robot competitions to inspire and guide robot development while gaining feedback on prototype test methods. The left panel shows outreach to responders and soldiers in the form of Response Robot Evaluation Exercises hosted at responder training facilities. This is where we validate the test methods and introduce embedded test apparatuses into operational scenarios.
Definition of a Response Robot (Ground, Aerial, Aquatic):
The suite of DHS-NIST-ASTM International Standard Test Methods for Response Robots was developed to measure the range of capabilities of response robots independent of robot size, and broadly applicable to a variety of missions intended by responders and soldiers. The working definition of a “response robot” is a remotely deployed device intended to perform operational tasks at operational tempos from safe operational stand-offs. It should provide remote situational awareness, a means to project operator intent through the equipped capabilities, and improve overall effectiveness of the mission while reducing risk to the operator. Key features include:
Response robots include ground vehicles up to about 500 kg (1100 lbs) or so; small unmanned aerial systems (sUAS) within the FAA Group I defined as under 2 kg (4.4 lbs), under 30 knots maximum forward airspeed, and harmless upon impact; aquatic systems including remotely operated vehicles for swift water rescue, inspection of bridges, ports, ship hulls, etc.
FIGURE 5: Over 100 robots have been tested to varying degrees of completeness across the roster of standard test methods. These are some examples of response robots that show the range of sizes of ground robots, some examples of vertical take-off and landing sUAS, and some small aquatic ROVs that are indicative of the robots targeted within this project.
Summary of This Standard Testing Approach:
The suite of standard test methods provides a rapid, quantitative, and comprehensive evaluation of remotely operated robots. Individual tests typically take less than an hour, except for certain endurance tests. So given typically 20-30 applicable test methods, a reliable robot can get through all the testing in less than a week. It is a purposefully breadth-first approach since robot capabilities data is short-lived as robot technologies and implementations change and mature. The resulting statistically significant capabilities data defines the overall characteristics of a given robot, and places that robot within context across its class of related robots. Robot configurations are typically subjected to 20-30 test methods chosen by a sponsor or procurer to capture baseline capabilities necessary for intended missions. The chosen combination of test methods help determine capability trade-offs, reliability, etc. “Expert” operators provided by the developer perform the tests for the standards process to capture the best possible performance for comparison. Robot developers should not promise any better performance. The operator controls the robot from a remote location, typically out of sight and sound of the test apparatus but within radio communications range (except for the radio comms test methods), to maintain total reliance on their system interface at all times. Test trials include between 10-30 repetitions to achieve statistical significance (80% reliability with 80% confidence). Interactions with incapacitated robots during test trials are allowed to reset the robot to the start point or to make minor repairs with no spare parts. Up to three interactions are allowed within a thirty repetition trial to maintain statistical significance. Every interaction is documented in Field Maintenance & Repair forms to identify indications of issues, remedies implemented, and tools used throughout the testing process. Testing events typically take less than one week and include some operational tasks with embedded standard apparatuses to leverage and extend the challenges imposed. Testing may take place at any robot test facility housing the suite of standard test methods, or at Response Robot Evaluation Exercises typically administered by NIST. Robot testing may also be conducted at events where the robots and user communities are typically assembled such as conferences or regional training events.
New test method requirements can come from any source: responder, robot developer, procurement sponsor, program manager, or other source. For a recent procurement process, NIST fabricated new draft standard test methods for a class of ultra lightweight reconnaissance robots under 10 kg (22 lbs). The sponsor had a specific requirement for durability with an emphasis on throwing the robot during deployment, possibly over a wall, onto a roof, or simply past some obstacles. NIST prototyped the Throw Distance test method to measure the down-range distance a robot could be thrown over a 2.4m (8ft) wall. After each throw, reconnaissance tasks ensured that the robot remained functional prior to the next throw. These tasks included driving the nearest circular line on the ground (control/latency), identifying hazmat labels on a barrel placed at the center of the circle (camera pointing and visual acuity), and listening to audible random numbers played within the barrel (audio acuity and 2-way communications, if equipped). As with the entire suite of standard test methods, other operational targets can be used for robots equipped to detect explosives, radiation sources, hazardous chemicals, etc. But this test was sufficient for the roster of robots tested. The test method validation process started immediately within the prototype apparatus. The developers all learned about their systems as they reluctantly began to throw (or haltingly toss) their robots over the wall. The engineers on the teams considered how to soften impact to survive 10 or more sequential repetitions. Some changed their wheel designs, sprocket designs, and/or materials. One developer used a more sophisticated “flight” behavior to maintain heading and orientation -- a real innovation.
FIGURE 6: A) The draft standard test apparatus Throw Distance includes a 2.4m (8ft) tall wall to throw over, an adjoining remote control station to limit sight lines down-range, and landing locations with 4m (13ft) diameter circular lines for the robot to follow. B and C) Robots may be thrown over the wall in any manor with a two-step approach while staying on the 2.4m (8ft) OSB panel on the ground. D) Hazmat targets are placed at the center of the circle for the robot to identify to demonstrate functionality after each throw. Colored discs on the ground mark the landing locations for each trial which can add up to 30 repetitions.
Ultimately this process worked for robot developers, procurement sponsors, and the end users as the final robots delivered were clearly more capable and reliable than the initial set tested. Without this process and the design iterations and revisions it inspired, they almost certainly would have failed at some point in the hands of the end users in the field.
Outcomes: Response Robot Capabilities Compendium and Collaborating Test Facilities
There are several intangible outcomes from this process as well. For example, the standard test methods clarify communications and expectations between responders and robot developers. The physical test apparatuses and agreed upon metrics help convey mission requirements to robot developers while refining expectations of capabilities for responders. And, of course, they help measure the results.
However, the main outcome from this suite of standard test methods is a growing Response Robot Capabilities Compendium, which is a database of test results and associated bar charts describing the various baseline capabilities of tested robot configurations -- almost like robot DNA where no two are exactly similar. Robot data generated by any participating standard test facility can be included in the capabilities compendium and compared no matter where or when the testing occurred (e.g. U.S., Germany, Japan, etc.). Bar graphs for each individual test method show the capability of each robot configuration relative to the class of robots within that test method. Some missions may require “best-in-class” performance for a particular capability, while allowing average performance in other capabilities. In general, the capabilities compendium and the bar charts help inform responders, soldiers, and their respective organizations about the trade-offs of capabilities currently available, and begin to align expectations regarding what the robots can do during deployments.
FIGURE 7: Robot data in the form of bar charts provide easy ways to compare different robot models within a range of applicable test methods. Any end user can decide which test methods are important for their intended missions, and focus on the robots that demonstrate the right combination of capabilities to take a closer look. The variety of mobility terrains are shown here as an example, but every sub-suite of standard test methods produces similar bar charts across all robots tested to statistical significance.
Currently, all robot capabilities data has been captured either at NIST or at a NIST hosted Response Robot Evaluation Exercise. The data was collected primarily to support the standardization process, but it has already proven useful for guiding several robot procurements. In 2013, NIST opened a new robot test facility on its main campus in Gaithersburg, MD. The nearly 1,000 square meter (10,000 square foot) facility will support continued development of the standards and maintain calibration experiments with a growing roster of collaborating test facilities in the U.S. and internationally. Many other locations host particular subsets of the standard test methods to support robotic development, program management, or procurement needs.:
FIGURE 8: NIST’s new Robot Test Facility on the main campus in Gaithersburg, MD contains all the test methods and prototypes being validated.