Real-time three-dimensional vision has been rapidly advancing over the past twenty years, leading to a number of successful laboratory demonstrations, including real-time visual servoing, autonomous vehicle navigation , and real-time people and vehicle tracking . However, the advances have frequently not yet made the transition to commercial products, due in part to a lack of objective methods for empirical performance evaluation. To ensure a new algorithm or sensor system is reliable and accurate enough for commercial application in a safety or performance critical environment, the new system must be tested again rigorous standards and benchmarks. In several areas of computer vision, there are well-established benchmarks for static problems where the solution need not run in real time, as well as challenge problems for real-time processing. The Middlebury Stereo Dataset and the NIST Face Recognition Grand Challenge have succeeded well in advancing their respective research algorithms by providing well-defined challenge tasks with ground truth and evaluation metrics. The success of these efforts led to workshop series such as BenCOS (Benchmarking Automated Calibration, Orientation and Surface Reconstruction from Images) and CLEAR (Classification of Events, Activities and Relationships) in video tracking. The DARPA Grand Challenge series for autonomous vehicles demonstrated that clear and well-motivated benchmark tasks could lead a research community to assemble disparate, incomplete solutions into successful full solutions in a few years. Despite these successes, there are a number of computer vision tasks for which well-defined benchmarks do not yet exist, or do not exist to the level of commercially required precision and robustness. Of interest in this article is dynamic pose estimation under complex environmental conditions tracking an object's pose and position as it moves in an environment with uncontrolled lighting and background. This is a central task in robotic perception, and a robust, highly accurate solution would be of use in a number of applications. Creating a standard benchmark for this task would help advance the field, but is difficult because the number of variables is very large and the development of ground truth data is complex. The variables in such a task include the size and shape of the object, the speed and nature of the object motion, the complexity and motion of background objects, the lighting conditions, among other elements. The PerMIS 2008 Special Session on "Performance Metrics for Perception in Intelligent Manufacturing," held August 20, 2008, brought together academic, industrial and governmental researchers interested in calibrating and benchmarking vision and metrology systems. The papers each addressed an individual problem of interest in its own right, but taken together the papers also fit into a framework for the benchmarking of complex perception tasks like dynamic pose estimation. A general framework for the evaluation of perception algorithms includes three steps: 1) Model the conditions and requirements of a task to best understand how to test sensor systems for that task. 2) Design and calibrate a standard measurement system to gather system data and ground truth under controlled conditions. 3) Establish metrics for evaluating the performance of the system under test to the ground truth. This chapter collected the PerMIS special session papers in summary form which fit well into this framework.
Citation: Performance Evaluation and Benchmarking of Intelligent Systems
Publisher Info: Springer, Norwell, MA
Pub Type: Book Chapters
Laser Tracker, Performance Evaluation, large-scale metrology, laser tracker, performance evaluation, sensitivity analysis, LADAR camera, Super Resolution, 6DOF, human perception experiment, target identification, robotics applications