Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol


Rangachar Kasturi, D Goldgof, K Soundararajan, V Manohar, M Boonstra, V Korzhova, J Zhang, Rachel J. Bowers, John S. Garofolo


Common benchmark datasets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the raw data, ground truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches for computer vision as well as statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, to provide the research community with sufficient data for the exploration of automatic modeling techniques, to encourage the incorporation of objective evaluation into the development process, and to provide useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
IEEE Transactions on Pattern Analysis and Machine Intelligence


baseline algorithms, computer vision, evaluation framework, face, text, and vehicle detection and tracking, homeland security, information/knowledge management, performance assessment, video extraction


Kasturi, R. , Goldgof, D. , Soundararajan, K. , Manohar, V. , Boonstra, M. , Korzhova, V. , Zhang, J. , Bowers, R. and Garofolo, J. (1970), Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol, IEEE Transactions on Pattern Analysis and Machine Intelligence (Accessed April 14, 2024)
Created August 26, 2016, Updated January 27, 2020