TRECVid Semantic Indexing of Video: A 6-Year Retrospective
George M. Awad, Cees Snoek, Alan Smeaton, Georges Quenot
Semantic indexing, or assigning semantic tags to video samples, is a key component for content-based access to video documents and collections. The Semantic Indexing task has been run at TRECVid from 2010 to 2015 with the support of NIST and the Quaero project. As with the previous High-Level Feature detection task which ran from 2002 to 2009, the semantic indexing task aims at evaluating methods and systems for detecting visual, auditory or multi-modal concepts in video shots. In addition to the main semantic indexing task, four secondary tasks were proposed namely the localization task, the concept pair task, the no annotation task, and the progress task. It attracted over 40 research teams during its running period. The task was conducted using a total of 1,400 hours of video data drawn from Internet Archive videos with Creative Commons licenses gathered by NIST. 200 hours of new test data was made available each year plus 200 more as development data in 2010. The number of target concepts to be detected started from 130 in 2010 and was extended to 346 in 2011. Both the increase in the volume of video data and in the number of target concepts favored the development of generic and scalable methods. Over 8 millions shots×concepts direct annotations plus over 20 millions indirect ones were produced by the participants and the Quaero project on a total of 800 hours of development data. Significant progress was accomplished during the period as this was accurately measured in the context of the progress task but also from some of the participants contrast experiments. This paper describes the data, protocol and metrics used for the main and the secondary tasks, the results obtained and the main approaches used by participants.
ITE Transactions on Media Technology and Applications