Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Taking Measure

Just a Standard Blog

Information Retrieval: How NIST Helps You Find That Video Online

YouTube like interface on a laptop screen. The text on the screen reads "my videos" and there are several tiles displaying various cat videos below.
Credit: Georgejmclittle/

The need for tools to help everyday users access the information they are looking for on the internet is getting more and more urgent due to the massive amount of data generated every day, hour and minute! This information can be in different formats, such as text (e.g., the latest news or answers to questions), audio (e.g., soundtracks on SoundCloud), images (e.g., pictures for a PowerPoint presentation) and video (e.g., movies, tutorials about how to make or fix stuff, cats).

For example, on YouTube each day about 1 billion hours of video are being watched, and every month more than 2 billion users log onto the service. How can YouTube make those billions of users happy by delivering them the videos they are looking for? This is where a major research field called information retrieval (IR) comes into play.

Information retrieval is the science of searching for information in a document or searching for the documents themselves. Here, “document” is a general term that can be used to describe text, image, video or sound data. Examples of well-known IR systems are the Google search engine or Microsoft Bing. Every day, billions of users go to those search engines to seek the information that they need. Sometimes they get lucky and find whatever they were looking for easily and quickly, sometimes it takes more time and effort, and sometimes they never find a satisfactory answer.

Although there has been a lot of progress in the performance of search engines lately, it’s still the case that all search engines mainly depend on the information the person uploading the website, audio track, video clip or image provides to the search engine, such as title, tags, description, keywords, etc. This leads to the question: What if the user is searching for a piece of information the uploader didn’t give? In this situation it becomes very difficult for the search engine to satisfy the user, and that is why IR research is important.

In fact, research into how to compare two search engines’ results and how to promote progress in this field began in the mid-20th century. However, at that time the lack of large datasets to test search engines on was a big challenge. How can someone test the quality of a search engine if the set of documents to search on is very small? What are suitable measures to assess the performance of search engines and their results? How do you systematically test two or more search engines on the same exact user queries?

All these questions resulted in an initiative that was started in 1991 by the Text Retrieval Conference (TREC) at the National Institute of Standards and Technology (NIST) to address the need for a more coordinated evaluation of search engines among researchers. Initially, TREC designed evaluation campaigns by distributing standard datasets, sample user queries, answers to those queries, and agreed-upon measures to give standard scores to results found by systems trying to answer those queries.

Those evaluation campaigns targeted text retrieval research across different task scenarios, such as a user searching for random information that a search engine had not encountered before, a user looking for an answer to a specific question, and the retrieval of information written in a language different from the language of the user's query, all while adopting different data domains such as blog data, enterprise data, genomic data, legal data, spam data and others.

In 2001, a new TREC evaluation task started to use video data, and in 2003 it became an independent international evaluation benchmark called TRECVID. The motivation was an interest at NIST, industry and academia in expanding the notion of “information” in IR beyond text exposing the difficulties of comparing research result in video retrieval because there was no common basis (data, tasks, measures) for scientific comparison.

NIST and TRECVID’s goals of promoting research and enabling progress in video retrieval by developing measurement methods for benchmarks of performance reflected the relatively young nature of the field. TRECVID from the start has followed the same annual agenda:

  • Acquire data and distribute it to participating researchers.
  • Formulate a set of search queries and release these to researchers.
  • Allow about one month before accepting results of the top ranked video clips per search query.
  • Pool results from all researchers’ outputs to eliminate duplicates, and use human assessors to judge if each result is valid or not valid.
  • Calculate measures for submitted results, such as how many correct results the system was able to retrieve from the actual answer set and how many of the retrieved results were correct, and distribute scores back to researchers.
  • Host a workshop at NIST in November.
  • Make plans and repeat the process annually until we solve the search problem in the video domain!

During the workshop, researchers and task coordinators discuss which methods and techniques work or do not work and why. Moreover, after the annual evaluation cycle finishes, all data used during the previous year is made public (after obtaining agreement from the data vendors, of course) to be used again in sequential years by TRECVID participants or any entity that would like to evaluate its own search engines.

It turns out that running evaluation campaigns such as TRECVID provides a lot of benefits to the community, such as securing, preparing and distributing data, which used to be difficult to get in the early years and still is difficult in some cases despite the abundance of datasets these days. The participants can then use the same data, the same agreed metrics for evaluation and the same answer sheet for measurement, and this allows direct comparisons across and within groups.

The collaborations among researchers, such as sharing useful resources, also foster a community and make it easier to break into what is a new area for many people. All this helps to improve overall performance of the tasks being benchmarked. By following the published guidelines for evaluation, either within or outside a formal evaluation campaign, a research group can perform direct comparisons with the work of others and know that its test methodology is valid and acceptable.

Good performance results are often cited in response to research funding solicitations and form the basis for promoting research through publications. In fact, some participants in TRECVID used their research output as the basis for start-up companies. The different research groups usually learn from each other since they are working on the same problems and data and using the same measures, etc. Approaches that seem to work in one system can be incorporated into other systems and tested to see if they still work. This way, groups just getting started can reach better performance faster.

This year, TRECVID will turn 20 years old. I hope that the next time you search for a video, you remember all the work that has been done by researchers to make sure you get what you are looking for quickly and accurately.

About the author

George Awad

George Awad has been supporting the TRECVID project at NIST since 2007. He has a Ph.D. in computer science from Dublin City University (DCU), Ireland. His current main research activities includes evaluating video search engines using real-world datasets. He previously co-organized international tutorials and workshops and has several publications in international conferences and academic journals. He jointly received the 57th Niwa-Takayanagi Best Paper Prize awarded by the Institute of Image Information and Television Engineers (ITE) and the 2018 IEEE Computer Society PAMI Mark Everingham Prize.

Related posts


Great answer.

Add new comment

Enter the characters shown in the image.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Please be respectful when posting comments. We will post all comments without editing as long as they are appropriate for a public, family friendly website, are on topic and do not contain profanity, personal attacks, misleading or false information/accusations or promote specific commercial products, services or organizations. Comments that violate our comment policy or include links to non-government organizations/web pages will not be posted.