Overview of the TREC-2012 Microblog Track

Published: June 02, 2014


Ian M. Soboroff, Iadh Ounis, Jimmy Lin, Craig Macdonald


The Microblog track examines search tasks and evaluation meth- odologies for information seeking behaviours in microblogging en- vironments such as Twitter. It was first introduced in 2011, address- ing a real-time adhoc search task, whereby the user wishes to see the most recent but relevant information to the query. In 2012, the real-time adhoc task was changed slightly, and a new filtering task was added. The filtering task models a standing query where the user wants to see relevant tweets as they occur in the future. For the second year of the track, we reused the Tweets11 corpus described below. The corpus is comprised of 16M tweets spread over 2 weeks, sampled courtesy of Twitter. The corpus is designed to be a reusable, representative sample of the twittersphere – i.e. both important and spam tweets are included. As the reusability of a test collection is paramount in a TREC track, this sample can be obtained at any point in time. To accomplish this, the TREC Microblog track has introduced a novel methodology, whereby par- ticipants sign an agreement for the ids of the tweets in the corpus. Tools are then provided that permit TREC participants to download the corpus directly from the Twitter website. The first Microblog track in TREC 2011 was a remarkable suc- cess. In 2012, 40 groups participated in the track, with 33 groups submitting a total of 121 runs for the real-time adhoc task, and 19 groups submitting a total of 60 runs for the filtering task.
