Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highlyrelevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation,assessors for the TREC-9 web track used a three-point relevance scale and also selected best pages for each topic. The relative effectiveness of runs evaluated by different relevant document sets differed, confirming the hypothesis that different retrieval techniques work better for retrieving highly relevant documents. Yet evaluating by highly relevant documents can be unstable since there are relatively few highly relevant documents. TREC assessors frequently disagreed in their selection of the best page, and subsequent evaluation by best page across different assessors varied widely. The discounted cumulative gain measure introduced by Jarvelin and Kekalainen increases evaluation stability by incorporating all relevance judgments while still giving precedence to highly relevant documents.
ACM Special Interest Group in Information Retrieval (SIGIR)
search engine evaluation, text retrieval evaluation
Evaluation by Highly Relevant Documents, ACM Special Interest Group in Information Retrieval (SIGIR)
(Accessed February 25, 2024)