Why batch and user evaluations do not give the same results

A. H. Turpin, W. Hersh

Research output: Contribution to journalConference articlepeer-review

107 Scopus citations


Much system-oriented evaluation of information retrieval systems has used the Cranfield approach based upon queries run against test collections in a batch mode. Some researchers have questioned whether this approach can be applied to the real world, but little data exists for or against that assertion. We have studied this question in the context of the TREC Interactive Track. Previous results demonstrated that improved performance as measured by relevance-based metrics in batch studies did not correspond with the results of outcomes based on real user searching tasks. The experiments in this paper analyzed those results to determine why this occurred. Our assessment showed that while the queries entered by real users into systems yielding better results in batch studies gave comparable gains in ranking of relevant documents for those users, they did not translate into better performance on specific tasks. This was most likely due to users being able to adequately find and utilize relevant documents ranked further down the output list.

Original languageEnglish (US)
Pages (from-to)225-231
Number of pages7
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
StatePublished - 2001
Externally publishedYes
Event24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - New Orleans, LA, United States
Duration: Sep 9 2001Sep 13 2001


  • Information retrieval evaluation
  • Interactive retrieval
  • Text Retrieval Conference (TREC)

ASJC Scopus subject areas

  • Management Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Why batch and user evaluations do not give the same results'. Together they form a unique fingerprint.

Cite this