TY - JOUR
T1 - Tasks, topics and relevance judging for the TREC Genomics Track
T2 - five years of experience evaluating biomedical text information retrieval systems
AU - Roberts, Phoebe M.
AU - Cohen, Aaron M.
AU - Hersh, William R.
N1 - Funding Information:
Acknowledgements The TREC Genomics Track was funded by grant ITR-0325160 to W.R.H. from the U.S. National Science Foundation. The authors would like to thank the Genomics track steering committee, especially Kevin Bretonnel Cohen and Anna Divoli, for helpful discussions about relevance judgments and guidelines.
PY - 2009/2
Y1 - 2009/2
N2 - With the help of a team of expert biologist judges, the TREC Genomics track has generated four large sets of "gold standard" test collections, comprised of over a hundred unique topics, two kinds of ad hoc retrieval tasks, and their corresponding relevance judgments. Over the years of the track, increasingly complex tasks necessitated the creation of judging tools and training guidelines to accommodate teams of part-time short-term workers from a variety of specialized biological scientific backgrounds, and to address consistency and reproducibility of the assessment process. Important lessons were learned about factors that influenced the utility of the test collections including topic design, annotations provided by judges, methods used for identifying and training judges, and providing a central moderator "meta-judge".
AB - With the help of a team of expert biologist judges, the TREC Genomics track has generated four large sets of "gold standard" test collections, comprised of over a hundred unique topics, two kinds of ad hoc retrieval tasks, and their corresponding relevance judgments. Over the years of the track, increasingly complex tasks necessitated the creation of judging tools and training guidelines to accommodate teams of part-time short-term workers from a variety of specialized biological scientific backgrounds, and to address consistency and reproducibility of the assessment process. Important lessons were learned about factors that influenced the utility of the test collections including topic design, annotations provided by judges, methods used for identifying and training judges, and providing a central moderator "meta-judge".
KW - Evaluation
KW - Information retrieval
KW - Inter-annotator agreement
KW - Reference standards
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=58149218476&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58149218476&partnerID=8YFLogxK
U2 - 10.1007/s10791-008-9072-x
DO - 10.1007/s10791-008-9072-x
M3 - Article
AN - SCOPUS:58149218476
SN - 1386-4564
VL - 12
SP - 81
EP - 97
JO - Information Retrieval
JF - Information Retrieval
IS - 1
ER -