Optimizing feature representation for automated systematic review work prioritization.

Aaron M. Cohen

Optimizing feature representation for automated systematic review work prioritization.

Aaron M. Cohen

Medical Informatics and Clinical Epidemiology

Research output: Contribution to journal › Article › peer-review

62 Scopus citations

Abstract

Automated document classification can be a valuable tool for enhancing the efficiency of creating and updating systematic reviews (SRs) for evidence-based medicine. One way document classification can help is in performing work prioritization: given a set of documents, order them such that the most likely useful documents appear first. We evaluated several alternate classification feature systems including unigram, n-gram, MeSH, and natural language processing (NLP) feature sets for their usefulness on 15 SR tasks, using the area under the receiver operating curve as a measure of goodness. We also examined the impact of topic-specific training data compared to general SR inclusion data. The best feature set used a combination of n-gram and MeSH features. NLP-based features were not found to improve performance. Furthermore, topic-specific training data usually provides a significant performance gain over more general SR training.

Original language	English (US)
Pages (from-to)	121-125
Number of pages	5
Journal	AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
State	Published - 2008

ASJC Scopus subject areas

General Medicine

Cite this

@article{e6723bf2051843dc91e96d1595d45846,

title = "Optimizing feature representation for automated systematic review work prioritization.",

abstract = "Automated document classification can be a valuable tool for enhancing the efficiency of creating and updating systematic reviews (SRs) for evidence-based medicine. One way document classification can help is in performing work prioritization: given a set of documents, order them such that the most likely useful documents appear first. We evaluated several alternate classification feature systems including unigram, n-gram, MeSH, and natural language processing (NLP) feature sets for their usefulness on 15 SR tasks, using the area under the receiver operating curve as a measure of goodness. We also examined the impact of topic-specific training data compared to general SR inclusion data. The best feature set used a combination of n-gram and MeSH features. NLP-based features were not found to improve performance. Furthermore, topic-specific training data usually provides a significant performance gain over more general SR training.",

author = "Cohen, {Aaron M.}",

note = "Copyright: This record is sourced from MEDLINE{\textregistered}/PubMed{\textregistered}, a database of the U.S. National Library of Medicine",

year = "2008",

language = "English (US)",

pages = "121--125",

journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",

issn = "1559-4076",

publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - Optimizing feature representation for automated systematic review work prioritization.

AU - Cohen, Aaron M.

N1 - Copyright: This record is sourced from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

PY - 2008

Y1 - 2008

N2 - Automated document classification can be a valuable tool for enhancing the efficiency of creating and updating systematic reviews (SRs) for evidence-based medicine. One way document classification can help is in performing work prioritization: given a set of documents, order them such that the most likely useful documents appear first. We evaluated several alternate classification feature systems including unigram, n-gram, MeSH, and natural language processing (NLP) feature sets for their usefulness on 15 SR tasks, using the area under the receiver operating curve as a measure of goodness. We also examined the impact of topic-specific training data compared to general SR inclusion data. The best feature set used a combination of n-gram and MeSH features. NLP-based features were not found to improve performance. Furthermore, topic-specific training data usually provides a significant performance gain over more general SR training.

AB - Automated document classification can be a valuable tool for enhancing the efficiency of creating and updating systematic reviews (SRs) for evidence-based medicine. One way document classification can help is in performing work prioritization: given a set of documents, order them such that the most likely useful documents appear first. We evaluated several alternate classification feature systems including unigram, n-gram, MeSH, and natural language processing (NLP) feature sets for their usefulness on 15 SR tasks, using the area under the receiver operating curve as a measure of goodness. We also examined the impact of topic-specific training data compared to general SR inclusion data. The best feature set used a combination of n-gram and MeSH features. NLP-based features were not found to improve performance. Furthermore, topic-specific training data usually provides a significant performance gain over more general SR training.

UR - http://www.scopus.com/inward/record.url?scp=69549134271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69549134271&partnerID=8YFLogxK

M3 - Article

C2 - 18998798

AN - SCOPUS:69549134271

SN - 1559-4076

SP - 121

EP - 125

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

ER -

Optimizing feature representation for automated systematic review work prioritization.

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this