Abstract
Objectives: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed tomake this process more efficient via a hybrid approach using both crowdsourcing andML. Methods: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. Results: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. Conclusions: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.
Original language | English (US) |
---|---|
Pages (from-to) | 1165-1168 |
Number of pages | 4 |
Journal | Journal of the American Medical Informatics Association |
Volume | 24 |
Issue number | 6 |
DOIs | |
State | Published - Nov 1 2017 |
Keywords
- Crowdsourcing
- Evidence-based medicine
- Human computation
- Machine learning
- Natural language processing
ASJC Scopus subject areas
- Health Informatics