UIC/OHSU CLEF 2018 Task 2 diagnostic test accuracy ranking using publication type cluster similarity measures

Aaron Cohen; Neil R. Smalheiser

UIC/OHSU CLEF 2018 Task 2 diagnostic test accuracy ranking using publication type cluster similarity measures

Aaron Cohen, Neil R. Smalheiser

Medical Informatics and Clinical Epidemiology

Research output: Contribution to journal › Conference article › peer-review

3 Scopus citations

Abstract

The CLEF 2018 Task 2 goal was to identify and rank retrieved articles relevant to conducting a systematic diagnostic test accuracy review on a given topic. The UIC/OHSU team did not attempt to rank retrieved articles by relevance directly, but rather explored the baseline value of ranking retrieved articles according to the probability that they are concerned with diagnostic test accuracy. First, a set of six publication type clusters, including a cluster of diagnostic test accuracy papers (DTAs), was built by searching PubMed from 1987-2015. We created several types of cluster similarity measures for each publication type. Similarity types included: implicit-term similarity, most important word similarity, journal similarity, and author count similarity. These similarity features were then used with weighted and un-weighted linear S VM machine learning algorithms, which were trained with a data set retrieved from PubMed searches consisting of 3481 PMLDS likely to be DTAs, and 71684 PMIDS most of which are not likely to be DTAs. The trained models produce scores predicting the probability that an individual article is a DTA. The CLEF 2018 Task 2 Test PMLDs for each topic were scored and ranked, and the cutoff probability for each of the two models determined by visual inspection of the score distribution on the test data. Cutoff probabilities chosen were 0.20 for the unweighted SVM model and 0.40 for the weighted SVM model.

Original language	English (US)
Journal	CEUR Workshop Proceedings
Volume	2125
State	Published - 2018
Event	19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, France Duration: Sep 10 2018 → Sep 14 2018

Keywords

Diagnostic Test Accuracy
Machine Learning
Publication Types
Support Vector Machine

ASJC Scopus subject areas

General Computer Science

Cite this

@article{2a71268f7e4547bab99965738c438375,

title = "UIC/OHSU CLEF 2018 Task 2 diagnostic test accuracy ranking using publication type cluster similarity measures",

abstract = "The CLEF 2018 Task 2 goal was to identify and rank retrieved articles relevant to conducting a systematic diagnostic test accuracy review on a given topic. The UIC/OHSU team did not attempt to rank retrieved articles by relevance directly, but rather explored the baseline value of ranking retrieved articles according to the probability that they are concerned with diagnostic test accuracy. First, a set of six publication type clusters, including a cluster of diagnostic test accuracy papers (DTAs), was built by searching PubMed from 1987-2015. We created several types of cluster similarity measures for each publication type. Similarity types included: implicit-term similarity, most important word similarity, journal similarity, and author count similarity. These similarity features were then used with weighted and un-weighted linear S VM machine learning algorithms, which were trained with a data set retrieved from PubMed searches consisting of 3481 PMLDS likely to be DTAs, and 71684 PMIDS most of which are not likely to be DTAs. The trained models produce scores predicting the probability that an individual article is a DTA. The CLEF 2018 Task 2 Test PMLDs for each topic were scored and ranked, and the cutoff probability for each of the two models determined by visual inspection of the score distribution on the test data. Cutoff probabilities chosen were 0.20 for the unweighted SVM model and 0.40 for the weighted SVM model.",

keywords = "Diagnostic Test Accuracy, Machine Learning, Publication Types, Support Vector Machine",

author = "Aaron Cohen and Smalheiser, {Neil R.}",

year = "2018",

language = "English (US)",

volume = "2125",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 ; Conference date: 10-09-2018 Through 14-09-2018",

}

TY - JOUR

T1 - UIC/OHSU CLEF 2018 Task 2 diagnostic test accuracy ranking using publication type cluster similarity measures

AU - Cohen, Aaron

AU - Smalheiser, Neil R.

PY - 2018

Y1 - 2018

N2 - The CLEF 2018 Task 2 goal was to identify and rank retrieved articles relevant to conducting a systematic diagnostic test accuracy review on a given topic. The UIC/OHSU team did not attempt to rank retrieved articles by relevance directly, but rather explored the baseline value of ranking retrieved articles according to the probability that they are concerned with diagnostic test accuracy. First, a set of six publication type clusters, including a cluster of diagnostic test accuracy papers (DTAs), was built by searching PubMed from 1987-2015. We created several types of cluster similarity measures for each publication type. Similarity types included: implicit-term similarity, most important word similarity, journal similarity, and author count similarity. These similarity features were then used with weighted and un-weighted linear S VM machine learning algorithms, which were trained with a data set retrieved from PubMed searches consisting of 3481 PMLDS likely to be DTAs, and 71684 PMIDS most of which are not likely to be DTAs. The trained models produce scores predicting the probability that an individual article is a DTA. The CLEF 2018 Task 2 Test PMLDs for each topic were scored and ranked, and the cutoff probability for each of the two models determined by visual inspection of the score distribution on the test data. Cutoff probabilities chosen were 0.20 for the unweighted SVM model and 0.40 for the weighted SVM model.

AB - The CLEF 2018 Task 2 goal was to identify and rank retrieved articles relevant to conducting a systematic diagnostic test accuracy review on a given topic. The UIC/OHSU team did not attempt to rank retrieved articles by relevance directly, but rather explored the baseline value of ranking retrieved articles according to the probability that they are concerned with diagnostic test accuracy. First, a set of six publication type clusters, including a cluster of diagnostic test accuracy papers (DTAs), was built by searching PubMed from 1987-2015. We created several types of cluster similarity measures for each publication type. Similarity types included: implicit-term similarity, most important word similarity, journal similarity, and author count similarity. These similarity features were then used with weighted and un-weighted linear S VM machine learning algorithms, which were trained with a data set retrieved from PubMed searches consisting of 3481 PMLDS likely to be DTAs, and 71684 PMIDS most of which are not likely to be DTAs. The trained models produce scores predicting the probability that an individual article is a DTA. The CLEF 2018 Task 2 Test PMLDs for each topic were scored and ranked, and the cutoff probability for each of the two models determined by visual inspection of the score distribution on the test data. Cutoff probabilities chosen were 0.20 for the unweighted SVM model and 0.40 for the weighted SVM model.

KW - Diagnostic Test Accuracy

KW - Machine Learning

KW - Publication Types

KW - Support Vector Machine

UR - http://www.scopus.com/inward/record.url?scp=85051077406&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051077406&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85051077406

SN - 1613-0073

VL - 2125

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018

Y2 - 10 September 2018 through 14 September 2018

ER -

UIC/OHSU CLEF 2018 Task 2 diagnostic test accuracy ranking using publication type cluster similarity measures

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this