Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms

Matthew D. Li; Francis Deng; Ken Chang; Jayashree Kalpathy-Cramer; Ambrose J. Huang

doi:10.1016/j.acra.2021.01.017

Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms

Matthew D. Li, Francis Deng, Ken Chang, Jayashree Kalpathy-Cramer, Ambrose J. Huang

Medical Informatics and Clinical Epidemiology

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Rationale and Objectives: Train and apply natural language processing (NLP) algorithms for automated radiology-arthroscopy correlation of meniscal tears. Materials and Methods: In this retrospective single-institution study, we trained supervised machine learning models (logistic regression, support vector machine, and random forest) to detect medial or lateral meniscus tears on free-text MRI reports. We trained and evaluated model performances with cross-validation using 3593 manually annotated knee MRI reports. To assess radiology-arthroscopy correlation, we then randomly partitioned this dataset 80:20 for training and testing, where 108 test set MRIs were followed by knee arthroscopy within 1 year. These free-text arthroscopy reports were also manually annotated. The NLP algorithms trained on the knee MRI training dataset were then evaluated on the MRI and arthroscopy report test datasets. We assessed radiology-arthroscopy agreement using the ensembled NLP-extracted findings versus manually annotated findings. Results: The NLP models showed high cross-validation performance for meniscal tear detection on knee MRI reports (medial meniscus F1 scores 0.93–0.94, lateral meniscus F1 scores 0.86–0.88). When these algorithms were evaluated on arthroscopy reports, despite never training on arthroscopy reports, performance was similar, though higher with model ensembling (medial meniscus F1 score 0.97, lateral meniscus F1 score 0.99). However, ensembling did not improve performance on knee MRI reports. In the radiology-arthroscopy test set, the ensembled NLP models were able to detect mismatches between MRI and arthroscopy reports with sensitivity 79% and specificity 87%. Conclusion: Radiology-arthroscopy correlation can be automated for knee meniscal tears using NLP algorithms, which shows promise for education and quality improvement.

Original language	English (US)
Pages (from-to)	479-487
Number of pages	9
Journal	Academic radiology
Volume	29
Issue number	4
DOIs	https://doi.org/10.1016/j.acra.2021.01.017
State	Published - Apr 2022

Keywords

Knee MRI
Machine learning
Meniscal tear
Natural language processing
Radiology-arthroscopy correlation

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging

Access to Document

10.1016/j.acra.2021.01.017

Cite this

@article{056b082498a242e498d0b6dbb07798f8,

title = "Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms",

abstract = "Rationale and Objectives: Train and apply natural language processing (NLP) algorithms for automated radiology-arthroscopy correlation of meniscal tears. Materials and Methods: In this retrospective single-institution study, we trained supervised machine learning models (logistic regression, support vector machine, and random forest) to detect medial or lateral meniscus tears on free-text MRI reports. We trained and evaluated model performances with cross-validation using 3593 manually annotated knee MRI reports. To assess radiology-arthroscopy correlation, we then randomly partitioned this dataset 80:20 for training and testing, where 108 test set MRIs were followed by knee arthroscopy within 1 year. These free-text arthroscopy reports were also manually annotated. The NLP algorithms trained on the knee MRI training dataset were then evaluated on the MRI and arthroscopy report test datasets. We assessed radiology-arthroscopy agreement using the ensembled NLP-extracted findings versus manually annotated findings. Results: The NLP models showed high cross-validation performance for meniscal tear detection on knee MRI reports (medial meniscus F1 scores 0.93–0.94, lateral meniscus F1 scores 0.86–0.88). When these algorithms were evaluated on arthroscopy reports, despite never training on arthroscopy reports, performance was similar, though higher with model ensembling (medial meniscus F1 score 0.97, lateral meniscus F1 score 0.99). However, ensembling did not improve performance on knee MRI reports. In the radiology-arthroscopy test set, the ensembled NLP models were able to detect mismatches between MRI and arthroscopy reports with sensitivity 79% and specificity 87%. Conclusion: Radiology-arthroscopy correlation can be automated for knee meniscal tears using NLP algorithms, which shows promise for education and quality improvement.",

keywords = "Knee MRI, Machine learning, Meniscal tear, Natural language processing, Radiology-arthroscopy correlation",

author = "Li, {Matthew D.} and Francis Deng and Ken Chang and Jayashree Kalpathy-Cramer and Huang, {Ambrose J.}",

note = "Publisher Copyright: {\textcopyright} 2021 The Association of University Radiologists",

year = "2022",

month = apr,

doi = "10.1016/j.acra.2021.01.017",

language = "English (US)",

volume = "29",

pages = "479--487",

journal = "Academic radiology",

issn = "1076-6332",

publisher = "Elsevier USA",

number = "4",

}

TY - JOUR

T1 - Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms

AU - Li, Matthew D.

AU - Deng, Francis

AU - Chang, Ken

AU - Kalpathy-Cramer, Jayashree

AU - Huang, Ambrose J.

PY - 2022/4

Y1 - 2022/4

N2 - Rationale and Objectives: Train and apply natural language processing (NLP) algorithms for automated radiology-arthroscopy correlation of meniscal tears. Materials and Methods: In this retrospective single-institution study, we trained supervised machine learning models (logistic regression, support vector machine, and random forest) to detect medial or lateral meniscus tears on free-text MRI reports. We trained and evaluated model performances with cross-validation using 3593 manually annotated knee MRI reports. To assess radiology-arthroscopy correlation, we then randomly partitioned this dataset 80:20 for training and testing, where 108 test set MRIs were followed by knee arthroscopy within 1 year. These free-text arthroscopy reports were also manually annotated. The NLP algorithms trained on the knee MRI training dataset were then evaluated on the MRI and arthroscopy report test datasets. We assessed radiology-arthroscopy agreement using the ensembled NLP-extracted findings versus manually annotated findings. Results: The NLP models showed high cross-validation performance for meniscal tear detection on knee MRI reports (medial meniscus F1 scores 0.93–0.94, lateral meniscus F1 scores 0.86–0.88). When these algorithms were evaluated on arthroscopy reports, despite never training on arthroscopy reports, performance was similar, though higher with model ensembling (medial meniscus F1 score 0.97, lateral meniscus F1 score 0.99). However, ensembling did not improve performance on knee MRI reports. In the radiology-arthroscopy test set, the ensembled NLP models were able to detect mismatches between MRI and arthroscopy reports with sensitivity 79% and specificity 87%. Conclusion: Radiology-arthroscopy correlation can be automated for knee meniscal tears using NLP algorithms, which shows promise for education and quality improvement.

AB - Rationale and Objectives: Train and apply natural language processing (NLP) algorithms for automated radiology-arthroscopy correlation of meniscal tears. Materials and Methods: In this retrospective single-institution study, we trained supervised machine learning models (logistic regression, support vector machine, and random forest) to detect medial or lateral meniscus tears on free-text MRI reports. We trained and evaluated model performances with cross-validation using 3593 manually annotated knee MRI reports. To assess radiology-arthroscopy correlation, we then randomly partitioned this dataset 80:20 for training and testing, where 108 test set MRIs were followed by knee arthroscopy within 1 year. These free-text arthroscopy reports were also manually annotated. The NLP algorithms trained on the knee MRI training dataset were then evaluated on the MRI and arthroscopy report test datasets. We assessed radiology-arthroscopy agreement using the ensembled NLP-extracted findings versus manually annotated findings. Results: The NLP models showed high cross-validation performance for meniscal tear detection on knee MRI reports (medial meniscus F1 scores 0.93–0.94, lateral meniscus F1 scores 0.86–0.88). When these algorithms were evaluated on arthroscopy reports, despite never training on arthroscopy reports, performance was similar, though higher with model ensembling (medial meniscus F1 score 0.97, lateral meniscus F1 score 0.99). However, ensembling did not improve performance on knee MRI reports. In the radiology-arthroscopy test set, the ensembled NLP models were able to detect mismatches between MRI and arthroscopy reports with sensitivity 79% and specificity 87%. Conclusion: Radiology-arthroscopy correlation can be automated for knee meniscal tears using NLP algorithms, which shows promise for education and quality improvement.

KW - Knee MRI

KW - Machine learning

KW - Meniscal tear

KW - Natural language processing

KW - Radiology-arthroscopy correlation

UR - http://www.scopus.com/inward/record.url?scp=85100621697&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85100621697&partnerID=8YFLogxK

U2 - 10.1016/j.acra.2021.01.017

DO - 10.1016/j.acra.2021.01.017

M3 - Article

AN - SCOPUS:85100621697

SN - 1076-6332

VL - 29

SP - 479

EP - 487

JO - Academic radiology

JF - Academic radiology

IS - 4

ER -

Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this