Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis

Imaging and Informatics in Retinopathy of Prematurity Consortium

doi:10.1016/j.xops.2022.100126

Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis

Imaging and Informatics in Retinopathy of Prematurity Consortium

Ophthalmology

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets. Design: Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data. Participants: Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants. Methods: Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets. Main Outcome Measures: Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong's test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar's chi-square test and Cohen's κ value. Results: The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P= 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations. Conclusions: Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.

Original language	English (US)
Article number	100126
Journal	Ophthalmology Science
Volume	2
Issue number	2
DOIs	https://doi.org/10.1016/j.xops.2022.100126
State	Published - Jun 2022

Keywords

Artificial intelligence
Deep learning
Generative adversarial network
Retinopathy of prematurity

ASJC Scopus subject areas

Ophthalmology

Access to Document

10.1016/j.xops.2022.100126

Cite this

@article{f1a6ad9a69a54ed28df9e9a3e33dc072,

title = "Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis",

abstract = "Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets. Design: Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data. Participants: Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants. Methods: Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets. Main Outcome Measures: Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fr{\'e}chet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong's test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar's chi-square test and Cohen's κ value. Results: The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P= 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fr{\'e}chet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations. Conclusions: Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.",

keywords = "Artificial intelligence, Deep learning, Generative adversarial network, Retinopathy of prematurity",

author = "{Imaging and Informatics in Retinopathy of Prematurity Consortium} and Coyner, {Aaron S.} and Chen, {Jimmy S.} and Ken Chang and Praveer Singh and Susan Ostmo and Chan, {R. V.Paul} and Chiang, {Michael F.} and Jayashree Kalpathy-Cramer and Campbell, {J. Peter}",

note = "Publisher Copyright: {\textcopyright} 2022 American Academy of Ophthalmology",

year = "2022",

month = jun,

doi = "10.1016/j.xops.2022.100126",

language = "English (US)",

volume = "2",

journal = "Ophthalmology Science",

issn = "2666-9145",

publisher = "Elsevier BV",

number = "2",

}

TY - JOUR

T1 - Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence

T2 - Application to Retinopathy of Prematurity Diagnosis

AU - Imaging and Informatics in Retinopathy of Prematurity Consortium

AU - Coyner, Aaron S.

AU - Chen, Jimmy S.

AU - Chang, Ken

AU - Singh, Praveer

AU - Ostmo, Susan

AU - Chan, R. V.Paul

AU - Chiang, Michael F.

AU - Kalpathy-Cramer, Jayashree

AU - Campbell, J. Peter

PY - 2022/6

Y1 - 2022/6

N2 - Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets. Design: Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data. Participants: Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants. Methods: Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets. Main Outcome Measures: Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong's test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar's chi-square test and Cohen's κ value. Results: The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P= 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations. Conclusions: Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.

AB - Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets. Design: Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data. Participants: Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants. Methods: Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets. Main Outcome Measures: Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong's test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar's chi-square test and Cohen's κ value. Results: The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P= 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations. Conclusions: Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.

KW - Artificial intelligence

KW - Deep learning

KW - Generative adversarial network

KW - Retinopathy of prematurity

UR - http://www.scopus.com/inward/record.url?scp=85130246215&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85130246215&partnerID=8YFLogxK

U2 - 10.1016/j.xops.2022.100126

DO - 10.1016/j.xops.2022.100126

M3 - Article

AN - SCOPUS:85130246215

SN - 2666-9145

VL - 2

JO - Ophthalmology Science

JF - Ophthalmology Science

IS - 2

M1 - 100126

ER -

Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this