Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound

Gary Y. Li; Li Chen; Mohsen Zahiri; Naveen Balaraju; Shubham Patil; Courosh Mehanian; Cynthia Gregory; Kenton Gregory; Balasundar Raju; Jochen Kruecker; Alvin Chen

doi:10.1109/ICCVW60793.2023.00262

Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound

Gary Y. Li, Li Chen, Mohsen Zahiri, Naveen Balaraju, Shubham Patil, Courosh Mehanian, Cynthia Gregory, Kenton Gregory, Balasundar Raju, Jochen Kruecker, Alvin Chen

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

For many challenging medical imaging tasks involving sequences, video-level labels alone are insufficient to train accurate disease classification models and do not carry information about the locations of relevant features. Alternatively, localization-based models such as detectors offer much stronger interpretability by indicating areas of suspicion, but require comprehensive frame-by-frame annotations by experts. We propose a method to address the trade-off between annotation burden and interpretability by performing simultaneous detection and classification on medical video sequences while requiring very limited frame-level supervision. Specifically, our approach aggregates individual predictions from a detection model into "tracklets"representing temporally consistent regions of pathology along the sequence. The tracklets are classified in a second stage to arrive at an overall video-level prediction. Both the detector and tracklet classifier are trained in a weakly semi-supervised manner using a large amount of video-annotated data alongside a limited set of frame annotations. We apply the approach to several challenging medical imaging tasks, namely localizing and predicting the presence or absence of lung consolidation and pleural effusion in ultrasound videos. We show that, with only a very small amount of additional frame-annotated data, the method provides strong model interpretability through localization and achieves state-of-the-art detection and classification, outperforming both direct video classifiers and comparable frame-based detectors trained without the added temporal context.

Original language	English (US)
Title of host publication	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	2475-2484
Number of pages	10
ISBN (Electronic)	9798350307443
DOIs	https://doi.org/10.1109/ICCVW60793.2023.00262
State	Published - 2023
Event	2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 - Paris, France Duration: Oct 2 2023 → Oct 6 2023

Publication series

Name	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023

Conference

Conference	2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Country/Territory	France
City	Paris
Period	10/2/23 → 10/6/23

Keywords

lung ultrasound
object detection
semi supervised learning
video classification
weakly supervised learning

ASJC Scopus subject areas

Artificial Intelligence
Computer Science Applications
Computer Vision and Pattern Recognition

Access to Document

10.1109/ICCVW60793.2023.00262

Cite this

Li, G. Y., Chen, L., Zahiri, M., Balaraju, N., Patil, S., Mehanian, C., Gregory, C., Gregory, K., Raju, B., Kruecker, J., & Chen, A. (2023). Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 (pp. 2475-2484). (Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCVW60793.2023.00262

Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound. / Li, Gary Y.; Chen, Li; Zahiri, Mohsen et al.
Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 2475-2484 (Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, GY, Chen, L, Zahiri, M, Balaraju, N, Patil, S, Mehanian, C, Gregory, C , Gregory, K, Raju, B, Kruecker, J & Chen, A 2023, Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound. in Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023. Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023, Institute of Electrical and Electronics Engineers Inc., pp. 2475-2484, 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023, Paris, France, 10/2/23. https://doi.org/10.1109/ICCVW60793.2023.00262

Li GY, Chen L, Zahiri M, Balaraju N, Patil S, Mehanian C et al. Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 2475-2484. (Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023). doi: 10.1109/ICCVW60793.2023.00262

Li, Gary Y. ; Chen, Li ; Zahiri, Mohsen et al. / Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound. Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 2475-2484 (Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023).

@inproceedings{932003eb34854a35b0111684563de5d5,

title = "Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound",

abstract = "For many challenging medical imaging tasks involving sequences, video-level labels alone are insufficient to train accurate disease classification models and do not carry information about the locations of relevant features. Alternatively, localization-based models such as detectors offer much stronger interpretability by indicating areas of suspicion, but require comprehensive frame-by-frame annotations by experts. We propose a method to address the trade-off between annotation burden and interpretability by performing simultaneous detection and classification on medical video sequences while requiring very limited frame-level supervision. Specifically, our approach aggregates individual predictions from a detection model into {"}tracklets{"}representing temporally consistent regions of pathology along the sequence. The tracklets are classified in a second stage to arrive at an overall video-level prediction. Both the detector and tracklet classifier are trained in a weakly semi-supervised manner using a large amount of video-annotated data alongside a limited set of frame annotations. We apply the approach to several challenging medical imaging tasks, namely localizing and predicting the presence or absence of lung consolidation and pleural effusion in ultrasound videos. We show that, with only a very small amount of additional frame-annotated data, the method provides strong model interpretability through localization and achieves state-of-the-art detection and classification, outperforming both direct video classifiers and comparable frame-based detectors trained without the added temporal context.",

keywords = "lung ultrasound, object detection, semi supervised learning, video classification, weakly supervised learning",

author = "Li, {Gary Y.} and Li Chen and Mohsen Zahiri and Naveen Balaraju and Shubham Patil and Courosh Mehanian and Cynthia Gregory and Kenton Gregory and Balasundar Raju and Jochen Kruecker and Alvin Chen",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

doi = "10.1109/ICCVW60793.2023.00262",

language = "English (US)",

series = "Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2475--2484",

booktitle = "Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023",

}

TY - GEN

T1 - Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound

AU - Li, Gary Y.

AU - Chen, Li

AU - Zahiri, Mohsen

AU - Balaraju, Naveen

AU - Patil, Shubham

AU - Mehanian, Courosh

AU - Gregory, Cynthia

AU - Gregory, Kenton

AU - Raju, Balasundar

AU - Kruecker, Jochen

AU - Chen, Alvin

PY - 2023

Y1 - 2023

N2 - For many challenging medical imaging tasks involving sequences, video-level labels alone are insufficient to train accurate disease classification models and do not carry information about the locations of relevant features. Alternatively, localization-based models such as detectors offer much stronger interpretability by indicating areas of suspicion, but require comprehensive frame-by-frame annotations by experts. We propose a method to address the trade-off between annotation burden and interpretability by performing simultaneous detection and classification on medical video sequences while requiring very limited frame-level supervision. Specifically, our approach aggregates individual predictions from a detection model into "tracklets"representing temporally consistent regions of pathology along the sequence. The tracklets are classified in a second stage to arrive at an overall video-level prediction. Both the detector and tracklet classifier are trained in a weakly semi-supervised manner using a large amount of video-annotated data alongside a limited set of frame annotations. We apply the approach to several challenging medical imaging tasks, namely localizing and predicting the presence or absence of lung consolidation and pleural effusion in ultrasound videos. We show that, with only a very small amount of additional frame-annotated data, the method provides strong model interpretability through localization and achieves state-of-the-art detection and classification, outperforming both direct video classifiers and comparable frame-based detectors trained without the added temporal context.

AB - For many challenging medical imaging tasks involving sequences, video-level labels alone are insufficient to train accurate disease classification models and do not carry information about the locations of relevant features. Alternatively, localization-based models such as detectors offer much stronger interpretability by indicating areas of suspicion, but require comprehensive frame-by-frame annotations by experts. We propose a method to address the trade-off between annotation burden and interpretability by performing simultaneous detection and classification on medical video sequences while requiring very limited frame-level supervision. Specifically, our approach aggregates individual predictions from a detection model into "tracklets"representing temporally consistent regions of pathology along the sequence. The tracklets are classified in a second stage to arrive at an overall video-level prediction. Both the detector and tracklet classifier are trained in a weakly semi-supervised manner using a large amount of video-annotated data alongside a limited set of frame annotations. We apply the approach to several challenging medical imaging tasks, namely localizing and predicting the presence or absence of lung consolidation and pleural effusion in ultrasound videos. We show that, with only a very small amount of additional frame-annotated data, the method provides strong model interpretability through localization and achieves state-of-the-art detection and classification, outperforming both direct video classifiers and comparable frame-based detectors trained without the added temporal context.

KW - lung ultrasound

KW - object detection

KW - semi supervised learning

KW - video classification

KW - weakly supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85182929826&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85182929826&partnerID=8YFLogxK

U2 - 10.1109/ICCVW60793.2023.00262

DO - 10.1109/ICCVW60793.2023.00262

M3 - Conference contribution

AN - SCOPUS:85182929826

T3 - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023

SP - 2475

EP - 2484

BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023

Y2 - 2 October 2023 through 6 October 2023

ER -

Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this