TY - GEN
T1 - Weakly Semi-supervised Detector-based Video Classification with Temporal Context for Lung Ultrasound
AU - Li, Gary Y.
AU - Chen, Li
AU - Zahiri, Mohsen
AU - Balaraju, Naveen
AU - Patil, Shubham
AU - Mehanian, Courosh
AU - Gregory, Cynthia
AU - Gregory, Kenton
AU - Raju, Balasundar
AU - Kruecker, Jochen
AU - Chen, Alvin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - For many challenging medical imaging tasks involving sequences, video-level labels alone are insufficient to train accurate disease classification models and do not carry information about the locations of relevant features. Alternatively, localization-based models such as detectors offer much stronger interpretability by indicating areas of suspicion, but require comprehensive frame-by-frame annotations by experts. We propose a method to address the trade-off between annotation burden and interpretability by performing simultaneous detection and classification on medical video sequences while requiring very limited frame-level supervision. Specifically, our approach aggregates individual predictions from a detection model into "tracklets"representing temporally consistent regions of pathology along the sequence. The tracklets are classified in a second stage to arrive at an overall video-level prediction. Both the detector and tracklet classifier are trained in a weakly semi-supervised manner using a large amount of video-annotated data alongside a limited set of frame annotations. We apply the approach to several challenging medical imaging tasks, namely localizing and predicting the presence or absence of lung consolidation and pleural effusion in ultrasound videos. We show that, with only a very small amount of additional frame-annotated data, the method provides strong model interpretability through localization and achieves state-of-the-art detection and classification, outperforming both direct video classifiers and comparable frame-based detectors trained without the added temporal context.
AB - For many challenging medical imaging tasks involving sequences, video-level labels alone are insufficient to train accurate disease classification models and do not carry information about the locations of relevant features. Alternatively, localization-based models such as detectors offer much stronger interpretability by indicating areas of suspicion, but require comprehensive frame-by-frame annotations by experts. We propose a method to address the trade-off between annotation burden and interpretability by performing simultaneous detection and classification on medical video sequences while requiring very limited frame-level supervision. Specifically, our approach aggregates individual predictions from a detection model into "tracklets"representing temporally consistent regions of pathology along the sequence. The tracklets are classified in a second stage to arrive at an overall video-level prediction. Both the detector and tracklet classifier are trained in a weakly semi-supervised manner using a large amount of video-annotated data alongside a limited set of frame annotations. We apply the approach to several challenging medical imaging tasks, namely localizing and predicting the presence or absence of lung consolidation and pleural effusion in ultrasound videos. We show that, with only a very small amount of additional frame-annotated data, the method provides strong model interpretability through localization and achieves state-of-the-art detection and classification, outperforming both direct video classifiers and comparable frame-based detectors trained without the added temporal context.
KW - lung ultrasound
KW - object detection
KW - semi supervised learning
KW - video classification
KW - weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85182929826&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182929826&partnerID=8YFLogxK
U2 - 10.1109/ICCVW60793.2023.00262
DO - 10.1109/ICCVW60793.2023.00262
M3 - Conference contribution
AN - SCOPUS:85182929826
T3 - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
SP - 2475
EP - 2484
BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Y2 - 2 October 2023 through 6 October 2023
ER -