Overview of the CLEF 2009 medical image retrieval track

Henning Müller; Jayashree Kalpathy-Cramer; Ivan Eggel; Steven Bedrick; Säd Radhouani; Brian Bakke; Charles E. Kahn; William Hersh

Overview of the CLEF 2009 medical image retrieval track

Henning Müller, Jayashree Kalpathy-Cramer, Ivan Eggel, Steven Bedrick, Säd Radhouani, Brian Bakke, Charles E. Kahn, William Hersh

Research output: Contribution to journal › Conference article › peer-review

18 Scopus citations

Abstract

2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.

Original language	English (US)
Journal	CEUR Workshop Proceedings
Volume	1175
State	Published - 2009
Event	2009 Cross Language Evaluation Forum Workshop, CLEF 2009, co-located with the 13th European Conference on Digital Libraries, ECDL 2009 - Corfu, Greece Duration: Sep 30 2009 → Oct 2 2009

Keywords

Image retrieval
Medical image retrieval
Multimodal retrieval

ASJC Scopus subject areas

General Computer Science

Cite this

@article{d6eb6594f19f47c29d5ac6eea519984f,

title = "Overview of the CLEF 2009 medical image retrieval track",

abstract = "2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.",

keywords = "Image retrieval, Medical image retrieval, Multimodal retrieval",

author = "Henning M{\"u}ller and Jayashree Kalpathy-Cramer and Ivan Eggel and Steven Bedrick and S{\"a}d Radhouani and Brian Bakke and Kahn, {Charles E.} and William Hersh",

year = "2009",

language = "English (US)",

volume = "1175",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2009 Cross Language Evaluation Forum Workshop, CLEF 2009, co-located with the 13th European Conference on Digital Libraries, ECDL 2009 ; Conference date: 30-09-2009 Through 02-10-2009",

}

TY - JOUR

T1 - Overview of the CLEF 2009 medical image retrieval track

AU - Müller, Henning

AU - Kalpathy-Cramer, Jayashree

AU - Eggel, Ivan

AU - Bedrick, Steven

AU - Radhouani, Säd

AU - Bakke, Brian

AU - Kahn, Charles E.

AU - Hersh, William

PY - 2009

Y1 - 2009

N2 - 2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.

AB - 2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.

KW - Image retrieval

KW - Medical image retrieval

KW - Multimodal retrieval

UR - http://www.scopus.com/inward/record.url?scp=84922051601&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922051601&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84922051601

SN - 1613-0073

VL - 1175

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2009 Cross Language Evaluation Forum Workshop, CLEF 2009, co-located with the 13th European Conference on Digital Libraries, ECDL 2009

Y2 - 30 September 2009 through 2 October 2009

ER -

Overview of the CLEF 2009 medical image retrieval track

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this