Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013

Steven Bedrick; Golnar Sheikshabbafghi

Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013

Steven Bedrick, Golnar Sheikshabbafghi

Institute on Development and Disability

Research output: Contribution to journal › Conference article › peer-review

Abstract

The Oregon Health & Science University team's participation in task #3 ("addressing patients' medical questions") of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.

Original language	English (US)
Journal	CEUR Workshop Proceedings
Volume	1179
State	Published - 2013
Event	2013 Cross Language Evaluation Forum Conference, CLEF 2013 - Valencia, Spain Duration: Sep 23 2013 → Sep 26 2013

Keywords

Language model
Lucene
MetaMap
Skip-grams

ASJC Scopus subject areas

General Computer Science

Cite this

@article{a72a2d7ee5904268b4dcd88912437cb2,

title = "Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013",

abstract = "The Oregon Health & Science University team's participation in task #3 ({"}addressing patients' medical questions{"}) of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.",

keywords = "Language model, Lucene, MetaMap, Skip-grams",

author = "Steven Bedrick and Golnar Sheikshabbafghi",

year = "2013",

language = "English (US)",

volume = "1179",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2013 Cross Language Evaluation Forum Conference, CLEF 2013 ; Conference date: 23-09-2013 Through 26-09-2013",

}

TY - JOUR

T1 - Lucene, MetaMap, and language modeling

T2 - 2013 Cross Language Evaluation Forum Conference, CLEF 2013

AU - Bedrick, Steven

AU - Sheikshabbafghi, Golnar

PY - 2013

Y1 - 2013

N2 - The Oregon Health & Science University team's participation in task #3 ("addressing patients' medical questions") of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.

AB - The Oregon Health & Science University team's participation in task #3 ("addressing patients' medical questions") of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.

KW - Language model

KW - Lucene

KW - MetaMap

KW - Skip-grams

UR - http://www.scopus.com/inward/record.url?scp=84922021504&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922021504&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84922021504

SN - 1613-0073

VL - 1179

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 23 September 2013 through 26 September 2013

ER -

Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this