Abstract
Over the past several years, our team has focused its efforts on improving retrieval precision performance by mixing visual and textual information. This year, we chose to explore ways in which we could use external data to enrich our retrieval system's data set; specifically, we annotated each image in the test collection with a set of MeSH headings from two different sources: human-assigned MEDLINE index terms, and automatically-assigned MeSH headings (via the National Library of Medicine's MetaMap software). In addition to exploring these different data enrichment techniques, we also revamped the architecture of our retrieval system itself. In past years, we have used a two-tiered approach wherein the data is stored in a relational database (RDBMS), but the indexing and searching are done using Lucene-like system. This year, we took advantage of our RDBMS's full-text search capabilities and performed both storage and searching in the RDBMS. This turned out to have both positive and negative effects at a practical level. On the one hand, using the database's built-in text retrieval subsystem resulted in improved retrieval speed and easier query analysis; however, these gains came at the cost of reduced exibility and increased code complexity. Our experiments investigated the effects of using various combinations of human- and automatically-assigned MeSH terms, along with several of the techniques that have proved useful in previous years. We found that including automatically-assigned MeSH terms sometimes provided a small amount of improvement (in terms of bpref, MAP, and early precision) and sometimes hurt performance, whereas including the humanassigned MEDLINE index headings consistently yielded a sizable improvement in those same metrics.
Original language | English (US) |
---|---|
Journal | CEUR Workshop Proceedings |
Volume | 1176 |
State | Published - 2010 |
Event | 2010 Cross Language Evaluation Forum Conference, CLEF 2010 - Padua, Italy Duration: Sep 22 2010 → Sep 23 2010 |
ASJC Scopus subject areas
- General Computer Science