TY - GEN
T1 - Finna
T2 - 2013 AAAI Fall Symposium
AU - Ambert, Kyle H.
AU - Cohen, Aaron M.
AU - Burns, Gully A.P.C.
AU - Boudreau, Eilis
AU - Sonmez, Kemal
PY - 2013
Y1 - 2013
N2 - The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general.
AB - The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general.
UR - http://www.scopus.com/inward/record.url?scp=84898867244&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84898867244&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84898867244
SN - 9781577356394
T3 - AAAI Fall Symposium - Technical Report
SP - 2
EP - 7
BT - Discovery Informatics
PB - AI Access Foundation
Y2 - 15 November 2013 through 17 November 2013
ER -