TY - JOUR
T1 - Evaluation-as-a-service for the computational sciences
T2 - Overview and outlook
AU - Hopfgartner, Frank
AU - Hanbury, Allan
AU - MÜLler, Henning
AU - Eggel, Ivan
AU - Balog, Krisztian
AU - Brodt, Torben
AU - Cormack, Gordon V.
AU - Lin, Jimmy
AU - Kalpathy-Cramer, Jayashree
AU - Kando, Noriko
AU - Kato, Makoto P.
AU - Krithara, Anastasia
AU - Gollub, Tim
AU - Potthast, Martin
AU - Viegas, Evelyne
AU - Mercer, Simon
N1 - Funding Information:
We acknowledge financial support by the European Science Foundation via its Research Network Program “Evaluating Information Access Systems” (ELIAS) and by the European Commission via the FP7 project VISCERAL (318068). Authors’ addresses: F. Hopfgartner, University of Sheffield, 211 Portobello St, Sheffield S1 4DP, United Kingdom; email: fhopfgartner@sheffield.ac.uk; A. Hanbury, TU Wien, Favoritenstrasse 9-11/194, 1040 Vienna, Austria; email: allan. hanbury@tuwien.ac.at; H. Müller and I. Eggel, University of Applied Sciences Western Switzerland (HES-SO), Rue du TechnoPôle 3, 3960 Sierre, Switzerland; email: henning.mueller@hevs.ch; K. Balog, University of Stavanger, NO-4036 Stavanger, Norway; email: krisztib@ntnu.no; T. Brodt, plista GmbH, Torstraße 33 – 35 10119 Berlin Germany; email: tb@plista.com; G. V. Cormack and J. Lin, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada; emails: cormack@cormack.uwaterloo.ca, jimmylin@uwaterloo.ca; J. Kalpathy-Cramer, Athinoula A. Martinos Center for Biomedical Imaging at Massachusetts General Hospital and Harvard Medical School, 13th Street, Charlestown, MA 02129 USA; email: kalpathy@nmr.mgh.harvard.edu; N. Kando, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan; email: noriko.kando@nii.ac.jp; M. P. Kato, Kyoto University, Yoshida Honmachi, Sakyo, Kyoto 606-8501, Japan; email: kato@dl.kuis.kyoto-u.ac.jp; A. Krithara, National Center for Scientific Research “Demokri-tos”, Ag. Paraskevi, 15310 Athens, Greece; email: akrithara@iit.demokritos.gr; T. Gollub, Bauhaus-Universität Weimar, Bauhausstraße 9a · Room 308, 99423 Weimar, Germany; email: tim.gollub@uni-weimar.de; M. Potthast, Leipzig University, Augustusplatz 10, 04109 Leipzig, Germany; email: martin.potthast@uni-leipzig.de; E. Viegas, Microsoft Research, Redmond, WA, USA; email: evelynev@microsoft.com; S. Mercer, Independent Consultant, undisclosed address. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2018 Association for Computing Machinery. 1936-1955/2018/10-ART15 $15.00 https://doi.org/10.1145/3239570
Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/10
Y1 - 2018/10
N2 - Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfeld paradigm of creating shared test collections, defning search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not ft this paradigm very well: extremely large data sets, confdential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the feld of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.
AB - Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfeld paradigm of creating shared test collections, defning search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not ft this paradigm very well: extremely large data sets, confdential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the feld of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.
KW - Benchmarking
KW - Evaluation-as-a-service
KW - Information access systems
UR - http://www.scopus.com/inward/record.url?scp=85056446482&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056446482&partnerID=8YFLogxK
U2 - 10.1145/3239570
DO - 10.1145/3239570
M3 - Review article
AN - SCOPUS:85056446482
SN - 1936-1955
VL - 10
JO - Journal of Data and Information Quality
JF - Journal of Data and Information Quality
IS - 4
M1 - a15
ER -