TY - JOUR
T1 - The health care and life sciences community profile for dataset descriptions
AU - Dumontier, Michel
AU - Gray, Alasdair J.G.
AU - Marshall, M. Scott
AU - Alexiev, Vladimir
AU - Ansell, Peter
AU - Bader, Gary
AU - Baran, Joachim
AU - Bolleman, Jerven T.
AU - Callahan, Alison
AU - Cruz-Toledo, José
AU - Gaudet, Pascale
AU - Gombocz, Erich A.
AU - Gonzalez-Beltran, Alejandra N.
AU - Groth, Paul
AU - Haendel, Melissa
AU - Ito, Maori
AU - Jupp, Simon
AU - Juty, Nick
AU - Katayama, Toshiaki
AU - Kobayashi, Norio
AU - Krishnaswami, Kalpana
AU - Laibe, Camille
AU - Le Novère, Nicolas
AU - Lin, Simon
AU - Malone, James
AU - Miller, Michael
AU - Mungall, Christopher J.
AU - Rietveld, Laurens
AU - Wimalaratne, Sarala M.
AU - Yamaguchi, Atsuko
N1 - Funding Information:
We would like to acknowledge the contributions made by all those involved in the development of the W3C Health Care and Life Sciences community profile. In particular we would like to acknowledge Eric Prud'hommeaux the W3C liaison for the Health Care and Life Sciences Interest Group for his contributions in finalising the formatting of the W3C Interest Group Note. We also acknowledge the BioHackathon series (http://www.biohackathon.org/, accessed June 2016) for providing opportunities to discuss initial ideas for dataset descriptions. Funding Funding for Michel Dumontier was provided in part by grant U54 HG008033-01 awarded by NIAID through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative. Alasdair J.G. Gray was partly funded by the Open PHACTS project and Innovative Medicines Initiative Joint Undertaking under grant agreement number 115191, the resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution. M. Scott Marshall was funded by the European Commission through the EURECA (FP7-ICT-2012-6-270253) project. Gary Bader was supported by the US National Institutes of Health grant (U41 HG006623). Jerven Bollenman's Swiss-Prot group activities are supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation. Nicolas Le Novere was funded by the BBSRC Institute Strategic Programme BB/J004456/1. The BioHackathon series is supported by the Integrated Database Project (Ministry of Education, Culture, Sports Science and Technology, Japan), the National Bioscience Database Center (NBDC-Japan), and the Database Center for Life Sciences (DBCLS-Japan). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2016 Dumontier et al.
PY - 2016
Y1 - 2016
N2 - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
AB - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
KW - Data profiling
KW - Dataset descriptions
KW - FAIR data
KW - Metadata
KW - Provenance
UR - http://www.scopus.com/inward/record.url?scp=84992130197&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84992130197&partnerID=8YFLogxK
U2 - 10.7717/peerj.2331
DO - 10.7717/peerj.2331
M3 - Article
AN - SCOPUS:84992130197
SN - 2167-8359
VL - 2016
JO - PeerJ
JF - PeerJ
IS - 8
M1 - e2331
ER -