The health care and life sciences community profile for dataset descriptions

Michel Dumontier; Alasdair J.G. Gray; M. Scott Marshall; Vladimir Alexiev; Peter Ansell; Gary Bader; Joachim Baran; Jerven T. Bolleman; Alison Callahan; José Cruz-Toledo; Pascale Gaudet; Erich A. Gombocz; Alejandra N. Gonzalez-Beltran; Paul Groth; Melissa Haendel; Maori Ito; Simon Jupp; Nick Juty; Toshiaki Katayama; Norio Kobayashi; Kalpana Krishnaswami; Camille Laibe; Nicolas Le Novère; Simon Lin; James Malone; Michael Miller; Christopher J. Mungall; Laurens Rietveld; Sarala M. Wimalaratne; Atsuko Yamaguchi

doi:10.7717/peerj.2331

The health care and life sciences community profile for dataset descriptions

Michel Dumontier, Alasdair J.G. Gray, M. Scott Marshall, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, José Cruz-Toledo, Pascale Gaudet, Erich A. Gombocz, Alejandra N. Gonzalez-Beltran, Paul Groth, Melissa Haendel, Maori Ito, Simon Jupp, Nick Juty, Toshiaki Katayama, Norio KobayashiKalpana Krishnaswami, Camille Laibe, Nicolas Le Novère, Simon Lin, James Malone, Michael Miller, Christopher J. Mungall, Laurens Rietveld, Sarala M. Wimalaratne, Atsuko Yamaguchi

OHSU Library

Research output: Contribution to journal › Article › peer-review

18 Scopus citations

Abstract

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

Original language	English (US)
Article number	e2331
Journal	PeerJ
Volume	2016
Issue number	8
DOIs	https://doi.org/10.7717/peerj.2331
State	Published - 2016

Keywords

Data profiling
Dataset descriptions
FAIR data
Metadata
Provenance

ASJC Scopus subject areas

General Neuroscience
General Biochemistry, Genetics and Molecular Biology
General Agricultural and Biological Sciences

Access to Document

10.7717/peerj.2331

Cite this

Dumontier, M., Gray, A. J. G., Marshall, M. S., Alexiev, V., Ansell, P., Bader, G., Baran, J., Bolleman, J. T., Callahan, A., Cruz-Toledo, J., Gaudet, P., Gombocz, E. A., Gonzalez-Beltran, A. N., Groth, P., Haendel, M., Ito, M., Jupp, S., Juty, N., Katayama, T., ... Yamaguchi, A. (2016). The health care and life sciences community profile for dataset descriptions. PeerJ, 2016(8), Article e2331. https://doi.org/10.7717/peerj.2331

Dumontier, M, Gray, AJG, Marshall, MS, Alexiev, V, Ansell, P, Bader, G, Baran, J, Bolleman, JT, Callahan, A, Cruz-Toledo, J, Gaudet, P, Gombocz, EA, Gonzalez-Beltran, AN, Groth, P, Haendel, M, Ito, M, Jupp, S, Juty, N, Katayama, T, Kobayashi, N, Krishnaswami, K, Laibe, C, Le Novère, N, Lin, S, Malone, J, Miller, M, Mungall, CJ, Rietveld, L, Wimalaratne, SM & Yamaguchi, A 2016, 'The health care and life sciences community profile for dataset descriptions', PeerJ, vol. 2016, no. 8, e2331. https://doi.org/10.7717/peerj.2331

@article{2ed26c772b3d440c81c8133bf705ec76,

title = "The health care and life sciences community profile for dataset descriptions",

abstract = "Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.",

keywords = "Data profiling, Dataset descriptions, FAIR data, Metadata, Provenance",

author = "Michel Dumontier and Gray, {Alasdair J.G.} and Marshall, {M. Scott} and Vladimir Alexiev and Peter Ansell and Gary Bader and Joachim Baran and Bolleman, {Jerven T.} and Alison Callahan and Jos{\'e} Cruz-Toledo and Pascale Gaudet and Gombocz, {Erich A.} and Gonzalez-Beltran, {Alejandra N.} and Paul Groth and Melissa Haendel and Maori Ito and Simon Jupp and Nick Juty and Toshiaki Katayama and Norio Kobayashi and Kalpana Krishnaswami and Camille Laibe and {Le Nov{\`e}re}, Nicolas and Simon Lin and James Malone and Michael Miller and Mungall, {Christopher J.} and Laurens Rietveld and Wimalaratne, {Sarala M.} and Atsuko Yamaguchi",

note = "Publisher Copyright: {\textcopyright} 2016 Dumontier et al.",

year = "2016",

doi = "10.7717/peerj.2331",

language = "English (US)",

volume = "2016",

journal = "PeerJ",

issn = "2167-8359",

publisher = "PeerJ",

number = "8",

}

TY - JOUR

T1 - The health care and life sciences community profile for dataset descriptions

AU - Dumontier, Michel

AU - Gray, Alasdair J.G.

AU - Marshall, M. Scott

AU - Alexiev, Vladimir

AU - Ansell, Peter

AU - Bader, Gary

AU - Baran, Joachim

AU - Bolleman, Jerven T.

AU - Callahan, Alison

AU - Cruz-Toledo, José

AU - Gaudet, Pascale

AU - Gombocz, Erich A.

AU - Gonzalez-Beltran, Alejandra N.

AU - Groth, Paul

AU - Haendel, Melissa

AU - Ito, Maori

AU - Jupp, Simon

AU - Juty, Nick

AU - Katayama, Toshiaki

AU - Kobayashi, Norio

AU - Krishnaswami, Kalpana

AU - Laibe, Camille

AU - Le Novère, Nicolas

AU - Lin, Simon

AU - Malone, James

AU - Miller, Michael

AU - Mungall, Christopher J.

AU - Rietveld, Laurens

AU - Wimalaratne, Sarala M.

AU - Yamaguchi, Atsuko

PY - 2016

Y1 - 2016

N2 - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

AB - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

KW - Data profiling

KW - Dataset descriptions

KW - FAIR data

KW - Metadata

KW - Provenance

UR - http://www.scopus.com/inward/record.url?scp=84992130197&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992130197&partnerID=8YFLogxK

U2 - 10.7717/peerj.2331

DO - 10.7717/peerj.2331

M3 - Article

AN - SCOPUS:84992130197

SN - 2167-8359

VL - 2016

JO - PeerJ

JF - PeerJ

IS - 8

M1 - e2331

ER -

The health care and life sciences community profile for dataset descriptions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this