The health care and life sciences community profile for dataset descriptions

Michel Dumontier, Alasdair J.G. Gray, M. Scott Marshall, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, José Cruz-Toledo, Pascale Gaudet, Erich A. Gombocz, Alejandra N. Gonzalez-Beltran, Paul Groth, Melissa Haendel, Maori Ito, Simon Jupp, Nick Juty, Toshiaki Katayama, Norio KobayashiKalpana Krishnaswami, Camille Laibe, Nicolas Le Novère, Simon Lin, James Malone, Michael Miller, Christopher J. Mungall, Laurens Rietveld, Sarala M. Wimalaratne, Atsuko Yamaguchi

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

Original languageEnglish (US)
Article numbere2331
Issue number8
StatePublished - 2016


  • Data profiling
  • Dataset descriptions
  • FAIR data
  • Metadata
  • Provenance

ASJC Scopus subject areas

  • Neuroscience(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)


Dive into the research topics of 'The health care and life sciences community profile for dataset descriptions'. Together they form a unique fingerprint.

Cite this