Towards a common approach to data versioning

Progress through the Research Data Alliance (RD-A)

Data Versioning Working Group

To enable reproducibility of research results, it is important for a researcher to be able to cite the exact dataset that was used to underpin their research, especially when the dataset is large, dynamic and evolving over time. One aspect of reproducibility is the need for unambiguous references to a specific version of a dataset. However, such systematic data versioning practices are currently not available.

Versioning procedures and best practices are well established for scientific software and can be used to enable reproducibility of scientific results. The code base of large software projects does bear some resemblance to large dynamic datasets. Are versioning practices for code also suitable for datasets? When are the differences sufficient to warrant defining a new version and minting a new Digital Object Identifier (DOI)?

Over the past two years, the Research Data Alliance (RD-A) Data Versioning Working Group has collected numerous use cases of data versioning practices and extracted data versioning patterns. Many of these use cases have been drawn from Australian research and data intensive organisations representing a variety of perspectives and practices.

Join this webinar to hear a summary of the use cases collected, learn about the versioning patterns identified and review a draft of the Groups’ final report and recommendations. There will be time for your input and questions.

The webinar will be presented by Working Group Co-Chairs, Dr Jens Klump and Dr Lesley Wyborn.


Humanities, Arts and Culture Data Summit and DARIAH Beyond Europe workshop

View Now
View Now

Research in the cloud: Advanced services

View Now
View Now