Presented: 27 May 2020
Presenter: Liz Stokes
#4 in the 8 webinar series of the FAIR data 101 training webinars.
Hello, everybody. Welcome to the webinar number four, the second webinar of our second module on the FAIR Data Principles. My name is Liz Stokes. I’m from the ARDC skills team, and I’m going to talk to you about what the FAIR Data Principles have to say beyond protocols and into what repositories can do to make research data accessible to their users. I would like to acknowledge the traditional owners of the land on which I’m standing today, which is the Gadigal people of the Eora Nation. This sovereignty has not been seated to these people, and I pay my respects to the traditional owners, both past and present and welcome any First Nations people who are joining us today. So, let’s get straight into it.
So, a little bit of front matter perhaps, maybe I’ll call it. There’s a link down the bottom there to the code of conduct. Please have a read of that, and let us know if you have any issues. You are more than welcome to put questions or comments in the chat modules in GoToWebinar here today as I’m talking, but I won’t respond really to them until the end question and answer time. If you word it such that it is urgent, then one of my awesome ARDC team will no doubt respond promptly. Of course, I encourage you to take on any thoughts that you have and share them in our Slack channel this afternoon after this webinar.
Okay. I accidentally moved on. Great. Well, let’s keep going then. So, the overview for today is that I’m going to look at recapping over repositories’ role in enabling FAIR data, look at some examples about how different repositories mediate open and closed access to data, and then we’ll have a little Q and A session, and finish up with any questions you might have around the activities, quiz and community discussions for this coming module, okay? Right here.
Okay. So, the FAIR principle that I’m going to be covering today is really this one, this A2, that metadata are accessible even when the data are no longer available. Actually, the principle for this is really the backstop for repositories, our data repositories that we know and love. It cycles back to the undertaking of the other principles under findable, interoperable and reusable that repositories do, but today, I’m going to be concentrating on access to data via those repositories.
So, I would like to put a little disambiguation here, okay, in that for the FAIR principles, accessible means access to the data. It’s not necessarily about the web content accessibility guidelines, although that is certainly part of best practice in facilitating access to anything on the internet, but it’s really more about who can access what data under what conditions. As every repository and their use cases can be quite different from each other, there is no one standard to manage all of this.
Okay. Well, let’s get into answering some practical questions. So, what can repositories do to enable FAIR data? So, you recall, I shared a few examples. Looks like the slide’s not working. That’s interesting. So, you’ll recall, as I was, that a few repositories that I shared, for example, Zenodo and the Australian Data Archive, which featured a nice little introduction in our Slack channel last week … Thank you to Marina. I guess we’ll just wait for that slide to keep on loading, or I might move forward into that.
Often, when we’re looking at what repositories can do to enable FAIR data, we’re often looking for examples of best practice and what are the exemplary data repositories out there. So, one reason why I chose some of those repositories such as the Australian Data Archive and the ICPSR, the social science data archive, hosted by the University of Michigan, that both of those repositories are benchmarked, have gone through a certification process for the CoreTrustSeal, so this idea of benchmarking against trusted data repository requirements is certainly one reason that you might go into enabling FAIR data by pursuing that certification process. It’s not necessarily easy, and it certainly takes a certain amount of time, but my colleagues at ARDC have assisted a few people in going there.
I think as I was saying, I was going to mention benchmarking against the trusted data repository requirements. Hey, it’s working now. Great. Okay. I’ve included a link down there to the CoreTrustSeal. I encourage you to follow that link, and have a look around because it’s a nice little map, which takes you to actually, the physical address that’s been registered against each repository. It’s a nice way of understanding Australian repositories and where they’re based, but okay. I’m just going to leave that there, not go deep into that certification process.
So, another thing that repositories can do in order to facilitate access is to implement a mechanism for authorization and authentication. Now, wow, two multisyllabic words starting with A. What does this really mean? So, for example, onto slide 11, is the ALA, the Atlas of Living Australia, okay? As you can see here in this slide, they have provided a range of different ways that you can … range of different ways for authentication. So, to authenticate as a user, you can either sign in with the Australian Access Federation, or you can choose more social media and corporate type accounts with Google, Facebook, and Twitter there, or you can create your own account. So, they also provide a way of a username and password account there. The point I’m making is that this is all for facilitating the authentication, which is what the machines take care of in terms of our FAIR Data Principles, humans and machines working on the same data together.
For authorization, on the other hand, this is something for humans to decide. So, I’m going to come back to a later discussion of mediating data in that way. Another thing that repositories can do is to expose the data and metadata with a well-documented API. Who remembers what API stands for from Matthias’s lecture on Monday? I welcome your answers in the chat there. I’m going to show you an example from the CSIRO Data Access Portal. This is a screen share, but if you take your cursor up to the top right-hand corner where it says API next to help there, you will see some pretty splendid and thorough documentation on how the API works and how you might get automated access to the data that CSIRO provides.
Okay. So, this is really where I wanted to start talking about these different methods of facilitating access to data. As I mentioned before, a lot of this comes down to what people need to consider in terms of understanding the needs of their researchers and the people providing access to that data. So, for example, mediation, okay, is really all about respecting the wishes of the data generators and content owners or people who are responsible for providing that data, but it’s also about navigating any legal frameworks that we might operate in, which sometimes tend to value the individual’s rights over their intellectual property, right?
Of course, there are different cases where a researcher or certainly a repository manager may wish to provide more or higher security to data and to restrict access to it, but the people who had given their data or provided that may want more openness about that data. So, we could go to an oral history example where the people sharing data for perhaps a certain community, they may actually want to be named even if they are discussing something that is quite private, okay? So, the mediation that occurs between the repository managers and the researchers and the people providing that is, well, how many people is okay for us to share this data with? Okay.
Another example on the other hand might be thinking about medical data and access to that. So, people might never actually want to be necessarily identified, but they may be very happy for that information to go out wide and be shared with relevant researchers and other research groups to progress advances in medicine and combating things like a pandemic, for example. I suppose it’s also an interesting point to note, just thinking back to how ALA provides access through social media platforms, so not necessarily only through the AAF, but social media as a thing that has happened … I do apologize for my incoherence right now. The way that social media platforms restrict and enable information to go to people in your network and to advertisers is also actually an example of mediation, one that perhaps we have already signed onto in theory, if not necessarily having read all the details of that thing that you need to read to sign that you accept and agree, those terms and conditions, but I’m moving away now into analogy territory. So, I’ll just pull it back a little, going back.
So, for example, another concern that people might have is that the data might not necessarily be digital as well. So, I suppose many of the researchers on board here might be familiar with needing to organize paper, forms for having a discussion with their research participants about consent for collecting the data and what might happen or what might be done with that data after they have collected it, okay? This negotiation over access to longterm consent, it’s not uncommon to be in paper, and certainly, that’s something to take into account for repository managers who may be concentrating largely on having a repository that is for digital objects only. So, there’s potential to branch out into physical holdings as well.
We could also look at commercially sensitive data and where decisions need to happen in terms of controlling the bounds of who might access that data. So, this kind of mediation might happen via legal instruments or providing a memorandum of understanding between different partners. So, it’s really all about ensuring that there is clarity for what the mechanism is to enter into negotiations for how to access that data, or for example, if we’re talking in the commercial sector, we might actually be talking about data science initiatives, so that includes … Sorry, that was my daughter. So, that includes access to perhaps the software and any code or algorithms or pipelines and workflows.
Ah, awesome. Finished that sentence. Of course, so maybe the collaborative research centers where a university department or faculty might have organized, have a partnership with a commercial organizing that, so they might have federated agreements to share their data. So, these are only a few examples of that, but some of those… As you can appreciate, some of that data might need to always be closed. It’s really about having that clarity about what data or rather, what metadata is available so that people have a record of that. Okay.
Okay, so deciding on the access can be … So, that was all sort of pretty heavy actually, I’d like to say. It’s a veritable minefield when you’re talking about access to sensitive data and what you can enable to be open or closed. Hang on a moment. I just need to be talking right now. Thank you. So, I wanted to highlight the coalition of publishing in the earth, space, and environmental sciences, which is what the COPDESS acronym in that slide is, how they decided to publish an agreement in 2014 about what they would do in order to facilitate the FAIR Data Principles, okay? So, onto slide 17.
So, they wrote a commitment statement, and they encouraged individuals and institutions and all kinds of organizations to sign on as signatories to this commitment statement. So, among those signatories, there are researchers, publishers, societies, institutes, infrastructure providers, and repositories as well, all coming together as a community to implement these principles. Of course, if you’re in the earth and space sciences, you could sign onto this too. So, I’ll just put that out there in case you were looking for something to do after the webinar.
Okay. So, now, it’s time for me to move on to this final … Okay. Let’s get back to the metadata, okay? So, making the metadata available even when the data are no longer available. So, what does this mean? Well, curating metadata indefinitely, which is ultimately what I suppose we’re expecting our data repositories to do, requires quite a lot of effort. So, it’s actually quite important for us to consider the end goal or what might happen if, for example, the metadata are to be moved, okay? Maybe your repository is changing platforms or infrastructure, so you need to migrate your metadata and your data and make sure they stay together, or perhaps the project closes for which the repository has been created, or it could be … I don’t know. Maybe even universities that have been around for centuries, but they may need to close, so developing an exit strategy for how the metadata will be available even if the data needs to be moved is an important thing.
So, moving on to other reasons why the data would no longer be available is that other file formats or standards may actually change, okay? These examples that I have up on this slide are all kind of likely activities or things that may happen. So, the published data itself maybe have been withdrawn or retracted. The creators might have moved on. They may have changed institutions, for example, or the research project, ah, was a giant con, for example, or rather, it closed. Sorry. I don’t mean to cast aspersions on our research community. So, the research project concludes, or for example, maybe this has happened to you that government or other department changes its name. So, then, it’s very hard to find that data where you thought it was in the labyrinth structure of their website.
So, how does it help to have the metadata available? So, moving on from like all of the problems, things that could go wrong, here are a few examples of where it would be good to have access to that metadata. So, it enables you to have contextual information to follow up. If you want to get in touch with the original data creators, you may want to find out what else this research data was related to, so other related-research outputs and to support meta-analysis and citation so that … We don’t necessarily want to break the citation chain, so especially if the data perhaps was retracted. At least you can look up the citation to that. You have some kind of provenance trail for the work that went into that.
Indeed, when I say that meta-analysis is not really a pun, I know I love that, and that helps me understand it, but having metadata available can enable that distant reading by doing analysis on the metadata that is available for a certain discipline subject or field but also doing meta-analysis, which is the field that I personally really only have a cursory understanding of, although I appreciate that meta-analyses are an excellent field of research. Methodology, I should probably say. I have a librarian background and can gloss over things.
Moving right along. Okay. So, if you would like to take access conditions further, I would encourage you to join the sensitive data community of practice, which is actually convened and looked after by our colleague, Nicola Burton. We’ll put these links in the Slack chat. You can also continue the discussion around access. Maybe you have some examples that you would like, or thorny issues of moderating or mediating access to research data that you would like to discuss with your fellow participants. Also, there’s a link to the resources on the ARDC website. So, we have things for managing sensitive data. There’s a guide for publishing sensitive data and a flow chart for sharing sensitive data.
So, in summary, these are those, the four FAIR Data Principles that we have covered in our accessible module. Ultimately, to wrap all of this up, considering that access metadata, access to metadata, as well as data into the longterm, the idea is that it should be archived longterm and made available in such a way that it can be easily retrieved by humans and machines or be used locally with the help of standard communication protocols. So, now, I guess it’s time for a Q and A. I’ll give you a few moments to ask some questions.
Okay. Okay. So, we do have some questions. A lot of people did answer correctly what API stands for, applications programming interface. Good memory, guys. Okay. Now, and also, some people commiserating with the intrusion by your daughter. For example, one person’s saying that’s their 18-month-old is now all up to speed on FAIR Data Principles. Okay. So, we actually do have a question. Can you think of an example of a closed repository?
Yes. So, I personally can’t necessarily think of any repositories that are fully closed because, well, I’ve not seen them, but there certainly are a number of repositories that have closed data stored, and you can’t get access to that data. I’m pretty sure, although I might have to be corrected here, that the ADA, Australian Data Archive is one of those. It does have open data sets, but it also does have closed data sets that are stored for archiving and can’t necessarily be accessed by anybody else. Okay. A procedural request … Oh, sorry. Liz, did you have more on that?
I was just going to say that another example of closed data perhaps … I guess, do you mean fully closed forever, or do you mean closed until somebody asks for it? I ask this thinking about medical data collections, perhaps thinking about … and government departments. So, the collection of mortality and morbidity data across hospitals, which is collected and curated by data custodians, jurisdictional data custodians usually organized in across the states and territories, right? So, there are processes for applying to access that data. Those are generally governed by advisory boards and other ethics clearance. So, that’s kind of my model for data that is normally closed but can be opened up where it’s appropriate if it falls under a research project.
In fact, there have been a lot of comments on this, popping in while we’ve been speaking. So, for example, somebody has shared that the Australian Geospatial-Intelligence Organization has a fully closed spatial data repository and given geospatial intelligence, I’m not entirely surprised by that. We do have a clarifying note from a representative of the ADA. They would store part of the data as closed where it maintains the complete record of the project, but generally, they would only accept data where at least part is intended for sharing whether that is mediated or open. So, I think that actually brings up an interesting discussion point in and of itself where from a project, some of the data can be made open, some of the data is mediated, but some will always remain closed. I suspect, in fact, for a piece of data that will always remain closed from my personal experience is, say the names of participants in an anonymous research. So, after the data has been anonymized, you still need to keep a list of the participants, but you can’t share that list, but you could share the anonymized data.
Now, another question, nothing to do with accessibility materials, has your beard grown since the last session? It has grown. I haven’t cut my beard for quite some time. Same with my hair. Isolation life, really. Who has time to go to the hairdresser? Other than that, we have no more questions, but certainly more compliments about how we’ve been able to do so well despite the work from home situation. So, I think we should probably leave it there. Liz, did you have anything more to add? In fact, sorry. You have more slides. Let’s keep going through them.
Oh, do I? No. Oh, yes. It’s the feedback slide. Thanks. So, don’t forget to share your feedback from today’s webinar. Often, I don’t think of the question I really want to ask until at least two and a half minutes after the speaker has finished and packed up. I look forward to your discussions on the Slack and chatting away next week, so thanks, everybody.