Journals and Public Access to Research Data

Journals and Public Access to Research Data

This spring HighWire’s Senior Vice President and head of Product Management, Tony Alves, attended several industry events, including the National Academy of Sciences’ Journal Summit, STM’s Research Integrity Masterclass, and the annual meetings of both the Council of Science Editors and the Society for Scholarly Publishers. For the next several weeks Tony will provide useful summaries of some of the most important sessions, highlighting insights and hot-takes from those meetings.

In March I attended the National Academy of Sciences’ Journal Summit. The theme was “Change in Context: Identifying the best routes to an open science world.” There were over one hundred and thirty attendees representing publishers, libraries, institutions, funders, government agencies, suppliers and non-profits. The attendee list was a real who’s who of scholarly communications. As we were under Chatham House Rule, I will not identify individuals as I share some of what I learned that day.

In my previous post I covered the panel discussion on the impact of new public access requirements for US federally funded research. The other panel discussion I want to cover focuses on the public access to data requirements for US federally funded research, recently mandated in the Nelson Memo, which was put out by the White House Office of Science and Technology Policy (OSTP) in August of 2022. The requirement states “Scientific data underlying peer-reviewed scholarly publications resulting from federally funded research should be made freely available and publicly accessible by default at the time of publication” with certain restrictions. A panel representing different constituents in the scholarly publishing ecosystem weighed in on the implications of this mandate for publishers, researchers, and the research infrastructure.

One of the panelists started by advocating for the use of persistent identifiers (PIDs) throughout all facets of research and publishing. PIDs are the connective tissue between articles and data, and they allow for quick and easy access to an article’s underlying data, wherever it exists on the Internet. They did point out that there are cases when data should not be openly available, but safeguards can be built into the system to protect sensitive data.  They also pointed out that data management plans (DMPs) need to be thought out and in place before the publishing process begins. DMPs should be part of the research project, not part of the publishing process.

One of the panelist, a longtime open data advocate, pointed out that high quality, interoperable data has tremendous societal benefits. Publishers need to follow through on the Force 11 FAIR data commitments that they have made. Researchers will always take the easiest path when complying with data mandates, so there needs to be clear guidelines, and the process of uploading data needs to be easy for the researcher. Data repositories are important players in the open data, this includes commercial and institutional repositories. Data repositories should not be the place where data goes to sleep, but instead repositories should bring data to life. Some of the challenges facing data repositories include easy integration with journal workflows and consistent, reliable curation of metadata. A future opportunity exists with preprints; as researchers rally around preprints, how can data repositories support them.

Other panelists, approaching the topic from the research institution perspective, felt that researchers need to overcome the tendency to not share, especially researchers in life science who fear losing the publishing edge if their data is available to others in their field. However, it has been shown that researchers are interested in sharing data when they learn that data sharing will increase citations. In the end, data sharing is seen as an administrative burden, these burdens are always growing, and researchers just want to do their research. It is important to remember that not all data is reusable or worth saving. Some data is only useful for the single piece of research, while some data, such as gene sequencing, geologic measurements, etc. is reusable.

Citing recent studies, one of the panelists summarized the discussion by noting that, overall, researchers are in favor of data sharing, but in practice, they are less likely to do it. There can be resentment of data sharing mandates, especially if a researcher doesn’t see reuse value, or if they know that there is very limited interest in their data (the researchers would rather just send the data along to those few who are interested). Data deposit costs are growing, which means that people will cut corners and the data will be less useful. In the end, the community needs to ensure that data deposit should not be about checking a box – it should be about ensuring FAIR data.

By Tony Alves

Latest news and blog articles