Trust in science was an issue long before the COVID crisis politicized science and made scientists and scholars targets of suspicion and doubt. Research integrity is a hot topic, and there is widespread recognition in scholarly publishing that we need to flush out fake science and prevent bad actors from disseminating false research. The drivers that compel researchers to take short cuts or falsify findings are most often financially motivated or linked to career advancement. There are multiple threats to the integrity of the scientific record, such as paper mills that sell authorship on fake research papers, manipulation of images to either enhance or alter actual findings, citation rings where authors cite each other’s papers even when they are not relevant, and fraudulent peer reviewers who provide positive reviews in exchange for cash or other favors. As a product management person with decades of experience, I have helped integrate processes and tools resulting from individual efforts into editorial systems. The tools that perform tasks like plagiarism checking, statistical analysis, and reference checking are all useful, but they tend to be focused on a single problem or they lack industry-wide commitment. That’s why I’m really excited about the efforts of the STM association to increase research integrity and protect the scholarly record.
For the past two years I have been involved in an effort to address these threats to scientific integrity, chairing an STM working group that is looking into the problem of simultaneous and duplicate submissions. The project began in December of 2020, with a directive to explore how we might determine, across the scholarly publishing industry, the percentage of articles that are being simultaneously submitted to, and thus under review at, multiple journals. First working with IJsbrand Jan Aalbersberg, of Elsevier, and later with Joris van Rossem of the newly formed STM Solutions, we put together a working group that included representatives from publishing houses and from major workflow systems. The working group was formed to help investigate this issue and define a testing protocol that could be used to evaluate the size of the problem. This was primarily intended to be an analysis and assessment activity that would result in a recommendation for a pilot project that would enable a journal to easily determine if a research article has been simultaneously submitted to another journal.
In the initial discussion it was clear that sizing the problem might require the pooling of manuscript data and several concerns were raised, such as GDPR compliance, supplying proprietary and strategically sensitive information to competitors, anti-trust laws, and the need to request permission from thousands of organizations that actually own the data processed by publishers. In the face of these challenges, we settled on an approach that each publisher could accomplish on their own, without pooling data. Each publisher would look at areas where they had journals with wide discipline coverage; an algorithm spearheaded by representative from Elsevier and Sage was developed and tested; and an assumption was made that by combining individual publisher results, we could reach a valid estimate. The algorithm was experimented with, and eventually several of the publishers were able to come up with an estimate that around 2.5% to 4.25% of submissions can be considered “near duplicates”. This number grew over time, which showed that the problem was getting worse over a 29-month period of January 2018 to May 2020.
Having established that simultaneous submissions are indeed a problem that needed to be addressed, the working group turned to strategizing solutions. It was recognized right from the start that the only way to detect if a paper has been submitted to more than one journal at a time was for publishers to pool their submission data, and perhaps even the full text of all submissions. I’ve already discussed the limitations, and those limitations are not technical they are business and policy limits. It was decided that the technical solution, to pool newly submitted articles from across publishers, would be built and managed by STM, a trusted third party. This effort was originally called the “collaboration hub” – a data lake to which everyone would contribute. STM’s newly established STM Solutions took on this effort, renaming the solution the “Integrity Hub”, and as it currently stands, STM has built a prototype, they have run a successful pilot, and they are close to signing legal agreements with member organizations. In addition, STM Solutions has begun to explore other uses for the data lake, such as paper mill detection and image duplication and manipulation. In their words, the Integrity Hub provides, “a cloud-based environment for publishers to check submitted articles for research integrity issues, consistent with applicable laws and industry best practice and fully respecting the laws and ethics of data privacy and competition/anti-trust laws. In this environment, publishers may collaborate with other parties of their choosing to develop and operate screening tools for the benefit of the entire scholarly ecosystem.”
The simultaneous submission working group is now focused on developing policies around the use of the hub, such as how are publishers notified of integrity breaches, how long is data retained, and what are researchers told about the process. The working group is also giving feedback on the pilot solution, contributing content for testing, helping to solve workflow issues as they come up, and providing data sets to help train the Integrity Hub’s artificial intelligence engine. I am looking forward to the opportunity to officially hook HighWire’s own submission system, BenchPress, to the Integrity Hub and provide this service to clients who want to utilize this innovative tool designed to increase trust in science.
On December 7th I will be participating in STM’s “Research Integrity Master Class” with STM staff and with members of the various working groups. The master class will include sessions and workshops on retractions, integrity screening tools, and paper mills, as well as an update on guidelines for dealing with integrity issues from the Committee on Publication Ethics (COPE). This event is open to anyone who wants to join, and registration can be found here: https://www.stm-assoc.org/events/stm-research-integrity-master-class/.
By Tony Alves
Latest news and blog articles
Life Science Alliance Renews Partnership with HighWire