Practical Uses of AI and ML in Scholarly Publishing

Practical Uses of AI and ML in Scholarly Publishing

While technologies like Artificial Intelligence (AI) and Machine Learning (ML) offer unprecedented capabilities in automating editorial workflows and enhancing research integrity, they also introduce a new set of ethical and operational challenges that the industry is grappling with.

A few years ago, Highwire Press organized a workshop at Stanford University to explore the industry’s expectations from AI. The discussions were categorized into four key areas:

  • Editorial Strategy: The potential of AI to refine journal scopes, trace the lineage of academic papers, and spotlight impactful researchers.
  • Outreach and Author Prospecting: Leveraging AI for targeted communication, identifying prospective authors, and gaining nuanced insights into the author and reviewer pools, while also mitigating conflicts of interest.
  • Handling Papers: Utilizing AI to enhance workflow efficiency, screen submissions rigorously, identify suitable reviewers, and evaluate the novelty and impact of manuscripts.
  • Measuring and Monitoring: Employing AI to assess academic impact and market competition, and to monitor editorial policies and preprint behaviors. However, it’s important to note that many of these metrics are lagging indicators, often revealing their true value months after publication.

This blog  aims to dissect the dual nature of AI and ML in the scholarly publishing process and is based on a recent webinar from Highwire’s Best Practices Webinar Series. The webinar features talks from industry experts Joris van Rossum, Product Director at STM Solutions, Ian Mulvany, CTO, British Medical Journal, and Jack Nicholson, Co-founder and CEO at Scite.

The Promise and Perils of AI in Scholarly Publishing: Insights from Joris van Rossum of STM

AI’s rise has had a positive impact on the internal workflows “in terms of quality assurance reproducibility checks, creating taxonomies, content enhancements and language Improvement of manuscripts.”

But as Joris notes, the scholarly ecosystem has transformed from a small, intimate enterprise into a large, impersonal, and virtual environment. This shift has led to challenges, including the emergence of paper mills and fraudulent activities, as the anonymity of the virtual environment makes it easier for bad actors to exploit the system.

The Rise of Paper Mills

Last year, STM, in collaboration with COPE, published a report on paper mills, which are services that offer fake and manipulated manuscripts for a fee. These paper mills have led to numerous retractions and pose one of the main challenges for the academic community. Joris emphasized that the anonymity and virtual nature of the current research environment have made it fertile ground for such fraudulent activities.

Joris highlights the “arms race” we find ourselves in—while publishers use AI to screen manuscripts, bad actors employ it to create increasingly sophisticated fake articles. The skepticism surrounding the effectiveness of tools that claim to identify AI-generated papers only adds to the complexity.

A Big Challenge: How to establish real science in a virtual environment?

AI is not new to publishers; it has been used for quality control, peer review, and internal workflows. However, the advent of generative AI like GPT-3 has intensified the challenges. On one hand, AI can be used to identify fake science through plagiarism tools and quality checks. On the other hand, it can also be used to create fake articles, data, and images, making it a double-edged sword.

Ethical and Trustworthy AI

Two years ago, STM produced a report outlining best practice principles for ethical and trustworthy AI. These principles focus on transparency, accountability, quality, integrity, data protection, privacy, security, and fairness. Joris stressed the need for publishers to be transparent about the use of AI, especially when it serves as a decision-making tool.

Integrity Hub Initiative: A Collaborative Response

To combat these challenges, the Integrity Hub Initiative was launched. Its mission is to equip the scholarly community with the data, intelligence, and technology needed to protect research integrity. The initiative focuses on sharing knowledge, creating policies, and building infrastructure that screens incoming manuscripts for various issues, including those generated by paper mills.

The Open Access Conundrum

AI’s Limitations Behind Paywalls

The reality is that a significant portion of scholarly articles is locked behind paywalls. These barriers limit the scope of AI’s capabilities in fact-checking and providing reliable information. For instance, AI tools designed to analyze the validity of scientific claims can only be as good as the data they can access. When crucial data is locked away, the AI’s output is not just incomplete but potentially misleading.

OpenAI’s Proprietary Nature: A Concern

While we’re on the subject of limitations, it’s crucial to discuss OpenAI’s proprietary nature. OpenAI, one of the leading organizations in AI research, has been criticized for its lack of transparency in its algorithms. This proprietary nature raises ethical questions, especially when these algorithms are applied to scholarly content. How can we trust the validity of AI outputs when we don’t fully understand the algorithms behind them?

The Need for Transparency

The scholarly community is built on the pillars of transparency, peer-review, and open discourse. When AI tools, which lack algorithmic transparency, are introduced into this ecosystem, it creates a dissonance. The proprietary nature of these tools is at odds with the scholarly community’s values, raising concerns about the future integration of AI into academic research.

Rethinking the Value of Scholarly Content

Navigating the Intersection of Scholarly Citations and AI: Insights from Josh’s Talk

Josh emphasized, “Citations are no longer just a numerical game; they are a qualitative measure that can be dissected through machine learning algorithms to reveal patterns and biases in scholarly work.” He presented data to show how AI can identify citation cartels and even predict future high-impact papers, thus revolutionizing the way we perceive academic validation.

The Scholarly Ecosystem: A Complex Adaptive System

Josh described the scholarly ecosystem as a “complex adaptive system,” where the actions of one stakeholder have ripple effects across the board.

The Dynamic Feedback Loop

He cited examples of how AI-driven peer reviews are forcing traditional publishers to adapt their models. This adaptation, in turn, creates a dynamic feedback loop that benefits researchers, publishers, and the scholarly community at large.

Contextual Citation Metrics: A Multi-Dimensional Approach

“Traditional citation metrics are like judging a book by its cover,” Josh noted. He introduced a machine learning model that goes beyond mere citation counts. This model evaluates the sentiment and context behind each citation, offering a multi-dimensional view of a paper’s impact.

The ‘Why’ Behind Citations

By analyzing the ‘why’ behind each citation, this model provides a nuanced understanding of a paper’s influence, revealing whether it is cited as a positive example, a point of critique, or merely for background information.

Scite’s Assistant: Setting a New Standard

Josh unveiled Scite’s Assistant, describing it as “not just a tool but a standard against which all AI-generated scholarly content should be measured.” He demonstrated how Scite’s Assistant cross-verifies AI outputs against a database of peer-reviewed articles, setting a new benchmark for scholarly validation.

The Future of Software and AI’s Broad Impact

One of the most striking points made during the webinar was the democratization of software development, enabled by AI. Advanced machine learning algorithms are making it easier for people with limited coding experience to create functional and efficient software. This is not only lowering the barriers to entry but also significantly reducing the cost of software creation.

BMJ’s Approach to AI and Product Innovation

Product Life Cycle: AI’s Multifaceted Impact

Discussing the life cycle of a product—early adoption, growth, maturity, and decline—Ian emphasized the role of AI at different stages. According to Ian, AI can help in two significant ways:

  1. By enabling “small bets” in the early stages to test new ideas quickly and cost-effectively.
  2. By extending the life cycle of mature products that are already generating most of your revenue.

Small Bets: The BMJ Way

BMJ is not shying away from taking risks. Ian shared that they are making “small bets” on new ideas, some of which fail, while others succeed. For instance, they experimented with custom dashboards for disease tracking but found no takers. “It was a good example of an idea that we were able to bring to the Market quickly but it didn’t work because it wasn’t creating any value,” Ian explained. The agility to pivot is one of the advantages AI offers.

Legacy Systems: The Double-Edged Sword

One of the significant challenges BMJ faces is the legacy systems that host their journal portfolio. “Your product portfolio probably has good revenue, but you’ve scaled your systems already so making radical changes in that area in changing markets or new environments is costly,” Ian said. This is a crucial consideration for scholarly publishers who are often entangled in outdated systems but are looking to innovate.

Partnerships: The Collaborative Edge

Ian highlighted the importance of partnerships in BMJ’s AI journey. “Partnership will become a key theme of what I’m going to be talking about today,” he stated. Collaborations with other organizations and tech companies have enabled BMJ to leverage AI capabilities without bearing the brunt of risks and costs alone.

ChatGPT and Co-Pilot: The New Workforce

BMJ is already harnessing the power of large language models like ChatGPT and Co-Pilot for tasks like code generation and documentation. “One of the software engineers in one of my teams, Daniel, is now using one of these tools and Co-Pilot…he feels it speeds up his productivity by about a factor of two,” Ian shared. These tools are not just theoretical concepts but practical solutions that are enhancing productivity.

Closing Thoughts

The scholarly publishing industry is caught in a paradoxical “arms race,” where AI and ML technologies serve as both enablers and disruptors.

It’s clear that there’s an urgent need for a universally accepted set of ethical guidelines for AI in scholarly publishing, going beyond just transparency to include fairness, data protection, and accountability. AI literacy remains a potent solution, ensuring that both publishers and researchers understand the capabilities and limitations of these technologies.

Further, the concept of making “small bets” on AI-driven initiatives could be a game-changer, allowing publishers to rapidly test and adapt to market needs without significant financial risk. However, it’s essential to note here that the challenges are too complex for any single entity to solve. Partnerships, both within the industry and with tech companies, can accelerate the responsible and effective use of AI in the scholarly publishing process.

To sum it up, the scholarly ecosystem is indeed a “complex adaptive system,” and the integration of AI into it needs to be both strategic and ethical to ensure the integrity and advancement of academic research.

Latest news and blog articles