While enthusiasm for AI in peer review is growing, implementation varies widely, especially for smaller publishers with limited resources. In this session, Lucia Steele of AboutScience shared the results of a bold experiment: what happens when you feed real manuscripts into generative AI tools and compare their reviews to human ones? Her presentation cuts through the hype and reveals where we stand—and where we might go next.
Testing AI in the Trenches: Lessons from a Small Publisher
With over 30 years in STM publishing, Lucia Steele manages the peer review process at AboutScience, a small open access publisher in Milan focused on clinical medicine and health sciences. When asked to participate in this webinar, she decided not just to theorize, but to test. Could freely available generative AI tools meaningfully support peer review?
She began with a daunting data point: an estimated 60 million reviewer invitations would have been needed to support peer review for the 2.9 million articles listed in Web of Science in 2024. Reviewers are fatigued, editors are overburdened, and the need for scalable support is undeniable.
The Experiment: Humans vs AI
Lucia and her team selected two published manuscripts that had undergone full peer review and received multiple reviewer reports. Then they turned to freely available tools to see how they would perform:
- ChatGPT Peer Reviewer (custom GPT)
- Peer Review Collaborator GPT
- AutoExpert Academic Improve
- Microsoft Copilot (MS 365)
The editors fed these tools the manuscript and asked them to generate a peer review report. The AI-generated reports were then evaluated by human editors against the original peer reviews using a structured scorecard.
The Results: Glimmers and Gimmicks
What did the editors find?
- ChatGPT stood out: Editors said it “seemed to have understood what the paper was about,” offering more focused feedback than the other tools.
- Others felt generic: Several reports offered vague, boilerplate commentary, comments that could apply to any article in any field.
- None were decision-ready: Editors universally agreed that AI-generated reviews required interpretation and supplementation by a human expert.
- Structure ≠ Substance: Even when a report was well-formatted, it often lacked the insight or specificity necessary to support editorial decisions.
Still, there were encouraging signs. The editors imagined a future where AI-generated preliminary reports could serve as scaffolding, summarizing key issues, checking formatting or methodology, and providing initial observations to guide a human reviewer’s focus.
Dream Tools vs. Real Limits
Lucia categorized AI use cases into two buckets: already implemented and still aspirational.
- In Practice: Similarity checks, language quality assessments, formatting verification, reviewer suggestions, and scope validation are already integrated into many workflows.
- Still a Dream: Reliable AI-generated peer review comments. While LLMs can mimic reviewer tone, they struggle with scientific nuance and often hallucinate critiques or fail to engage with core research content.
Lucia articulated a powerful vision: a future in which AI offloads the most tedious and mechanical parts of the review process, formatting, stats checks, reference validation, so reviewers can focus solely on scientific merit. But we’re not there yet.
Policy Tensions: Ethics, Trust, and Transparency
Lucia then turned to the ethical thicket surrounding AI use in peer review. Drawing on current guidance from COPE, STM, EASE, WAME, ICMJE, and OASPA, she summarized key principles:
- No AI-only decisions: Final judgment must remain with human editors.
- AI ≠ Reviewer: Generative AI should not produce or replace peer reviews.
- Transparency: Reviewers and editors must disclose AI use.
- Confidentiality: Manuscripts under review should never be uploaded to public GenAI tools.
Even the idea of letting reviewers use AI to structure or rephrase their reports is fraught, especially since reviewers rarely know whether a journal permits it. Lucia strongly advocated for better reviewer guidance, clearer journal policies, and consistent disclosures.
Publisher To-Do List: Before the AI Revolution
Before publishers can confidently embrace AI in peer review, Lucia warned, they must build a solid policy foundation. Her “To-Do List” included:
- Revise reviewer, editor, and author guidelines to address AI use explicitly.
- Ensure tools support confidentiality and can be trusted across disciplines.
- Standardize processes to help reviewers engage more efficiently.
- Prepare for audits by indexing services and watchdogs that may soon require journals to disclose AI involvement.
Open Questions, Honest Answers
Lucia ended her presentation with a slide full of thorny but necessary questions:
- Can we trust AI to generate credible review content?
- Should authors be told their paper was evaluated (even in part) by AI?
- Is it acceptable for a reviewer to use AI? Or for AI to be the reviewer?
- What happens if an author challenges an AI-assisted rejection?
Lucia doesn’t pretend to have all the answers, but insists that by asking the right questions now, publishers can prepare themselves for a responsible future.
Between Curiosity and Caution
Lucia’s presentation stands out for its humility. Rather than claim certainty, she explored AI’s peer review potential through direct experience, and invited others to do the same. Her message was clear: there’s value in experimentation, but only when paired with strong editorial judgment, community input, and policy clarity.
Next up, we zoom out even further. What does AI mean for reviewer behavior, reviewer recognition, and the sustainability of peer review as a whole? Sven Fund of Reviewer Credits brings a systemic, and reviewer-first, perspective in the final installment.
AI Meets Peer Review – Insights from the Frontlines
The scholarly publishing landscape is being reshaped by artificial intelligence, and peer review is squarely in its path. At HighWire’s recent Best Practice webinar, “Understanding How AI Tools Are Used in Peer Review: Practical Insights for Editors and Publishers,” an international panel explored the current and future role of AI in the peer review process. This three-part blog series distills the insights of our expert speakers:
- Fabio Di Bello (University of Chieti) offered a structured deep dive into how AI is enhancing core editorial workflows.
- Lucia Steele (AboutScience) shared findings from a practical experiment comparing AI-generated peer review reports to human ones.
- Sven Fund (Reviewer Credits) zoomed out to analyze how AI fits into the broader system, from reviewer behavior to sustainability.
Each post aims to capture not only what’s happening now, but what may lie ahead as we grapple with the ethical, operational, and scientific implications of AI in peer review.
– By Tony Alves
Read the next part