At NISO Plus 2026, one message came through clearly: AI is no longer something happening to scholarly publishing. It is happening inside it.
The pre-conference workshop, “Keeping Robots in Line,” alongside sessions on metadata infrastructure and sustainable AI development, collectively reframed the conversation. The question is no longer whether AI will use scholarly content. It already does. The real question is whether publishers and infrastructure providers will design the rails on which AI runs, or whether AI will define those rails for them.
Across the sessions, three intertwined themes emerged: infrastructure, interoperability, and sustainability.
AI Doesn’t Read Articles — It Consumes Structure
One of the most important technical reframings at the conference was deceptively simple: large language models do not “read” articles. They tokenize, chunk, vectorize, and retrieve fragments of content.
In that world, a PDF is not a unit of knowledge. It is a legacy container optimized for human reading. Machines don’t care about layout or narrative flow. They need claims, figures, methods, and context, bound together in structured, machine-readable ways.
If content is poorly structured, meaning collapses. Retrieval becomes probabilistic. Attribution becomes blurry. Hallucinations become more likely. As speakers emphasized, “garbage in, garbage out” is not just a slogan, it is a structural risk.
The implication is profound: publishers are no longer just distributing documents. They are stewards of semantic architecture. Intelligent chunking, semantic enrichment, persistent identifiers, and Retrieval-Augmented Generation (RAG) frameworks are not technical luxuries; they are trust infrastructure.
Metadata Is the Hidden Spine of the Ecosystem
If AI is forcing us to rethink content structure, it is also exposing a longstanding weakness: metadata fragmentation.
The session “The Metadata Imperative” made clear that metadata most often breaks at system handoffs, when content moves from submission to production to hosting to indexing. Format mismatches, inconsistent standards, and incomplete records create cascading friction.
In a human-mediated world, that friction was annoying but survivable. In an AI-mediated world, it is destabilizing.
AI systems depend on high-quality, structured metadata: persistent identifiers for authors (ORCID), institutions (ROR), funders, licenses, and version markers. Without it, attribution fails. Without it, confidence classification is impossible. Without it, research integrity tools operate blind.
The session underscored an uncomfortable truth: “complete” metadata is not the same as “accurate” metadata. Publishers must shift from layout-first workflows to metadata-first, XML-driven processes. Downstream enrichment through Crossref and PID normalization is improving the situation, but enrichment that happens only inside proprietary silos does not solve ecosystem-wide fragmentation.
If AI consumes structured knowledge, metadata becomes the connective tissue that makes that knowledge interpretable and trustworthy.
Rights Exist, But Machines Can’t See Them
Another recurring theme was that publishers do not lack rights frameworks. They lack machine-readable rights signaling.
Under the EU’s DSM Directive, rightsholders can reserve content for text and data mining through mechanisms like Article 4. But traditional defenses, robots.txt, IP filtering, terms and conditions, were built for human users and polite crawlers. They break under AI’s scale and autonomy.
The challenge is not inventing new rights. It is making existing rights visible to machines in interoperable ways.
Machine-readable licensing signals, standardized policy indicators, and content-level identifiers that “travel” beyond a URL are emerging as critical components of this new layer. Rights metadata must move from legal boilerplate to operational infrastructure.
If AI operates at machine speed, rights recognition must operate at machine speed.
The Economic Illusion of Training
The session on Sustainable Approaches to AI Development challenged another widespread assumption: that AI training will become a durable revenue stream for publishers.
The current AI paradigm, built largely on indiscriminate web scraping, propagates misinformation, weakens attribution, and creates intellectual property ambiguity. Training on curated scholarly content offers a path toward more reliable systems. But training is episodic. It is front-loaded. It is unlikely to sustain a broad recurring licensing market.
Inference, not training, is where economic activity concentrates.
Every AI query that retrieves and synthesizes authoritative research is a discrete economic event. That is where pay-per-use models, bring-your-own-license (BYOL) integrations, and license-on-demand frameworks enter the picture. These emerging models tie compensation to usage rather than to bulk corpus delivery.
Crucially, sustainable inference depends on measurable attribution. Without usage tracking, reporting standards, and transparent contribution weighting, no market can stabilize.
This is why discussions at NISO Plus about COUNTER-like reporting for AI usage were not peripheral. They are foundational.
Responsible AI Requires More Than Access
From the research perspective, responsible AI for science must be transparent, traceable, verifiable, and well-calibrated.
Scientists need traceability that connects claims to evidence. They need coherence and logical consistency. They need verifiability and benchmarking. They need confidence classification that goes beyond a binary “peer-reviewed vs. not” distinction.
This demands provenance embedded in the content itself: version of record markers, retraction signals, licensing data, authorship identifiers, and links to underlying datasets.
Trust is shifting from journal containers to structured context.
In an AI-mediated environment, the journal name may disappear from view. What persists are identifiers, metadata, and machine-readable trust signals.
From “Scrape First” to “Request, Verify, Deliver”
Perhaps the clearest operational takeaway from the conference was architectural: publishers must build “machine doors,” not just reinforce human ones.
That means:
- Separating human and machine access pathways.
- Implementing entitlement-aware APIs.
- Treating AI agents as licensed participants, not anonymous crawlers.
- Logging and auditing AI activity.
- Designing modular knowledge objects rather than static documents.
Emerging standards like Model Context Protocol (MCP) point toward a brokered future where AI systems request access through controlled service endpoints, entitlements are verified before retrieval, and interactions are auditable.
The sustainable model is not “scrape first, negotiate later.” It is “request, verify, deliver.”
The Inflection Point
NISO Plus 2026 did not present a finalized blueprint. It surfaced an inflection point.
Infrastructure, metadata, rights signaling, business models, and research integrity can no longer be treated as separate conversations. AI collapses them into one.
If attribution fails, trust erodes.
If entitlement fails, economics destabilize.
If metadata fails, integrity tools and AI systems alike misfire.
The robots are already in the ecosystem. The question is not whether they belong there. It is whether we design systems that make them governed, attributable, interoperable participants in scholarly communication.
Handled thoughtfully, AI can amplify research, improve discovery, and strengthen trust.
Handled passively, it will quietly reshape the system without guardrails.
NISO Plus made one thing clear: the future of scholarly publishing will not be decided by whether AI is allowed in.
It will be decided by how well we build the infrastructure around it.