Fifty shades of grey literature: preprint and the tempo of research
Lately I’ve been having a lot of conversations with clients who are working to address the needs of their new customers in the dawning era of public-access publishing. A reflection arises from these discussions that might best be summarised in musical terms, which is that you can be pitch-perfect to the needs of researchers, but unless you also understand the tempo of intellectual discovery in the digital age – and how that has changed – you’ll find it hard to orchestrate relationships with them effectively.
One area where this seems especially to be the case, in my view, is the one we currently call preprint.
Fifty shades of grey literature
Scholarly publication can be a slow process. In many subjects, more than a year can elapse between the submission of the latest brilliant discovery to it being published. But virtually any type of researcher will want news of the latest developments in their field much sooner than that. So the practice has been for researchers to circulate pre-prints informally between themselves.
In pre-digital days these were literally author’s manuscripts; copied and passed around from hand to hand. Correspondence between scientists served a similar purpose, as did conference proceedings and other types of ‘grey literature’.
Digital has given researchers an extended set of possibilities in preprint. We’ve seen the enormous popularity of repositories such as arXiv in the sciences and RePEc for economists, each of which serves as a mechanism for ‘rapid and broad’ communication of new research results in the fields they cover.
While these tend to be organised around disciplines, there are also institutional repositories maintained by many universities and institutions, and additionally ‘ahead-of-print’ materials that publishers are starting to carry online that have been accepted for publication (so meeting an initial quality bar) but which are still going through the review process.
Meanwhile there has been an explosion in social media, including researcher blogs, which provides the opportunity for the dissemination of more (and often greyer) preprint materials.
Issues and drivers in preprint
These new channels, especially the discipline-specific repositories such as arXiv, are clearly filling a need that has been largely unmet within the publishing industry.
Researchers are living in an environment of ‘publish or perish’; funders want to see more evidence (or some evidence, at any rate) of impact in the wider world from the work they fund, and the pace of change is hardly slowing, in an increasingly globalised world, with the better established world powers facing challenges from the emerging new economies of China, India, etc.
There has been a huge increase in the amount of opportunities to air preprint research, and digital has also driven ever-wider dissemination of this material, exposing it to anyone who is interested via Google, as well as more specialised discovery engines. But the change has not just been quantitative.
Another result has been greater diversity in pre-print; a much wider range of quality – from material that is probably not going to get published anywhere (and is about as grey as it can get) to high-end ‘premium-brand’ research that is so close to the final peer-reviewed, published version as to be almost indistinguishable from it (prompting some to question where the value-add really is) – all available for free, 24/7, and ranged along a spectrum in terms of how authoritative it might be, hitting all points in between.
The indiscriminate nature of web searching has caused many to worry about the potentially chaotic nature of this picture, and the possible damage that might be caused to the process of scholarly enquiry when the ‘noise’ slider is turned up so high relative to the trustworthy, valuable ‘signal’.
Change the word, change the thought
So here’s the nub of the problem. The term ‘preprint’ doesn’t really describe what is going on here anymore. We might need a better term that will allow us a more encompassing notion. I would suggest we start talking about ‘pre-publication’ instead (perhaps ‘pre-pub’ for short?).
Before the internet came along, the idea of publishing something was extremely close to, if not exactly coterminous with, the act of printing it. The right to publish was inextricably bound up with the right to print. Obviously, this is not the case with digital and the read/write web. Now everyone’s a publisher.
The things that confer authority in a piece of content, and gives it the ‘imprimatur’ of the ‘version of record’, are no longer bound up with an act of dissemination. The official publication of an article online by the publisher is now just one act of dissemination among many, not all of them carried out by the publisher.
A further thing that has happened with the move from print to digital is that individual articles have achieved greater mobility and visibility outside of the journal ‘wrapper’. In the pre-Internet era they were bound within the physical pages of a journal, and their presence there gave an immediately apprehensible and unarguable message about status and validity. Now, wandering around the pages of the web like so many lost socks, it’s not so easy to make judgements about identity, validity and context for pre-pub articles.
This is a general problem with the web. How many times a day do we look at our screens and wonder what exactly is the provenance and reliability of the words we are reading? I would argue, however, that this it is a problem we have to nail for scholarly publishing, where higher standards of reliability are required than in other types of information consumption.
Is peer review too opaque?
I’m not wishing here to cast any aspersions at all on the value of peer review in the scholarly publishing process. In fact, if anything, the peer-reviewed status of the ‘final’ article is of more importance, if anything, when it comes to digital – because there is no printing ‘moment’ to signal a stage-gate has been passed. But for the very same reason, peer review would benefit from being more transparent.
Web searchers need to know the status of a paper they stumble upon mid-trawl, or are referred to by another paper – has this been published and peer reviewed; has it been accepted for publication but not peer reviewed yet; has it met any sort of initial quality bar at all – or is it just the ravings of a mendacious internet troll? To what degree and extent has it been peer-reviewed? How rigorous has been the process applied?
The need becomes even starker when papers are machine readable, and served up by semantic engines. Are the machines given any indication of the status – leading to questions such as: do we have a metadata standard for peer review?
We are seeing an increasing level of specificity and transparency with licensing online – e.g. CC BY, CC BY-ND, CC BY-SA, etc. It would be really helpful if scholarly papers could carry a similarly specific and transparent indication of where they are in the pre-pub approval process, and also what peer review actually entails.
In fact something like this is essential if we are to meet the needs of researchers online in keeping up with the tempo of scholarly communication.
CrossRef to the rescue (again)
With this in view, it is interesting to look at what CrossRef has been doing with CrossMark, and in particular with the ‘publication record’ information under the ‘Record’ tab (an optional inclusion for publishers). This can tell you exactly where a document is in the publication process and whether it has been peer-reviewed or not.
I think there is a particularly strong argument for embedding this (or something very much like it) within digital publication workflows. It enables the researcher to establish really quickly the exact status of a paper she is looking at, and allows the publisher not only to be seen to meet the needs of researchers, but also to emphasise the value they are adding through the publication process.
One problem it does not address, however, is giving greater transparency to what peer review actually entails. Here our industry has the opportunity to define a few standard categories in a similar way to CC licenses, perhaps. This would help publishers to differentiate around the different practices in their peer review processes, and thus bolster their individual journal brands.
Opportunities – and the one big danger
It should be clear from the above that there is a strong relationship between being able to identify the authoritativeness of a given scholarly paper, and the speed at which researchers are able to work, allowing intellectual discovery to progress. It also ought to be clear that this is something publishers should be very interested in.
I can see many opportunities for publishers to add value in pre-pub literature – removing friction and adding services around the manuscript submission process – but also one big danger.
It is essentially the same danger I’ve identified with other aspects of digital publication – notably publication of research data – that the community meets it own needs through good-enough free or ‘open’ solutions, and publishers get left out of the equation.
The mere existence of repositories like arXiv leads some to wonder whether publishers are necessary at all. I’m not going to rehearse the slightly hoary arguments around that other than to say that it ought to be radiantly obvious that they are.
But in order for that obviousness to radiate in the right kind of way, publishers need to show a real commitment to digital by trail-blazing in the solution of problems such as the ones I’ve raised above relating to – for want of a better term – pre-pub.