Online Indexing of Scholarly Publications: Part 2, What Happens When Finding Everything is So Easy?

The transformation in discovery – and its consequences – was the topic of the opening keynote at the September 2015 ALPSP Annual Meeting. Anurag Acharya – co-founder of Google Scholar – spoke and answered questions for an hour.   That’s forever in our sound-bite culture, but the talk was both inspirational — about what we had collectively accomplished — as well as exciting and challenging – about the directions ahead.   Anurag’s talk and the Q&A is online as a Part One covered Anurag’s presentation of what we have accomplished. The present post, Part Two, covers the consequences. Anurag has agreed to address questions about this post that readers put in the comments.

Here is my take on the key topics from Anurag’s keynote.

In Part One, I highlighted the factors that have transformed  scholarly communication over the last 10-15 years:

  • Search is the new browse
  • Full text indexing of current articles plus significant backfiles joined with relevance ranking to change how we looked and what we did.
  • “Articles stand on their own merit”
  • “Bring all researchers to the frontier”
  • “So much more you can actually read”

In the Part Two of this post, I cover Anurag’s view of What Happens When Finding Everything is So Easy?

Each of the above factors may seem to be incremental, but together they deliver so much impact that even a tradition-bound and well-practiced researcher workflow will change. In fact, while publishing behavior – often determined by a senior member of a research group, and by the senior editors of the journals they publish in – is slow to change, the “hunting behavior” of readers can shift more rapidly in response to adjustments made by the younger grad students and postdocs.

What are the effects of the transformation in finding and reading? Here Google Scholar has a lot of evidence – based on search and result behavior – to report from. While the evidence and its interpretation are two different things, the evidence alone of behavior shift is important for us to be aware of.

What do researchers look for? More queries, more words, more concepts, more areas

Scholar records these changes, per user:

  • growth in the number of articles clicked
    • growth is in both abstracts and fulltext clicked
      • but abstracts are growing more
  • growth in diversity of areas clicked on

What’s happening here? An iterative-filtering workflow is now common: search – scan titles and snippet – click on a number of abstracts – click on a few full texts – change query – lather, rinse, repeat.   I think of this as a kind of hunt-then-gather mode: you hunt, you gather up, you move on to another venue, you repeat. I imagine people are determining relevance via the abstract – which loads more quickly and never hits a paywall – then decide whether to store (a PDF) or read.

Scholar has also found that abstracts that have full text links are more likely to be clicked on than those that do not have such links. Perhaps this is because the user is assured that full text is available if it is needed. And/or perhaps because the entries draw your eye:scholar

PDF still wins the popularity contest

 While there may be reasons that HTML full text is more “powerful” – especially for researchers who need access to high resolution figures or supplemental data sets – the PDF still wins the ‘conformance quality’ (1) award: a downloaded PDF ensures you will be able to read the article later. The impermanent nature of access rights – library subscriptions change, off campus access, a reader’s own job changes – leads to a need to store a local, permanent copy. As Anurag said in the Q&A, “Some downloadable form that is permanent will survive.”

The spread of attention

The ease of finding a great variety of items encourages what Anurag called a “spread of attention”. The spreading is on several dimensions: small journals, new journals, non-English journals, old(er) articles, non-articles (preprints, dissertations, working papers, proceedings, technical reports, patents) all get more attention when they are in the same query space with the “formal literature” that is found in the highly-curated databases.

The “article economy” is enabled by many things in our ecosystem, but the scholarly search engine which finds articles – not journals or issues – is key. The early user experience design decision that each full article would have an address and be on one page rather than be atomized across several is another key enabler. The SKU for an article is a URL or a DOI, if you will; an article doesn’t have several DOIs or URLs. (This wasn’t always the case. Some early scholarly-article web sites had each article section on its own page.)

Users want lots of abstracts and only some full text; metrics should recognize this

Literature review is inherently a filtering process, and abstracts are purpose built to be the distillation of an article. Anurag believes that supplying users with full text when they can’t use it (because the part of the workflow a user is in is the filtering part, not the reading part) is not helpful and slows down the filtering. (There are probably exceptions to this, such as detailed methods searches when the information needed for filtering is not in the abstract.) Similarly, metrics that ignore abstracts are missing a lot of the utility a journal provides to its readers.   Anurag encourages COUNTER to add abstracts and PDF downloads as required measurements in addition to fulltext, gold OA and denials currently emphasized.

Abstracts should now be written for a broader, not just a specialized, audience

Research articles are typically written to the authors’ peers: subject-matter experts in their field. And so abstracts are typically written for the same audience. Relevance ranking in a comprehensive search will lead searchers outside the field of experts to these articles by their abstracts, attracting a larger research readership, giving the authors’ a wider impact. But abstracts written for one’s peers often have jargon (like blog posts for publishers…). Abstracts that are accessible to a broader audience, i.e., researchers in related fields, will help.

Anurag noted that Science and Nature have written broad-audience abstracts well for many years. We see journals beginning to attend to this by adding keywords to articles, and by including impact summaries and “take-home messages”.   Readers appreciate these, and they expand the audience while also efficiently helping the broader audience contextualize a paper.

Now on to the Q&A, which was a wide-ranging one.

“What about searching books?”

To paraphrase Anurag again: ‘We need a representation of a monograph that functions like the abstract does for a journal article. This can’t be an introduction or preface, and it can’t be the first few pages –these things aren’t a representation of the whole.’

On the difference between book and journal searching

‘Users expectations are different for these two. When you search books you expect an answer; when you search journal articles, a scholar expects a list of things to read. A book represents late-stage work, not the early-stage work of journal articles.’

You can easily see the difference between these two modalities. Do a search in Google for “San Francisco weather” and the answer pops up: the current temperature and conditions, and the forecast. But for the “weather scholar” there is also a list of sites below the forecast that you can go to if you want to study the topic by reading web pages about San Francisco weather.

“Do you use things previously searched and found for ranking in Scholar?”

Anurag: ‘This isn’t as significant as you think. [Scholarly] queries are long and contain discipline-specific terms” unlike in Google web search.” He adds, “Personalization helps when queries are ambiguous. When queries are detailed and specific, as most Scholar queries are, personalization doesn’t add much.” There were follow up questions on this theme, suggesting disbelief that Scholar doesn’t use frequency, location, etc. as a ranking signal. It doesn’t, Anurag repeated. He encouraged people who can’t believe it to try the experiment of doing the same Scholar queries in different countries, or with different people trying the same query.

“What is the role of the journal in the future?”

To paraphrase Anurag: ‘I have no good answer for that. The journal was a channel of distribution, and this is less important now. It is still important as a channel of recognition. There are three important relevance-ranking signals for a just-published article: Who wrote it? What is it about? Where was it published? The last of these, where it was published, covers many different indications.’

“The goal of Scholar…” 

“…is to make it easier for people solving difficult problems to do more.”

Anurag has agreed to address questions about this post that readers put in the comments.

(1) Conformance quality: Quality of conformance is the ability of a product, service, or process to meet its design specifications. Design specifications are an interpretation of what the customer needs.

Latest news and blog articles