For publishers who distribute content in both print and digital – which means almost everybody – it must seem that digital is the tail that wags the dog. In a well established publishing business, digital revenues can often be less than those from print, yet experts have for years been recommending they shake up their processes and workflows because of the coming changes digital will require. Many have not seen the need to jump on the bandwagon and have instead relied on one-time conversion, or typesetters who provide XML.
Having spent a number of years working with publishers in many aspects of content workflows I’ve seen first-hand the pain associated with transitioning to a new digital workflow. But I’ve also had some great results come out of this work. Here are three smart strategies I’ve seen deliver value time and again:
- Content modeling
- Ontological enhancement (or semantic modeling)
- SEO and in-context search
This is not an unordered list: it’s a progression, starting from content creation, which is where the content model exists, through to enriching your content to be more findable and discoverable by customers.
I hope to cover all three of these bases on this blog, but for this post I’m going to focus on content modeling, the first and in some ways the toughest of these. I’m going to talk about some of the reasons why it can seem so hard to get started, what you actually get out of it, and why, in the final analysis, it’s something you can’t afford to ignore.
Let’s face it, stopping to think about the structure and rules of your published works (present and potential future) is like trying to run a marathon while tying your sneakers at the same time. It’s not impossible, but it can be distracting and sometimes dangerous, especially if there is a bus nearby.
Yet, for those who take the time to create a good content model there are compelling business reasons to enter into the exercise.
There is an explosion of information happening, yet people demand quick access to relevant content that cuts through the clutter.
(Anne M. Mulcahy – former chairperson and CEO of Xerox Corporation)
Content modelling is simply about structuring content. For editors and authors this is about the structural elements that are part of the body of content; identifying them for what they are and what purpose they serve. In many ways it is the foundation upon which much logic can be applied. A content model will inform good composition, and how to contextually flow things in an ebook or mobile presentation.
People use the term differently in different business contexts, but for our purposes, we can say that the ‘Content Model’ refers to the manifestation of a set of rules (whether through schema, DTD, or other rules/logic) that can be used to verify and enforce the desired or expected structure of the content.
A key decision in creating your content model is how you are going to structure your XML, which in practical terms involves choosing a DTD. There is more choice in this field now than in the early days when the NLM DTD gained a lot of traction in the scholarly publishing market. This was found by some publishers to have more in it than was actually needed and some considered it cumbersome for all but some very specific uses, with the result that simpler DTD’s such as JATS (Journal Article Tag Suite) were created. Formats such as DITA, which have toolsets around them have also grown in popularity. Many publishers have just chosen to start with a standard like NLM, and customize it for their own uses. It may be easier to identify the elements you need, versus trying to weed through the things you don’t – but to each their own approach.
Perhaps the ideal scenario, however, is to create a content model that is a custom fit for your particular content and your particular content needs, allowing you to streamline that content and enhance it strictly for the uses you wish to make of it. This can pay off if you ever wish to automate some of the digital process; fewer elements = less time to market.
The true context for this choice lies in the nature of publishing itself. Publishing businesses are extremely diverse and individual, however the usual circumstances within the work that an individual publisher produces tends to remain remarkably consistent. So a publisher that creates, for instance, medical or legal content, can create a set of terms for structural elements which will make sense for both book and journal content (noting that these are very different beasts that may have very little “commonality”, structurally speaking). For publishers whose corpus is in the trade, university press, or higher ed markets this can be a bit more tricky because of the unique variety of much of their published work, but it is still achievable. Remember, we are talking about defining structure and not the way content is rendered (e.g. pedagogy, composition, style, design, etc.)
So a custom content model might take some work to achieve, but it will add longevity and business value; the least part of which is not efficiency and the ability to semi-automate as the digital portion of your business grows and develops.
I’m not suggesting that every model should be written from scratch, but as I indicated earlier in this post, you can create a content model by starting with one of the existing types and modifying it to suit your needs. However, you do it, my experience has been that the further upstream you push this process into the editorial phase, the more effortless it becomes. And positive business value is sure to follow.
This process can be a little painful at first because it takes a lot of thought (working on the business, rather than in the business). It involves rejigging the way you do things, and – probably most painfully – changing the way you think about your publishing. But after it is in place, it soon becomes a way of life, and a more streamlined way of achieving the end goal – a highly successful digital product.
The publishing roots of content modelling
Creating a content model can feel like an exercise that has no precedent in traditional publishing workflows, however editors have been doing something analogous to this for a long time, in the directions they provide to typesetters (e.g. A-Head, Title, etc.), which has always been called the “design brief”.
Through these instructions, an editor delineates the structure of a given document and communicates that forward to the next stage of the process. But with print, that work the editor does is for one-time use only (if you like): these instructions are cast aside once the document is typeset, because this is the final version, to be published only within the medium of print.
Today’s publishers have the requirement to publish the same content in multiple formats both on and offline, and for a variety of very diverse delivery devices (smartphone, desktop, tablet, etc.) and even, in some cases, to serve it in chunked form for non-linear contexts such as online learning. Traditional processes and workflows suit this expanded requirement very poorly.
Publishers who don’t take the content modelling route not only face the costs and difficulties of repurposing content for different media, they are also vulnerable to the rapid changes in the tech landscape going forward; changes that in the past have left content too closely associated with deprecated formats and content platforms (e.g. Flash) high and dry.
This last reflection should serve, in part at least, to answer a further objection to following such a strategy I have occasionally seen; that it might in some way be a ‘fad’, and end in a wasted investment.
Overestimating the critical nature of the initial decisions a company makes when it creates a content model can have a paralysing effect – what if I get this wrong now? What if that mistake has unforeseen impacts on my business down the line – and wouldn’t I be better, faced with all that unquantifiable risk, just to do nothing?
My observation on that, frankly, is that many publishers, and not just scholarly publishers, have believed for a long time that it is better to do nothing than to do the wrong thing, and now many of them are paying the price. Digital products are becoming ubiquitous in the marketplace, and their content is really not in very good shape to be able to go after those formats. A conversion project is simply not the only thing you need to do. You need to invest in a process that will build business value in your content.
What you need to do is understand what your content is about – which means you have to get into discussions within your organization about the content you have and how you wish it to be represented. Companies that have not had this discussion in some form are now experiencing difficulties. There are many competent people in the industry who perform these services with admirable foresight and longevity.
The important thing is to have engaged with the issue, to have started on the journey. Even to have settled on something that feels like a temporary fix, a preliminary stage on the way to something more concerted, will give you advantages. As always it pays to begin with the end in mind.
The whole beauty of XML is that, if you’ve done a fairly decent job of expressing the structure of your content, and you decide on something better later on, you can convert it: XML can always be transformed into something else. The fundamental point here is that a content model is a definition of how the content should be structured; simply stated, it is business rules around structure.
So rather than having to jump head first into this thing, you can dip a toe – without losing an arm and a leg to the unforeseen alligators that are always lurking in that chasm.
Meanwhile the speed at which digital moves is not decreasing any time soon. So in this instance at least, to quote WPP boss, Martin Sorrell, ‘a bad decision on Monday morning is better than a good decision on Friday night’.
Thanks for entertaining these technical thoughts of mine… of course I know I’m just preaching to the choir. Most publishers in scholarly publishing have their houses in good order. I look forward to discussing these and many other issues with you all in future blogs.
Latest news and blog articles
Full-text HTML of preprints now available on medRxiv