[DPUB] Please read re metadata before Monday's meeting

Hi, folks-

I have finally posted the summaries of my metadata interviews on the wiki here<https://www.w3.org/dpub/IG/wiki/Task_Forces/Metadata#NEAR_TERM_GOALS>. I really encourage you to read through them before the meeting on Monday morning. Although we will be talking about themes that have come out of those conversations, you will get a lot more insight by actually reading through the interviews. This is especially useful to show the differences of perspective and priority across different types of publishing: I deliberately selected folks who I knew would articulate issues from certain perspectives (news, journals, scholarly/STM, etc.) in addition to the important trade book publishing POV.

Here is just a quick summary of some of the issues that came out of these interviews:

--COMPLEXITY. Many, many folks lamented that there are so many metadata vocabularies, and they are so complicated (and many of them are in constant evolution-ONIX, BISAC, PRISM, etc.) that (a) they are hard to understand, keep up with, and implement properly, and (b) lead to . . .

--INCONSISTENCY. Despite the existence of well established and widely used standards (again, e.g., ONIX, BISAC, PRISM, etc.) they are used inconsistently by both the creators and recipients of metadata. Publishers feel that no two recipients want exactly the same things from them, and recipients lament that no two publishers give them the required metadata in exactly the same way.

--SACRIFICING RICHNESS FOR SIMPLICITY. While many folks on the trade side wish they had an "ONIX Lite," actually getting to that is not trivial because there is an inherent complexity of what they want to communicate. The clearest counter example comes from the scholarly/STM publishing side, where many folks actually think of metadata as a "solved problem." Why does it work so well? Because CrossRef was created initially for a single purpose-to enable reference linking-and so they created a very simple spec for the metadata necessary to make that work. On the other hand, folks now expect CrossRef to do Metadata Magic (my phrase, not theirs) and they can't because they don't have the metadata they need for the _other_ things folks want, because they have only collected the subset of metadata they needed for their initial use case.

--ONIX vs. SUBJECT METADATA. In the book industry, there are really two distinct ways metadata gets used. Overwhelmingly the main way is via "ONIX Feeds," the periodic batches of metadata the publishers or their service providers send out to the supply chain (retailers, aggregators, etc.). This is _separate from the book content_. In fact ONIX was initially created to distribute supply chain metadata mainly for physical books; although it has been updated with many more features related to eBooks (ONIX 3.0), there is a lot of resistance in the US to move off of the older ONIX 2.1 because from the point of view of many publishers, "it works," and also "it's what the supply chain is asking for." While yes, you can embed subject metadata in an ONIX record, that is only a tiny slice of what goes in an ONIX record. On the other hand, what is lacking is a way to embed subject metadata _in the book content itself_, either at the title level or at a component level (most common need: chapters; but all agree that embedding subject metadata at a granular level in the content _should_ make it more discoverable, manageable, and useful). But . . .

--FEW BOOKS ARE ONLINE ANYWAY. One reason metadata works so well in scholarly/STM journals is that the journal content is overwhelmingly online, so being able to click on a link in a reference (and journal articles often have hundreds of references) and _get right to the desired content_ is huge. While this is starting to happen with some scholarly books, it is extremely rare in any other side of book publishing. Books are products, whether print or eBooks, that are _discovered, sold, and often delivered online_ but the book content itself is rarely online.

--DISCOVERY (aka MARKETING) IS THE PRIORITY. Book publishers _do_ want to be able to do a better job of identifying the subjects of books, chapters, and components. They already have a lot of vocabularies designed to do just that. Plus many folks express the need for simple keywords: that is, NOT a controlled vocabulary, just let the publisher or the editor or the marketer or the author put in the damn file whatever words they think will make the right people find them and buy them. One contrasting example: Kevin Hawkins pointed out that for the University of Michigan, they are actually spending LESS effort on cataloguing [from the library perspective] because for content that _is_ online, people use search engines to find things, not library catalogs. To tie those two PsOV together was a point made by Thad McIlroy: for discovery, what matters, really, is _discovery via Google_. (Thad has a very practical, down-to-earth, get-real orientation.)

--IDENTIFIERS, IDENTIFIERS, IDENTIFIERS. We can talk all day long about metadata but if we ain't got identifiers we ain't got nuthin'. (Again, my editorial opinion.)

--AND NOW FOR SOMETHING COMPLETELY DIFFERENT: NEWS. Please read the interviews with Vincent Baby<https://www.w3.org/dpub/IG/wiki/Task_Forces/Metadata/Vincent_Baby_Interview> and Michael Steidl<https://www.w3.org/dpub/IG/wiki/Task_Forces/Metadata/Michael_Steidl_Interview> of the IPTC. You will see that the news industry has done a TON of work on metadata since 1979; they've been involved with the W3C and the Semantic Web all along; they really grok metadata. And they have an interesting perspective because of (a) the enormous firehose of content and images and media they need to manage, and (b) the speed with which everything has to happen. They can't wait for some agency to issue an identifier for something; they need self-describing identifiers. It's creative work, so RIGHTS METADATA is crucial. In the new multimedia world we are living in, metadata standards don't align well (and the IPTC is working to help address this). They have a TON of vocabularies, standards, etc. (provided as links in the reports of my interviews with Vincent and Michael). They are even keeping up with "fashion": despite their long commitment to XML, they realize that JSON is ascendant (while nowhere as rigorous or useful, just way easier to implement-my editorial comment, not theirs) so they are working on JSONizing standards they've got as XML or RDFa.

I could go on and on. ;-) I'll stop there and encourage you to read the interviews in the wiki<https://www.w3.org/dpub/IG/wiki/Task_Forces/Metadata#NEAR_TERM_GOALS>-and don't miss the one with Len Vlahos and Julie Morris of BISG<https://www.w3.org/dpub/IG/wiki/Task_Forces/Metadata/Len_Vlahos_%26_Julie_Morris_Interview>, which I have not mentioned explicitly in this summary, but which actually addresses so many of the fundamental issues because the BISG does so much metadata-related work with both book publishers and the whole book supply chain.

Thanks to whoever of you have actually read this far. Talk to you Monday.

--Bill Kasdorf


Bill Kasdorf
Vice President, Apex Content Solutions
Apex CoVantage
W: +1 734-904-6252
M: +1 734-904-6252
@BillKasdorf<http://twitter.com/#!/BillKasdorf>
bkasdorf@apexcovantage.com
www.apexcovantage.com<http://www.apexcovantage.com/>

[Corporate Logo-Copy]

Received on Saturday, 26 April 2014 16:46:39 UTC