RE: [METADATA] Webbiness of publishing metadata (ISSUE-1)

Phil has touched on the question of metadata travel with content (as opposed to being part of content). This is the issue that I had labeled metadata + content. As Ivan pointed out, these issues are very closely related.


1.       How is publishing metadata best expressed? Is there need/desire/interest in mapping existing standards, such as ONIX, to schema.org, RDFa, et al? What is the purpose of mapping?

2.       Many have made the point that metadata should be apart from content so that it can be updated easily and often. How does metadata then travel with the content? How does metadata that is marked up using OWP tools (see 1) travel with content?

I recently saw a sample of a book in HTML in which the keywords were simply tagged as <span class ="keyword">.  Well, the keywords travelled with the book, but they had no meaning (even in context) and making adjustements would have been a chore.

We are veering in a very EPUB-centric direction.

****************************
Tzviya Siegman * Digital Book Standards & Capabilities Lead * John Wiley & Sons, Inc.
111 River Street, MS 5-02 * Hoboken, NJ 07030-5774 * 201-748-6884 * tsiegman@wiley.com<mailto:tsiegman@wiley.com>

From: Madans, Phil [mailto:Phil.Madans@hbgusa.com]
Sent: Monday, September 15, 2014 2:04 PM
To: Bill Kasdorf; Graham Bell; Ivan Herman; Siegman, Tzviya - Hoboken
Cc: W3C Digital Publishing IG; Madi Solomon
Subject: RE: [METADATA] Webbiness of publishing metadata (ISSUE-1)

I get very nervous when I hear talk about  including metadata in the epub file, like embedding ONIX or some other standard.  The issue is that metadata changes. If you are embedding metadata in the epubs then you get into the position of having to generate and distribute new epub files every time that metadata changes. I don't know how many publishers would be eager to do that.  We wouldn't. And I don't think our vendors would be too keen on that either.

Once an ebook publishes, a lot of the metadata probably isn't going to change: Title, Author, Imprint, etc. But the descriptive metadata that we are looking for to aid in discovery is far less static: Keywords, subject categories, descriptions, awards, quotes, author bios and, of course, price.  These elements can change often.

Embedding metadata in the epub file, to me, is trying to do for the epub what the book jacket does for the physical product. The book jacket is about marketing, discoverability.  It has all of those elements, like author bio and quotes and subjects categories, etc. And it is wrapped right around the content-and is also embedded in the content in the form of ad pages. The problem is the only time we can update with new metadata is when we reprint the book and/or the jacket, unless we want to sticker existing stock. In the same way, I don't think embedding metadata in the epub is going to be a dynamic or flexible enough solution for getting the most bang out of the metadata. Unless there is a constant regeneration of the epub, which, again, I think will turn into a supply chain issue.

That's my opinion.

Phil

------------------------------------------------------------
Phil Madans | Executive Director of Digital Publishing Technology | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 | phil.madans@hbgusa.com<mailto:david.young@hbgusa.com>

From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
Sent: Monday, September 15, 2014 10:01 AM
To: Graham Bell; Ivan Herman; Tzviya Siegman
Cc: W3C Digital Publishing IG; Madi Solomon
Subject: RE: [METADATA] Webbiness of publishing metadata (ISSUE-1)

+1

This is exactly what I was going to say but Graham beat me to the punch. ;-)

Especially his comment that "it is not an issue that most publishers are even aware of."

I want to especially emphasize the point that I think the Web should _enable_ the expression and conveyance of metadata, not specify what that metadata _is_.

Both schema.org and URIs are useful cases in point.

Schema.org provides a useful way to embed metadata in content, but I would say it is somewhat halfway on the "enable don't specify" path. It does specify properties (which is actually very helpful) but in many or most case the actual vocabularies used to characterize those properties are not specified. While specifying down to that level of detail is of course very useful for interoperability, it tends to be too limiting, too restrictive, not expressive enough (have I been redundant enough?) for most specific communities of users. Thus the educational folks got a few of the things they need, the accessibility folks got a few of the things they need, etc.-both got _subsets_ of the vocabularies they really consider important within their domains. So I think on balance it is very useful to let those properties be described by whatever vocabularies are useful to a certain community of users.

My example for URI is the DOI. ;-) It is not a choice _between_ using DOI or URI: the recommended practice is to _express_ a DOI in the _form_ of a URI. While that was not the common practice at first, it has been recommended for the past year or two and is increasingly being done. Many identifiers can be expressed in the form of a URI, which I think is a Very Good Thing. URI doesn't attempt to _replace_ those identifiers, it makes them work better.

--Bill K

From: Graham Bell [mailto:graham@editeur.org]
Sent: Monday, September 15, 2014 5:18 AM
To: Ivan Herman; Tzviya Siegman
Cc: W3C Digital Publishing IG; Bill Kasdorf; Madi Solomon
Subject: Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)

I think it would be fair to say that the use of linked data and URIs as identifiers is "definitely not a 'solved issue' among publishers" -- and to a large extent is not an issue that most publishers are even aware of. While the book industry provides a fair amount of useful metadata, this metadata is not aimed at making the web more useful, but at making the supply chain for commercial books and e-books more useful.

I go back to the three cases I listed in a comment on the DPIG wiki (see the Phase 1 Strategy section).


 i. metadata delivered in bulk, separate from the content or resource itself (eg as part of the commercial supply chain)

  ii. metadata delivered embedded within the content or resource it describes (eg within an EPUB, within a web page)

  iii. metadata delivered embedded within web pages describing the content or resource (eg in an online store, repository or catalog), possibly separate from the metadata displayed (for humans) on those pages
(actually there is a fourth case, which is metadata delivered on demand, separate from the content or resource (eg as part of a web service).

Publishers have tackled case i. via ONIX, but not case ii. or iii. Case ii is properly the domain of the content standards groups such as W3C DPIG and IDPF. Case iii. may also be something where W3C DPIG and schema.org<http://schema.org> have roles. But...

Given the reluctance of book publishers and retailers to invest more in metadata (viz lack of uptake of a work identifier like ISTC, lack of interest in a release identifier analogous to GRID, slow migration to ONIX 3.0 in countries where 2.1 was most firmly embedded...), it seems to me to be critical that we don't further burden the industry with 'yet another data format to ignore'. As Phil implies in his point 5, the important thing is to have good metadata, and it doesn't much matter how it is expressed - so long as it can be transformed from one expression to another easily and without loss of meaning. I suspect the best way around this is to retain as much of the semantics of ONIX, while thinking about a syntax that would allow that metadata to be embedded in e-publications and online content. This would avoid publishers having to manage two or three parallel and distinct sets of metadata. Separating ONIX semantics ('what do we mean by pub date, by imprint, by title?') from the XML message (which is 'merely' a convenient syntax used for transmitting the data along a data supply chain in bulk), and allowing ONIX-style data to be expressed in other syntaxes or data formats seems (to me) to be the way to go.

I think there is something significant to do, but let's not be reinventing the wheel.

Graham Bell
EDItEUR

Tel: +44 20 7503 6418
Mob: +44 7887 754958

EDItEUR Limited is a company limited by guarantee, registered in England no 2994705. Registered Office: United House, North Road, London N7 9DP, UK. Website: http://www.editeur.org




On 15 Sep 2014, at 10:59, Ivan Herman wrote:

Hi Tzviya

I try to clarify the issues you raised...

the description of ISSUE-1[1] is currently empty. (It only has a title, in the subject of this mail).

My interpretation of your question: is the published metadata web-friendly? For me, with my W3C/OWP goggle on, this means whether it is easy to use and combine metadata around a (or a family of) publication. With my former Semantic Web hat's google on this time, this is very much related to the essence of RDF: forgetting about the arcane syntax of RDF/XML, the various choices that have been made in its design, the real advantage of RDF is the ability to combine (meta)data coming from different sources. And the core of this is: use URI-s as unique identifiers wherever it makes sense and is useful.

So... is the usage of URI-s around publishing metadata a solved issue? I have the *impression* the answer is no (but Laura D. may shoot me.) If not, is there anything W3C can do around this? Honestly, I do not think so, it may be just as a complex task as defining a unified vocabulary to rule them all... Is there a way to at least help? Years ago a document was produced in the semantic web domain called 'Cool URI-s for the Semantic Web'[2]; would it be of any help if we tried to do something similar?

But I may completely misunderstand the issue.

Ivan

P.S. That being said, I would think that this whole issue SHOULD be listed in the metadata document we produce, spelling it out clearly.

[1] https://www.w3.org/dpub/IG/track/issues/1
[2] http://www.w3.org/TR/cooluris/

----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me



This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.

Received on Monday, 15 September 2014 20:38:26 UTC