Re: [METADATA] Webbiness of publishing metadata (ISSUE-1) from Graham Bell on 2014-09-15 (public-digipub-ig@w3.org from September 2014)

From: Graham Bell <graham@editeur.org>
Date: Mon, 15 Sep 2014 10:17:49 +0100
To: Ivan Herman <ivan@w3.org>, Tzviya Siegman <tsiegman@wiley.com>
CC: W3C Digital Publishing IG <public-digipub-ig@w3.org>, Bill Kasdorf <bkasdorf@apexcovantage.com>, Madi Solomon <madi.solomon@pearson.com>
Message-ID: <20EBE032-672B-428D-9042-756364336EB3@editeur.org>

I think it would be fair to say that the use of linked data and URIs as identifiers is "definitely not a 'solved issue' among publishers" -- and to a large extent is not an issue that most publishers are even aware of. While the book industry provides a fair amount of useful metadata, this metadata is not aimed at making the web more useful, but at making the supply chain for commercial books and e-books more useful.

I go back to the three cases I listed in a comment on the DPIG wiki (see the Phase 1 Strategy section).

i. metadata delivered in bulk, separate from the content or resource itself (eg as part of the commercial supply chain)
ii. metadata delivered embedded within the content or resource it describes (eg within an EPUB, within a web page)
iii. metadata delivered embedded within web pages describing the content or resource (eg in an online store, repository or catalog), possibly separate from the metadata displayed (for humans) on those pages

(actually there is a fourth case, which is metadata delivered on demand, separate from the content or resource (eg as part of a web service).

Publishers have tackled case i. via ONIX, but not case ii. or iii. Case ii is properly the domain of the content standards groups such as W3C DPIG and IDPF. Case iii. may also be something where W3C DPIG and schema.org<http://schema.org> have roles. But...

Given the reluctance of book publishers and retailers to invest more in metadata (viz lack of uptake of a work identifier like ISTC, lack of interest in a release identifier analogous to GRID, slow migration to ONIX 3.0 in countries where 2.1 was most firmly embedded…), it seems to me to be critical that we don't further burden the industry with 'yet another data format to ignore'. As Phil implies in his point 5, the important thing is to have good metadata, and it doesn't much matter how it is expressed – so long as it can be transformed from one expression to another easily and without loss of meaning. I suspect the best way around this is to retain as much of the semantics of ONIX, while thinking about a syntax that would allow that metadata to be embedded in e-publications and online content. This would avoid publishers having to manage two or three parallel and distinct sets of metadata. Separating ONIX semantics ('what do we mean by pub date, by imprint, by title?') from the XML message (which is 'merely' a convenient syntax used for transmitting the data along a data supply chain in bulk), and allowing ONIX-style data to be expressed in other syntaxes or data formats seems (to me) to be the way to go.

I think there is something significant to do, but let's not be reinventing the wheel.

Graham Bell
EDItEUR

Tel: +44 20 7503 6418
Mob: +44 7887 754958

EDItEUR Limited is a company limited by guarantee, registered in England no 2994705. Registered Office: United House, North Road, London N7 9DP, UK. Website: http://www.editeur.org

On 15 Sep 2014, at 10:59, Ivan Herman wrote:

Hi Tzviya

I try to clarify the issues you raised...

the description of ISSUE-1[1] is currently empty. (It only has a title, in the subject of this mail).

My interpretation of your question: is the published metadata web-friendly? For me, with my W3C/OWP goggle on, this means whether it is easy to use and combine metadata around a (or a family of) publication. With my former Semantic Web hat's google on this time, this is very much related to the essence of RDF: forgetting about the arcane syntax of RDF/XML, the various choices that have been made in its design, the real advantage of RDF is the ability to combine (meta)data coming from different sources. And the core of this is: use URI-s as unique identifiers wherever it makes sense and is useful.

So... is the usage of URI-s around publishing metadata a solved issue? I have the *impression* the answer is no (but Laura D. may shoot me.) If not, is there anything W3C can do around this? Honestly, I do not think so, it may be just as a complex task as defining a unified vocabulary to rule them all... Is there a way to at least help? Years ago a document was produced in the semantic web domain called 'Cool URI-s for the Semantic Web'[2]; would it be of any help if we tried to do something similar?

But I may completely misunderstand the issue.

Ivan

P.S. That being said, I would think that this whole issue SHOULD be listed in the metadata document we produce, spelling it out clearly.

[1] https://www.w3.org/dpub/IG/track/issues/1
[2] http://www.w3.org/TR/cooluris/

----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me

Received on Tuesday, 16 September 2014 09:08:41 UTC