Re: [METADATA] Webbiness of publishing metadata (ISSUE-1) from Tom De Nies on 2014-09-15 (public-digipub-ig@w3.org from September 2014)

From: Tom De Nies <tom.denies@ugent.be>
Date: Mon, 15 Sep 2014 22:25:02 +0200
To: "Madans, Phil" <Phil.Madans@hbgusa.com>
Cc: Bill Kasdorf <bkasdorf@apexcovantage.com>, Graham Bell <graham@editeur.org>, Ivan Herman <ivan@w3.org>, Tzviya Siegman <tsiegman@wiley.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Madi Solomon <madi.solomon@pearson.com>
Message-ID: <CA+=hbbfh=7S4UsJzo9Y+6-ToEGhheWzzGPd_uyz_a_yO68+w8A@mail.gmail.com>
You make a valid point, Phil, but the alternative (not embedding the
metadata) is not ideal either.
Without being able to directly refer to certain parts of the content of an
epub, the possibilities to add descriptive metadata decrease significantly.

Ideally, you would be able to identify each part/fragment of an epub
individually (e.g., with a fragment URI), so you can describe it with its
metadata somewhere else.

Tom


2014-09-15 20:03 GMT+02:00 Madans, Phil <Phil.Madans@hbgusa.com>:

>  I get very nervous when I hear talk about  including metadata in the
> epub file, like embedding ONIX or some other standard.  The issue is that
> metadata changes. If you are embedding metadata in the epubs then you get
> into the position of having to generate and distribute new epub files every
> time that metadata changes. I don’t know how many publishers would be eager
> to do that.  We wouldn’t. And I don’t think our vendors would be too keen
> on that either.
>
>
>
> Once an ebook publishes, a lot of the metadata probably isn’t going to
> change: Title, Author, Imprint, etc. But the descriptive metadata that we
> are looking for to aid in discovery is far less static: Keywords, subject
> categories, descriptions, awards, quotes, author bios and, of course,
> price.  These elements can change often.
>
>
>
> Embedding metadata in the epub file, to me, is trying to do for the epub
> what the book jacket does for the physical product. The book jacket is
> about marketing, discoverability.  It has all of those elements, like
> author bio and quotes and subjects categories, etc. And it is wrapped right
> around the content—and is also embedded in the content in the form of ad
> pages. The problem is the only time we can update with new metadata is when
> we reprint the book and/or the jacket, unless we want to sticker existing
> stock. In the same way, I don’t think embedding metadata in the epub is
> going to be a dynamic or flexible enough solution for getting the most bang
> out of the metadata. Unless there is a constant regeneration of the epub,
> which, again, I think will turn into a supply chain issue.
>
>
>
> That’s my opinion.
>
>
>
> Phil
>
>
>
> ------------------------------------------------------------
>
> Phil Madans | Executive Director of Digital Publishing Technology
> | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 |
> phil.madans@hbgusa.com <david.young@hbgusa.com>
>
>
>
> *From:* Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
> *Sent:* Monday, September 15, 2014 10:01 AM
> *To:* Graham Bell; Ivan Herman; Tzviya Siegman
> *Cc:* W3C Digital Publishing IG; Madi Solomon
> *Subject:* RE: [METADATA] Webbiness of publishing metadata (ISSUE-1)
>
>
>
> +1
>
>
>
> This is exactly what I was going to say but Graham beat me to the punch.
> ;-)
>
>
>
> Especially his comment that "it is not an issue that most publishers are
> even aware of."
>
>
>
> I want to especially emphasize the point that I think the Web should _
> *enable*_ the expression and conveyance of metadata, not specify what
> that metadata _*is*_.
>
>
>
> Both schema.org and URIs are useful cases in point.
>
>
>
> Schema.org provides a useful way to embed metadata in content, but I would
> say it is somewhat halfway on the "enable don't specify" path. It does
> specify properties (which is actually very helpful) but in many or most
> case the actual vocabularies used to characterize those properties are not
> specified. While specifying down to that level of detail is of course very
> useful for interoperability, it tends to be too limiting, too restrictive,
> not expressive enough (have I been redundant enough?) for most specific
> communities of users. Thus the educational folks got a few of the things
> they need, the accessibility folks got a few of the things they need,
> etc.—both got _*subsets*_ of the vocabularies they really consider
> important within their domains. So I think on balance it is very useful to
> let those properties be described by whatever vocabularies are useful to a
> certain community of users.
>
>
>
> My example for URI is the DOI. ;-) It is not a choice _*between*_ using
> DOI or URI: the recommended practice is to _*express*_ a DOI in the _
> *form*_ of a URI. While that was not the common practice at first, it has
> been recommended for the past year or two and is increasingly being done.
> Many identifiers can be expressed in the form of a URI, which I think is a
> Very Good Thing. URI doesn't attempt to _*replace*_ those identifiers, it
> makes them work better.
>
>
>
> --Bill K
>
>
>
> *From:* Graham Bell [mailto:graham@editeur.org <graham@editeur.org>]
> *Sent:* Monday, September 15, 2014 5:18 AM
> *To:* Ivan Herman; Tzviya Siegman
> *Cc:* W3C Digital Publishing IG; Bill Kasdorf; Madi Solomon
> *Subject:* Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)
>
>
>
> I think it would be fair to say that the use of linked data and URIs as
> identifiers is "*definitely not a 'solved issue' among publishers*" --
> and to a large extent is not an issue that most publishers are even aware
> of. While the book industry provides a fair amount of useful metadata, this
> metadata is not aimed at making the web more useful, but at making the
> supply chain for commercial books and e-books more useful.
>
>
>
> I go back to the three cases I listed in a comment on the DPIG wiki (see
> the Phase 1 Strategy section).
>
>
>
>   i. metadata delivered in bulk, separate from the content or resource itself (*eg* as part of the commercial supply chain)
>
>   ii. metadata delivered embedded within the content or resource it describes (*eg* within an EPUB, within a web page)
>
>   iii. metadata delivered embedded within web pages *describing* the content or resource (*eg* in an online store, repository or catalog), possibly separate from the metadata *displayed* (for humans) on those pages
>
>   (actually there is a fourth case, which is metadata delivered on
> demand, separate from the content or resource (eg as part of a web service).
>
>
>
> Publishers have tackled case i. via ONIX, but not case ii. or iii. Case ii
> is properly the domain of the content standards groups such as W3C DPIG and
> IDPF. Case iii. may also be something where W3C DPIG and schema.org have
> roles. But...
>
>
>
> Given the reluctance of book publishers and retailers to invest more in
> metadata (*viz* lack of uptake of a work identifier like ISTC, lack of
> interest in a release identifier analogous to GRID, slow migration to ONIX
> 3.0 in countries where 2.1 was most firmly embedded…), it seems to me to be
> critical that we don't further burden the industry with 'yet another data
> format to ignore'. As Phil implies in his point 5, the important thing is
> to have *good metadata*, and it doesn't much matter how it is expressed –
> so long as it can be transformed from one expression to another easily and
> without loss of meaning. I suspect the best way around this is to retain as
> much of the semantics of ONIX, while thinking about a syntax that would
> allow that metadata to be embedded in e-publications and online content.
> This would avoid publishers having to manage two or three parallel and
> distinct sets of metadata. Separating ONIX semantics ('what do we mean by
> pub date, by imprint, by title?') from the XML message (which is 'merely' a
> convenient syntax used for transmitting the data along a data supply chain
> in bulk), and allowing ONIX-style data to be expressed in other syntaxes or
> data formats seems (to me) to be the way to go.
>
>
>
> I think there is something significant to do, but let's not be reinventing
> the wheel.
>
>
>
> Graham Bell
>
> EDItEUR
>
>
>
> Tel: +44 20 7503 6418
>
> Mob: +44 7887 754958
>
>
>
> EDItEUR Limited is a company limited by guarantee, registered in England
> no 2994705. Registered Office: United House, North Road, London N7 9DP,
> UK. Website: http://www.editeur.org
>
>
>
>
>
>
>
>
>
> On 15 Sep 2014, at 10:59, Ivan Herman wrote:
>
>
>
> Hi Tzviya
>
> I try to clarify the issues you raised...
>
> the description of ISSUE-1[1] is currently empty. (It only has a title, in
> the subject of this mail).
>
> My interpretation of your question: is the published metadata
> web-friendly? For me, with my W3C/OWP goggle on, this means whether it is
> easy to use and combine metadata around a (or a family of) publication.
> With my former Semantic Web hat's google on this time, this is very much
> related to the essence of RDF: forgetting about the arcane syntax of
> RDF/XML, the various choices that have been made in its design, the real
> advantage of RDF is the ability to combine (meta)data coming from different
> sources. And the core of this is: use URI-s as unique identifiers wherever
> it makes sense and is useful.
>
> So... is the usage of URI-s around publishing metadata a solved issue? I
> have the *impression* the answer is no (but Laura D. may shoot me.) If not,
> is there anything W3C can do around this? Honestly, I do not think so, it
> may be just as a complex task as defining a unified vocabulary to rule them
> all... Is there a way to at least help? Years ago a document was produced
> in the semantic web domain called 'Cool URI-s for the Semantic Web'[2];
> would it be of any help if we tried to do something similar?
>
> But I may completely misunderstand the issue.
>
> Ivan
>
> P.S. That being said, I would think that this whole issue SHOULD be listed
> in the metadata document we produce, spelling it out clearly.
>
> [1] https://www.w3.org/dpub/IG/track/issues/1
> [2] http://www.w3.org/TR/cooluris/
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>  This may contain confidential material. If you are not an intended
> recipient, please notify the sender, delete immediately, and understand
> that no disclosure or reliance on the information herein is permitted.
> Hachette Book Group may monitor email to and from our network.
>
Received on Monday, 15 September 2014 20:25:35 UTC