Re: [METADATA] Webbiness of publishing metadata (ISSUE-1) from Robert Sanderson on 2014-09-15 (public-digipub-ig@w3.org from September 2014)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Mon, 15 Sep 2014 13:29:29 -0700
To: Tom De Nies <tom.denies@ugent.be>
Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <CABevsUEn_pK3OSSxMmVEZoeftNhKN8xi3LW=u243yA+h1BMOBw@mail.gmail.com>
Like with CFI? And Open Annotation? :)

http://www.idpf.org/epub/oa/

EPUBs are relatively straight forward in comparison to other content types,
in terms of referencing arbitrary components.


On Mon, Sep 15, 2014 at 1:25 PM, Tom De Nies <tom.denies@ugent.be> wrote:

> You make a valid point, Phil, but the alternative (not embedding the
> metadata) is not ideal either.
> Without being able to directly refer to certain parts of the content of an
> epub, the possibilities to add descriptive metadata decrease significantly.
>
> Ideally, you would be able to identify each part/fragment of an epub
> individually (e.g., with a fragment URI), so you can describe it with its
> metadata somewhere else.
>
> Tom
>
>
> 2014-09-15 20:03 GMT+02:00 Madans, Phil <Phil.Madans@hbgusa.com>:
>
>>  I get very nervous when I hear talk about  including metadata in the
>> epub file, like embedding ONIX or some other standard.  The issue is that
>> metadata changes. If you are embedding metadata in the epubs then you get
>> into the position of having to generate and distribute new epub files every
>> time that metadata changes. I don’t know how many publishers would be eager
>> to do that.  We wouldn’t. And I don’t think our vendors would be too keen
>> on that either.
>>
>>
>>
>> Once an ebook publishes, a lot of the metadata probably isn’t going to
>> change: Title, Author, Imprint, etc. But the descriptive metadata that we
>> are looking for to aid in discovery is far less static: Keywords, subject
>> categories, descriptions, awards, quotes, author bios and, of course,
>> price.  These elements can change often.
>>
>>
>>
>> Embedding metadata in the epub file, to me, is trying to do for the epub
>> what the book jacket does for the physical product. The book jacket is
>> about marketing, discoverability.  It has all of those elements, like
>> author bio and quotes and subjects categories, etc. And it is wrapped right
>> around the content—and is also embedded in the content in the form of ad
>> pages. The problem is the only time we can update with new metadata is when
>> we reprint the book and/or the jacket, unless we want to sticker existing
>> stock. In the same way, I don’t think embedding metadata in the epub is
>> going to be a dynamic or flexible enough solution for getting the most bang
>> out of the metadata. Unless there is a constant regeneration of the epub,
>> which, again, I think will turn into a supply chain issue.
>>
>>
>>
>> That’s my opinion.
>>
>>
>>
>> Phil
>>
>>
>>
>> ------------------------------------------------------------
>>
>> Phil Madans | Executive Director of Digital Publishing Technology
>> | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 |
>> phil.madans@hbgusa.com <david.young@hbgusa.com>
>>
>>
>>
>> *From:* Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
>> *Sent:* Monday, September 15, 2014 10:01 AM
>> *To:* Graham Bell; Ivan Herman; Tzviya Siegman
>> *Cc:* W3C Digital Publishing IG; Madi Solomon
>> *Subject:* RE: [METADATA] Webbiness of publishing metadata (ISSUE-1)
>>
>>
>>
>> +1
>>
>>
>>
>> This is exactly what I was going to say but Graham beat me to the punch.
>> ;-)
>>
>>
>>
>> Especially his comment that "it is not an issue that most publishers are
>> even aware of."
>>
>>
>>
>> I want to especially emphasize the point that I think the Web should _
>> *enable*_ the expression and conveyance of metadata, not specify what
>> that metadata _*is*_.
>>
>>
>>
>> Both schema.org and URIs are useful cases in point.
>>
>>
>>
>> Schema.org provides a useful way to embed metadata in content, but I
>> would say it is somewhat halfway on the "enable don't specify" path. It
>> does specify properties (which is actually very helpful) but in many or
>> most case the actual vocabularies used to characterize those properties are
>> not specified. While specifying down to that level of detail is of course
>> very useful for interoperability, it tends to be too limiting, too
>> restrictive, not expressive enough (have I been redundant enough?) for most
>> specific communities of users. Thus the educational folks got a few of the
>> things they need, the accessibility folks got a few of the things they
>> need, etc.—both got _*subsets*_ of the vocabularies they really consider
>> important within their domains. So I think on balance it is very useful to
>> let those properties be described by whatever vocabularies are useful to a
>> certain community of users.
>>
>>
>>
>> My example for URI is the DOI. ;-) It is not a choice _*between*_ using
>> DOI or URI: the recommended practice is to _*express*_ a DOI in the _
>> *form*_ of a URI. While that was not the common practice at first, it
>> has been recommended for the past year or two and is increasingly being
>> done. Many identifiers can be expressed in the form of a URI, which I think
>> is a Very Good Thing. URI doesn't attempt to _*replace*_ those
>> identifiers, it makes them work better.
>>
>>
>>
>> --Bill K
>>
>>
>>
>> *From:* Graham Bell [mailto:graham@editeur.org <graham@editeur.org>]
>> *Sent:* Monday, September 15, 2014 5:18 AM
>> *To:* Ivan Herman; Tzviya Siegman
>> *Cc:* W3C Digital Publishing IG; Bill Kasdorf; Madi Solomon
>> *Subject:* Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)
>>
>>
>>
>> I think it would be fair to say that the use of linked data and URIs as
>> identifiers is "*definitely not a 'solved issue' among publishers*" --
>> and to a large extent is not an issue that most publishers are even aware
>> of. While the book industry provides a fair amount of useful metadata, this
>> metadata is not aimed at making the web more useful, but at making the
>> supply chain for commercial books and e-books more useful.
>>
>>
>>
>> I go back to the three cases I listed in a comment on the DPIG wiki (see
>> the Phase 1 Strategy section).
>>
>>
>>
>>   i. metadata delivered in bulk, separate from the content or resource itself (*eg* as part of the commercial supply chain)
>>
>>   ii. metadata delivered embedded within the content or resource it describes (*eg* within an EPUB, within a web page)
>>
>>   iii. metadata delivered embedded within web pages *describing* the content or resource (*eg* in an online store, repository or catalog), possibly separate from the metadata *displayed* (for humans) on those pages
>>
>>   (actually there is a fourth case, which is metadata delivered on
>> demand, separate from the content or resource (eg as part of a web service).
>>
>>
>>
>> Publishers have tackled case i. via ONIX, but not case ii. or iii. Case
>> ii is properly the domain of the content standards groups such as W3C DPIG
>> and IDPF. Case iii. may also be something where W3C DPIG and schema.org
>> have roles. But...
>>
>>
>>
>> Given the reluctance of book publishers and retailers to invest more in
>> metadata (*viz* lack of uptake of a work identifier like ISTC, lack of
>> interest in a release identifier analogous to GRID, slow migration to ONIX
>> 3.0 in countries where 2.1 was most firmly embedded…), it seems to me to be
>> critical that we don't further burden the industry with 'yet another data
>> format to ignore'. As Phil implies in his point 5, the important thing is
>> to have *good metadata*, and it doesn't much matter how it is expressed
>> – so long as it can be transformed from one expression to another easily
>> and without loss of meaning. I suspect the best way around this is to
>> retain as much of the semantics of ONIX, while thinking about a syntax that
>> would allow that metadata to be embedded in e-publications and online
>> content. This would avoid publishers having to manage two or three parallel
>> and distinct sets of metadata. Separating ONIX semantics ('what do we mean
>> by pub date, by imprint, by title?') from the XML message (which is
>> 'merely' a convenient syntax used for transmitting the data along a data
>> supply chain in bulk), and allowing ONIX-style data to be expressed in
>> other syntaxes or data formats seems (to me) to be the way to go.
>>
>>
>>
>> I think there is something significant to do, but let's not be
>> reinventing the wheel.
>>
>>
>>
>> Graham Bell
>>
>> EDItEUR
>>
>>
>>
>> Tel: +44 20 7503 6418
>>
>> Mob: +44 7887 754958
>>
>>
>>
>> EDItEUR Limited is a company limited by guarantee, registered in England
>> no 2994705. Registered Office: United House, North Road, London N7 9DP,
>> UK. Website: http://www.editeur.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 15 Sep 2014, at 10:59, Ivan Herman wrote:
>>
>>
>>
>> Hi Tzviya
>>
>> I try to clarify the issues you raised...
>>
>> the description of ISSUE-1[1] is currently empty. (It only has a title,
>> in the subject of this mail).
>>
>> My interpretation of your question: is the published metadata
>> web-friendly? For me, with my W3C/OWP goggle on, this means whether it is
>> easy to use and combine metadata around a (or a family of) publication.
>> With my former Semantic Web hat's google on this time, this is very much
>> related to the essence of RDF: forgetting about the arcane syntax of
>> RDF/XML, the various choices that have been made in its design, the real
>> advantage of RDF is the ability to combine (meta)data coming from different
>> sources. And the core of this is: use URI-s as unique identifiers wherever
>> it makes sense and is useful.
>>
>> So... is the usage of URI-s around publishing metadata a solved issue? I
>> have the *impression* the answer is no (but Laura D. may shoot me.) If not,
>> is there anything W3C can do around this? Honestly, I do not think so, it
>> may be just as a complex task as defining a unified vocabulary to rule them
>> all... Is there a way to at least help? Years ago a document was produced
>> in the semantic web domain called 'Cool URI-s for the Semantic Web'[2];
>> would it be of any help if we tried to do something similar?
>>
>> But I may completely misunderstand the issue.
>>
>> Ivan
>>
>> P.S. That being said, I would think that this whole issue SHOULD be
>> listed in the metadata document we produce, spelling it out clearly.
>>
>> [1] https://www.w3.org/dpub/IG/track/issues/1
>> [2] http://www.w3.org/TR/cooluris/
>>
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> WebID: http://www.ivan-herman.net/foaf#me
>>
>>
>>
>>
>>  This may contain confidential material. If you are not an intended
>> recipient, please notify the sender, delete immediately, and understand
>> that no disclosure or reliance on the information herein is permitted.
>> Hachette Book Group may monitor email to and from our network.
>>
>
>


-- 
Rob Sanderson
Technology Collaboration Facilitator
Digital Library Systems and Services
Stanford, CA 94305
Received on Monday, 15 September 2014 20:29:59 UTC