W3C home > Mailing lists > Public > public-digipub-ig@w3.org > September 2014

Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)

From: LAURA DAWSON <ljndawson@gmail.com>
Date: Tue, 16 Sep 2014 12:33:27 -0400
To: Robert Sanderson <azaroth42@gmail.com>, Ivan Herman <ivan@w3.org>
CC: Tom De Nies <tom.denies@ugent.be>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <D03DDDFD.7F131%ljndawson@gmail.com>
And there we have the fundamental problem of the book as we have known it
for 400 years, colliding with the web. Because ultimately whatıs to
distinguish an ebook from a website once the silos come down? What is the
canonical reference for any given book?

From:  Robert Sanderson <azaroth42@gmail.com>
Date:  Tuesday, September 16, 2014 at 12:27 PM
To:  Ivan Herman <ivan@w3.org>
Cc:  Tom De Nies <tom.denies@ugent.be>, W3C Digital Publishing IG
<public-digipub-ig@w3.org>
Subject:  Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)
Resent-From:  <public-digipub-ig@w3.org>
Resent-Date:  Tue, 16 Sep 2014 16:28:02 +0000


What do you mean by "book" (and here lies the problem, as we know ...)

* The URI from which you get your personal copy of the EPUB version of that
version of the book
* The URI that identifies that particular EPUB version of the book,
regardless of ownership
* The URI that identifies digital copies of that version of the book,
regardless of format
* The URI that identifies digital copies of the book, regardless of version
* The URI that identifies that particular book, regardless of
digital/physical
* The URI that identifies that edition of the book, regardless of exact text
* The URI that identifies that creative work, regardless of medium?

(etc, I'm sure there's more)

And the answer is purely socio-political.  Publishers could easily mint
their own URIs for all of these things. They could get together and create a
shortened URI service to manage them centrally, like DOI.  The only
requirement from an information perspective is clarity on what the URI
actually identifies, such that it can be used appropriately.

Rob


On Tue, Sep 16, 2014 at 2:50 AM, Ivan Herman <ivan@w3.org> wrote:
> 
> On 15 Sep 2014, at 22:29 , Robert Sanderson <azaroth42@gmail.com> wrote:
> 
>> >
>> > Like with CFI? And Open Annotation? :)
>> >
>> > http://www.idpf.org/epub/oa/
>> >
>> > EPUBs are relatively straight forward in comparison to other content types,
>> in terms of referencing arbitrary components.
> 
> CFI is for the fragment. Which is fine. But what is the URI of the book? We
> are getting back to the fundamental problem of identification...
> 
> Ivan
> 
> 
>> >
>> >
>> > On Mon, Sep 15, 2014 at 1:25 PM, Tom De Nies <tom.denies@ugent.be> wrote:
>> > You make a valid point, Phil, but the alternative (not embedding the
>> metadata) is not ideal either.
>> > Without being able to directly refer to certain parts of the content of an
>> epub, the possibilities to add descriptive metadata decrease significantly.
>> >
>> > Ideally, you would be able to identify each part/fragment of an epub
>> individually (e.g., with a fragment URI), so you can describe it with its
>> metadata somewhere else.
>> >
>> > Tom
>> >
>> >
>> > 2014-09-15 20:03 GMT+02:00 Madans, Phil <Phil.Madans@hbgusa.com>:
>> > I get very nervous when I hear talk about  including metadata in the epub
>> file, like embedding ONIX or some other standard.  The issue is that metadata
>> changes. If you are embedding metadata in the epubs then you get into the
>> position of having to generate and distribute new epub files every time that
>> metadata changes. I donıt know how many publishers would be eager to do that.
>> We wouldnıt. And I donıt think our vendors would be too keen on that either.
>> >
>> >
>> >
>> > Once an ebook publishes, a lot of the metadata probably isnıt going to
>> change: Title, Author, Imprint, etc. But the descriptive metadata that we are
>> looking for to aid in discovery is far less static: Keywords, subject
>> categories, descriptions, awards, quotes, author bios and, of course, price.
>> These elements can change often.
>> >
>> >
>> >
>> > Embedding metadata in the epub file, to me, is trying to do for the epub
>> what the book jacket does for the physical product. The book jacket is about
>> marketing, discoverability.  It has all of those elements, like author bio
>> and quotes and subjects categories, etc. And it is wrapped right around the
>> content‹and is also embedded in the content in the form of ad pages. The
>> problem is the only time we can update with new metadata is when we reprint
>> the book and/or the jacket, unless we want to sticker existing stock. In the
>> same way, I donıt think embedding metadata in the epub is going to be a
>> dynamic or flexible enough solution for getting the most bang out of the
>> metadata. Unless there is a constant regeneration of the epub, which, again,
>> I think will turn into a supply chain issue.
>> >
>> >
>> >
>> > Thatıs my opinion.
>> >
>> >
>> >
>> > Phil
>> >
>> >
>> >
>> > ------------------------------------------------------------
>> >
>> > Phil Madans | Executive Director of Digital Publishing Technology |
>> Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415
>> <tel:212-364-1415>  | phil.madans@hbgusa.com
>> >
>> >
>> >
>> > From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
>> > Sent: Monday, September 15, 2014 10:01 AM
>> > To: Graham Bell; Ivan Herman; Tzviya Siegman
>> > Cc: W3C Digital Publishing IG; Madi Solomon
>> > Subject: RE: [METADATA] Webbiness of publishing metadata (ISSUE-1)
>> >
>> >
>> >
>> > +1
>> >
>> >
>> >
>> > This is exactly what I was going to say but Graham beat me to the punch. >>
;-)
>> >
>> >
>> >
>> > Especially his comment that "it is not an issue that most publishers are
>> even aware of."
>> >
>> >
>> >
>> > I want to especially emphasize the point that I think the Web should
>> _enable_ the expression and conveyance of metadata, not specify what that
>> metadata _is_.
>> >
>> >
>> >
>> > Both schema.org <http://schema.org>  and URIs are useful cases in point.
>> >
>> >
>> >
>> > Schema.org provides a useful way to embed metadata in content, but I would
>> say it is somewhat halfway on the "enable don't specify" path. It does
>> specify properties (which is actually very helpful) but in many or most case
>> the actual vocabularies used to characterize those properties are not
>> specified. While specifying down to that level of detail is of course very
>> useful for interoperability, it tends to be too limiting, too restrictive,
>> not expressive enough (have I been redundant enough?) for most specific
>> communities of users. Thus the educational folks got a few of the things they
>> need, the accessibility folks got a few of the things they need, etc.‹both
>> got _subsets_ of the vocabularies they really consider important within their
>> domains. So I think on balance it is very useful to let those properties be
>> described by whatever vocabularies are useful to a certain community of
>> users.
>> >
>> >
>> >
>> > My example for URI is the DOI. ;-) It is not a choice _between_ using DOI
>> or URI: the recommended practice is to _express_ a DOI in the _form_ of a
>> URI. While that was not the common practice at first, it has been recommended
>> for the past year or two and is increasingly being done. Many identifiers can
>> be expressed in the form of a URI, which I think is a Very Good Thing. URI
>> doesn't attempt to _replace_ those identifiers, it makes them work better.
>> >
>> >
>> >
>> > --Bill K
>> >
>> >
>> >
>> > From: Graham Bell [mailto:graham@editeur.org]
>> > Sent: Monday, September 15, 2014 5:18 AM
>> > To: Ivan Herman; Tzviya Siegman
>> > Cc: W3C Digital Publishing IG; Bill Kasdorf; Madi Solomon
>> > Subject: Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)
>> >
>> >
>> >
>> > I think it would be fair to say that the use of linked data and URIs as
>> identifiers is "definitely not a 'solved issue' among publishers" -- and to a
>> large extent is not an issue that most publishers are even aware of. While
>> the book industry provides a fair amount of useful metadata, this metadata is
>> not aimed at making the web more useful, but at making the supply chain for
>> commercial books and e-books more useful.
>> >
>> >
>> >
>> > I go back to the three cases I listed in a comment on the DPIG wiki (see
>> the Phase 1 Strategy section).
>> >
>> >
>> >
>> >  i. metadata delivered in bulk, separate from the content or resource
>> itself (eg as part of the commercial supply chain)
>> >   ii. metadata delivered embedded within the content or resource it
>> describes (eg within an EPUB, within a web page)
>> >   iii. metadata delivered embedded within web pages describing the content
>> or resource (eg in an online store, repository or catalog), possibly separate
>> from the metadata displayed (for humans) on those pages
>> > (actually there is a fourth case, which is metadata delivered on demand,
>> separate from the content or resource (eg as part of a web service).
>> >
>> >
>> >
>> > Publishers have tackled case i. via ONIX, but not case ii. or iii. Case ii
>> is properly the domain of the content standards groups such as W3C DPIG and
>> IDPF. Case iii. may also be something where W3C DPIG and schema.org
>> <http://schema.org>  have roles. But...
>> >
>> >
>> >
>> > Given the reluctance of book publishers and retailers to invest more in
>> metadata (viz lack of uptake of a work identifier like ISTC, lack of interest
>> in a release identifier analogous to GRID, slow migration to ONIX 3.0 in
>> countries where 2.1 was most firmly embeddedŠ), it seems to me to be critical
>> that we don't further burden the industry with 'yet another data format to
>> ignore'. As Phil implies in his point 5, the important thing is to have good
>> metadata, and it doesn't much matter how it is expressed ­ so long as it can
>> be transformed from one expression to another easily and without loss of
>> meaning. I suspect the best way around this is to retain as much of the
>> semantics of ONIX, while thinking about a syntax that would allow that
>> metadata to be embedded in e-publications and online content. This would
>> avoid publishers having to manage two or three parallel and distinct sets of
>> metadata. Separating ONIX semantics ('what do we mean by pub date, by
>> imprint, by title?') from the XML message (which is 'merely' a convenient
>> syntax used for transmitting the data along a data supply chain in bulk), and
>> allowing ONIX-style data to be expressed in other syntaxes or data formats
>> seems (to me) to be the way to go.
>> >
>> >
>> >
>> > I think there is something significant to do, but let's not be reinventing
>> the wheel.
>> >
>> >
>> >
>> > Graham Bell
>> >
>> > EDItEUR
>> >
>> >
>> >
>> > Tel: +44 20 7503 6418 <tel:%2B44%2020%207503%206418>
>> >
>> > Mob: +44 7887 754958 <tel:%2B44%207887%20754958>
>> >
>> >
>> >
>> > EDItEUR Limited is a company limited by guarantee, registered in England no
>> 2994705. Registered Office: United House, North Road, London N7 9DP, UK.
>> Website: http://www.editeur.org
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 15 Sep 2014, at 10:59, Ivan Herman wrote:
>> >
>> >
>> >
>> > Hi Tzviya
>> >
>> > I try to clarify the issues you raised...
>> >
>> > the description of ISSUE-1[1] is currently empty. (It only has a title, in
>> the subject of this mail).
>> >
>> > My interpretation of your question: is the published metadata web-friendly?
>> For me, with my W3C/OWP goggle on, this means whether it is easy to use and
>> combine metadata around a (or a family of) publication. With my former
>> Semantic Web hat's google on this time, this is very much related to the
>> essence of RDF: forgetting about the arcane syntax of RDF/XML, the various
>> choices that have been made in its design, the real advantage of RDF is the
>> ability to combine (meta)data coming from different sources. And the core of
>> this is: use URI-s as unique identifiers wherever it makes sense and is
>> useful.
>> >
>> > So... is the usage of URI-s around publishing metadata a solved issue? I
>> have the *impression* the answer is no (but Laura D. may shoot me.) If not,
>> is there anything W3C can do around this? Honestly, I do not think so, it may
>> be just as a complex task as defining a unified vocabulary to rule them
>> all... Is there a way to at least help? Years ago a document was produced in
>> the semantic web domain called 'Cool URI-s for the Semantic Web'[2]; would it
>> be of any help if we tried to do something similar?
>> >
>> > But I may completely misunderstand the issue.
>> >
>> > Ivan
>> >
>> > P.S. That being said, I would think that this whole issue SHOULD be listed
>> in the metadata document we produce, spelling it out clearly.
>> >
>> > [1] https://www.w3.org/dpub/IG/track/issues/1
>> > [2] http://www.w3.org/TR/cooluris/
>> >
>> > ----
>> > Ivan Herman, W3C
>> > Digital Publishing Activity Lead
>> > Home: http://www.w3.org/People/Ivan/
>> > mobile: +31-641044153 <tel:%2B31-641044153>
>> > GPG: 0x343F1A3D
>> > WebID: http://www.ivan-herman.net/foaf#me
>> >
>> >
>> >
>> >
>> >
>> >
>> > This may contain confidential material. If you are not an intended
>> recipient, please notify the sender, delete immediately, and understand that
>> no disclosure or reliance on the information herein is permitted. Hachette
>> Book Group may monitor email to and from our network.
>> >
>> >
>> >
>> >
>> > --
>> > Rob Sanderson
>> > Technology Collaboration Facilitator
>> > Digital Library Systems and Services
>> > Stanford, CA 94305
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153 <tel:%2B31-641044153>
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
> 
> 
> 
> 
> 



-- 
Rob Sanderson
Technology Collaboration Facilitator
Digital Library Systems and Services
Stanford, CA 94305
Received on Tuesday, 16 September 2014 16:34:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 25 April 2017 10:44:20 UTC