W3C home > Mailing lists > Public > public-digipub-ig@w3.org > September 2014

Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Tue, 16 Sep 2014 09:27:32 -0700
Message-ID: <CABevsUF+6ANSPzSCwQjF==h9SY-BjX5UAriZ9UsGrgubogAhKA@mail.gmail.com>
To: Ivan Herman <ivan@w3.org>
Cc: Tom De Nies <tom.denies@ugent.be>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
What do you mean by "book" (and here lies the problem, as we know ...)

* The URI from which you get your personal copy of the EPUB version of that
version of the book
* The URI that identifies that particular EPUB version of the book,
regardless of ownership
* The URI that identifies digital copies of that version of the book,
regardless of format
* The URI that identifies digital copies of the book, regardless of version
* The URI that identifies that particular book, regardless of
digital/physical
* The URI that identifies that edition of the book, regardless of exact text
* The URI that identifies that creative work, regardless of medium?

(etc, I'm sure there's more)

And the answer is purely socio-political.  Publishers could easily mint
their own URIs for all of these things. They could get together and create
a shortened URI service to manage them centrally, like DOI.  The only
requirement from an information perspective is clarity on what the URI
actually identifies, such that it can be used appropriately.

Rob


On Tue, Sep 16, 2014 at 2:50 AM, Ivan Herman <ivan@w3.org> wrote:

>
> On 15 Sep 2014, at 22:29 , Robert Sanderson <azaroth42@gmail.com> wrote:
>
> >
> > Like with CFI? And Open Annotation? :)
> >
> > http://www.idpf.org/epub/oa/
> >
> > EPUBs are relatively straight forward in comparison to other content
> types, in terms of referencing arbitrary components.
>
> CFI is for the fragment. Which is fine. But what is the URI of the book?
> We are getting back to the fundamental problem of identification...
>
> Ivan
>
>
> >
> >
> > On Mon, Sep 15, 2014 at 1:25 PM, Tom De Nies <tom.denies@ugent.be>
> wrote:
> > You make a valid point, Phil, but the alternative (not embedding the
> metadata) is not ideal either.
> > Without being able to directly refer to certain parts of the content of
> an epub, the possibilities to add descriptive metadata decrease
> significantly.
> >
> > Ideally, you would be able to identify each part/fragment of an epub
> individually (e.g., with a fragment URI), so you can describe it with its
> metadata somewhere else.
> >
> > Tom
> >
> >
> > 2014-09-15 20:03 GMT+02:00 Madans, Phil <Phil.Madans@hbgusa.com>:
> > I get very nervous when I hear talk about  including metadata in the
> epub file, like embedding ONIX or some other standard.  The issue is that
> metadata changes. If you are embedding metadata in the epubs then you get
> into the position of having to generate and distribute new epub files every
> time that metadata changes. I don’t know how many publishers would be eager
> to do that.  We wouldn’t. And I don’t think our vendors would be too keen
> on that either.
> >
> >
> >
> > Once an ebook publishes, a lot of the metadata probably isn’t going to
> change: Title, Author, Imprint, etc. But the descriptive metadata that we
> are looking for to aid in discovery is far less static: Keywords, subject
> categories, descriptions, awards, quotes, author bios and, of course,
> price.  These elements can change often.
> >
> >
> >
> > Embedding metadata in the epub file, to me, is trying to do for the epub
> what the book jacket does for the physical product. The book jacket is
> about marketing, discoverability.  It has all of those elements, like
> author bio and quotes and subjects categories, etc. And it is wrapped right
> around the content—and is also embedded in the content in the form of ad
> pages. The problem is the only time we can update with new metadata is when
> we reprint the book and/or the jacket, unless we want to sticker existing
> stock. In the same way, I don’t think embedding metadata in the epub is
> going to be a dynamic or flexible enough solution for getting the most bang
> out of the metadata. Unless there is a constant regeneration of the epub,
> which, again, I think will turn into a supply chain issue.
> >
> >
> >
> > That’s my opinion.
> >
> >
> >
> > Phil
> >
> >
> >
> > ------------------------------------------------------------
> >
> > Phil Madans | Executive Director of Digital Publishing Technology |
> Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 |
> phil.madans@hbgusa.com
> >
> >
> >
> > From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
> > Sent: Monday, September 15, 2014 10:01 AM
> > To: Graham Bell; Ivan Herman; Tzviya Siegman
> > Cc: W3C Digital Publishing IG; Madi Solomon
> > Subject: RE: [METADATA] Webbiness of publishing metadata (ISSUE-1)
> >
> >
> >
> > +1
> >
> >
> >
> > This is exactly what I was going to say but Graham beat me to the punch.
> ;-)
> >
> >
> >
> > Especially his comment that "it is not an issue that most publishers are
> even aware of."
> >
> >
> >
> > I want to especially emphasize the point that I think the Web should
> _enable_ the expression and conveyance of metadata, not specify what that
> metadata _is_.
> >
> >
> >
> > Both schema.org and URIs are useful cases in point.
> >
> >
> >
> > Schema.org provides a useful way to embed metadata in content, but I
> would say it is somewhat halfway on the "enable don't specify" path. It
> does specify properties (which is actually very helpful) but in many or
> most case the actual vocabularies used to characterize those properties are
> not specified. While specifying down to that level of detail is of course
> very useful for interoperability, it tends to be too limiting, too
> restrictive, not expressive enough (have I been redundant enough?) for most
> specific communities of users. Thus the educational folks got a few of the
> things they need, the accessibility folks got a few of the things they
> need, etc.—both got _subsets_ of the vocabularies they really consider
> important within their domains. So I think on balance it is very useful to
> let those properties be described by whatever vocabularies are useful to a
> certain community of users.
> >
> >
> >
> > My example for URI is the DOI. ;-) It is not a choice _between_ using
> DOI or URI: the recommended practice is to _express_ a DOI in the _form_ of
> a URI. While that was not the common practice at first, it has been
> recommended for the past year or two and is increasingly being done. Many
> identifiers can be expressed in the form of a URI, which I think is a Very
> Good Thing. URI doesn't attempt to _replace_ those identifiers, it makes
> them work better.
> >
> >
> >
> > --Bill K
> >
> >
> >
> > From: Graham Bell [mailto:graham@editeur.org]
> > Sent: Monday, September 15, 2014 5:18 AM
> > To: Ivan Herman; Tzviya Siegman
> > Cc: W3C Digital Publishing IG; Bill Kasdorf; Madi Solomon
> > Subject: Re: [METADATA] Webbiness of publishing metadata (ISSUE-1)
> >
> >
> >
> > I think it would be fair to say that the use of linked data and URIs as
> identifiers is "definitely not a 'solved issue' among publishers" -- and to
> a large extent is not an issue that most publishers are even aware of.
> While the book industry provides a fair amount of useful metadata, this
> metadata is not aimed at making the web more useful, but at making the
> supply chain for commercial books and e-books more useful.
> >
> >
> >
> > I go back to the three cases I listed in a comment on the DPIG wiki (see
> the Phase 1 Strategy section).
> >
> >
> >
> >  i. metadata delivered in bulk, separate from the content or resource
> itself (eg as part of the commercial supply chain)
> >   ii. metadata delivered embedded within the content or resource it
> describes (eg within an EPUB, within a web page)
> >   iii. metadata delivered embedded within web pages describing the
> content or resource (eg in an online store, repository or catalog),
> possibly separate from the metadata displayed (for humans) on those pages
> > (actually there is a fourth case, which is metadata delivered on demand,
> separate from the content or resource (eg as part of a web service).
> >
> >
> >
> > Publishers have tackled case i. via ONIX, but not case ii. or iii. Case
> ii is properly the domain of the content standards groups such as W3C DPIG
> and IDPF. Case iii. may also be something where W3C DPIG and schema.org
> have roles. But...
> >
> >
> >
> > Given the reluctance of book publishers and retailers to invest more in
> metadata (viz lack of uptake of a work identifier like ISTC, lack of
> interest in a release identifier analogous to GRID, slow migration to ONIX
> 3.0 in countries where 2.1 was most firmly embedded…), it seems to me to be
> critical that we don't further burden the industry with 'yet another data
> format to ignore'. As Phil implies in his point 5, the important thing is
> to have good metadata, and it doesn't much matter how it is expressed – so
> long as it can be transformed from one expression to another easily and
> without loss of meaning. I suspect the best way around this is to retain as
> much of the semantics of ONIX, while thinking about a syntax that would
> allow that metadata to be embedded in e-publications and online content.
> This would avoid publishers having to manage two or three parallel and
> distinct sets of metadata. Separating ONIX semantics ('what do we mean by
> pub date, by imprint, by title?') from the XML message (which is 'merely' a
> convenient syntax used for transmitting the data along a data supply chain
> in bulk), and allowing ONIX-style data to be expressed in other syntaxes or
> data formats seems (to me) to be the way to go.
> >
> >
> >
> > I think there is something significant to do, but let's not be
> reinventing the wheel.
> >
> >
> >
> > Graham Bell
> >
> > EDItEUR
> >
> >
> >
> > Tel: +44 20 7503 6418
> >
> > Mob: +44 7887 754958
> >
> >
> >
> > EDItEUR Limited is a company limited by guarantee, registered in England
> no 2994705. Registered Office: United House, North Road, London N7 9DP, UK.
> Website: http://www.editeur.org
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 15 Sep 2014, at 10:59, Ivan Herman wrote:
> >
> >
> >
> > Hi Tzviya
> >
> > I try to clarify the issues you raised...
> >
> > the description of ISSUE-1[1] is currently empty. (It only has a title,
> in the subject of this mail).
> >
> > My interpretation of your question: is the published metadata
> web-friendly? For me, with my W3C/OWP goggle on, this means whether it is
> easy to use and combine metadata around a (or a family of) publication.
> With my former Semantic Web hat's google on this time, this is very much
> related to the essence of RDF: forgetting about the arcane syntax of
> RDF/XML, the various choices that have been made in its design, the real
> advantage of RDF is the ability to combine (meta)data coming from different
> sources. And the core of this is: use URI-s as unique identifiers wherever
> it makes sense and is useful.
> >
> > So... is the usage of URI-s around publishing metadata a solved issue? I
> have the *impression* the answer is no (but Laura D. may shoot me.) If not,
> is there anything W3C can do around this? Honestly, I do not think so, it
> may be just as a complex task as defining a unified vocabulary to rule them
> all... Is there a way to at least help? Years ago a document was produced
> in the semantic web domain called 'Cool URI-s for the Semantic Web'[2];
> would it be of any help if we tried to do something similar?
> >
> > But I may completely misunderstand the issue.
> >
> > Ivan
> >
> > P.S. That being said, I would think that this whole issue SHOULD be
> listed in the metadata document we produce, spelling it out clearly.
> >
> > [1] https://www.w3.org/dpub/IG/track/issues/1
> > [2] http://www.w3.org/TR/cooluris/
> >
> > ----
> > Ivan Herman, W3C
> > Digital Publishing Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > GPG: 0x343F1A3D
> > WebID: http://www.ivan-herman.net/foaf#me
> >
> >
> >
> >
> >
> >
> > This may contain confidential material. If you are not an intended
> recipient, please notify the sender, delete immediately, and understand
> that no disclosure or reliance on the information herein is permitted.
> Hachette Book Group may monitor email to and from our network.
> >
> >
> >
> >
> > --
> > Rob Sanderson
> > Technology Collaboration Facilitator
> > Digital Library Systems and Services
> > Stanford, CA 94305
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>
>


-- 
Rob Sanderson
Technology Collaboration Facilitator
Digital Library Systems and Services
Stanford, CA 94305
Received on Tuesday, 16 September 2014 16:28:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 25 April 2017 10:44:20 UTC