W3C home > Mailing lists > Public > public-publ-wg@w3.org > August 2017

Re: Identifying a book on the Web today

From: Hadrien Gardeur <hadrien.gardeur@feedbooks.com>
Date: Wed, 2 Aug 2017 23:35:03 +0200
Message-ID: <CA+KS-11qQZSUQmHMNMWaUZGvVckFwx+3LPrmKUYSr0aJ4Qv9KQ@mail.gmail.com>
To: Baldur Bjarnason <baldur@rebus.foundation>
Cc: Benjamin Young <byoung@bigbluehat.com>, David Wood <david.wood@ephox.com>, MURATA Makoto <eb2m-mrt@asahi-net.or.jp>, "public-publ-wg@w3.org" <public-publ-wg@w3.org>
Hello Baldur,

Thanks for the in-depth dive into this issue. I'll chime in to provide
additional context to the Readium Web Publication Manifest case that you've
mentioned.

- Readium 2’s manifest requires both a ‘self’ link to identify the
> publication’s canonical URL (see below) as well as an identifier property
> (which, IIRC functions as a JSON-LD @id/linked data IRI[5] and [6])
>

Actually, the only requirement is a "self" link, the additional identifier
is recommended but optional.

Typically for a book this would be:
- the URL of the manifest for the canonical/self link
- an ISBN expressed as a URN for the secondary identifier

This secondary identifier does work as a JSON-LD @id if present.

As you've pointed out, requiring two identifiers like in Atom can be
confusing, which is the main reason why we only have one requirement.

There are a few points that I'd like to make about that:
- a publication that is primarily created for the Web is not very likely to
have such a secondary identifier
- but the opposite is true: none of the EPUB distributed today have a
canonical URL but they all have such a secondary identifier
- for that reason, we do need both and might end up with different
requirements for WP vs PWP

In the context of Readium-2 this means:
- that when we ingest a packaged publication (EPUB, CBZ), we include the
secondary identifier in the manifest if we can identify one
- but we completely ignore the secondary identifier when we build a UA to
display and interact with a Web Publication (the canonical/self URL to the
manifest is far more important)

To summarise:
>
> * A URL as both a locator and identifier is a given—if it’s on the web,
> that’s how it’s going to work—but we can’t change how a URL functions or
> behaves.
> * Using a URL that doesn’t identify the publication (e.g. an external HTML
> page) to help people indirectly locate a publication should be a feature
> that we provide by specifying some form of discovery mechanism (some form
> of link—HTTP header or link tag—with a format-specific rel value is the
> usual way of doing this).
> * A secondary globally unique identifier that is separate from the
> identifying and locating URL is useful for a variety of reasons but
> requiring one has as many downsides as it has upsides—the biggest downside
> being that most developers won’t provide one even if that makes the web
> publication invalid. I’m sure we will debate this but given that the
> functional advantages are largely in the area of distribution and
> portability I don’t see why this should be a requirement for non-portable
> web publications.
> * We absolutely should not venture into the territory of extending
> existing protocols, minting new identifying schemes, or specifying a
> locator mechanism that mandates the implementation and maintenance of what
> are likely to be non-trivial server systems.
>

Fully agree, this is completely consistent with what we've decided for
Readium-2.

Hadrien
Received on Wednesday, 2 August 2017 21:35:47 UTC

This archive was generated by hypermail 2.3.1 : Monday, 23 October 2017 15:49:06 UTC