Re: Identifying a book on the Web today

Hello Baldur,

Thanks for the in-depth dive into this issue. I'll chime in to provide
additional context to the Readium Web Publication Manifest case that you've
mentioned.

- Readium 2’s manifest requires both a ‘self’ link to identify the
> publication’s canonical URL (see below) as well as an identifier property
> (which, IIRC functions as a JSON-LD @id/linked data IRI[5] and [6])
>

Actually, the only requirement is a "self" link, the additional identifier
is recommended but optional.

Typically for a book this would be:
- the URL of the manifest for the canonical/self link
- an ISBN expressed as a URN for the secondary identifier

This secondary identifier does work as a JSON-LD @id if present.

As you've pointed out, requiring two identifiers like in Atom can be
confusing, which is the main reason why we only have one requirement.

There are a few points that I'd like to make about that:
- a publication that is primarily created for the Web is not very likely to
have such a secondary identifier
- but the opposite is true: none of the EPUB distributed today have a
canonical URL but they all have such a secondary identifier
- for that reason, we do need both and might end up with different
requirements for WP vs PWP

In the context of Readium-2 this means:
- that when we ingest a packaged publication (EPUB, CBZ), we include the
secondary identifier in the manifest if we can identify one
- but we completely ignore the secondary identifier when we build a UA to
display and interact with a Web Publication (the canonical/self URL to the
manifest is far more important)

To summarise:
>
> * A URL as both a locator and identifier is a given—if it’s on the web,
> that’s how it’s going to work—but we can’t change how a URL functions or
> behaves.
> * Using a URL that doesn’t identify the publication (e.g. an external HTML
> page) to help people indirectly locate a publication should be a feature
> that we provide by specifying some form of discovery mechanism (some form
> of link—HTTP header or link tag—with a format-specific rel value is the
> usual way of doing this).
> * A secondary globally unique identifier that is separate from the
> identifying and locating URL is useful for a variety of reasons but
> requiring one has as many downsides as it has upsides—the biggest downside
> being that most developers won’t provide one even if that makes the web
> publication invalid. I’m sure we will debate this but given that the
> functional advantages are largely in the area of distribution and
> portability I don’t see why this should be a requirement for non-portable
> web publications.
> * We absolutely should not venture into the territory of extending
> existing protocols, minting new identifying schemes, or specifying a
> locator mechanism that mandates the implementation and maintenance of what
> are likely to be non-trivial server systems.
>

Fully agree, this is completely consistent with what we've decided for
Readium-2.

Hadrien

Received on Wednesday, 2 August 2017 21:35:47 UTC