Re: Identifying a book on the Web today from Ivan Herman on 2017-08-03 (public-publ-wg@w3.org from August 2017)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 3 Aug 2017 11:54:04 +0200
To: Baldur Bjarnason <baldur@rebus.foundation>
Cc: Benjamin Young <byoung@bigbluehat.com>, David Wood <david.wood@ephox.com>, MURATA Makoto <eb2m-mrt@asahi-net.or.jp>, W3C Publishing Working Group <public-publ-wg@w3.org>
Message-Id: <1F82921D-68C3-48E3-A122-F109230612F2@w3.org>
[Admin comment: if we think this issue is a genuine discussion to have, we should open a github issue, in this case probably at the global level, ie, on publ-wg. Referring back to discussion threads on a mailing list is WAY more difficult later (say, in 2 years) than it is on github, and we will make our life much easier if we do it that way. Baldur, if you agree, adding a new issue would be a good idea, with a cut-and-past of your text.]

Hey Baldur,


I am not sure how the discussion got to these high level points, to be honest. I do not think (or at least I hope) anybody seriously considered defining our own identifier scheme, alternative protocols, etc; I think we should definitely keep away from those issues. We work with what is on the Web and, I believe, our mantra is to minimize any specification we do and definitely avoid touching the fundamentals. 

Ie, I basically agree with what is below, just let me add some non-fundamental comments. 


> On 2 Aug 2017, at 22:04, Baldur Bjarnason <baldur@rebus.foundation> wrote:

>
> * A URL as both a locator and identifier is a given—if it’s on the web, that’s how it’s going to work—but we can’t change how a URL functions or behaves.

I believe, as we emphasized in the PWP document in the DPUB IG[1], we have to be very clear that these two notions/roles are separate and they may or may not coincide. We have to accept that there are communities that do use identifiers that are not a URL (ISBN is the typical case, with all its flaws).

I think what _is_ a given is that we have a URL that acts as a locator on the Web, because it _is_ the Web. And, of course, we have to accept how URL-s are defined, and we have to accept (and possibly exploit!) how URL-s and HTTP behave. But let us not decide in general that this URL is the identifier or not (see also my comment below).

> * Using a URL that doesn’t identify the publication (e.g. an external HTML page) to help people indirectly locate a publication should be a feature that we provide by specifying some form of discovery mechanism (some form of link—HTTP header or link tag—with a format-specific rel value is the usual way of doing this).

I am not sure I 100% understand what you mean here. I guess you refer to the (still undecided) issue of locating the WP's manifest (however it will look like) using a URL. If so then yes, I completely agree that we have to provide a discovery mechanism.

But… alas! it is not easy to set up, at least for a lambda user, a proper HTTP based mechanism like, eg, content negotiations or controlling the return headers. This is also a constraint we will have to work with, content negotiation should probably be one but not _the_ mechanism to achieve that.

(The difficulties to control those things is one of the reasons that the Web developers community often seems, these days, to reject any HTTP based mechanisms…)

> * A secondary globally unique identifier that is separate from the identifying and locating URL is useful for a variety of reasons but requiring one has as many downsides as it has upsides—the biggest downside being that most developers won’t provide one even if that makes the web publication invalid. I’m sure we will debate this but given that the functional advantages are largely in the area of distribution and portability I don’t see why this should be a requirement for non-portable web publications.

I would not refer to this as "secondary". As I said above, I believe we should separate the notion of a (globally unique) identifier from a locator and, hopefully, on Monday we could agree on some minimal level of requirements that we consider as fundamental in using something as an "identifier". We should be agnostic on whether a specific URL can be considered as an identifier or not, we should just recognize that these two notions are different and, in our manifest, we should provide a slot to add both.

As for the requirement (or not) of having it: I guess, in spec parlance, what you say is that having at least one global identifier assigned to a WP is a SHOULD but not a MUST. And, for the reasons you cite, I agree with this. 

That being said, there may be communities (either via explicit profiles that we may define later or just throug some social agreement) that would have that as an absolute requirement, ie, a MUST. Scholarly journals is a typical case: having a globally unique identifier assigned to an article (which should be a WP) is an absolute must in that community and, furthermore, URL-s are not necessarily accepted as such (DOI-s are used for it these days)[2]. I would expect legal documents having a similarly strong requirement. Maybe the usage of MUST instead of SHOULD would be part of specific profiles, could be a requirement for a PWP or an EPUB4. This is for later.

> * We absolutely should not venture into the territory of extending existing protocols, minting new identifying schemes, or specifying a locator mechanism that mandates the implementation and maintenance of what are likely to be non-trivial server systems.

Absolutely and completely true. Actually, the charter puts the definition of new identification schemes explicitly out of scope, but what you say here is even a bit more general. 

Thanks!

Ivan

[1] https://www.w3.org/TR/pwp/#identification
[2] I must note that there are serious debates about this in the scholarly publishing community, with some asking for the abolition of the predominance of DOI-s.

----
Ivan Herman, W3C 
Publishing@W3C Technical Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Thursday, 3 August 2017 09:54:19 UTC