Re: Team discussion - on refining Best Practice around use of canonical URLs from John Arwe on 2014-06-10 (public-ldp-wg@w3.org from June 2014)

From: John Arwe <johnarwe@us.ibm.com>
Date: Tue, 10 Jun 2014 16:32:11 -0400
To: Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <OFA845342A.61DB642D-ON85257CF3.006A6CA5-85257CF3.0070CFFB@us.ibm.com>

[1] defines rel=canonical .   It (re-)reads as quite compatible with the 
submission's usage except when it comes to retrievability; there is no 
overt conflict at least, but they're not completely aligned.

Using [2] alone does not even cover the "common case" cited in the text of 
two URLs that vary only by URI scheme.  Nor would [3].  The relevant 
difference is [2] (see espec 6.1 parag 1) is talking about the problem 
"given 2 URIs, that I do not control, are they equivalent" whereas [1] is 
talking about "how do I, as a server with full control over a resource 
that I know is accessible via multiple URI aliases, tell clients which of 
those URIs to prefer", and the latter is the problem that the submission 
was out to solve.

The niggle is that none of these guarantees the canonical URI is 
retrievable.  In the case of [2], the URI scheme can literally be anything 
(including URNs, which are not retrievable). 
Strictly speaking, an HTTP/S URI need not be retrievable/accessible 
either.  My reading of [1] and [3] is that they'd treat the "not 
retrievable" case as an outlier and even somewhat contrary to [1]'s 
anticipated uses.  In the "common case" cited as well as the reverse proxy 
case I cited on the weekly call, it could very easily be true that no 
single URI is retrievable by all clients ... so in that sense the 
Submission's canonical URI was being used purely as an identifier, the way 
Linked Data treats namespace URIs (optionally retrievable).

In the special case of LDP Paging, we know the paged resource is 
retrievable by definition (can't be a paged resource if it can't respond 
to head/get) so rel=canonical makes perfect sense.

In the more general case of any LDPR, I keep bouncing off the problem of 
how rel=canonical works when the canonical URI is not retrievable by all 
clients.  I can't see any way to solve that beyond saying that clients are 
only licensed to use canonical for LDPR identity, and retrieval of the 
canonical might not work even if retrieval of the original request URI 
does work for any given client.  If everyone is OK with that, solved(?).

Finally: we originally moved the submission's canonical text to BP after 
early feedback from Yves Lafon, who was worried IIRC about the 
authoritativeness of the submission's (then-) Location response header. 
I'd think rel=canonical has the same issue; [1] has nothing specific on 
that that I can see, but [4] basically says you can't trust Link headers 
unless you're using HTTPS, and even then only if the link context = the 
request URI (paraphrasing).



[1] http://tools.ietf.org/html/rfc6596
[2] tools.ietf.org/html/rfc3986#section-6
[3] http://tools.ietf.org/html/rfc7230#section-2.7.3
[4] http://tools.ietf.org/html/rfc5988#section-7

Best Regards, John

Voice US 845-435-9470  BluePages
Cloud and Smarter Infrastructure OSLC Lead

Received on Tuesday, 10 June 2014 20:42:54 UTC