Re: Important Change to HTTP semantics re. hashless URIs from Erik Isaksson on 2013-03-25 (public-lod@w3.org from March 2013)

From: Erik Isaksson <erikis@kth.se>
Date: Mon, 25 Mar 2013 10:07:30 +0100
To: Mo McRoberts <Mo.McRoberts@bbc.co.uk>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, "public-lod@w3.org" <public-lod@w3.org>, "public-rww@w3.org" <public-rww@w3.org>, "public-webid@w3.org" <public-webid@w3.org>, "dbpedia-discussion@lists.sourceforge.net" <dbpedia-discussion@lists.sourceforge.net>
Message-ID: <CAK9bEGy4BD91RQ-70qJdWpo3e+ZMt5w9+ugpGCzrFrGBTGEB5w@mail.gmail.com>
On Mon, Mar 25, 2013 at 9:42 AM, Mo McRoberts <Mo.McRoberts@bbc.co.uk> wrote:
> On Sun 2013-Mar-24, at 17:39, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>
>> All,
>>
>> Here is a key HTTP enhancement from Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content note from IETF [1].
>>
>> "
>>   4.  If the response has a Content-Location header field and its
>>       field-value is a reference to a URI different from the effective
>>       request URI, then the sender asserts that the payload is a
>>       representation of the resource identified by the Content-Location
>>       field-value.  However, such an assertion cannot be trusted unless
>>       it can be verified by other means (not defined by HTTP).
>> "
>>
>
>
> It's good to have the clarification (the wording in the new draft is nicer), but it's probably worth stressing that Content-Location isn't at all new, and this *mostly* amounts to a tidying-up of wording rather than a change in semantics.
>
> Section 14.14 of RFC2616 (HTTP/1.1) states:
>
> “The Content-Location entity-header field MAY be used to supply the resource location for the entity enclosed in the message when that entity is accessible from a location separate from the requested resource's URI."
>
> The biggest change here is actually the “However, such an assertion cannot be trusted..." part!
>
> M.
>

TimBL's proposal 25 was about introducing a new header (e.g.,
"Document:") with the semantics that I believe are being discussed
here.

At that time, I asked him about the relation between Content-Location
and Document [1]:

On Sun, Apr 1, 2012 at 5:14 PM, Tim Berners-Lee <timbl@w3.org> wrote:
>
> On 2012-03 -29, at 18:26, Erik Isaksson wrote:
>
>> Would the Document header be sort of a strengthened version of
>> Content-Location, with the difference that the returned representation
>> is not a "representation of the target resource" (i.e., probe URI)?
>
> Yes, exactly.
> I though if using Location: but hadn't checked that there wasn't any way in which this use
> would clash wit the normal use of "Location:".

Quote from further down in Hypertext Transfer Protocol (HTTP/1.1):
Semantics and Content [2]:

"   o  For a response to a GET or HEAD request, this is an indication
      that the effective request URI refers to a resource that is
      subject to content negotiation and the Content-Location field-
      value is a more specific identifier for the selected
      representation."

So Content-Location provides a "more specific identifier", which I
don't think helps us with avoiding 303. Anyway, personally, I think
we're along the right track here.

Best regards,
Erik

[1] http://lists.w3.org/Archives/Public/www-tag/2012Apr/0009.html
[2] http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-22#page-16


>>
>> Implications:
>>
>> This means that when hashless (aka. slash) HTTP URIs are used to denote entities, a client can use value from the Content-Location response header to distinguish a URI that denote an Entity Description Document (Descriptor) distinct from the URI of the Entity Described by said document. Thus, if a client de-references the URI <http://dbpedia.org/resource/Barack_Obama> and it gets a 200 OK from the server combined with <http://dbpedia.org/page/Barack_Obama> in the Content-Location response header, the client (user agent) can infer the following:
>>
>> 1. <http://dbpedia.org/resource/Barack_Obama> denotes the real-world entity 'Barack Obama' .
>> 2. <http://dbpedia.org/page/Barack_Obama> denotes the Web Document that describes real-world entity 'Barack Obama' -- by virtue of the fact that the server has explicitly *identified* said resource via the Content-Location header .
>>
>> Basically, the Toucan Affair [2][3][4] has now been incorporated into HTTP thereby providing an alternative to 303 redirection which has troubled/challenged many folks trying to exploit Linked Data via hashless HTTP URIs.
>>
>> Implementations:
>>
>> As per my comments in the Toucan Affair thread, our ODE [5] Linked Data client has always supported this heuristic. In addition, I am going propose implementing this heuristic in DBpedia which will simply have the net effect of not sending a 303 to user agents that look-up URIs in this particular Linked Data space.
>>
>> Linked Data Client implementation suggestions:
>>
>> I encourage clients to support this heuristic in addition to 303 with regards to Linked Data URI disambiguation. Implementation costs are minimal while the upside extremely high re., Linked Data comprehension, appreciation, and adoption.
>>
>> Links:
>>
>> 1. http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-22#page-15 .
>> 2. http://blog.iandavis.com/2010/11/04/is-303-really-necessary/ -- Is 303 Really Necessary post by Ian Davis.
>> 3. http://lists.w3.org/Archives/Public/public-lod/2010Nov/0090.html -- mailing list thread .
>> 4. http://linkeddata.uriburner.com/about/html/http/iandavis.com/2010/303/toucan -- example of heuristic handling .
>> 5. http://ode.openlinksw.com -- ODE Linked Data consumer service, bookmarklets, and cross-browser extensions.
>> 6. http://bit.ly/YxW21k -- Illustrating Semiotic Triangle using DBpedia's Linked Data URIs .
>>
>> --
>>
>> Regards,
>>
>> Kingsley Idehen
>> Founder & CEO
>> OpenLink Software
>> Company Web: http://www.openlinksw.com
>> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca handle: @kidehen
>> Google+ Profile: https://plus.google.com/112399767740508618350/about
>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>
>>
>>
>>
>>
>
>
> --
> Mo McRoberts - Analyst - BBC Archive Development,
> Zone 1.08, BBC Scotland, 40 Pacific Quay, Glasgow G51 1DA,
> MC3 D4, Media Centre, 201 Wood Lane, London W12 7TQ,
> 0141 422 6036 (Internal: 01-26036) - PGP key CEBCF03E
>
>
>
> -----------------------------
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and
> may contain personal views which are not the views of the BBC unless specifically stated.
> If you have received it in
> error, please delete it from your system.
> Do not use, copy or disclose the
> information in any way nor act in reliance on it and notify the sender
> immediately.
> Please note that the BBC monitors e-mails
> sent or received.
> Further communication will signify your consent to
> this.
> -----------------------------
>
Received on Monday, 25 March 2013 09:08:14 UTC