Re: 2NN Contents Of Related (303 Shortcut) from Sandro Hawke on 2014-09-05 (ietf-http-wg@w3.org from July to September 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 04 Sep 2014 23:18:24 -0400
To: "Roy T. Fielding" <fielding@gbiv.com>
CC: Martin Thomson <martin.thomson@gmail.com>, Eric Prud'hommeaux <eric@w3.org>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, "Julian F. Reschke" <julian.reschke@gmx.de>
Message-ID: <54092B80.2060606@w3.org>
Thanks for the thorough response, Roy.   Details inline.

On 09/04/2014 09:14 PM, Roy T. Fielding wrote:
> On Sep 4, 2014, at 3:02 PM, Sandro Hawke wrote:
>
>> On 09/04/2014 01:50 PM, Martin Thomson wrote:
>>> On 2 September 2014 08:00, Eric Prud'hommeaux <eric@w3.org> wrote:
>>>> We could ask questions like "Is /Index?page=1 a representation of
>>>> /Index ?" and "What is the subject of the metadata in a 200+CL, the
>>>> effective request URI or the CL?" The end result of these is that we
>>>> evaluate the use cases for 303.
>>> I think that saying we end up re-evaluating the need for 303 is
>>> drawing a pretty long bow.
>>>
>>> Why don't we talk instead about semantics.  What semantic distinction
>>> are you looking to make?  There's a functional pattern you are looking
>>> to enable (request this, get that instead, don't pay extra round
>>> trips), but that pattern is supported by 200+CL.
>>>
>> It's a good question, and I'm not sure we have a great answer. Mostly, we want it to be possible for there to be semantic distinctions.   We're building infrastructure, not applications, so the distinctions are likely to be made at other layers.
> Given resources A and B,
>
>    GET A -> 200 OK, CL: B
>      implies that the payload is both a representation of A and of B
>      (assuming the origin server is authoritative for both).

Assuming you're right about this, and I agree you are, isn't that 
sufficient to answer Mark Nottingham's question of why not use 200+CL?   
In our use cases, the payload is not a representation of A.

>    GET A -> 303 See Other, Location: B
>      implies that we don't have a representation of A, but B is interesting
>      too so you might want to go over and get that if you haven't already.

Right.

>    GET A -> 2NN Related, CL: B

Minor detail: in the I-D it's the Location header, not the 
Content-Location header.  This is to allow the Content-Location header 
to still be used with its normal Con-Neg purpose, as shown in the 
example in the draft.

See: http://tools.ietf.org/html/draft-prudhommeaux-http-status-2nn-00

>      implies that we don't have a representation of A, but we know what you
>      really wanted (better than you) so here is a representation of B.

If I understand right, your "better than you" comment is about how there 
might already be a cache of B, and with 2NN that cached copy wont be 
used.  I agree that's a weakness in this proposal, although in all the 
scenarios I've seen discussed, it's unlikely B would be cached unless 
the 2NN response to A also was, so this weakness wouldn't be observed in 
practice.

> Now, here's the problem:
>
> "It's a round trip short cut!"  No, because it won't be cached.

As above, in the scenarios we're looking at, it's unlikely B will be 
cached unless the 2NN response for A is also.

> 303 round trips to the same server are almost entirely free in HTTP/1.1
> because of persistent connections,

Free of connection setup overhead, but they still cost a second 
round-trip delay.

In LDP applications, these calls are more like RPC than like displaying 
a web-page, so milliseconds might possibly count more than they do in 
more common existing applications.

>   so what you are really short-cutting
> here is the chance for the user's cache to discover it already has a
> representation of that other resource they didn't actually request and
> might not even be interested in retrieving.
>
> "Oh, but we know the user always wants that other resource because this
> is a semantic web system!"  Then do yourself a favor and use templated
> assertions to define links between those semanticky resources and the
> descriptions the user really wanted in the first place, and just skip the
> first request entirely.
>
>     urn:world:{stuff} -> describedBy ("http://encyclopedia/query?{stuff}")

The actual driving force behind this I-D is not about using 303s to deal 
with httpRange-14, it's to deal with paging.  That is, the client does a 
GET on A, including these request headers:

    Prefer: contents-of-related
    Prefer: return=representation; max-triple-count="100"

and now the server can directly provide the first hundred triples, via a 
representation of B, which is that the first "page" of A.

So templates wont help with that.

As an aside, is there a standard way to publish the kind of template 
assertion you provide above?   That would be useful for other things, 
indeed.


> "Oh, but this isn't *just* a semantic web system -- we expect this to be
> implemented by the entire Web!"  Well, then you don't know that the user
> really wants to get a 2NN instead of a 303,

We know the client wants a 2NN instead of a 303 because of the "Prefer: 
contents-of-related" headers.  Without that, or some other indication 
not currently standardized, 2NN wont be sent.

> and thus you are favoring
> your own club of implementations over the general performance value
> (and benefit to everyone on the Internet) of reusing cached representations.
> Maybe that's when Prefer should be used instead.
>
>> Some possible distinctions that come to mind:
>>
>> - search engines / indexing services -- these systems index which URLs provide content containing particular data items.   These systems are indexed by the URL; should they index the request URL or the CL?
> I doubt that search engines index content of arbitrary status codes.

Existing search engines wont get a 2NN since they wont be including that 
prefer header.   New data search engines will be able to use it if they 
want.

> They do index the destinations of redirects, and they typically associate
> aliased content with the most-linked URL.  2NN doesn't help at all.

Again, the only point of 2NN is to help with the round-trip time. In the 
scenarios we're looking at, where there are many requests back to back 
over the same HTTP connection, the second round-trip can cut performance 
in half.

I'm going to skip the point-by-point on the following bits for now, 
unless you think they're still relevant given what I've said above.   
They were aimed at answering why 200+CL wasn't a solution.

>> - endorsement -- what we now see in social systems as Like/+1/star -- where the user sees something and gives it some kind of mark of approval.  Is that mark on just the first page of items, or the whole set?    Which URL should be considered endoresed?  It's possible the user will be frustrated, or worse, if they meant to mark one and the other was considered marked.
> 2NN is not going to help you there.  You can't tell if they like a picture,
> the person in the picture, the dog the person is holding in the picture,
> or the sweater on the dog the person is holding in the picture.  The user
> doesn't care either way -- they just want the owner to like them back.
> The only ones who really care are the advertisers trying to figure out
> which of those things the user might actually be interested in buying.
>
>> - link rel=alternate -- is that an alternate for this page or the whole thing?   Some alternates might be paged differently, so maybe it doesn't make sense for the page.
> Link relations are supposed to define what the relation applies to.
> The response isn't going to make any difference.
>
>> - link rel=copyright -- if the different items have different copyright, there might be multiple of these links to cover them all, and it will be different depending whether this is talking about the paged resource or just a page
> Copyright links point to a description of the copyright for an expression
> (the selected representation), but that description might encompass an
> entire site of resources (e.g., it might be a database of copyright info).
>
>> - link rel=next/prev -- at first glance these obviously are about the CL not the requested resources, but what if the requested resource were itself in some kind of sequence?    Or what if the redirect were for some reason other than paging?
> They are defined as navigation links.  What does paging have to do with
> any of this?  Whether "/Index?page=1" and "/Index" are the same or
> (more likely) different resources is not going to be discoverable by
> one look at their representations.  The response will be 200, regardless.
>
> I have no doubt that we'll end up with a 2NN code anyway, because it is
> easier to mint new codes than to explain why servers shouldn't use them.

I hope I speak for the entire LDP Working Group when I say we'd be 
thrilled to drop this proposal if we saw a viable alternative using 
existing standards.

          -- Sandro

>
> ....Roy
>
>
Received on Friday, 5 September 2014 03:18:34 UTC