Re: 2NN Contents Of Related (303 Shortcut) from Roy T. Fielding on 2014-09-05 (ietf-http-wg@w3.org from July to September 2014)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Fri, 5 Sep 2014 10:54:30 -0700
To: Sandro Hawke <sandro@w3.org>
Cc: Martin Thomson <martin.thomson@gmail.com>, Eric Prud'hommeaux <eric@w3.org>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, "Julian F. Reschke" <julian.reschke@gmx.de>
Message-Id: <03BF736A-ED87-4F1D-9799-5A7A2269237B@gbiv.com>
On Sep 4, 2014, at 8:18 PM, Sandro Hawke wrote:

> Thanks for the thorough response, Roy.   Details inline.
> 
> On 09/04/2014 09:14 PM, Roy T. Fielding wrote:
>> On Sep 4, 2014, at 3:02 PM, Sandro Hawke wrote:
>> 
>>> On 09/04/2014 01:50 PM, Martin Thomson wrote:
>>>> On 2 September 2014 08:00, Eric Prud'hommeaux <eric@w3.org> wrote:
>>>>> We could ask questions like "Is /Index?page=1 a representation of
>>>>> /Index ?" and "What is the subject of the metadata in a 200+CL, the
>>>>> effective request URI or the CL?" The end result of these is that we
>>>>> evaluate the use cases for 303.
>>>> I think that saying we end up re-evaluating the need for 303 is
>>>> drawing a pretty long bow.
>>>> 
>>>> Why don't we talk instead about semantics.  What semantic distinction
>>>> are you looking to make?  There's a functional pattern you are looking
>>>> to enable (request this, get that instead, don't pay extra round
>>>> trips), but that pattern is supported by 200+CL.
>>>> 
>>> It's a good question, and I'm not sure we have a great answer. Mostly, we want it to be possible for there to be semantic distinctions.   We're building infrastructure, not applications, so the distinctions are likely to be made at other layers.
>> Given resources A and B,
>> 
>>   GET A -> 200 OK, CL: B
>>     implies that the payload is both a representation of A and of B
>>     (assuming the origin server is authoritative for both).
> 
> Assuming you're right about this, and I agree you are, isn't that sufficient to answer Mark Nottingham's question of why not use 200+CL?   In our use cases, the payload is not a representation of A.

Yes, they have distinct meanings.  That doesn't mean the distinction is
useful, but it is the answer to his question.

>>   GET A -> 303 See Other, Location: B
>>     implies that we don't have a representation of A, but B is interesting
>>     too so you might want to go over and get that if you haven't already.
> 
> Right.
> 
>>   GET A -> 2NN Related, CL: B
> 
> Minor detail: in the I-D it's the Location header, not the Content-Location header.  This is to allow the Content-Location header to still be used with its normal Con-Neg purpose, as shown in the example in the draft.
> 
> See: http://tools.ietf.org/html/draft-prudhommeaux-http-status-2nn-00

The I-D is wrong.  Please fix it.  CL is a link assertion that isn't so
much about conneg, but rather that conneg is one way to cause that
assertion to exist.  2NN would be another such way. They cannot conflict.

>>     implies that we don't have a representation of A, but we know what you
>>     really wanted (better than you) so here is a representation of B.
> 
> If I understand right, your "better than you" comment is about how there might already be a cache of B, and with 2NN that cached copy wont be used.  I agree that's a weakness in this proposal, although in all the scenarios I've seen discussed, it's unlikely B would be cached unless the 2NN response to A also was, so this weakness wouldn't be observed in practice.

2NN responses cannot be cached by existing caches.  303 responses can, as
can the 200 responses after the redirection.  That's why we redirect (well,
that, and the fact that the destination might be in a different administrative
domain with its own authority and potential access controls).

>> Now, here's the problem:
>> 
>> "It's a round trip short cut!"  No, because it won't be cached.
> 
> As above, in the scenarios we're looking at, it's unlikely B will be cached unless the 2NN response for A is also.

2NN responses will not be cached because HTTP forbids caching of unknown
status codes.  It might be cached by the special-purpose tool's cache,
but that is only one of the potential opportunities to cache.  Both the
2NN and the redirected 200 response (via 303) are lost to intermediate
caching and the resource itself has to be marked as Vary: Prefer.

Performance has to be evaluated from the perspective of systems design,
not individual requests.

>> 303 round trips to the same server are almost entirely free in HTTP/1.1
>> because of persistent connections,
> 
> Free of connection setup overhead, but they still cost a second round-trip delay.
> 
> In LDP applications, these calls are more like RPC than like displaying a web-page, so milliseconds might possibly count more than they do in more common existing applications.

No, they don't.  A web page is far more latency sensitive than any LDP
application ever deployed that makes HTTP requests, and yet a typical
web page consists of dozens of round trips to multiple servers.  The practical
impact of a single round trip per extremely rare 303 response is far less
than the impact of adding 40 or so bytes to the critical path of every
request in the form of Prefer header fields.  If the LDP knows it is going
to get a lot of 2NN responses (i.e., it is not sending requests to the open
Web), then it shouldn't be sending those requests in the first place.

>>  so what you are really short-cutting
>> here is the chance for the user's cache to discover it already has a
>> representation of that other resource they didn't actually request and
>> might not even be interested in retrieving.
>> 
>> "Oh, but we know the user always wants that other resource because this
>> is a semantic web system!"  Then do yourself a favor and use templated
>> assertions to define links between those semanticky resources and the
>> descriptions the user really wanted in the first place, and just skip the
>> first request entirely.
>> 
>>    urn:world:{stuff} -> describedBy ("http://encyclopedia/query?{stuff}")
> 
> The actual driving force behind this I-D is not about using 303s to deal with httpRange-14, it's to deal with paging.  That is, the client does a GET on A, including these request headers:
> 
>   Prefer: contents-of-related
>   Prefer: return=representation; max-triple-count="100"
> 
> and now the server can directly provide the first hundred triples, via a representation of B, which is that the first "page" of A.

The first hundred triples is a representation of resource A.
There is no requirement, anywhere, that representations be complete.
Prefer in this case is just another form of content negotiation and
the response is 200.  Responding 303 in this case would be wrong,
as would 2NN.

> So templates wont help with that.

It doesn't seem to be a relevant question.  If the client knows enough to
send Prefer, then it can also know what CL means.  2NN should not be sent
in this case.

> As an aside, is there a standard way to publish the kind of template assertion you provide above?   That would be useful for other things, indeed.

It is just a translation table with two URI Templates, like a combo of

   http://www.w3.org/People/Fielding/draft-ietf-uri-roy-urn-urc-00.txt

and

   http://tools.ietf.org/html/rfc6570

>> "Oh, but this isn't *just* a semantic web system -- we expect this to be
>> implemented by the entire Web!"  Well, then you don't know that the user
>> really wants to get a 2NN instead of a 303,
> 
> We know the client wants a 2NN instead of a 303 because of the "Prefer: contents-of-related" headers.  Without that, or some other indication not currently standardized, 2NN wont be sent.

Okay, then include that in your costs.

>> and thus you are favoring
>> your own club of implementations over the general performance value
>> (and benefit to everyone on the Internet) of reusing cached representations.
>> Maybe that's when Prefer should be used instead.
>> 
>>> Some possible distinctions that come to mind:
>>> 
>>> - search engines / indexing services -- these systems index which URLs provide content containing particular data items.   These systems are indexed by the URL; should they index the request URL or the CL?
>> I doubt that search engines index content of arbitrary status codes.
> 
> Existing search engines wont get a 2NN since they wont be including that prefer header.   New data search engines will be able to use it if they want.
> 
>> They do index the destinations of redirects, and they typically associate
>> aliased content with the most-linked URL.  2NN doesn't help at all.
> 
> Again, the only point of 2NN is to help with the round-trip time. In the scenarios we're looking at, where there are many requests back to back over the same HTTP connection, the second round-trip can cut performance in half.

If the requests are always resulting in 303s, then you are requesting the
wrong things.  This is just like when XML parser developers complained that
the W3C site was slowing down their document parsing because it couldn't
handle dynamic requests for static DTDs.  The right solution was to fix the
tools.

....Roy
Received on Friday, 5 September 2014 17:54:54 UTC