Re: 2NN Contents Of Related (303 Shortcut) from Eric Prud'hommeaux on 2014-09-05 (ietf-http-wg@w3.org from July to September 2014)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 5 Sep 2014 16:00:23 -0400
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: Sandro Hawke <sandro@w3.org>, Martin Thomson <martin.thomson@gmail.com>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, "Julian F. Reschke" <julian.reschke@gmx.de>
Message-ID: <20140905200007.GA20351@w3.org>
* Roy T. Fielding <fielding@gbiv.com> [2014-09-05 10:54-0700]
> On Sep 4, 2014, at 8:18 PM, Sandro Hawke wrote:
> 
> > Thanks for the thorough response, Roy.   Details inline.
> > 
> > On 09/04/2014 09:14 PM, Roy T. Fielding wrote:
> >> On Sep 4, 2014, at 3:02 PM, Sandro Hawke wrote:
> >> 
> >>> On 09/04/2014 01:50 PM, Martin Thomson wrote:
> >>>> On 2 September 2014 08:00, Eric Prud'hommeaux <eric@w3.org> wrote:
> >>>>> We could ask questions like "Is /Index?page=1 a representation of
> >>>>> /Index ?" and "What is the subject of the metadata in a 200+CL, the
> >>>>> effective request URI or the CL?" The end result of these is that we
> >>>>> evaluate the use cases for 303.
> >>>> I think that saying we end up re-evaluating the need for 303 is
> >>>> drawing a pretty long bow.
> >>>> 
> >>>> Why don't we talk instead about semantics.  What semantic distinction
> >>>> are you looking to make?  There's a functional pattern you are looking
> >>>> to enable (request this, get that instead, don't pay extra round
> >>>> trips), but that pattern is supported by 200+CL.
> >>>> 
> >>> It's a good question, and I'm not sure we have a great answer. Mostly, we want it to be possible for there to be semantic distinctions.   We're building infrastructure, not applications, so the distinctions are likely to be made at other layers.
> >> Given resources A and B,
> >> 
> >>   GET A -> 200 OK, CL: B
> >>     implies that the payload is both a representation of A and of B
> >>     (assuming the origin server is authoritative for both).
> > 
> > Assuming you're right about this, and I agree you are, isn't that sufficient to answer Mark Nottingham's question of why not use 200+CL?   In our use cases, the payload is not a representation of A.
> 
> Yes, they have distinct meanings.  That doesn't mean the distinction is
> useful, but it is the answer to his question.
> 
> >>   GET A -> 303 See Other, Location: B
> >>     implies that we don't have a representation of A, but B is interesting
> >>     too so you might want to go over and get that if you haven't already.
> > 
> > Right.
> > 
> >>   GET A -> 2NN Related, CL: B
> > 
> > Minor detail: in the I-D it's the Location header, not the Content-Location header.  This is to allow the Content-Location header to still be used with its normal Con-Neg purpose, as shown in the example in the draft.
> > 
> > See: http://tools.ietf.org/html/draft-prudhommeaux-http-status-2nn-00
> 
> The I-D is wrong.  Please fix it.  CL is a link assertion that isn't so
> much about conneg, but rather that conneg is one way to cause that
> assertion to exist.  2NN would be another such way. They cannot conflict.

In my reply to Sandro, I attempted to demonstrate how they are both
used in a complementary way, Location identifying the target of a 303
redirect and C-L identifying the negotiated representation of that
Location. I understand that there are other variants which may affect
the C-L, but I don't see why it's wrong that 2NN preserves the meaning
of both of those headers.


> >>     implies that we don't have a representation of A, but we know what you
> >>     really wanted (better than you) so here is a representation of B.
> > 
> > If I understand right, your "better than you" comment is about how there might already be a cache of B, and with 2NN that cached copy wont be used.  I agree that's a weakness in this proposal, although in all the scenarios I've seen discussed, it's unlikely B would be cached unless the 2NN response to A also was, so this weakness wouldn't be observed in practice.
> 
> 2NN responses cannot be cached by existing caches.  303 responses can, as
> can the 200 responses after the redirection.  That's why we redirect (well,
> that, and the fact that the destination might be in a different administrative
> domain with its own authority and potential access controls).
> 
> >> Now, here's the problem:
> >> 
> >> "It's a round trip short cut!"  No, because it won't be cached.
> > 
> > As above, in the scenarios we're looking at, it's unlikely B will be cached unless the 2NN response for A is also.
> 
> 2NN responses will not be cached because HTTP forbids caching of unknown
> status codes.  It might be cached by the special-purpose tool's cache,
> but that is only one of the potential opportunities to cache.  Both the
> 2NN and the redirected 200 response (via 303) are lost to intermediate
> caching and the resource itself has to be marked as Vary: Prefer.
> 
> Performance has to be evaluated from the perspective of systems design,
> not individual requests.

I agree that both have to be taken into account. I suspect that this
situation is self-correcting. If 2NN represents a very small fraction
of web traffic, then there's little impact and little incentive for
proxies to become 2NN-aware. The cost of using 2NN is evaluated by the
clients and they can send the Prefer or not.

If deployment takes the web by storm (as I'm sure we're all
anticipating), there will be incentive for people to program proxies
which will cache 2NN. They may even want the proxy to send an out-of-
band GET to verify the final resource, which I admit is not fabulous
traffic-wise, but still appears to be an optimum for responsiveness on
the initial request and cache hits for the subsequent requests.


> >> 303 round trips to the same server are almost entirely free in HTTP/1.1
> >> because of persistent connections,
> > 
> > Free of connection setup overhead, but they still cost a second round-trip delay.
> > 
> > In LDP applications, these calls are more like RPC than like displaying a web-page, so milliseconds might possibly count more than they do in more common existing applications.
> 
> No, they don't.  A web page is far more latency sensitive than any LDP
> application ever deployed that makes HTTP requests, and yet a typical
> web page consists of dozens of round trips to multiple servers.

A huge amount of resource has gone into progressive rendering to deal
with this sort of latency.

>                                                                  The practical
> impact of a single round trip per extremely rare 303 response is far less
> than the impact of adding 40 or so bytes to the critical path of every
> request in the form of Prefer header fields.  If the LDP knows it is going
> to get a lot of 2NN responses (i.e., it is not sending requests to the open
> Web), then it shouldn't be sending those requests in the first place.

I'm not sure I understand the last argument (which seems most germane
to our use cases). If LDP clients accout for a small fraction of web
traffic, but most of their traffic involves 303s, the incentive to
optimize their traffic offsets the impact their traffic will have on
conventional proxies. (What's the deployment of caches today? I
imagine we could put numbers to this argument.)


> >>  so what you are really short-cutting
> >> here is the chance for the user's cache to discover it already has a
> >> representation of that other resource they didn't actually request and
> >> might not even be interested in retrieving.
> >> 
> >> "Oh, but we know the user always wants that other resource because this
> >> is a semantic web system!"  Then do yourself a favor and use templated
> >> assertions to define links between those semanticky resources and the
> >> descriptions the user really wanted in the first place, and just skip the
> >> first request entirely.
> >> 
> >>    urn:world:{stuff} -> describedBy ("http://encyclopedia/query?{stuff}")
> > 
> > The actual driving force behind this I-D is not about using 303s to deal with httpRange-14, it's to deal with paging.  That is, the client does a GET on A, including these request headers:
> > 
> >   Prefer: contents-of-related
> >   Prefer: return=representation; max-triple-count="100"
> > 
> > and now the server can directly provide the first hundred triples, via a representation of B, which is that the first "page" of A.
> 
> The first hundred triples is a representation of resource A.
> There is no requirement, anywhere, that representations be complete.
> Prefer in this case is just another form of content negotiation and
> the response is 200.  Responding 303 in this case would be wrong,
> as would 2NN.

Is the second hundred triples also a representation of resource A?
Are they all resource A?

Perhaps your argument is that "Prefer" is the wrong header to signal
the client's ability to handle 2NN.


> > So templates wont help with that.
> 
> It doesn't seem to be a relevant question.  If the client knows enough to
> send Prefer, then it can also know what CL means.  2NN should not be sent
> in this case.
> 
> > As an aside, is there a standard way to publish the kind of template assertion you provide above?   That would be useful for other things, indeed.
> 
> It is just a translation table with two URI Templates, like a combo of
> 
>    http://www.w3.org/People/Fielding/draft-ietf-uri-roy-urn-urc-00.txt
> 
> and
> 
>    http://tools.ietf.org/html/rfc6570
> 
> >> "Oh, but this isn't *just* a semantic web system -- we expect this to be
> >> implemented by the entire Web!"  Well, then you don't know that the user
> >> really wants to get a 2NN instead of a 303,
> > 
> > We know the client wants a 2NN instead of a 303 because of the "Prefer: contents-of-related" headers.  Without that, or some other indication not currently standardized, 2NN wont be sent.
> 
> Okay, then include that in your costs.

Fair enough, but 28 bytes is a small impact compared to a strictly
sequential extra round trip.


> >> and thus you are favoring
> >> your own club of implementations over the general performance value
> >> (and benefit to everyone on the Internet) of reusing cached representations.
> >> Maybe that's when Prefer should be used instead.
> >> 
> >>> Some possible distinctions that come to mind:
> >>> 
> >>> - search engines / indexing services -- these systems index which URLs provide content containing particular data items.   These systems are indexed by the URL; should they index the request URL or the CL?
> >> I doubt that search engines index content of arbitrary status codes.
> > 
> > Existing search engines wont get a 2NN since they wont be including that prefer header.   New data search engines will be able to use it if they want.
> > 
> >> They do index the destinations of redirects, and they typically associate
> >> aliased content with the most-linked URL.  2NN doesn't help at all.
> > 
> > Again, the only point of 2NN is to help with the round-trip time. In the scenarios we're looking at, where there are many requests back to back over the same HTTP connection, the second round-trip can cut performance in half.
> 
> If the requests are always resulting in 303s, then you are requesting the
> wrong things.  This is just like when XML parser developers complained that
> the W3C site was slowing down their document parsing because it couldn't
> handle dynamic requests for static DTDs.  The right solution was to fix the
> tools.

Fair enough, but we need a fix. I'm not sure if you propose 200+C-L
(you appear to argue against it above) or that we discount use cases
like the Linked Data Platform which request resources which frequently
grow to be impractical to send in a single gulp.


> ....Roy
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Friday, 5 September 2014 20:00:29 UTC