Re: draft of 209 proposal from Mark Nottingham on 2014-03-16 (www-tag@w3.org from March 2014)

From: Mark Nottingham <mnot@mnot.net>
Date: Sun, 16 Mar 2014 18:31:08 +1100
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Tim Berners-Lee <timbl@w3.org>, TAG List <www-tag@w3.org>, Arnaud Hors <lehors@us.ibm.com>, Yves Lafon <ylafon@w3.org>, Philippe Le Hégaret <plh@w3.org>, Peter Linss <peter.linss@hp.com>, "Appelquist Daniel (UK)" <Daniel.Appelquist@telefonica.com>
Message-Id: <C5568096-2880-492C-AF2D-CCC1CD85A7FA@mnot.net>
On 16 Mar 2014, at 9:22 am, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> Jonathan Rees argue against this based on the philosophy of HTTP and I'll make that concrete

Thank you.

> with a paging example motivating that philosophy. Suppose github has users and replication partners. Replication partners can GET a large issue list but plebian users get shunted off to paged access. By your proposal, GET <https://api.github.com/repos/w3c/web-platform-tests/issues> would provide inconsistent representations of that resource:
> 
> user: GET /issues => 200, Location: /issues?page=1
> repA: GET /issues Accept:text/turtle => 200, Location: /issues.ttl
> repB: GET /issues Accept:text/json => 200, Location: /issues.json

Content-Location, not Location. They're very different.

Aside - if you put all of these under the same URI, you're not going to get good cache efficiency, because you'll need to be doing Vary: Cookie or similar, and that's going to cause a lot of thrashing.


> REST says that /issues.ttl and /issues.json are representations of the same resource, as implied by a 200 + Content-Location, which is fine 'cause they have the same information. /issues?page=1 is markedly different, presenting only a piece of requested resource.

I can see that. However, I strongly suspect that github (and anyone else) in this situation is going to just be giving the partners a different link (somehow; it could be via an API, or HTML, or a template, or a form, or...), rather than trying to jump through hoops by using a new status code (as well as having bad effects on caching; see above).


> POSTs and 303s relax the rules of Content-Location. 209 could as well, but relaxing them on 200 would be rather a surprise for REST.

Where do you see that (POST and 303 relaxing C-L)?


>>> GET->303 works fine but it requires two round trips. The purpose of 209 (2xx) is to avoid a round trip. This is expected to be used in high volume services in the Linked Data Platform.
>> 
>> Right, but what I'm saying is that you can achieve the desired effect with POST->200+Content-Location.
> 
> Sure, but I expect you wouldn't want us trying to guess which resources we should GET and which we should interrogate by POST.

Sorry, you lost me there. 

I'm referring to what's explained in p2-semantics 3.1.4.2:

>    If Content-Location is included in a 2xx (Successful) response
>    message and its field-value refers to a URI that differs from the
>    effective request URI, then the origin server claims that the URI is
>    an identifier for a different resource corresponding to the enclosed
>    representation.  Such a claim can only be trusted if both identifiers
>    share the same resource owner, which cannot be programmatically
>    determined via HTTP.
> 
>    o  For a response to a GET or HEAD request, this is an indication
>       that the effective request URI refers to a resource that is
>       subject to content negotiation and the Content-Location field-
>       value is a more specific identifier for the selected
>       representation.
> 
>    o  For a 201 (Created) response to a state-changing method, a
>       Content-Location field-value that is identical to the Location
>       field-value indicates that this payload is a current
>       representation of the newly created resource.
> 
>    o  Otherwise, such a Content-Location indicates that this payload is
>       a representation reporting on the requested action's status and
>       that the same report is available (for future access with GET) at
>       the given URI.  For example, a purchase transaction made via a
>       POST request might include a receipt document as the payload of
>       the 200 (OK) response; the Content-Location field-value provides
>       an identifier for retrieving a copy of that same receipt in the
>       future.



>>> Using some syndication format like Atom can disambiguate this through a link rel="self" relationship, but our goal is to page resources directly rather than embedding them in an syndication framework.
>>> 
>> Sorry, what does that *mean*? Let's talk about formats and protocols, not frameworks.
> 
> I mean that Atom is a stack of a protocol, a format, and some discipline about a nested format. We're not using an intermediate format to contain our pages; we're just using HTTP to identify the pages.

So, if you really want to do that, it can be done with a HTTP header. However, I really question the need to do so; it seems like you're trying to 
simplify your format/application by pushing the complexity into HTTP itself -- thereby making the protocol more complex for everyone.


>>> in a request, we don't want the metadata about that first page to be taken as the data about the requested resource. For instance, <X> has 500 entries and <X;page=1> has 10 of them or <X> is Bob's patient record and <X/byClinic/Mayo> is Bob's history at Mayo Clinic.
>> 
>> Right, and you can make that explicit in the representations of the resource. What's the problem?
> 
> That would mean that HTTP clients in general couldn't tell whether they recieved a representation of the requested resource (which is the current expecation with 200) or were shunted off to a different resource. That would only be known clients willing and able to parse that representation, which would be like requiring clients to parse 303s response bodies.

OK, I'm getting what you want to do now, I think.

The problem is that fundamentally, HTTP doesn't allow a representation about one resource to make an authoritative assertion about another one; that's all "above" HTTP, in applications that are using it. This is pretty fundamental.  

It's possible to build an application on top of HTTP that ties things together in pages, but building this capability into the protocol by having resource A make authoritative assertions about resource B is going to break things.

At most, your 2NN status code could have the semantic of "resource A asserts that this response carries a representation of resource B", but as discussed, HTTP caching couldn't do anything with that, nor could anything at "the HTTP layer" trust that assertion, so you *are* doing something outside of HTTP here.

Sorry if I seem obstinate, it's just that what you're trying to do seems quite... unnatural, from a HTTP perspective.

Cheers,

P.S. If you're just trying to avoid round trips, I will reiterate that HTTP/2 Server Push is what you're looking for; it allows the server to proactively push responses into a client cache.


--
Mark Nottingham   http://www.mnot.net/
Received on Sunday, 16 March 2014 07:31:42 UTC