Re: draft of 209 proposal

* Mark Nottingham <mnot@mnot.net> [2014-03-13 13:11+1100]
> 
> On 8 Mar 2014, at 2:56 am, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> > * Mark Nottingham <mnot@mnot.net> [2014-03-07 09:25+0000]
> >> Hi Eric,
> >> 
> >> PLH asked me to give some initial feedback on this draft. If you want proper feedback from the IETF, I’d encourage you to submit the I-D :)
> > 
> > Happy to, could you tell me what the "I-D" is?
> 
> Internet-Draft :)
> 
> 
> >> First of all, I’d like to understand what you think this status code is giving you over just using a 200 with Content-Location.
> > 
> > As you point out below, the semantics we want involve a redirect, specifically "I can't give you X but I can give you Y which describes it."
> 
> But it's not really a redirect; the semantics you want are "you asked for that, but I'm giving you this." That's 200 with a Content-Location, because the resource *is* making an assertion about something, even if it has a separate identity.

Jonathan Rees argue against this based on the philosophy of HTTP and I'll make that concrete with a paging example motivating that philosophy. Suppose github has users and replication partners. Replication partners can GET a large issue list but plebian users get shunted off to paged access. By your proposal, GET <https://api.github.com/repos/w3c/web-platform-tests/issues> would provide inconsistent representations of that resource:

user: GET /issues => 200, Location: /issues?page=1
repA: GET /issues Accept:text/turtle => 200, Location: /issues.ttl
repB: GET /issues Accept:text/json => 200, Location: /issues.json

REST says that /issues.ttl and /issues.json are representations of the same resource, as implied by a 200 + Content-Location, which is fine 'cause they have the same information. /issues?page=1 is markedly different, presenting only a piece of requested resource. POSTs and 303s relax the rules of Content-Location. 209 could as well, but relaxing them on 200 would be rather a surprise for REST.


> > Hey TAG folks, is there some writeup of the http-range-14 decisions to use 303 for '/' and 200/appropriate media type for '#' URLs?
> > 
> > 
> >> The GET->303 use case can be met by that, I think, and so can POST->303; these are already widely-understood patterns of use.
> > 
> > GET->303 works fine but it requires two round trips. The purpose of 209 (2xx) is to avoid a round trip. This is expected to be used in high volume services in the Linked Data Platform.
> 
> Right, but what I'm saying is that you can achieve the desired effect with POST->200+Content-Location.

Sure, but I expect you wouldn't want us trying to guess which resources we should GET and which we should interrogate by POST.


> >> The third use case (partial feeds) is already indicated in content (as per RFC5005, which you reference), so I’m again not sure what a new status code brings to the table.
> >> 
> >> Specifically, how will HTTP software behave differently when receiving this status code?
> > 
> > RFC5005 encourages the same identifier to be used for both the entire resource and a page of that resource.
> 
> "encourages" is perhaps too strong; that pattern is used in some of the examples, but it isn't required nor encouraged by the prose, IIRC.
> 
> > Using some syndication format like Atom can disambiguate this through a link rel="self" relationship, but our goal is to page resources directly rather than embedding them in an syndication framework.
> 
> Sorry, what does that *mean*? Let's talk about formats and protocols, not frameworks.

I mean that Atom is a stack of a protocol, a format, and some discipline about a nested format. We're not using an intermediate format to contain our pages; we're just using HTTP to identify the pages.


> > If some server is unwilling to serve the whole resource
> 
> representation
> 
> > in a request, we don't want the metadata about that first page to be taken as the data about the requested resource. For instance, <X> has 500 entries and <X;page=1> has 10 of them or <X> is Bob's patient record and <X/byClinic/Mayo> is Bob's history at Mayo Clinic.
> 
> Right, and you can make that explicit in the representations of the resource. What's the problem?

That would mean that HTTP clients in general couldn't tell whether they recieved a representation of the requested resource (which is the current expecation with 200) or were shunted off to a different resource. That would only be known clients willing and able to parse that representation, which would be like requiring clients to parse 303s response bodies.


> >> Also, you say:
> >> 
> >>> Caching semantics are for the response are the same as for a 200 response to a GET on the target resource, though it is not necessary to include the Location header field as it is identical to the effective Request URI. Additionally, the 209 response to the initial request MAY be cached but it MUST include the Location header field in cached responses. Thus, a 209 response can be seen as providing two cache entries, a 200 response for the resource expressed in the Location header field and a 209 response for the initial resource.
> >> 
> >> That is not safe to do; in HTTP resource A can’t provide content to be cached under resource B’s URI. The security folks won’t let this happen, full stop, and the WG has demonstrated strong consensus on this matter in the past.
> > 
> > Yeah, I expect that even respecting a same-origin constraint won't be sufficient assurance to permit polluting other peoples' caches.
> 
> Indeed.
> 
> > That said, app-specific caches are likely to do this routinely; should an RFC mention this?
> 
> Probably not.
> 
> 
> --
> Mark Nottingham   http://www.mnot.net/
> 
> 
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Saturday, 15 March 2014 22:23:23 UTC