Re: 303 for paging; was Re: 2NN Contents Of Related (303 Shortcut) from Sandro Hawke on 2014-09-07 (ietf-http-wg@w3.org from July to September 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Sun, 07 Sep 2014 15:59:14 -0400
To: "Roy T. Fielding" <fielding@gbiv.com>
CC: Martin Thomson <martin.thomson@gmail.com>, Eric Prud'hommeaux <eric@w3.org>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, "Julian F. Reschke" <julian.reschke@gmx.de>
Message-ID: <540CB912.2040404@w3.org>
On 09/07/2014 01:30 PM, Roy T. Fielding wrote:
> On Sep 7, 2014, at 7:43 AM, Sandro Hawke <sandro@w3.org 
> <mailto:sandro@w3.org>> wrote:
>
>> On 09/05/2014 01:54 PM, Roy T. Fielding wrote:
>>>> >The actual driving force behind this I-D is not about using 303s to deal with httpRange-14, it's to deal with paging.  That is, the client does a GET on A, including these request headers:
>>>> >
>>>> >   Prefer: contents-of-related
>>>> >   Prefer: return=representation; max-triple-count="100"
>>>> >
>>>> >and now the server can directly provide the first hundred triples, via a representation of B, which is that the first "page" of A.
>>> The first hundred triples is a representation of resource A.
>>> There is no requirement, anywhere, that representations be complete.
>>> Prefer in this case is just another form of content negotiation and
>>> the response is 200.  Responding 303 in this case would be wrong,
>>> as would 2NN.
>>>
>>
>> This is a crucial point.    If even responding 303 is wrong, we have 
>> a bigger problem.
>>
>> It sounds to me like you're saying that a chapter of a book is the 
>> same thing as the entire book.
>
> No, you are assuming it is a book.
>
>> Let's say we have an online textbook available at:
>>
>>     http://example.org/WebDesign
>>
>> and each of its 40 chapters is available as a separate web page, with 
>> chapter number CC being available at
>>
>>     http://example.org/WebDesign?chapter={CC}
>>
>> Now imagine a client does GET http://example.org/WebDesign, with a 
>> prefer header saying it's fine to just send the first chapter if the 
>> book is too big.
>
> Then that resource is more than just a book. It provides views of the 
> resource state, which is no more or less than what content negotiation 
> provides in HTTP.
>
>
>> I think you're saying it would be fine for the server to respond 200 
>> OK, Content-Location: http://example.org/WebDesign?chapter=1, and 
>> give the content of that first chapter.
>
> The server can respond however they like. It is only the consistency 
> of those responses that defines what the resource might be, to any 
> extent that matters.
>

Okay, thanks for the explanation.

I don't see this leading to 303 being wrong, though.   Isn't it up to us 
(the group defining how LDP servers behave when answering for LDP 
resources) to define whether LDP resources can or cannot be represented 
by subset pages of it?

So far, we've said the representations of these resources (technically 
LDP RDF Sources) are RDF serializations of all the triples which 
comprise the state of the resource.   (There's one exception, where 
certain redundant triples can be omitted when the client sends the 
appropriate Prefer header.)

I hear you saying we wouldn't be violating any HTTP specs by saying, 
"Actually the representation could be subsets of that graph state."

But I don't see anything mandating that approach either.   My 
understanding is RDF software is pretty much always written with the 
assumption that the representation will include a serialization of every 
triple.   I think we'd prefer to align this work with that community 
practice.

>> To my understanding, that's wrong, because it violates the semantics 
>> of 200 OK and Content-Location.  Specifically, since a book is not 
>> the same thing as its first chapter, http://example.org/WebDesign and 
>> http://example.org/WebDesign?chapter=1 are distinct resources.  If 
>> they are distinct we can't use 200 OK+CL to respond to one with the 
>> other.
>
> But the first URI is not a book. You cannot define it as one thing and 
> then say its behavior doesn't fit that definition. It is not that thing.
>
>> I think I hear you saying that resources being distinct doesn't 
>> matter, that the notion of "representation" is much fuzzier than 
>> that.  I think you're saying that even though a chapter of a book and 
>> a book are different, it's fine to response 200 OK and give the text 
>> of the chapter as a representation of the book (assuming there was 
>> some negotiation licensing such behavior).
>
> No, I am saying that the responses define the resource, not the other 
> way around. If you have distinct URIs with decidedly different 
> behavior, like your chapters, then they are of course distinct 
> resources. So is the resource that always responds with a complete 
> book. It is defined by what it does.
>

Right, that's our model.   We have resource like books, and we have 
resources like chapters.   We think of them as distinct types of things, 
and define specific behaviors for each of them, include headers they 
provide to indicate what they are, and restrictions on how they respond 
to GET, POST, etc.

We expect that sometimes some "books" will get to be quite large (many 
GB).  So we want the servers to be able to respond to a naive GET on a 
"book" with a 303 to the first chapter.   If it tried to stream the 
whole thing, we expect we'd have real problems.   We encourage clients 
to say what max response size they want; servers should use that 
information and send the biggest size they and the client are okay with.

Do you see any problem with that use of 303?    I hope not.

The one problem we see is that it's an extra roundtrip.    One could 
argue so what, but our application developers seem to care, so thus the 
idea of a 2NN which has the semantics of bundling the 303  and a 
subsequent 200 into one round-trip.

As you've pointed out, this wont be able to use existing caches, and has 
some other caching weaknesses.

> A successful response to GET is a representation of the target 
> resource. Range, for example, does not alter the representation; it 
> simply provides a range of that representation in the payload of a 
> 206. Conneg doesn't alter the representation either. It merely selects 
> one of the available representations.
>
> So, if 2NN is a success to GET, it has to convey a selected 
> representation of that resource.

Really?    Certainly for 200, but why for 2NN? 
http://tools.ietf.org/html/rfc7231#page-51 section 6.3 doesn't say every 
2xx has to be like that.

>  Not some other resource, though it is fine for a single 
> representation to represent multiple resources. 3xx, in contrast, says 
> the request was not successful but this other thing might be just as good.
>
> There is nothing wrong with having a representation of a resource that 
> is a paged view. The only need is that something in the representation 
> data or metadata indicates that view, preferably with prev, this, and 
> next links. CL is a "this" link. The resource is then partially 
> defined by the fact that it offers views of the overall state. All 200 
> responses. No big deal.
>
>> If the HTTP WG really has consensus on that idea, I guess I can live 
>> with it, even though it's counterintuitive to me.
>>
>> But how far does this go?    When is 200+CL not okay? Could the first 
>> sentence of the book be a representation of it?   How about the the 
>> first letter?   How about the 10th letter?    How about the 13th, 
>> 7th, and 22nd letters, in that order?    How about the first word of 
>> a different book?   How about the first sentence of a different book? 
>>   How about the entire contents of a different book?    I don't see 
>> how you can draw a line here, with this way of thinking about it.
>
> There are no lines. There is no need for any.
>
>> To me, and the LDP WG, it's made a lot more sense to think of a 
>> chapter as simply being a different resource than the book, so if the 
>> server's going to give back a representation of the chapter, it can't 
>> use 200 OK.
>
> The server is in control of what it's resources mean. If a resource is 
> not limited to being an entire book, then it isn't an entire book. 
> There's no reason to pretend otherwise.

LDP wants certain machine interoperability characteristics which are not 
in standard HTTP, so it's defining a more restrictive class of 
resources, which are less free to behave as they want.

          -- Sandro

>
> ....Roy.
Received on Sunday, 7 September 2014 19:59:25 UTC