Re: 303 for paging; was Re: 2NN Contents Of Related (303 Shortcut) from Amos Jeffries on 2014-09-08 (ietf-http-wg@w3.org from July to September 2014)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Mon, 08 Sep 2014 14:48:36 +1200
To: ietf-http-wg@w3.org
Message-ID: <540D1904.4040801@treenet.co.nz>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/09/2014 7:59 a.m., Sandro Hawke wrote:
> On 09/07/2014 01:30 PM, Roy T. Fielding wrote:
>> On Sep 7, 2014, at 7:43 AM, Sandro Hawke <sandro@w3.org 
>> <mailto:sandro@w3.org>> wrote:
>> 
>>> On 09/05/2014 01:54 PM, Roy T. Fielding wrote:
>>>>>> The actual driving force behind this I-D is not about
>>>>>> using 303s
>>>>> to deal with httpRange-14, it's to deal with paging.  That
>>>>> is, the client does a GET on A, including these request
>>>>> headers:
>>>>>> 
>>>>>> Prefer: contents-of-related Prefer:
>>>>>> return=representation; max-triple-count="100"
>>>>>> 
>>>>>> and now the server can directly provide the first hundred
>>>>>> triples,
>>>>> via a representation of B, which is that the first "page"
>>>>> of A.
>>>> The first hundred triples is a representation of resource A. 
>>>> There is no requirement, anywhere, that representations be
>>>> complete. Prefer in this case is just another form of content
>>>> negotiation and the response is 200.  Responding 303 in this
>>>> case would be wrong, as would 2NN.
>>>> 
>>> 
>>> This is a crucial point.    If even responding 303 is wrong, we
>>> have a bigger problem.
>>> 
>>> It sounds to me like you're saying that a chapter of a book is
>>> the same thing as the entire book.
>> 
>> No, you are assuming it is a book.
>> 
>>> Let's say we have an online textbook available at:
>>> 
>>> http://example.org/WebDesign
>>> 
>>> and each of its 40 chapters is available as a separate web
>>> page, with chapter number CC being available at
>>> 
>>> http://example.org/WebDesign?chapter={CC}
>>> 
>>> Now imagine a client does GET http://example.org/WebDesign,
>>> with a prefer header saying it's fine to just send the first
>>> chapter if the book is too big.
>> 
>> Then that resource is more than just a book. It provides views of
>> the resource state, which is no more or less than what content
>> negotiation provides in HTTP.
>> 
>> 
>>> I think you're saying it would be fine for the server to
>>> respond 200 OK, Content-Location:
>>> http://example.org/WebDesign?chapter=1, and give the content of
>>> that first chapter.
>> 
>> The server can respond however they like. It is only the
>> consistency of those responses that defines what the resource
>> might be, to any extent that matters.
>> 
> 
> Okay, thanks for the explanation.
> 
> I don't see this leading to 303 being wrong, though.   Isn't it up
> to us (the group defining how LDP servers behave when answering for
> LDP resources) to define whether LDP resources can or cannot be
> represented by subset pages of it?
> 
> So far, we've said the representations of these resources
> (technically LDP RDF Sources) are RDF serializations of all the
> triples which comprise the state of the resource.   (There's one
> exception, where certain redundant triples can be omitted when the
> client sends the appropriate Prefer header.)
> 
> I hear you saying we wouldn't be violating any HTTP specs by
> saying, "Actually the representation could be subsets of that graph
> state."
> 
> But I don't see anything mandating that approach either.   My 
> understanding is RDF software is pretty much always written with
> the assumption that the representation will include a serialization
> of every triple.   I think we'd prefer to align this work with that
> community practice.
> 
>>> To my understanding, that's wrong, because it violates the
>>> semantics of 200 OK and Content-Location.  Specifically, since
>>> a book is not the same thing as its first chapter,
>>> http://example.org/WebDesign and 
>>> http://example.org/WebDesign?chapter=1 are distinct resources.
>>> If they are distinct we can't use 200 OK+CL to respond to one
>>> with the other.
>> 
>> But the first URI is not a book. You cannot define it as one
>> thing and then say its behavior doesn't fit that definition. It
>> is not that thing.
>> 
>>> I think I hear you saying that resources being distinct
>>> doesn't matter, that the notion of "representation" is much
>>> fuzzier than that.  I think you're saying that even though a
>>> chapter of a book and a book are different, it's fine to
>>> response 200 OK and give the text of the chapter as a
>>> representation of the book (assuming there was some negotiation
>>> licensing such behavior).
>> 
>> No, I am saying that the responses define the resource, not the
>> other way around. If you have distinct URIs with decidedly
>> different behavior, like your chapters, then they are of course
>> distinct resources. So is the resource that always responds with
>> a complete book. It is defined by what it does.
>> 
> 
> Right, that's our model.   We have resource like books, and we
> have resources like chapters.   We think of them as distinct types
> of things, and define specific behaviors for each of them, include
> headers they provide to indicate what they are, and restrictions on
> how they respond to GET, POST, etc.


What you have is actually a "book with chapters" resource.

You can think of them as diffferent things, but they are coming from
the same resource data which makes them only representations of one
resource.


> 
> We expect that sometimes some "books" will get to be quite large
> (many GB).  So we want the servers to be able to respond to a naive
> GET on a "book" with a 303 to the first chapter.   If it tried to
> stream the whole thing, we expect we'd have real problems.   We
> encourage clients to say what max response size they want; servers
> should use that information and send the biggest size they and the
> client are okay with.
> 
> Do you see any problem with that use of 303?    I hope not.

303 is saying "dont fetch this book A", fetch "book B instead".
Directing the client at a whole different "book" resource - not a
"chapter".

What you are trying to achieve with 2NN is respond "here is chapter 1
of book A". That is done with existing HTTP in either of two ways:

 1) by client requesting URI "/bookA" with Accept-Ranges:chapter and
server responding with 206, Range:chapter=1, and payload for that chapter.

 2) by server emitting 303 to URI "/bookA?chapter=1".
   then client re-requesting "/bookA?chapter=1"

#1 allows the amount of ranges replied by the server to adapt
depending on size or bandwidth available. It is quite powerful.

#2 signals that this book A may *only* be fetched chapter at a time,
beginning with 1. Fetching the /bookA without a chapter indicated is
"wrong" for this resource so try again with the chapter part.



To cover both clients who participate in #1 and clients who dont, both
methods should be implemented by the server. So clients not sending
the conneg Accept-Ranges:chapter header can get the 303 with a
Vary:Accept-Ranges on the server responses.


> 
> The one problem we see is that it's an extra roundtrip.    One
> could argue so what, but our application developers seem to care,
> so thus the idea of a 2NN which has the semantics of bundling the
> 303  and a subsequent 200 into one round-trip.
> 

It is only one extra round trip for old/legacy clients who do not
advertise range conneg to use option #1 above. These are the same ones
which will also not support your new 2NN status properly. They can be
just as easily updated to support Range conneg as to support 2NN, and
Range does not loose any caching benefits.


> As you've pointed out, this wont be able to use existing caches,
> and has some other caching weaknesses.
> 
>> A successful response to GET is a representation of the target 
>> resource. Range, for example, does not alter the representation;
>> it simply provides a range of that representation in the payload
>> of a 206. Conneg doesn't alter the representation either. It
>> merely selects one of the available representations.
>> 
>> So, if 2NN is a success to GET, it has to convey a selected 
>> representation of that resource.
> 
> Really?    Certainly for 200, but why for 2NN? 
> http://tools.ietf.org/html/rfc7231#page-51 section 6.3 doesn't say
> every 2xx has to be like that.

Every 2NN has to act properly when handled as a 200 by a client or
intermediary which does not yet support that 2NN special handling.

This means Vary:Prefer, and Prefer headers are mandatory in the server
responses for "/bookA" when the Range support method (#1 above) is not
used [for the 1.1 implementations] and something like Expires header
indicating stale content to prevent 1.0 implementations caching the
special payload.


> 
>> Not some other resource, though it is fine for a single 
>> representation to represent multiple resources. 3xx, in contrast,
>> says the request was not successful but this other thing might be
>> just as good.
>> 
>> There is nothing wrong with having a representation of a resource
>> that is a paged view. The only need is that something in the
>> representation data or metadata indicates that view, preferably
>> with prev, this, and next links. CL is a "this" link. The
>> resource is then partially defined by the fact that it offers
>> views of the overall state. All 200 responses. No big deal.
>> 
>>> If the HTTP WG really has consensus on that idea, I guess I can
>>> live with it, even though it's counterintuitive to me.
>>> 
>>> But how far does this go?    When is 200+CL not okay? Could the
>>> first sentence of the book be a representation of it?   How
>>> about the the first letter?   How about the 10th letter?    How
>>> about the 13th, 7th, and 22nd letters, in that order?    How
>>> about the first word of a different book?   How about the first
>>> sentence of a different book? How about the entire contents of
>>> a different book?    I don't see how you can draw a line here,
>>> with this way of thinking about it.
>> 
>> There are no lines. There is no need for any.
>> 
>>> To me, and the LDP WG, it's made a lot more sense to think of
>>> a chapter as simply being a different resource than the book,
>>> so if the server's going to give back a representation of the
>>> chapter, it can't use 200 OK.
>> 
>> The server is in control of what it's resources mean. If a
>> resource is not limited to being an entire book, then it isn't an
>> entire book. There's no reason to pretend otherwise.
> 
> LDP wants certain machine interoperability characteristics which
> are not in standard HTTP, so it's defining a more restrictive class
> of resources, which are less free to behave as they want.

I think you missed the point here.
 The LDP defines how the resource acts, how it acts defines how HTTP
is used to transfer it.

The more I read of this thread the more I see Range being what you
really want to do.

Amos
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUDRkEAAoJELJo5wb/XPRjQj4H/1he7mRI0TU0kU+F+eNd3SP8
+TgrHlhIQ3frdjv+0OtYqeC/Xa1PxncMTNpGPDtr3tgk9QI5XiiSqfupRMoSbQ+1
HtCzjLC6PucXjZ11IVj+XZzFJ4FU25+4upxAw6ejIjTdzNMo0O/3ZOVyWgOdKj3v
mi/HFDx/i9P/1S3n6ebrINkoeSVwDqInvgQ0v46ktiQI+Mm7wTSzd8X8bCdRy/zn
NiWwCIhU0O6g7F2YRs0zZT7JdgC7yfECkues6jQ8BHV120aSVCptCy27CLjwfcPU
XaOKWS8OlqKnFp7yvNedrfeHsC1+21km5srbujs9IHPa8SFwv5u9fmmBGSmHz/s=
=kbfK
-----END PGP SIGNATURE-----
Received on Monday, 8 September 2014 02:49:13 UTC