Re: 303 for paging; was Re: 2NN Contents Of Related (303 Shortcut) from Sandro Hawke on 2014-09-08 (ietf-http-wg@w3.org from July to September 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 08 Sep 2014 09:20:24 -0400
To: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <540DAD18.6030709@w3.org>
On 09/07/2014 10:48 PM, Amos Jeffries wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 8/09/2014 7:59 a.m., Sandro Hawke wrote:
>> On 09/07/2014 01:30 PM, Roy T. Fielding wrote:
>>> On Sep 7, 2014, at 7:43 AM, Sandro Hawke <sandro@w3.org
>>> <mailto:sandro@w3.org>> wrote:
>>>
>>>> On 09/05/2014 01:54 PM, Roy T. Fielding wrote:
>>>>>>> The actual driving force behind this I-D is not about
>>>>>>> using 303s
>>>>>> to deal with httpRange-14, it's to deal with paging.  That
>>>>>> is, the client does a GET on A, including these request
>>>>>> headers:
>>>>>>> Prefer: contents-of-related Prefer:
>>>>>>> return=representation; max-triple-count="100"
>>>>>>>
>>>>>>> and now the server can directly provide the first hundred
>>>>>>> triples,
>>>>>> via a representation of B, which is that the first "page"
>>>>>> of A.
>>>>> The first hundred triples is a representation of resource A.
>>>>> There is no requirement, anywhere, that representations be
>>>>> complete. Prefer in this case is just another form of content
>>>>> negotiation and the response is 200.  Responding 303 in this
>>>>> case would be wrong, as would 2NN.
>>>>>
>>>> This is a crucial point.    If even responding 303 is wrong, we
>>>> have a bigger problem.
>>>>
>>>> It sounds to me like you're saying that a chapter of a book is
>>>> the same thing as the entire book.
>>> No, you are assuming it is a book.
>>>
>>>> Let's say we have an online textbook available at:
>>>>
>>>> http://example.org/WebDesign
>>>>
>>>> and each of its 40 chapters is available as a separate web
>>>> page, with chapter number CC being available at
>>>>
>>>> http://example.org/WebDesign?chapter={CC}
>>>>
>>>> Now imagine a client does GET http://example.org/WebDesign,
>>>> with a prefer header saying it's fine to just send the first
>>>> chapter if the book is too big.
>>> Then that resource is more than just a book. It provides views of
>>> the resource state, which is no more or less than what content
>>> negotiation provides in HTTP.
>>>
>>>
>>>> I think you're saying it would be fine for the server to
>>>> respond 200 OK, Content-Location:
>>>> http://example.org/WebDesign?chapter=1, and give the content of
>>>> that first chapter.
>>> The server can respond however they like. It is only the
>>> consistency of those responses that defines what the resource
>>> might be, to any extent that matters.
>>>
>> Okay, thanks for the explanation.
>>
>> I don't see this leading to 303 being wrong, though.   Isn't it up
>> to us (the group defining how LDP servers behave when answering for
>> LDP resources) to define whether LDP resources can or cannot be
>> represented by subset pages of it?
>>
>> So far, we've said the representations of these resources
>> (technically LDP RDF Sources) are RDF serializations of all the
>> triples which comprise the state of the resource.   (There's one
>> exception, where certain redundant triples can be omitted when the
>> client sends the appropriate Prefer header.)
>>
>> I hear you saying we wouldn't be violating any HTTP specs by
>> saying, "Actually the representation could be subsets of that graph
>> state."
>>
>> But I don't see anything mandating that approach either.   My
>> understanding is RDF software is pretty much always written with
>> the assumption that the representation will include a serialization
>> of every triple.   I think we'd prefer to align this work with that
>> community practice.
>>
>>>> To my understanding, that's wrong, because it violates the
>>>> semantics of 200 OK and Content-Location.  Specifically, since
>>>> a book is not the same thing as its first chapter,
>>>> http://example.org/WebDesign and
>>>> http://example.org/WebDesign?chapter=1 are distinct resources.
>>>> If they are distinct we can't use 200 OK+CL to respond to one
>>>> with the other.
>>> But the first URI is not a book. You cannot define it as one
>>> thing and then say its behavior doesn't fit that definition. It
>>> is not that thing.
>>>
>>>> I think I hear you saying that resources being distinct
>>>> doesn't matter, that the notion of "representation" is much
>>>> fuzzier than that.  I think you're saying that even though a
>>>> chapter of a book and a book are different, it's fine to
>>>> response 200 OK and give the text of the chapter as a
>>>> representation of the book (assuming there was some negotiation
>>>> licensing such behavior).
>>> No, I am saying that the responses define the resource, not the
>>> other way around. If you have distinct URIs with decidedly
>>> different behavior, like your chapters, then they are of course
>>> distinct resources. So is the resource that always responds with
>>> a complete book. It is defined by what it does.
>>>
>> Right, that's our model.   We have resource like books, and we
>> have resources like chapters.   We think of them as distinct types
>> of things, and define specific behaviors for each of them, include
>> headers they provide to indicate what they are, and restrictions on
>> how they respond to GET, POST, etc.
>
> What you have is actually a "book with chapters" resource.
>
> You can think of them as diffferent things, but they are coming from
> the same resource data which makes them only representations of one
> resource.

It's common to have many URIs for resources all coming from the same 
database, of course.

>
>> We expect that sometimes some "books" will get to be quite large
>> (many GB).  So we want the servers to be able to respond to a naive
>> GET on a "book" with a 303 to the first chapter.   If it tried to
>> stream the whole thing, we expect we'd have real problems.   We
>> encourage clients to say what max response size they want; servers
>> should use that information and send the biggest size they and the
>> client are okay with.
>>
>> Do you see any problem with that use of 303?    I hope not.
> 303 is saying "dont fetch this book A", fetch "book B instead".
> Directing the client at a whole different "book" resource - not a
> "chapter".

Why can't it redirect to a chapter?

>
> What you are trying to achieve with 2NN is respond "here is chapter 1
> of book A". That is done with existing HTTP in either of two ways:
>
>   1) by client requesting URI "/bookA" with Accept-Ranges:chapter and
> server responding with 206, Range:chapter=1, and payload for that chapter.

I'm not familiar with why we rejected the Range design.   I'll let 
someone else answer with that, and/or look into it shortly.

>   2) by server emitting 303 to URI "/bookA?chapter=1".
>     then client re-requesting "/bookA?chapter=1"

That is in fact our primary mechanism.    But in one recent email, Roy 
said that wasn't okay.  So part of this thread is to confirm actually 
that protocol is okay.

> #1 allows the amount of ranges replied by the server to adapt
> depending on size or bandwidth available. It is quite powerful.

(will reply separately, as above)
> #2 signals that this book A may *only* be fetched chapter at a time,
> beginning with 1. Fetching the /bookA without a chapter indicated is
> "wrong" for this resource so try again with the chapter part.

Why are you being so restrictive?

What exactly would stop a client from doing a HEAD on /bookA and looking 
for rel=last if it wants to do reverse traversal?

What exactly would stop a server from providing /bookA instead of the 
redirect, if the clients gives different Prefer headers (with preferred 
sizes), or maybe the server load goes down?

>
>
> To cover both clients who participate in #1 and clients who dont, both
> methods should be implemented by the server. So clients not sending
> the conneg Accept-Ranges:chapter header can get the 303 with a
> Vary:Accept-Ranges on the server responses.

Agreed that we need to have the 303 mechanism in place for clients that 
don't know to ask for paging/ranges themselves.   That's in our paging spec.

>
>> The one problem we see is that it's an extra roundtrip.    One
>> could argue so what, but our application developers seem to care,
>> so thus the idea of a 2NN which has the semantics of bundling the
>> 303  and a subsequent 200 into one round-trip.
>>
> It is only one extra round trip for old/legacy clients who do not
> advertise range conneg to use option #1 above. These are the same ones
> which will also not support your new 2NN status properly. They can be
> just as easily updated to support Range conneg as to support 2NN, and
> Range does not loose any caching benefits.

Yes, if Range worked that would be great.   We'll check again on that.

>
>
>> As you've pointed out, this wont be able to use existing caches,
>> and has some other caching weaknesses.
>>
>>> A successful response to GET is a representation of the target
>>> resource. Range, for example, does not alter the representation;
>>> it simply provides a range of that representation in the payload
>>> of a 206. Conneg doesn't alter the representation either. It
>>> merely selects one of the available representations.
>>>
>>> So, if 2NN is a success to GET, it has to convey a selected
>>> representation of that resource.
>> Really?    Certainly for 200, but why for 2NN?
>> http://tools.ietf.org/html/rfc7231#page-51 section 6.3 doesn't say
>> every 2xx has to be like that.
> Every 2NN has to act properly when handled as a 200 by a client or
> intermediary which does not yet support that 2NN special handling.

Can you point me to the part of a spec which says clients should treat 
all unknown 2xx codes as if they were 200?     I don't think that's how 
unknown 2xx codes are supposed to be handled.

>
> This means Vary:Prefer, and Prefer headers are mandatory in the server
> responses for "/bookA" when the Range support method (#1 above) is not
> used [for the 1.1 implementations] and something like Expires header
> indicating stale content to prevent 1.0 implementations caching the
> special payload.

Yes

>
>>> Not some other resource, though it is fine for a single
>>> representation to represent multiple resources. 3xx, in contrast,
>>> says the request was not successful but this other thing might be
>>> just as good.
>>>
>>> There is nothing wrong with having a representation of a resource
>>> that is a paged view. The only need is that something in the
>>> representation data or metadata indicates that view, preferably
>>> with prev, this, and next links. CL is a "this" link. The
>>> resource is then partially defined by the fact that it offers
>>> views of the overall state. All 200 responses. No big deal.
>>>
>>>> If the HTTP WG really has consensus on that idea, I guess I can
>>>> live with it, even though it's counterintuitive to me.
>>>>
>>>> But how far does this go?    When is 200+CL not okay? Could the
>>>> first sentence of the book be a representation of it?   How
>>>> about the the first letter?   How about the 10th letter?    How
>>>> about the 13th, 7th, and 22nd letters, in that order?    How
>>>> about the first word of a different book?   How about the first
>>>> sentence of a different book? How about the entire contents of
>>>> a different book?    I don't see how you can draw a line here,
>>>> with this way of thinking about it.
>>> There are no lines. There is no need for any.
>>>
>>>> To me, and the LDP WG, it's made a lot more sense to think of
>>>> a chapter as simply being a different resource than the book,
>>>> so if the server's going to give back a representation of the
>>>> chapter, it can't use 200 OK.
>>> The server is in control of what it's resources mean. If a
>>> resource is not limited to being an entire book, then it isn't an
>>> entire book. There's no reason to pretend otherwise.
>> LDP wants certain machine interoperability characteristics which
>> are not in standard HTTP, so it's defining a more restrictive class
>> of resources, which are less free to behave as they want.
> I think you missed the point here.
>   The LDP defines how the resource acts, how it acts defines how HTTP
> is used to transfer it.

I don't understand that sentence.   What does the second "it" refer 
to?   It looks like "the resource", but of course one doesn't transfer 
resources, one transfers representations.

>
> The more I read of this thread the more I see Range being what you
> really want to do.

Could be.   We'll check on that again.

It's possible we did this design before realizing Range could be used 
with units other than bytes.   Do you know of any deployments with 
non-byte units?

Is there a way to make Range work with changing resource state? What 
happens if you have 100k units, and the client is reading 100 at a time, 
and an item gets inserted or deleted near the beginning.   It looks to 
me like the client would miss a boundary item or see a duplicated item. 
   (We've addressed this in paging by varying the page size when that 
happens.)    This seems like it'd be outside of what the folks designing 
Range were expecting, and so it wouldn't be properly handled.    Hoping 
I'm wrong....

        -- Sandro

>
> Amos
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (MingW32)
>
> iQEcBAEBAgAGBQJUDRkEAAoJELJo5wb/XPRjQj4H/1he7mRI0TU0kU+F+eNd3SP8
> +TgrHlhIQ3frdjv+0OtYqeC/Xa1PxncMTNpGPDtr3tgk9QI5XiiSqfupRMoSbQ+1
> HtCzjLC6PucXjZ11IVj+XZzFJ4FU25+4upxAw6ejIjTdzNMo0O/3ZOVyWgOdKj3v
> mi/HFDx/i9P/1S3n6ebrINkoeSVwDqInvgQ0v46ktiQI+Mm7wTSzd8X8bCdRy/zn
> NiWwCIhU0O6g7F2YRs0zZT7JdgC7yfECkues6jQ8BHV120aSVCptCy27CLjwfcPU
> XaOKWS8OlqKnFp7yvNedrfeHsC1+21km5srbujs9IHPa8SFwv5u9fmmBGSmHz/s=
> =kbfK
> -----END PGP SIGNATURE-----
>
>
Received on Monday, 8 September 2014 13:20:38 UTC