- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 19 Sep 2014 13:20:57 -0400
- To: ietf-http-wg@w3.org
On 09/17/2014 12:14 AM, Amos Jeffries wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 17/09/2014 6:20 a.m., Sandro Hawke wrote: >> On 09/16/2014 04:10 AM, "Martin J. Dürst" wrote: >>> Hello Sandro, others, >>> >>> On 2014/09/16 10:13, Sandro Hawke wrote: >>>> Earlier today the LDP Working Group discussed the matter of >>>> whether we could use range headers instead of separate page >>>> URIs. Use of Range headers was suggested on this list >>>> recently. >>>> >>>> Our conclusion was still "no", for the following reasons. >>>> Please let us know if you see a good solution to any/all of >>>> them: >>>> >>>> 1. We don't know how the server would initiate use of Range. >>>> With our current separate-page design, the server can do a 303 >>>> redirect to the first page if it determines the representation >>>> of the entire resource is too big. The question here is what >>>> to do when the client didn't anticipate this possibility. >>>> True, the 303 isn't a great solution either, since unprepared >>>> clients might not handle it well either. Perhaps one should >>>> give a 4xx or 5xx when the client asks for a giant resource >>>> without a range header...? But there's no "representation >>>> too big" code defined. >>> Can't you still use a 303 if there's no indication that the >>> client understands tuple ranges? >>> >> What Location would the 303 redirect to? With Range, the >> individual sub-parts wouldn't have their own URIs. >> >> Maybe it would redirect to a page which explained that the resource >> was too big, and gave some metadata, possibly including the first >> few and last few elements. >> >>>> 2. We don't know how we could do safe changes. With our >>>> current design, it's possible for the resource to change while >>>> paging is happening, and the client ends up with a >>>> representation whose inaccuracy is bounded by the extent of the >>>> change. The data is thus still usually perfectly usable. (If >>>> such a change is not acceptable, the client can of course >>>> detect the change using etags and restart.) This bounded >>>> inaccuracy a simple and practical concept with RDF (in a way it >>>> isn't with arbitrary byte strings). Just using Range, a >>>> deletion would often result in data unrelated to the change >>>> being dropped from what the client sees. >>> Why isn't this the case in your solution? In order to work, don't >>> you essentially have to remember exactly how far the client read? >>> If you have various clients, one that started before the first >>> change, one after the first but before the second change, and so >>> on, how is the server going to keep track of how far the client >>> got? >>> >> You seem be to be thinking that pages are numbered. >> >> Instead one can use HATEOAS and embed a place marker in the next >> and prev URIs. If those place markers are data values instead of >> indexes, then insert/delete are handled properly. >> >> This is explained in: http://www.w3.org/TR/ldp-paging/#ldpr-impl >> >> >>>> I suppose perhaps one could use some kind of tombstones to >>>> avoid this problem, not closing in gaps from deletion. >>>> Basically, a client might ask for triples 0-9 and only get 3 >>>> triples because the others were deleted? Does that make sense >>>> with Range? Is it okay to not have the elements be >>>> contiguous? >>> It definitely wouldn't make sense for byte ranges, but I think >>> it should be okay if you define tuple ranges to work that way. >>> >> I appreciate that you think that. Do you have any evidence that >> there is consensus around that idea? I can easily imagine other >> people will come along who would have a big problem with >> non-contiguous ranges. > "contiguous" is optional. You are defining how the tuple range unit is > syntaxed. The only restrictions HTTP places on it is that it conforms > to token character set, and fetching a range tuple produces the same data. > > You can even specify two tuple types, one for contiguous and one for > non-contiguous if you really have to. > > It is also relative to ETag. With each resource edit the ETag needs to > be updated to signal the change. The HTTP infrastructure treats two > range responses with identical ETag as being combinable into one > response, in either storage or delivery to the client. Differing Etag > and the responses must be kept separate and fetched separately by the > client. > > >> It would be awkward if that happened after we re-did the spec to >> use ranges. >> >> Also, does anyone know the standardization route for making a range >> type of RDF triples? Does that have to be an RFC or can it be an >> external spec, like media types? > http://tools.ietf.org/html/rfc7233#section-5.1 > > IETF review / RFC. Thanks for the pointer. I still can't tell if the text defining the new range type MUST be in an RFC or can be in a non-RFC formal open specification, as it can with media type and link type registrations. I also don't know (forgive me) what "IETF review" means. Who needs to be convinced, and how many days will it take? >>>> 3. Many of our usual RDF data systems don't support retrieval >>>> of ranges by integer sequence numbers. While some database >>>> systems have an internal integer row number in every table that >>>> could be used for Range, many others do not, and we don't know >>>> of a straightforward and appropriate way to add it. >>> So how are you going to implement paged views? I'd be surprised >>> if there are no sequence numbers but each tuple has a page >>> number. >>> >> As above. >> >>>> 4. Finally, there was some question as to whether the Web >>>> infrastructure has any useful support for non-byte ranges. This >>>> is perhaps not an objection, but it came up during the >>>> discussion, and we'd be interested in any data people have on >>>> this. >>> By infrastructure, do you mean caches? I don't think there is >>> much support yet, but I'm not an expert. >>> >> Caches, server stacks, clients stacks, deep packet inspectors, and >> other things I probably don't know about. > The infrastructure has mandatory support for the two failover actions: > Either > ensure that non-byte Ranges are passed to the server and treated as > non-cacheable > Or, > that the Accept-Range header is pruned such that the server is not > enticed to delivering non-byte ranges over infrastructure which will > break processing. > > In my experience the first action is more widely available from the > middleware infrastructure which either ignores Range entirely, or > caches selectively what it can and lets the rest pass untouched. Sounds reasonable. > >>>> Bottom line is we still think just using >>>> rel=first/last/next/prev, among distinct resources, is a pretty >>>> reasonable design. And if we're doing that, it'd be nice to >>>> have 2nn Contents-of-Related. >>> Maybe this question has come up before: If you have 1M of tuples, >>> and decide that you have to serve them in pages of 1K, how much >>> efficiency do you gain by having the first download >>> short-circuited, i.e. what's the efficiency gain of one roundtrip >>> saved over 1000 roundtrips? >>> >> In this case, I'm just the messenger. I'll have to ask about that >> and get back to you. >> >>> With a range-based design, various ranges can be downloaded in >>> parallel, >> Good point, I hadn't thought of that. Still, why would that every >> be useful? > Collapsing those 1000 round trips into just 2 with a pipeline, and > greatly reducing the opportunity for any parallel editing to interfere > with the server responses. If it was okay to stream it all, we wouldn't be trying to send it in little chunks. > > Take the mythical foo range type, where each letter A to ZZ represents > a block of data in the resource. In reality this could be a numeric > chapter number or a row hash ID provided that sequence was predictable > by the client. > > client: > GET / HTTP/1.1 > Accept-Ranges:foo > > server: > HTTP/1.1 206 Partial > Range: foo=A/A-ZZ > ETag: "A-ZZ_hash" This dialog does not appear correct. You seem to be using Accept-Ranges as a request header, and allowing the server to specify the range in a response header. As I read RFC-7233 Accept-Ranges is a response header, Range is a request header, and then Content-Range is the corresponding response header. > client: > GET / HTTP/1.1 > Accept-Ranges:foo > Range: foo=B/A-ZZ > ETag: "A-ZZ_hash" > > GET / HTTP/1.1 > Accept-Ranges:foo > Range: foo=C/A-ZZ > ETag: "A-ZZ_hash" > > GET / HTTP/1.1 > Accept-Ranges:foo > Range: foo=D/A-ZZ > ETag: "A-ZZ_hash" > > ... > > The UI behaviour is that the first chapter/row/whatever is delivered > immediately signalling how many there are and that range based support > is working. The display or client processing can proceed incrementally > like those update-on-scroll pages we see on some popular sites - > without needing long-polling or WebSocket connections. What's the advantage of asking for chapters 0, 1, and 2 in separate requests? If the client knows it wants all three, why not ask for 0-2? What does any of this have to do with long-polling or WebSockets? Those are techniques for notifying a client of new information. -- Sandro > >>> or the client can adjust ranges based on throughput,..., but with >>> your rel=first/last/next/prev design, you seem to be much more >>> constrained. >> We do have a Prefer header of page size, so clients can adjust >> that. I'd say there are different constraints. With Range, the >> server has less ability to negotiate, and there's no easy way to >> offer metadata. > Range has opportunities for metadata in the request/response message > headers, in the multipart segment headers per-range within response > payload, and again in the format of the data within those response > payload segments. > > Amos > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.22 (MingW32) > > iQEcBAEBAgAGBQJUGQrBAAoJELJo5wb/XPRjyJ0H/2rA/zFe9sYm6NouZTZ8gBU+ > W7OA6YqDq3kVCp+l9FV+5a2YVL0xW+DZC1mcHNrVnDbMOXKEQ568Dyuw0QDYXieR > NeeMLNpG4+UB18TKo4hs28R5pcgq4oXqo1IUTAg8vmhhAa2q1QMOEzvQQcDdjGMl > Ax+ZcmVQMl0w4E36D2m61T65fYr/gRWrgJ10r/CpwgINpVXd3DpE4Ikccr8E1j8h > Q9+wpwAyTLu5j+JFIU9kwlJMFEgxGnr4hG4crqufpx9dUkQX55HvNvSac1cu5UPh > MB9auHuTxAilfvLlL2imJuzpXShL2cKUgQIhAmzxKV2+mvab3xaCBOC4p9Quxnw= > =Zx48 > -----END PGP SIGNATURE----- > >
Received on Friday, 19 September 2014 17:20:59 UTC