Re: Why Range doesn't work for LDP "paging" (cf 2NN Contents-of-Related)

On 09/16/2014 04:10 AM, "Martin J. Dürst" wrote:
> Hello Sandro, others,
>
> On 2014/09/16 10:13, Sandro Hawke wrote:
>> Earlier today the LDP Working Group discussed the matter of whether we
>> could use range headers instead of separate page URIs.  Use of Range
>> headers was suggested on this list recently.
>>
>> Our conclusion was still "no", for the following reasons. Please let us
>> know if you see a good solution to any/all of them:
>>
>> 1.  We don't know how the server would initiate use of Range. With our
>> current separate-page design, the server can do a 303 redirect to the
>> first page if it determines the representation of the entire resource is
>> too big.   The question here is what to do when the client didn't
>> anticipate this possibility.  True, the 303 isn't a great solution
>> either, since unprepared clients might not handle it well either.
>> Perhaps one should give a 4xx or 5xx when the client asks for a giant
>> resource without a range header...?   But there's no "representation too
>> big" code defined.
>
> Can't you still use a 303 if there's no indication that the client 
> understands tuple ranges?
>

What Location would the 303 redirect to?   With Range, the individual 
sub-parts wouldn't have their own URIs.

Maybe it would redirect to a page which explained that the resource was 
too big, and gave some metadata, possibly including the first few and 
last few elements.

>> 2.  We don't know how we could do safe changes.  With our current
>> design, it's possible for the resource to change while paging is
>> happening, and the client ends up with a representation whose inaccuracy
>> is bounded by the extent of the change.  The data is thus still usually
>> perfectly usable.  (If such a change is not acceptable, the client can
>> of course detect the change using etags and restart.)   This bounded
>> inaccuracy a simple and practical concept with RDF (in a way it isn't
>> with arbitrary byte strings). Just using Range, a deletion would often
>> result in data unrelated to the change being dropped from what the
>> client sees.
>
> Why isn't this the case in your solution? In order to work, don't you 
> essentially have to remember exactly how far the client read? If you 
> have various clients, one that started before the first change, one 
> after the first but before the second change, and so on, how is the 
> server going to keep track of how far the client got?
>

You seem be to be thinking that pages are numbered.

Instead one can use HATEOAS and embed a place marker in the next and 
prev URIs.   If those place markers are data values instead of indexes, 
then insert/delete are handled properly.

This is explained in: http://www.w3.org/TR/ldp-paging/#ldpr-impl


>
>> I suppose perhaps one could use some kind of tombstones
>> to avoid this problem, not closing in gaps from deletion. Basically, a
>> client might ask for triples 0-9 and only get 3 triples because the
>> others were deleted?  Does that make sense with Range?   Is it okay to
>> not have the elements be contiguous?
>
> It definitely wouldn't make sense for byte ranges, but I think it 
> should be okay if you define tuple ranges to work that way.
>

I appreciate that you think that.   Do you have any evidence that there 
is consensus around that idea?  I can easily imagine other people will 
come along who would have a big problem with non-contiguous ranges.

It would be awkward if that happened after we re-did the spec to use ranges.

Also, does anyone know the standardization route for making a range type 
of RDF triples?   Does that have to be an RFC or can it be an external 
spec, like media types?

>
>> 3.  Many of our usual RDF data systems don't support retrieval of ranges
>> by integer sequence numbers.   While some database systems have an
>> internal integer row number in every table that could be used for Range,
>> many others do not, and we don't know of a straightforward and
>> appropriate way to add it.
>
> So how are you going to implement paged views? I'd be surprised if 
> there are no sequence numbers but each tuple has a page number.
>

As above.

>
>> 4.  Finally, there was some question as to whether the Web
>> infrastructure has any useful support for non-byte ranges. This is
>> perhaps not an objection, but it came up during the discussion, and we'd
>> be interested in any data people have on this.
>
> By infrastructure, do you mean caches? I don't think there is much 
> support yet, but I'm not an expert.
>

Caches, server stacks, clients stacks, deep packet inspectors, and other 
things I probably don't know about.

>
>> Bottom line is we still think just using rel=first/last/next/prev, among
>> distinct resources, is a pretty reasonable design.   And if we're doing
>> that, it'd be nice to have 2nn Contents-of-Related.
>
> Maybe this question has come up before: If you have 1M of tuples, and 
> decide that you have to serve them in pages of 1K, how much efficiency 
> do you gain by having the first download short-circuited, i.e. what's 
> the efficiency gain of one roundtrip saved over 1000 roundtrips?
>

In this case, I'm just the messenger.   I'll have to ask about that and 
get back to you.

> With a range-based design, various ranges can be downloaded in parallel, 

Good point, I hadn't thought of that.   Still, why would that every be 
useful?

> or the client can adjust ranges based on throughput,..., but with your 
> rel=first/last/next/prev design, you seem to be much more constrained.

We do have a Prefer header of page size, so clients can adjust that.   
I'd say there are different constraints.  With Range, the server has 
less ability to negotiate, and there's no easy way to offer metadata.

BTW, I still have your calendar nature photos on the wall above the desk 
I inherited from you in 2004.

        - Sandro

>
>
> Regards,     Martin.
>

Received on Tuesday, 16 September 2014 18:20:44 UTC