RE: request for verification: paging in TPF from Markus Lanthaler on 2015-11-02 (public-hydra@w3.org from November 2015)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Mon, 2 Nov 2015 20:29:04 +0100
To: <public-linked-data-fragments@w3.org>
Message-ID: <011b01d115a4$bc9adaa0$35d08fe0$@gmx.net>

On 1 Nov 2015 at 22:32, Ruben Verborgh wrote:
>>>    Any page other than the last page MUST NOT be empty.
>>> Would this make sense?
>> 
>> What would break if it would be empty? Nothing AFAICT.
>> It would just become less efficient.
> 
> One case breaks, and it's one that already occurred in two implementations
so far:
> 
> Suppose that a fragment has 253 triples, there are 100 triples per page.
> Imagine a client that needs to fetch the entire fragment.
> - The client fetches page 1. It has 100 triples, the count estimate is
250. It links to page 2.
> - The client fetches page 2. It has 100 triples, the count estimate is
250. It links to page 3.
> - The client fetches page 3. It has 53 triples, the count estimate is 250.
It links to page 4.
> - The client fetches page 4. It has 0 triples, the count estimate is 250.
It links to page 5.
> - The client fetches page 5. It has 0 triples, the count estimate is 250.
It links to page 6.
> - . ad infinitum .

The client should have some safety measures to catch a case like this I
guess.


> So that case I want to avoid is "every page has a 'next' link",
> because the algorithm to fetch the entire fragment breaks,
> as we cannot be sure that there are will be more data triples.
> And if we have to avoid that anyway, we might as well avoid unnecessary
empty pages.
> 
> But perhaps, we could avoid the case above in different ways, like:
> - The number of pages (and next links) per fragment MUST be finite.
> - A page MUST NOT contain triples if the previous page did not contain
triples.
> However, I haven't found one that is not clumsy.

Why not simply "the next link SHOULD NOT point to an empty fragment"?

I used SHOULD NOT in because it could indeed happen from time to time if the
underlying dataset changes and shouldn't break a client. I could live with a
MUST NOT as well but I think a SHOULD NOT forces client developers to
program more defensively - which is a good thing.


[itemsPerPage is...]

>>> b) Needed for some of the existing TPF query algorithms
>>> that do predictions based on page size. (Those could also count the
>>> number of items per page, but not in all cases.)
>> 
>> That's a good point. Let's discuss it as part of the "Client-initiated
>> pagination (ISSUE-102)" thread.
> 
> It's not client-initiated here, because the server chooses, right?
>
> For client cases, it makes actually sense to _omit_ it because the client
already knows it.
> Here, it would be the server saying "client, I'm serving you pages of 100
data triples".

Sure, but the concept is the same and we collect uses cases at

   https://www.w3.org/community/hydra/wiki/Client-initiated_pagination

That's why I proposed to discuss it in that context and focus on the rest in
this thread.


--
Markus Lanthaler
@markuslanthaler

Received on Monday, 2 November 2015 19:29:34 UTC