Re: request for verification: paging in TPF from Ruben Verborgh on 2015-11-01 (public-hydra@w3.org from November 2015)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Sun, 1 Nov 2015 22:32:07 +0100
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: public-linked-data-fragments@w3.org
Message-Id: <FDC71390-3DE4-4543-B889-77D8BD24C836@ugent.be>

Hi Markus,

>>    Any page other than the last page MUST NOT be empty.
>> Would this make sense?
> 
> What would break if it would be empty? Nothing AFAICT.
> It would just become less efficient.

One case breaks, and it's one that already occurred in two implementations so far:

Suppose that a fragment has 253 triples, there are 100 triples per page.
Imagine a client that needs to fetch the entire fragment.
– The client fetches page 1. It has 100 triples, the count estimate is 250. It links to page 2.
– The client fetches page 2. It has 100 triples, the count estimate is 250. It links to page 3.
– The client fetches page 3. It has 53 triples, the count estimate is 250. It links to page 4.
– The client fetches page 4. It has 0 triples, the count estimate is 250. It links to page 5.
– The client fetches page 5. It has 0 triples, the count estimate is 250. It links to page 6.
– … ad infinitum …

So that case I want to avoid is "every page has a 'next' link",
because the algorithm to fetch the entire fragment breaks,
as we cannot be sure that there are will be more data triples.
And if we have to avoid that anyway, we might as well avoid unnecessary empty pages.

But perhaps, we could avoid the case above in different ways, like:
– The number of pages (and next links) per fragment MUST be finite.
– A page MUST NOT contain triples if the previous page did not contain triples.
However, I haven't found one that is not clumsy.

> It would just become
> less efficient. RFC2119 defines MUST NOT as
> 
>   This phrase, or the phrase "SHALL NOT", mean that the
>   definition is an absolute prohibition of the specification.

That's how I intended it indeed.


>  <#dataset> a void:Dataset, hydra:Collection;
>    void:subset <?subject=...Tesla> .
> 
>  <?subject=...Tesla> hydra:view <?subject=...Tesla&page=2> .

+1

> This seems fine to me. <?subject=...Tesla> is also a void:Dataset,
> hydra:Collection. You need that to communicate the total number of items in
> that subset, right? 

Yes indeed.

>> a) Not needed for simple usage, but I want to replicate the human view,
>> which has this.
> 
> Machines are quite good at counting :-P

Sure, but then again the server might like to add additional data that's not part of the items,
which might make counting a little harder.
Or, in triple-based representations, metadata and controls are mixed with data.

>> b) Needed for some of the existing TPF query algorithms
>> that do predictions based on page size. (Those could also count the
>> number of items per page, but not in all cases.)
> 
> That's a good point. Let's discuss it as part of the "Client-initiated
> pagination (ISSUE-102)" thread.

It's not client-initiated here, because the server chooses, right?
For client cases, it makes actually sense to _omit_ it because the client already knows it.
Here, it would be the server saying "client, I'm serving you pages of 100 data triples".

Best,

Ruben

Received on Sunday, 1 November 2015 21:32:43 UTC