Re: stable paging from Sandro Hawke on 2014-02-17 (public-ldp-wg@w3.org from February 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 17 Feb 2014 09:58:09 -0500
To: John Arwe <johnarwe@us.ibm.com>, Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <53022381.4060206@w3.org>
On 02/17/2014 08:33 AM, John Arwe wrote:
> > ....  Lossy paging would result in postings not
> > being shown to some people in some circumstance, which is likely to
> > be unacceptable.
>
> This makes it sound as if the real chafing point is the inability for 
> the client to detect when it never sees something (when it needs to 
> "start over" if it cares about completeness), which is different than 
> having a problem with lossy paging per se.  In our current 
> implementations (other email), we also ended up giving clients a 
> signal by which they could Know that they missed something and hence 
> need to start over if they care about completeness; [1] is the spec 
> many of them are following.
>
> [1] http://open-services.net/wiki/core/TrackedResourceSet-2.0/

I'm not seeing an easy way to do that, with that spec - it looks like it 
does a lot more than is needed for this.


>
> > .... As with static paging, the server can, at any time, give
> > up on a particular paging function and answer 410 GONE for those
> > URLs.  ...
>
> This is an interesting variation.  Many client apps are written to 
> treat 4xx codes as errors.  "page gone" is something of an "expected 
> error" - more like a 5xx in some ways.  It's not like there's anything 
> wrong with the client's code to cause the 410 (but that would be true 
> of 410 in general, aside from cases where the same code already 
> deleted the request-URI for which the 410 is sent).
>

Yeah, I figure one can handle 410 intelligently when getting an SPR.

> Nit: "Stable" seems a bit strong.  This is more a bounded-loss case, 
> isn't it?
>

Well, what's stable is the assignment of triples into pages. 
"Fixed-Boundary Paging".

> > ..., but each triple which could
> > theoretically ever be in the graph is assigned to a particular page.
>
> Does this imply that you need a closed model in order to implement it? 
>  Otherwise the number of triples which could theoretically ever be in 
> the graph is infinite, so you fall somewhere in the space between 
> needing infinite pages, having some pages that will be too large to 
> transfer (defeating the purpose), and having an infinite number of 
> mapping functions.  It's sounding like some of the exchanges the WG 
> has had on 'reasoning' ... theoretically NP, but in practice not so bad.
>

No, I think it's easy.

If the server is application-specific it can just use days as the 
buckets for events, for instance, or alphabet ranges.

> I'm wondering if generic graph stores would have any problem with it, 
> since they definitionally have open models and hence know basically 
> nothing about the triples that might theoretically exist over time in 
> a resource.

Yeah, for paging generic LDP-RR's I see two easy approaches, which kind 
of correspond to the data structures one would probably use to store the 
graph:

- if the triples are really in no special order, use a hash function on 
the text of each triple.   Pick a hash function that gives you a number 
of buckets suitable for the number of triples you have and the page size 
you want.    Change the paging function when the number of triples 
changes a lot.

- if the triples are conceptually sorted, then figure out and store 
reasonable boundary values, as one would do if using a b-tree or 
balanced binary tree for maintaining the sorting.   Change the paging 
function when adding or deleting a b-tree-node which has been given to a 
client.   If no client has seen the page corresponding to a node, then 
it's okay to split or delete it.

       -- Sandro



>
>
> Best Regards, John
>
> Voice US 845-435-9470 BluePages 
> <http://w3.ibm.com/jct03019wt/bluepages/simpleSearch.wss?searchBy=Internet+address&location=All+locations&searchFor=johnarwe> 
>
> Tivoli OSLC Lead - Show me the Scenario
>
Received on Monday, 17 February 2014 14:58:16 UTC