Re: stable paging from Sandro Hawke on 2014-02-16 (public-ldp@w3.org from February 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Sun, 16 Feb 2014 14:49:59 -0500
To: public-ldp <public-ldp@w3.org>
Message-ID: <53011667.2060402@w3.org>
oops, wrong mailing list.  sorry. please ignore.

         -- Sandro


On 02/16/2014 02:49 PM, Sandro Hawke wrote:
> Sorry I didn't propose this months ago.   It's taken me some time to 
> see this solution clearly.   Thanks to Eric and Andrei for being 
> sounding boards.   They haven't seen this particular framing, though.
>
> This is my proposal for "stable paging", a way to prevent the "lossy 
> paging" problem in the currect spec and highlighted in RFC-5005.  I 
> believe it can be implemented efficiently, and isn't prohibitively 
> complicated.  I'd like to see it in the LC draft, marked "at risk" in 
> case it turns out to be harder to implement than I think.
>
> *static paging*
>
> First, let me define something simpler, which I'll call "static 
> paging".  With static paging, the server embeds the etag of the paged 
> resource into the paging URLs.  When it responds to the paging URLs, 
> it checks to make sure the etag is still the same. If it's not, then 
> the server responds with a 410 GONE.  Now the client knows it needs to 
> re-initiate paging.
>
> With static paging, the client gets a perfect view of the resource, 
> with no loss, no corruption.  And it's trivial to implement.  The 
> problem is that resources that are large enough to require paging are 
> likely to change frequently, so a client would be unable to follow 
> PREV/NEXT links very many hops before the resource changed and the 
> client had to start again.  It might be it could never get more than 
> 10 pages in before it had to start again.  That doesn't seem acceptable.
>
> *stable paging definition*
>
> With "stable paging" I propose a looser requirement.  We allow the 
> graph to change during paging, but each triple which could 
> theoretically ever be in the graph is assigned to a particular page. 
> This results in certain guarantees for the client. Specifically, if 
> the client traverses all the pages with stable paging, (1) every 
> triple it sees must have been in the graph at some point during the 
> traversal, and (2) every triple it does not see must have been absent 
> from the graph at some point during the traversal.  In other words, 
> the error is limited to exactly those triples that changed during the 
> traversal.  (With lossy paging, the client may miss triples that were 
> always in the graph.)
>
> *directory example *
>
> Let's consider the common case: we're paging an LDPC which has a sort 
> order.  (In fact, I expect this will be the only case where paging 
> will be implemented by most LDP servers.  When else would it ever 
> actually be useful?)  Maybe it's a directory of employees, sorted by 
> the foaf:name of each person.  The FIRST page has the folks at the 
> beginning of the alphabet; the LAST page has the folks at the end.
>
> It's a big directory, and people are being added and removed all the time.
>
> With lossy paging, and a server that naively just puts 100 people per 
> page, if a name were deleted during forward paging or added during 
> reverse paging, the client would simply, silently miss a person. 
> Displayed to a user this might be okay, but if the client is the 
> payroll system, or an acl system, or a spam filter lookup, or an 
> emergency alert system, missing even a single person is inexcusable. 
> Static paging might be nice, but isn't practical. But stable paging is 
> pretty good.  With stable paging we might still miss a person, but 
> only if that person were added or deleted during paging.  That seems 
> like perfectly reasonable behavior, about what people would expect.
>
> So, how does stable paging work?
>
> Instead of just putting 100 names per page, with stable paging the 
> server creates a function mapping names to pages independent of what 
> else is in the graph.  It might be the first page has names starting 
> with A, the second has names starting with B, etc.  Or it might be the 
> first is "" through "Anderson", the second is "Anderson" through 
> "Aubrey", etc.  The paging function can be created by looking at the 
> database, but once a page has been served to a client and certain 
> triples included/excluded from it because of some boundary test, that 
> boundary MUST NOT change.
>
> Each paging URL then includes an identifier for that paging function. 
> As with static paging, the server can, at any time, give up on a 
> particular paging function and answer 410 GONE for those URLs.  With 
> the directory, I imagine one would only do that when the directory 
> changes size significantly.  I can also imagine one would have two 
> paging functions, one for big pages and one for small pages, and the 
> server would pick one based on a client PREFER header.
>
> *microblogging example*
>
> In the application Andrei and I are working on (microblogging), the 
> server will have a container of postings on each "channel". It will 
> keep this sorted by the date of posting, so the LAST page will be the 
> most recent postings.  It's fine if some system posts every 10 seconds 
> (3 million posts per year) because the clients just jump to the LAST 
> page, and maybe follow a few PREV links, depending how far back they 
> want to go.  Lossy paging would result in postings not being shown to 
> some people in some circumstance, which is likely to be unacceptable.  
> For us, the paging function will just be constructed by partitioning 
> the range of dates present, and re-partitioning if a lot of postings 
> are inserted into the middle or deleted.  In the normal case of 
> postings being added "now", we're just adding to the end, so no 
> re-partitioning is needed.
>
> *conclusion*
>
> I know this is a little complicated, but without it, it seems to me 
> paging is worthless, or even harmful, at least for the applications 
> I'm thinking of.  I'm willing to help draft text for the spec if 
> people think this is reasonable to include.
>
>       -- Sandro
>
Received on Sunday, 16 February 2014 19:50:07 UTC