- From: Sandro Hawke <sandro@w3.org>
- Date: Sun, 16 Feb 2014 14:49:59 -0500
- To: public-ldp <public-ldp@w3.org>
- Message-ID: <53011667.2060402@w3.org>
oops, wrong mailing list. sorry. please ignore. -- Sandro On 02/16/2014 02:49 PM, Sandro Hawke wrote: > Sorry I didn't propose this months ago. It's taken me some time to > see this solution clearly. Thanks to Eric and Andrei for being > sounding boards. They haven't seen this particular framing, though. > > This is my proposal for "stable paging", a way to prevent the "lossy > paging" problem in the currect spec and highlighted in RFC-5005. I > believe it can be implemented efficiently, and isn't prohibitively > complicated. I'd like to see it in the LC draft, marked "at risk" in > case it turns out to be harder to implement than I think. > > *static paging* > > First, let me define something simpler, which I'll call "static > paging". With static paging, the server embeds the etag of the paged > resource into the paging URLs. When it responds to the paging URLs, > it checks to make sure the etag is still the same. If it's not, then > the server responds with a 410 GONE. Now the client knows it needs to > re-initiate paging. > > With static paging, the client gets a perfect view of the resource, > with no loss, no corruption. And it's trivial to implement. The > problem is that resources that are large enough to require paging are > likely to change frequently, so a client would be unable to follow > PREV/NEXT links very many hops before the resource changed and the > client had to start again. It might be it could never get more than > 10 pages in before it had to start again. That doesn't seem acceptable. > > *stable paging definition* > > With "stable paging" I propose a looser requirement. We allow the > graph to change during paging, but each triple which could > theoretically ever be in the graph is assigned to a particular page. > This results in certain guarantees for the client. Specifically, if > the client traverses all the pages with stable paging, (1) every > triple it sees must have been in the graph at some point during the > traversal, and (2) every triple it does not see must have been absent > from the graph at some point during the traversal. In other words, > the error is limited to exactly those triples that changed during the > traversal. (With lossy paging, the client may miss triples that were > always in the graph.) > > *directory example * > > Let's consider the common case: we're paging an LDPC which has a sort > order. (In fact, I expect this will be the only case where paging > will be implemented by most LDP servers. When else would it ever > actually be useful?) Maybe it's a directory of employees, sorted by > the foaf:name of each person. The FIRST page has the folks at the > beginning of the alphabet; the LAST page has the folks at the end. > > It's a big directory, and people are being added and removed all the time. > > With lossy paging, and a server that naively just puts 100 people per > page, if a name were deleted during forward paging or added during > reverse paging, the client would simply, silently miss a person. > Displayed to a user this might be okay, but if the client is the > payroll system, or an acl system, or a spam filter lookup, or an > emergency alert system, missing even a single person is inexcusable. > Static paging might be nice, but isn't practical. But stable paging is > pretty good. With stable paging we might still miss a person, but > only if that person were added or deleted during paging. That seems > like perfectly reasonable behavior, about what people would expect. > > So, how does stable paging work? > > Instead of just putting 100 names per page, with stable paging the > server creates a function mapping names to pages independent of what > else is in the graph. It might be the first page has names starting > with A, the second has names starting with B, etc. Or it might be the > first is "" through "Anderson", the second is "Anderson" through > "Aubrey", etc. The paging function can be created by looking at the > database, but once a page has been served to a client and certain > triples included/excluded from it because of some boundary test, that > boundary MUST NOT change. > > Each paging URL then includes an identifier for that paging function. > As with static paging, the server can, at any time, give up on a > particular paging function and answer 410 GONE for those URLs. With > the directory, I imagine one would only do that when the directory > changes size significantly. I can also imagine one would have two > paging functions, one for big pages and one for small pages, and the > server would pick one based on a client PREFER header. > > *microblogging example* > > In the application Andrei and I are working on (microblogging), the > server will have a container of postings on each "channel". It will > keep this sorted by the date of posting, so the LAST page will be the > most recent postings. It's fine if some system posts every 10 seconds > (3 million posts per year) because the clients just jump to the LAST > page, and maybe follow a few PREV links, depending how far back they > want to go. Lossy paging would result in postings not being shown to > some people in some circumstance, which is likely to be unacceptable. > For us, the paging function will just be constructed by partitioning > the range of dates present, and re-partitioning if a lot of postings > are inserted into the middle or deleted. In the normal case of > postings being added "now", we're just adding to the end, so no > re-partitioning is needed. > > *conclusion* > > I know this is a little complicated, but without it, it seems to me > paging is worthless, or even harmful, at least for the applications > I'm thinking of. I'm willing to help draft text for the spec if > people think this is reasonable to include. > > -- Sandro >
Received on Sunday, 16 February 2014 19:50:07 UTC