- From: Sandro Hawke <sandro@w3.org>
- Date: Sun, 16 Feb 2014 14:49:59 -0500
- To: public-ldp <public-ldp@w3.org>
- Message-ID: <53011667.2060402@w3.org>
oops, wrong mailing list. sorry. please ignore.
-- Sandro
On 02/16/2014 02:49 PM, Sandro Hawke wrote:
> Sorry I didn't propose this months ago. It's taken me some time to
> see this solution clearly. Thanks to Eric and Andrei for being
> sounding boards. They haven't seen this particular framing, though.
>
> This is my proposal for "stable paging", a way to prevent the "lossy
> paging" problem in the currect spec and highlighted in RFC-5005. I
> believe it can be implemented efficiently, and isn't prohibitively
> complicated. I'd like to see it in the LC draft, marked "at risk" in
> case it turns out to be harder to implement than I think.
>
> *static paging*
>
> First, let me define something simpler, which I'll call "static
> paging". With static paging, the server embeds the etag of the paged
> resource into the paging URLs. When it responds to the paging URLs,
> it checks to make sure the etag is still the same. If it's not, then
> the server responds with a 410 GONE. Now the client knows it needs to
> re-initiate paging.
>
> With static paging, the client gets a perfect view of the resource,
> with no loss, no corruption. And it's trivial to implement. The
> problem is that resources that are large enough to require paging are
> likely to change frequently, so a client would be unable to follow
> PREV/NEXT links very many hops before the resource changed and the
> client had to start again. It might be it could never get more than
> 10 pages in before it had to start again. That doesn't seem acceptable.
>
> *stable paging definition*
>
> With "stable paging" I propose a looser requirement. We allow the
> graph to change during paging, but each triple which could
> theoretically ever be in the graph is assigned to a particular page.
> This results in certain guarantees for the client. Specifically, if
> the client traverses all the pages with stable paging, (1) every
> triple it sees must have been in the graph at some point during the
> traversal, and (2) every triple it does not see must have been absent
> from the graph at some point during the traversal. In other words,
> the error is limited to exactly those triples that changed during the
> traversal. (With lossy paging, the client may miss triples that were
> always in the graph.)
>
> *directory example *
>
> Let's consider the common case: we're paging an LDPC which has a sort
> order. (In fact, I expect this will be the only case where paging
> will be implemented by most LDP servers. When else would it ever
> actually be useful?) Maybe it's a directory of employees, sorted by
> the foaf:name of each person. The FIRST page has the folks at the
> beginning of the alphabet; the LAST page has the folks at the end.
>
> It's a big directory, and people are being added and removed all the time.
>
> With lossy paging, and a server that naively just puts 100 people per
> page, if a name were deleted during forward paging or added during
> reverse paging, the client would simply, silently miss a person.
> Displayed to a user this might be okay, but if the client is the
> payroll system, or an acl system, or a spam filter lookup, or an
> emergency alert system, missing even a single person is inexcusable.
> Static paging might be nice, but isn't practical. But stable paging is
> pretty good. With stable paging we might still miss a person, but
> only if that person were added or deleted during paging. That seems
> like perfectly reasonable behavior, about what people would expect.
>
> So, how does stable paging work?
>
> Instead of just putting 100 names per page, with stable paging the
> server creates a function mapping names to pages independent of what
> else is in the graph. It might be the first page has names starting
> with A, the second has names starting with B, etc. Or it might be the
> first is "" through "Anderson", the second is "Anderson" through
> "Aubrey", etc. The paging function can be created by looking at the
> database, but once a page has been served to a client and certain
> triples included/excluded from it because of some boundary test, that
> boundary MUST NOT change.
>
> Each paging URL then includes an identifier for that paging function.
> As with static paging, the server can, at any time, give up on a
> particular paging function and answer 410 GONE for those URLs. With
> the directory, I imagine one would only do that when the directory
> changes size significantly. I can also imagine one would have two
> paging functions, one for big pages and one for small pages, and the
> server would pick one based on a client PREFER header.
>
> *microblogging example*
>
> In the application Andrei and I are working on (microblogging), the
> server will have a container of postings on each "channel". It will
> keep this sorted by the date of posting, so the LAST page will be the
> most recent postings. It's fine if some system posts every 10 seconds
> (3 million posts per year) because the clients just jump to the LAST
> page, and maybe follow a few PREV links, depending how far back they
> want to go. Lossy paging would result in postings not being shown to
> some people in some circumstance, which is likely to be unacceptable.
> For us, the paging function will just be constructed by partitioning
> the range of dates present, and re-partitioning if a lot of postings
> are inserted into the middle or deleted. In the normal case of
> postings being added "now", we're just adding to the end, so no
> re-partitioning is needed.
>
> *conclusion*
>
> I know this is a little complicated, but without it, it seems to me
> paging is worthless, or even harmful, at least for the applications
> I'm thinking of. I'm willing to help draft text for the spec if
> people think this is reasonable to include.
>
> -- Sandro
>
Received on Sunday, 16 February 2014 19:50:07 UTC