Re: lossless paging using HATEOAS from Steve Speicher on 2014-02-19 (public-ldp-wg@w3.org from February 2014)

From: Steve Speicher <sspeiche@gmail.com>
Date: Wed, 19 Feb 2014 14:15:59 -0500
To: Sandro Hawke <sandro@w3.org>
Cc: Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <CAOUJ7JrjoHcx3dp1XFm+_g8iW4AerM9qwm_o1A7wYyf9kAW+6g@mail.gmail.com>
On Wed, Feb 19, 2014 at 12:12 PM, Sandro Hawke <sandro@w3.org> wrote:

> Here's an implementation technique servers can use to do lossless paging
> very cheaply (give or take whatever the underlying paging technology is).
>
> In the NEXT and PREV links, include the boundary value.   That way the
> server doesn't need to remember anything; the state is all in the URL.
>  HATEOAS to the rescue.   (I'm sure I'm not the first to think of this....)
>

We've used this model for some time, encoding the boundaries into the URLs
of the pages themselves.  Usually based on some time-based property of the
resources, such as when it was created, though others have used other
properties from the data that the servers assign to the resources.

-  Steve Speicher


> As an example for a directory with these names (people at the last
> meeting), in alphabetic order:
>
>         Arnaud Le Hors
>         Ashok Malhotra
>         Cody Burleson
>         Eric Prud'hommeaux
>         Henry Story
>         John Arwe
>         Miguel Aragón
>         Nandana Mihindukulasooriya
>         Roger Menday
>         Sandro Hawke
>         Steve Speicher
>
> The container offers these headers:
>
>        Link: <.../my_container?page_first> rel=FIRST
>        Link: <.../my_container?page_last> rel=LAST
>
> If you GET that rel=FIRST one, you'll get this:
>
>        Link: <.../my_container?page_after=Henry%20Story> rel=NEXT
>
>        member & container triples for Arnaud Le Hors
>        member & container triples for Ashok Malhotra
>        member & container triples for Cody Burleson
>        member & container triples for Eric Prud'hommeaux
>        member & container triples for Henry Story
>
> If you GET that rel=NEXT one, you'll get this:
>
>        Link: <.../my_container?page_after=Sandro%20Hawke> rel=NEXT
>        Link: <.../my_container?page_before=John%20Arwe> rel=PREV
>
>        member & container triples for John Arwe
>        member & container triples for Miguel Aragón
>        member & container triples for Nandana Mihindukulasooriya
>        member & container triples for Roger Menday
>        member & container triples for Sandro Hawke
>
> If you GET that rel=NEXT one, you'll get this:
>
>        Link: <.../my_container?page_before=Steve%20Speicher> rel=PREV
>
>        member & container triples for Steve Speicher
>
> Meanwhile, if you decided to traverse backwards from the container's
> rel=LAST one, you'd get this:
>
>        Link: <.../my_container?page_before=Miguel%20Arag%C3%B3n> rel=PREV
>
>        member & container triples for Miguel Aragón
>        member & container triples for Nandana Mihindukulasooriya
>        member & container triples for Roger Menday
>        member & container triples for Sandro Hawke
>        member & container triples for Steve Speicher
>
> If you GET that rel=PREV one, you'd get:
>
>        Link: <.../my_container?page_after=John%20Arwe> rel=NEXT
>        Link: <.../my_container?page_before=Ashok%20Malhotra> rel=PREV
>
>        member & container triples for Ashok Malhotra
>        member & container triples for Cody Burleson
>        member & container triples for Eric Prud'hommeaux
>        member & container triples for Henry Story
>        member & container triples for John Arwe
>
> If you GET that rel=PREV one, you'd get:
>
>        Link: <.../my_container?page_after=Arnaud%20Le%20Hors> rel=NEXT
>
>        member & container triples for Arnaud Le Hors
>
> See?  No state on the server, and lossless paging, with fully controlled
> page size.  Servers which are already saving paging state for some reason
> can do it some other way.
>
> A few details:
>
> The value in the page_after and page_before fields would be the value of
> the sort field, whatever it is.
>
> If the composite of the sort fields is not guaranteed to be unique, or
> there is no sort field, the server should add another sort field, some kind
> of internal rowid, to make it unique.
>
> If there are multiple sort fields, the server could mush them together, or
> use multiple page_before/page_after parameters in the URL.
>
> If the data is particularly sensitive, the server might want to encrypt
> the fields.
>
> As an middle-ground solution, with a little state, the server could hide
> the values and make the URLs a lot shorter by maintaining a cache of
> boundary values, like
>    { "bv1": "Henry Story",
>       "bv2": "John Arwe",
>      ... }
>
> then instead of
>        Link: <.../my_container?page_after=Henry%20Story> rel=NEXT
> it would do
>        Link: <.../my_container?page_after_code=bv1> rel=NEXT
>
> In this case, the client would need to be prepared for 410 GONE in case
> the bv1 value had expired.
>
>    -- Sandro
>
>
>
Received on Wednesday, 19 February 2014 19:16:27 UTC