lossless paging using HATEOAS from Sandro Hawke on 2014-02-19 (public-ldp-wg@w3.org from February 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 19 Feb 2014 12:12:50 -0500
To: Linked Data Platform WG <public-ldp-wg@w3.org>
Message-ID: <5304E612.1010804@w3.org>
Here's an implementation technique servers can use to do lossless paging 
very cheaply (give or take whatever the underlying paging technology is).

In the NEXT and PREV links, include the boundary value.   That way the 
server doesn't need to remember anything; the state is all in the URL. 
  HATEOAS to the rescue.   (I'm sure I'm not the first to think of this....)

As an example for a directory with these names (people at the last 
meeting), in alphabetic order:

         Arnaud Le Hors
         Ashok Malhotra
         Cody Burleson
         Eric Prud'hommeaux
         Henry Story
         John Arwe
         Miguel Aragón
         Nandana Mihindukulasooriya
         Roger Menday
         Sandro Hawke
         Steve Speicher

The container offers these headers:

        Link: <.../my_container?page_first> rel=FIRST
        Link: <.../my_container?page_last> rel=LAST

If you GET that rel=FIRST one, you'll get this:

        Link: <.../my_container?page_after=Henry%20Story> rel=NEXT

        member & container triples for Arnaud Le Hors
        member & container triples for Ashok Malhotra
        member & container triples for Cody Burleson
        member & container triples for Eric Prud'hommeaux
        member & container triples for Henry Story

If you GET that rel=NEXT one, you'll get this:

        Link: <.../my_container?page_after=Sandro%20Hawke> rel=NEXT
        Link: <.../my_container?page_before=John%20Arwe> rel=PREV

        member & container triples for John Arwe
        member & container triples for Miguel Aragón
        member & container triples for Nandana Mihindukulasooriya
        member & container triples for Roger Menday
        member & container triples for Sandro Hawke

If you GET that rel=NEXT one, you'll get this:

        Link: <.../my_container?page_before=Steve%20Speicher> rel=PREV

        member & container triples for Steve Speicher

Meanwhile, if you decided to traverse backwards from the container's
rel=LAST one, you'd get this:

        Link: <.../my_container?page_before=Miguel%20Arag%C3%B3n> rel=PREV

        member & container triples for Miguel Aragón
        member & container triples for Nandana Mihindukulasooriya
        member & container triples for Roger Menday
        member & container triples for Sandro Hawke
        member & container triples for Steve Speicher

If you GET that rel=PREV one, you'd get:

        Link: <.../my_container?page_after=John%20Arwe> rel=NEXT
        Link: <.../my_container?page_before=Ashok%20Malhotra> rel=PREV

        member & container triples for Ashok Malhotra
        member & container triples for Cody Burleson
        member & container triples for Eric Prud'hommeaux
        member & container triples for Henry Story
        member & container triples for John Arwe

If you GET that rel=PREV one, you'd get:

        Link: <.../my_container?page_after=Arnaud%20Le%20Hors> rel=NEXT

        member & container triples for Arnaud Le Hors

See?  No state on the server, and lossless paging, with fully controlled 
page size.  Servers which are already saving paging state for some 
reason can do it some other way.

A few details:

The value in the page_after and page_before fields would be the value of 
the sort field, whatever it is.

If the composite of the sort fields is not guaranteed to be unique, or 
there is no sort field, the server should add another sort field, some 
kind of internal rowid, to make it unique.

If there are multiple sort fields, the server could mush them together, 
or use multiple page_before/page_after parameters in the URL.

If the data is particularly sensitive, the server might want to encrypt 
the fields.

As an middle-ground solution, with a little state, the server could hide 
the values and make the URLs a lot shorter by maintaining a cache of 
boundary values, like
    { "bv1": "Henry Story",
       "bv2": "John Arwe",
      ... }

then instead of
        Link: <.../my_container?page_after=Henry%20Story> rel=NEXT
it would do
        Link: <.../my_container?page_after_code=bv1> rel=NEXT

In this case, the client would need to be prepared for 410 GONE in case 
the bv1 value had expired.

    -- Sandro
Received on Wednesday, 19 February 2014 17:12:57 UTC