Re: Fwd: Re: Can we use TCP backpressure instead of paging? from Stian Soiland-Reyes on 2014-06-13 (public-ldp@w3.org from June 2014)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Fri, 13 Jun 2014 11:47:32 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: Andreas Kuckartz <a.kuckartz@ping.de>, LDP <public-ldp@w3.org>
Message-ID: <CAPRnXtmPjEqTqt7no_Uk_oadbc22zOVLhUPzsOKyvODhDtcsDw@mail.gmail.com>
On 11 Jun 2014 22:47, "Sandro Hawke" <sandro@w3.org> wrote:

> You seem to have any idea what paging is going to be used for.   I'm at a
disadvantage, since I don't.   (Seriously.)  What use cases do you see for
paging?   In particular, what use is reverse paging, and why might a client
want to page slowly?

Paging is particularly useful for clients with small processing capability
and a requirement for short response time, such as mobile apps and web
applications. A typical example would be a "dynamic scroll list" which
loads its content on demand using REST API calls. If the user never scrolls
down to item 500, then there is no need to fetch beyond item 100. So the
user is in control of the scrolling - there is no point in spending battery
and limited network quotas on fetching too much content in advance (but a
good implementation would prefetch at least the next screen-full of items
to give a sense of instant response).

Reverse paging could be because the user clicks the column header to view
the list in the opposite order (e.g. oldest to newest vs. newest to
oldest). A Twitter-like app would for instance mainly be listing tweets
from now and backwards - it does not care about fetching the tweets from
last week while the user was offline (unless she scrolls that far down into
her stream). However, once that is listed, the app wants to check for
updates in the future.


For the server, paging means a reduction in long-living resource hogs (e.g.
a massive page generation eating away memory and CPU), at the cost of
potentially rerunning the query on the database (with appropriate LIMIT)
once the next page request comes in. As any server can respond to any page
(given a reasonably synchronized or shared database), administrators can
scale out with more server instances on high demand, such as when deploying
on the Amazon AWS cloud.


> Actually, I see now that ambiguity is there with or without paging.    A
naive server could be serving a normal GET on an LDP container when a
PATCH, PUT, or POST comes in and changes what triples are in the
container.  Does that change get reflected in the remaining parts of the
GET?  Maybe "No", but that's a fairly expensive answer.

I think this ambiguity has been largely ignored by REST community, or not
been a bit issue because pagination often is from oldest to newest -
question is then just if the new item appears at the last page (as new item
or as 'next page' link) or not.

Done as "proper REST", a pagination resource would have a unique ID
corresponding to a static backend response, which as you suggest would
eventually time out with 410 Gone. This is however not usually practically
implemented. It is probably due to the common implementations using SQL or
NoSQL databases in the backend, but hidden through some abstraction layers.
It is then difficult to write an representation-generating implementation
that uses some form of saved snapshot of the whole database, because that
would have to be implemented either at a very low level (e.g. SQL
transaction) or by generating the whole unpaged resource representation and
store it temporarily in some kind of temporary file.


 >> and so could be served by multiple backend servers (including caches
and proxies) without the client noticing.

> Again, I'm needing a use case where that would be useful.

Any of the big "web-scale" sites use this kind of deployment: Facebook,
Twitter, Google (incl. Gmail), Github:

stain@biggie-mint ~ $ dig google.com

; <<>> DiG 9.9.3-rpz2+rl.13214.22-P2-Ubuntu-1:9.9.3.dfsg.P2-4ubuntu1.1 <<>>
google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9221
;; flags: qr rd ra; QUERY: 1, ANSWER: 16, AUTHORITY: 4, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.            IN    A

;; ANSWER SECTION:
google.com.        28    IN    A    64.15.119.118
google.com.        28    IN    A    64.15.119.119
google.com.        28    IN    A    64.15.119.123
google.com.        28    IN    A    64.15.119.84
google.com.        28    IN    A    64.15.119.88
google.com.        28    IN    A    64.15.119.89
google.com.        28    IN    A    64.15.119.93
google.com.        28    IN    A    64.15.119.94
google.com.        28    IN    A    64.15.119.98
google.com.        28    IN    A    64.15.119.99
google.com.        28    IN    A    64.15.119.103
google.com.        28    IN    A    64.15.119.104
google.com.        28    IN    A    64.15.119.108
google.com.        28    IN    A    64.15.119.109
google.com.        28    IN    A    64.15.119.113
google.com.        28    IN    A    64.15.119.114

28 seconds later, it is 64.15.119.98 that is on top and that would handle
my "Next" click. Any of the servers can be taken offline after 255 seconds
without anyone noticing.

Other deployments may do this through routing, transparent proxies, etc.

In conclusion I would say that the way servers are now deployed (which rely
on the stateless nature of HTTP), and the way software frameworks are being
used (which abstract away socket operatoins), then a TCP-based
back-pressure does not really solve anything that paging is already being
used for in REST APIs around the world. It only solves being able to
download/store (and possibly directly process, e.g. with Jena's RIOT
streaming APIs) a large data stream. And for that it works fine, and there
is not really anything special that needs to be done except not loading the
whole stream into memory.
Received on Friday, 13 June 2014 10:48:20 UTC