Re: Fwd: Re: Can we use TCP backpressure instead of paging? from Sandro Hawke on 2014-06-13 (public-ldp@w3.org from June 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 13 Jun 2014 10:07:57 -0400
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
CC: Andreas Kuckartz <a.kuckartz@ping.de>, LDP <public-ldp@w3.org>
Message-ID: <539B05BD.3010206@w3.org>
On 06/13/2014 06:47 AM, Stian Soiland-Reyes wrote:
> On 11 Jun 2014 22:47, "Sandro Hawke" <sandro@w3.org 
> <mailto:sandro@w3.org>> wrote:
>
> > You seem to have any idea what paging is going to be used for.   I'm 
> at a disadvantage, since I don't.   (Seriously.) What use cases do you 
> see for paging?   In particular, what use is reverse paging, and why 
> might a client want to page slowly?
>
> Paging is particularly useful for clients with small processing 
> capability and a requirement for short response time, such as mobile 
> apps and web applications. A typical example would be a "dynamic 
> scroll list" which loads its content on demand using REST API calls. 
> If the user never scrolls down to item 500, then there is no need to 
> fetch beyond item 100. So the user is in control of the scrolling - 
> there is no point in spending battery and limited network quotas on 
> fetching too much content in advance (but a good implementation would 
> prefetch at least the next screen-full of items to give a sense of 
> instant response).
>

Yes, I suppose that's the canonical answer.      I wonder a little how 
big a scrollable list humans want compared to the available 
bandwidth.    I mean, the typical web page is about 1MB I gather, and 
human-readable text that would be take many hours to read.  So do you 
really need to offer more text than that, dynamically available?

> Reverse paging could be because the user clicks the column header to 
> view the list in the opposite order (e.g. oldest to newest vs. newest 
> to oldest). A Twitter-like app would for instance mainly be listing 
> tweets from now and backwards - it does not care about fetching the 
> tweets from last week while the user was offline (unless she scrolls 
> that far down into her stream). However, once that is listed, the app 
> wants to check for updates in the future.
>

A lot of this comes down to data organization.    I'm somewhat dubious 
about servers organizing and ordering their LDP collections in the one 
true way that all their clients want.    If/when a client wants stuff in 
a different organization or a different ordering, all this paging 
machinery works against the client instead of for it.

Still, okay, maybe we'll solve that problem with a suitable LDP Module 
to come, some way for the client to express what organization and what 
ordering it wants -- and then paging will be useful.

>
> For the server, paging means a reduction in long-living resource hogs 
> (e.g. a massive page generation eating away memory and CPU), at the 
> cost of potentially rerunning the query on the database (with 
> appropriate LIMIT) once the next page request comes in. As any server 
> can respond to any page (given a reasonably synchronized or shared 
> database), administrators can scale out with more server instances on 
> high demand, such as when deploying on the Amazon AWS cloud.
>
>

Yes, this independent scaling point is good.

> > Actually, I see now that ambiguity is there with or without 
> paging.    A naive server could be serving a normal GET on an LDP 
> container when a PATCH, PUT, or POST comes in and changes what triples 
> are in the container.  Does that change get reflected in the remaining 
> parts of the GET?  Maybe "No", but that's a fairly expensive answer.
>
> I think this ambiguity has been largely ignored by REST community, or 
> not been a bit issue because pagination often is from oldest to newest 
> - question is then just if the new item appears at the last page (as 
> new item or as 'next page' link) or not.
>

I think you missed my point there.   Isolation between readers and 
writers is an issue even without paging.    With a single page resource, 
we still have exactly the same question.   Servers will need to be 
serializing graphs to answer GET request while a PATCH, PUT, and POST 
are changing the same graphs.   Does the server need to isolate the two 
from each other?    Probably, but that can be quite expensive, and I 
don't know of a spec that even says servers SHOULD do that.


> Done as "proper REST", a pagination resource would have a unique ID 
> corresponding to a static backend response, which as you suggest would 
> eventually time out with 410 Gone. This is however not usually 
> practically implemented. It is probably due to the common 
> implementations using SQL or NoSQL databases in the backend, but 
> hidden through some abstraction layers. It is then difficult to write 
> an representation-generating implementation that uses some form of 
> saved snapshot of the whole database, because that would have to be 
> implemented either at a very low level (e.g. SQL transaction) or by 
> generating the whole unpaged resource representation and store it 
> temporarily in some kind of temporary file.
>
>

Yep.

> >> and so could be served by multiple backend servers (including 
> caches and proxies) without the client noticing.
>
> > Again, I'm needing a use case where that would be useful.
>
> Any of the big "web-scale" sites use this kind of deployment: 
> Facebook, Twitter, Google (incl. Gmail), Github:
>

No, I understand the use case for scaling -- it was the use case for 
paging I was missing.

> stain@biggie-mint ~ $ dig google.com <http://google.com>
>
> ; <<>> DiG 9.9.3-rpz2+rl.13214.22-P2-Ubuntu-1:9.9.3.dfsg.P2-4ubuntu1.1 
> <<>> google.com <http://google.com>
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9221
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 16, AUTHORITY: 4, ADDITIONAL: 5
>
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ;; QUESTION SECTION:
> ;google.com <http://google.com>.         IN    A
>
> ;; ANSWER SECTION:
> google.com <http://google.com>.     28    IN    A    64.15.119.118
> google.com <http://google.com>.     28    IN    A    64.15.119.119
> google.com <http://google.com>.     28    IN    A    64.15.119.123
> google.com <http://google.com>.     28    IN    A    64.15.119.84
> google.com <http://google.com>.     28    IN    A    64.15.119.88
> google.com <http://google.com>.     28    IN    A    64.15.119.89
> google.com <http://google.com>.     28    IN    A    64.15.119.93
> google.com <http://google.com>.     28    IN    A    64.15.119.94
> google.com <http://google.com>.     28    IN    A    64.15.119.98
> google.com <http://google.com>.     28    IN    A    64.15.119.99
> google.com <http://google.com>.     28    IN    A    64.15.119.103
> google.com <http://google.com>.     28    IN    A    64.15.119.104
> google.com <http://google.com>.     28    IN    A    64.15.119.108
> google.com <http://google.com>.     28    IN    A    64.15.119.109
> google.com <http://google.com>.     28    IN    A    64.15.119.113
> google.com <http://google.com>.     28    IN    A    64.15.119.114
>
> 28 seconds later, it is 64.15.119.98 that is on top and that would 
> handle my "Next" click. Any of the servers can be taken offline after 
> 255 seconds without anyone noticing.
>
> Other deployments may do this through routing, transparent proxies, etc.
>
> In conclusion I would say that the way servers are now deployed (which 
> rely on the stateless nature of HTTP), and the way software frameworks 
> are being used (which abstract away socket operatoins), then a 
> TCP-based back-pressure does not really solve anything that paging is 
> already being used for in REST APIs around the world. It only solves 
> being able to download/store (and possibly directly process, e.g. with 
> Jena's RIOT streaming APIs) a large data stream. And for that it works 
> fine, and there is not really anything special that needs to be done 
> except not loading the whole stream into memory.
>
>

Well, what needs to be done there is that browsers need a way to process 
(1) without loading the whole thing into memory as you say, and (2) 
without even buffering very much in memory, but instead exerting 
backpressure when the consuming app can't handle the flow rate.

But, yeah, I guess that's probably not good enough to make LDP paging 
unnecessary.

        -- Sandro
Received on Friday, 13 June 2014 14:08:05 UTC