- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Fri, 13 Jun 2014 11:47:32 +0100
- To: Sandro Hawke <sandro@w3.org>
- Cc: Andreas Kuckartz <a.kuckartz@ping.de>, LDP <public-ldp@w3.org>
- Message-ID: <CAPRnXtmPjEqTqt7no_Uk_oadbc22zOVLhUPzsOKyvODhDtcsDw@mail.gmail.com>
On 11 Jun 2014 22:47, "Sandro Hawke" <sandro@w3.org> wrote: > You seem to have any idea what paging is going to be used for. I'm at a disadvantage, since I don't. (Seriously.) What use cases do you see for paging? In particular, what use is reverse paging, and why might a client want to page slowly? Paging is particularly useful for clients with small processing capability and a requirement for short response time, such as mobile apps and web applications. A typical example would be a "dynamic scroll list" which loads its content on demand using REST API calls. If the user never scrolls down to item 500, then there is no need to fetch beyond item 100. So the user is in control of the scrolling - there is no point in spending battery and limited network quotas on fetching too much content in advance (but a good implementation would prefetch at least the next screen-full of items to give a sense of instant response). Reverse paging could be because the user clicks the column header to view the list in the opposite order (e.g. oldest to newest vs. newest to oldest). A Twitter-like app would for instance mainly be listing tweets from now and backwards - it does not care about fetching the tweets from last week while the user was offline (unless she scrolls that far down into her stream). However, once that is listed, the app wants to check for updates in the future. For the server, paging means a reduction in long-living resource hogs (e.g. a massive page generation eating away memory and CPU), at the cost of potentially rerunning the query on the database (with appropriate LIMIT) once the next page request comes in. As any server can respond to any page (given a reasonably synchronized or shared database), administrators can scale out with more server instances on high demand, such as when deploying on the Amazon AWS cloud. > Actually, I see now that ambiguity is there with or without paging. A naive server could be serving a normal GET on an LDP container when a PATCH, PUT, or POST comes in and changes what triples are in the container. Does that change get reflected in the remaining parts of the GET? Maybe "No", but that's a fairly expensive answer. I think this ambiguity has been largely ignored by REST community, or not been a bit issue because pagination often is from oldest to newest - question is then just if the new item appears at the last page (as new item or as 'next page' link) or not. Done as "proper REST", a pagination resource would have a unique ID corresponding to a static backend response, which as you suggest would eventually time out with 410 Gone. This is however not usually practically implemented. It is probably due to the common implementations using SQL or NoSQL databases in the backend, but hidden through some abstraction layers. It is then difficult to write an representation-generating implementation that uses some form of saved snapshot of the whole database, because that would have to be implemented either at a very low level (e.g. SQL transaction) or by generating the whole unpaged resource representation and store it temporarily in some kind of temporary file. >> and so could be served by multiple backend servers (including caches and proxies) without the client noticing. > Again, I'm needing a use case where that would be useful. Any of the big "web-scale" sites use this kind of deployment: Facebook, Twitter, Google (incl. Gmail), Github: stain@biggie-mint ~ $ dig google.com ; <<>> DiG 9.9.3-rpz2+rl.13214.22-P2-Ubuntu-1:9.9.3.dfsg.P2-4ubuntu1.1 <<>> google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9221 ;; flags: qr rd ra; QUERY: 1, ANSWER: 16, AUTHORITY: 4, ADDITIONAL: 5 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 28 IN A 64.15.119.118 google.com. 28 IN A 64.15.119.119 google.com. 28 IN A 64.15.119.123 google.com. 28 IN A 64.15.119.84 google.com. 28 IN A 64.15.119.88 google.com. 28 IN A 64.15.119.89 google.com. 28 IN A 64.15.119.93 google.com. 28 IN A 64.15.119.94 google.com. 28 IN A 64.15.119.98 google.com. 28 IN A 64.15.119.99 google.com. 28 IN A 64.15.119.103 google.com. 28 IN A 64.15.119.104 google.com. 28 IN A 64.15.119.108 google.com. 28 IN A 64.15.119.109 google.com. 28 IN A 64.15.119.113 google.com. 28 IN A 64.15.119.114 28 seconds later, it is 64.15.119.98 that is on top and that would handle my "Next" click. Any of the servers can be taken offline after 255 seconds without anyone noticing. Other deployments may do this through routing, transparent proxies, etc. In conclusion I would say that the way servers are now deployed (which rely on the stateless nature of HTTP), and the way software frameworks are being used (which abstract away socket operatoins), then a TCP-based back-pressure does not really solve anything that paging is already being used for in REST APIs around the world. It only solves being able to download/store (and possibly directly process, e.g. with Jena's RIOT streaming APIs) a large data stream. And for that it works fine, and there is not really anything special that needs to be done except not loading the whole stream into memory.
Received on Friday, 13 June 2014 10:48:20 UTC