Fwd: Re: Can we use TCP backpressure instead of paging? from Andreas Kuckartz on 2014-06-10 (public-ldp@w3.org from June 2014)

From: Andreas Kuckartz <a.kuckartz@ping.de>
Date: 10 Jun 2014 19:34:04 +0200
To: public-ldp@w3.org
Message-ID: <5397418C.2040800@ping.de>
-------- Original Message --------
To: Sandro Hawke <sandro@w3.org>
CC: Linked Data Platform WG <public-ldp-wg@w3.org>

Thanks a lot for thinking outside the current box of paging. I have
looked at several paging approaches and do not really like any of them.
They contaminate the real data and/or seem to be unnecessarily complex
to implement.

Cheers,
Andreas
---

Sandro Hawke:
> Thinking about paging a little, I'm really wondering if one isn't better
> off using TCP backpressure instead of explicit paging.  It would have
> the huge advantage of requiring little or no special code in the client
> or the server, if they already implement high-performance streaming.   
> (I started thinking about this because as far as I can tell, if we want
> to allow LDP servers to initiate paging, we have to require every LDP
> client to understand paging.   That's a high bar.   If you want to
> respond to that particular point, please change the subject line!)
> 
> The key point here is that TCP already provides an elegant way to handle
> arbitrarily large data flows to arbitrary small devices on poor
> connections.    If someone knows of a good simple explanation of this,
> please send along a pointer.   My knowledge is largely pre-web.
> 
> In web software we often to think of HTTP operations as atomic steps
> that take an arbitrary long time.   With that model, doing a GET on a
> 100G resource is pretty much always a mistake.  But nothing in the web
> protocols requires thinking about it that way.   Instead, one can think
> of HTTP operations as opening streams which start data flowing.
> 
> In some situations, those streams will complete in a small number of
> milliseconds, and there was no advantage to thinking of it as a stream.
>   But once you hit human response time, it starts to make sense to be
> aware that there's a stream flowing.     If you're a client doing a GET,
> and it's taking more than maybe 0.5s, you can provide a better UX by
> displaying something for the user based on what you've gotten so far.
> 
> What's more, if the app only needs the first N results, it can read the
> stream until it gets N results, then .abort() the xhr.   The server may
> produce a few more results than were consumed before it knows about the
> .abort(), but I doubt that's too bad in most cases.
> 
> The case that's not handled well by current browsers is pausing the
> stream.   In theory, as I understand it (and I'm no expert), this can be
> done by simply using TCP flow control.   A non-browser app that stops
> reading data from its socket will exert backpressure, eventually
> resulting in the process writing data finding the stream's not ready for
> writing.   My sense is that can and does work rather well in a wide
> variety of situations.
> 
> Unfortunately, as I understand it, this doesn't work in WebApps today,
> because the browser will just keep reading and buffering until it runs
> out of VM.   If instead xhr (and websockets) had a limit on how much it
> would buffer, and webapps could set that (and probably it starts around
> 10M), then a WebApp that stopped consuming data would produce
> backpressure that would result in the server learning it can't send any
> more yet.     When the WebApp consumes more data, the server can start
> sending again.
> 
> I'm very curious if there's any technical reason this wont work.   I
> understand there may be problems with some existing software, including
> browsers, not handling this kind of streaming.  But is there anything in
> the basic internet protocols and implementations that make this not
> work?     For instance, it may be that after blocking for a long time
> (minutes, waiting for the user to request more), restarting is too slow,
> or something like that.
> 
> One possible workaround for the lack of browser support would be for
> servers to be a bit smarter and make some guesses.  For example, a
> server might say that requests with User-Agent being any known browser
> should be served normally for the first 10s, then drop to a much slower
> speed, consuming resources in the server, the net, and the client.   
> WebApps that want to sidestep this could do so with a Prefer header,
> like Prefer initial-full-speed-duration=1s or 1000s.    At some point,
> when browsers allow webapp backpressure, those browser User-Agent
> strings could be exempted from this slowdown.
> 
>      -- Sandro
Received on Tuesday, 10 June 2014 17:34:31 UTC