Re: Fwd: Re: Can we use TCP backpressure instead of paging? from Sandro Hawke on 2014-06-11 (public-ldp@w3.org from June 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 11 Jun 2014 17:46:36 -0400
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
CC: Andreas Kuckartz <a.kuckartz@ping.de>, LDP <public-ldp@w3.org>
Message-ID: <5398CE3C.8020309@w3.org>
On 06/11/2014 02:41 AM, Stian Soiland-Reyes wrote:
>
> Simple principles that seems reasonable, using existing infrastructure.
>
> However it would come at a large cost for the server to maintain a 
> large number of open TCP sockets. Paging has the advantage of allowing 
> a series of smaller, independent requests for different subsets of the 
> resource
>

You seem to have any idea what paging is going to be used for.   I'm at 
a disadvantage, since I don't.   (Seriously.)  What use cases do you see 
for paging?   In particular, what use is reverse paging, and why might a 
client want to page slowly?

> (and as we've discussed at the cost of ambiguous results on collection 
> modifications),
>


Actually, I see now that ambiguity is there with or without paging.    A 
naive server could be serving a normal GET on an LDP container when a 
PATCH, PUT, or POST comes in and changes what triples are in the 
container.  Does that change get reflected in the remaining parts of the 
GET?  Maybe "No", but that's a fairly expensive answer.

> and so could be served by multiple backend servers (including caches 
> and proxies) without the client noticing.
>

Again, I'm needing a use case where that would be useful.

> A TCP stream would also fall apart as a mobile client moves from 3G to 
> WiFi, with no good way to "continue". (Or would you combine this with 
> byte level HTTP ranges?)
>

Switching networks is a problem for everyone, and I suppose HTTP Range 
is the way folks handle it, if they're going to at all, so why should we 
be any different?

Skimming http://tools.ietf.org/html/rfc7233 "Hypertext Transfer Protocol 
(HTTP/1.1): Range Requests" hot off the presses...

There's a lot to be said for using Range Requests instead of paging, 
too, even if it's just byte ranges.

> In fact, this technique is now used for streaming video, by basically 
> having a playlist of a series of static, smaller video snippets which 
> are distributed on Content Delivery Networks using regular HTTP GET 
> (and so can be cached, proxied, etc) and joined together in the client.
>
> http://en.m.wikipedia.org/wiki/HTTP_Live_Streaming
> http://en.wikipedia.org/wiki/MPEG-DASH
>

Yeah, but in that case, the user really wants to skip around.   I see 
zero value in skipping around in a randomly ordered set of triples.  
There's some value in skipping around when the set is ordered, but I 
think the current ordering support is so weak as to be not worth 
implementing.   (In particular, it can only be determined by the server.)

> If you were to take this analogy into LDP, then one could request for 
> a paging content type, and get back a list of IRIs for pages which can 
> then be requested at the client's pace and choice.
>

Funny how much that sounds like an LDP container.   We have paging 
because THAT LIST might be very big.

> If a server insists on paging (collection is massive), but the client 
> can't Accept it, then it is a simple case of HTTP 406 Not Acceptable, 
> which should be more reasonable then giving only half the collection,  
> without the client understanding that there is paging.
>

My sense of the world is that if clients will break when the server uses 
a new protocol, and anyone important uses a client, then the server will 
never get away with adopting a new protocol. Extensibility in 
decentralized systems like the Web requires graceful fallback when the 
other party doesn't implement the new feature.

         -- Sandro


> On 10 Jun 2014 19:05, "Sandro Hawke" <sandro@w3.org 
> <mailto:sandro@w3.org>> wrote:
>
>     On 06/10/2014 01:34 PM, Andreas Kuckartz wrote:
>
>         -------- Original Message --------
>         To: Sandro Hawke <sandro@w3.org <mailto:sandro@w3.org>>
>         CC: Linked Data Platform WG <public-ldp-wg@w3.org
>         <mailto:public-ldp-wg@w3.org>>
>
>         Thanks a lot for thinking outside the current box of paging. I
>         have
>         looked at several paging approaches and do not really like any
>         of them.
>         They contaminate the real data and/or seem to be unnecessarily
>         complex
>         to implement.
>
>
>     You're welcome.     I did a little more investigation today,
>     including writing a tiny node.js server that streams data and lets
>     me see what what happens when I do things on the client.    This
>     little bit of testing was encouraging.
>
>     I also spoke to a few people at lunch here at the W3C AC meeting
>     and again no one saw any serious problem.
>
>     No clue yet how likely it is that browser vendors might implement
>     an extension to xhr to allowed WebApps to use this properly.
>
>             -- Sandro
>
>         Cheers,
>         Andreas
>         ---
>
>         Sandro Hawke:
>
>             Thinking about paging a little, I'm really wondering if
>             one isn't better
>             off using TCP backpressure instead of explicit paging.  It
>             would have
>             the huge advantage of requiring little or no special code
>             in the client
>             or the server, if they already implement high-performance
>             streaming.
>             (I started thinking about this because as far as I can
>             tell, if we want
>             to allow LDP servers to initiate paging, we have to
>             require every LDP
>             client to understand paging.   That's a high bar.   If you
>             want to
>             respond to that particular point, please change the
>             subject line!)
>
>             The key point here is that TCP already provides an elegant
>             way to handle
>             arbitrarily large data flows to arbitrary small devices on
>             poor
>             connections.    If someone knows of a good simple
>             explanation of this,
>             please send along a pointer.   My knowledge is largely
>             pre-web.
>
>             In web software we often to think of HTTP operations as
>             atomic steps
>             that take an arbitrary long time.   With that model, doing
>             a GET on a
>             100G resource is pretty much always a mistake.  But
>             nothing in the web
>             protocols requires thinking about it that way.   Instead,
>             one can think
>             of HTTP operations as opening streams which start data
>             flowing.
>
>             In some situations, those streams will complete in a small
>             number of
>             milliseconds, and there was no advantage to thinking of it
>             as a stream.
>                But once you hit human response time, it starts to make
>             sense to be
>             aware that there's a stream flowing.     If you're a
>             client doing a GET,
>             and it's taking more than maybe 0.5s, you can provide a
>             better UX by
>             displaying something for the user based on what you've
>             gotten so far.
>
>             What's more, if the app only needs the first N results, it
>             can read the
>             stream until it gets N results, then .abort() the xhr. The
>             server may
>             produce a few more results than were consumed before it
>             knows about the
>             .abort(), but I doubt that's too bad in most cases.
>
>             The case that's not handled well by current browsers is
>             pausing the
>             stream.   In theory, as I understand it (and I'm no
>             expert), this can be
>             done by simply using TCP flow control.   A non-browser app
>             that stops
>             reading data from its socket will exert backpressure,
>             eventually
>             resulting in the process writing data finding the stream's
>             not ready for
>             writing.   My sense is that can and does work rather well
>             in a wide
>             variety of situations.
>
>             Unfortunately, as I understand it, this doesn't work in
>             WebApps today,
>             because the browser will just keep reading and buffering
>             until it runs
>             out of VM.   If instead xhr (and websockets) had a limit
>             on how much it
>             would buffer, and webapps could set that (and probably it
>             starts around
>             10M), then a WebApp that stopped consuming data would produce
>             backpressure that would result in the server learning it
>             can't send any
>             more yet.     When the WebApp consumes more data, the
>             server can start
>             sending again.
>
>             I'm very curious if there's any technical reason this wont
>             work.   I
>             understand there may be problems with some existing
>             software, including
>             browsers, not handling this kind of streaming.  But is
>             there anything in
>             the basic internet protocols and implementations that make
>             this not
>             work?     For instance, it may be that after blocking for
>             a long time
>             (minutes, waiting for the user to request more),
>             restarting is too slow,
>             or something like that.
>
>             One possible workaround for the lack of browser support
>             would be for
>             servers to be a bit smarter and make some guesses.  For
>             example, a
>             server might say that requests with User-Agent being any
>             known browser
>             should be served normally for the first 10s, then drop to
>             a much slower
>             speed, consuming resources in the server, the net, and the
>             client.
>             WebApps that want to sidestep this could do so with a
>             Prefer header,
>             like Prefer initial-full-speed-duration=1s or 1000s.    At
>             some point,
>             when browsers allow webapp backpressure, those browser
>             User-Agent
>             strings could be exempted from this slowdown.
>
>                   -- Sandro
>
>
>
>
>
>
Received on Wednesday, 11 June 2014 21:46:46 UTC