- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 11 Jun 2014 17:46:36 -0400
- To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- CC: Andreas Kuckartz <a.kuckartz@ping.de>, LDP <public-ldp@w3.org>
- Message-ID: <5398CE3C.8020309@w3.org>
On 06/11/2014 02:41 AM, Stian Soiland-Reyes wrote:
>
> Simple principles that seems reasonable, using existing infrastructure.
>
> However it would come at a large cost for the server to maintain a
> large number of open TCP sockets. Paging has the advantage of allowing
> a series of smaller, independent requests for different subsets of the
> resource
>
You seem to have any idea what paging is going to be used for. I'm at
a disadvantage, since I don't. (Seriously.) What use cases do you see
for paging? In particular, what use is reverse paging, and why might a
client want to page slowly?
> (and as we've discussed at the cost of ambiguous results on collection
> modifications),
>
Actually, I see now that ambiguity is there with or without paging. A
naive server could be serving a normal GET on an LDP container when a
PATCH, PUT, or POST comes in and changes what triples are in the
container. Does that change get reflected in the remaining parts of the
GET? Maybe "No", but that's a fairly expensive answer.
> and so could be served by multiple backend servers (including caches
> and proxies) without the client noticing.
>
Again, I'm needing a use case where that would be useful.
> A TCP stream would also fall apart as a mobile client moves from 3G to
> WiFi, with no good way to "continue". (Or would you combine this with
> byte level HTTP ranges?)
>
Switching networks is a problem for everyone, and I suppose HTTP Range
is the way folks handle it, if they're going to at all, so why should we
be any different?
Skimming http://tools.ietf.org/html/rfc7233 "Hypertext Transfer Protocol
(HTTP/1.1): Range Requests" hot off the presses...
There's a lot to be said for using Range Requests instead of paging,
too, even if it's just byte ranges.
> In fact, this technique is now used for streaming video, by basically
> having a playlist of a series of static, smaller video snippets which
> are distributed on Content Delivery Networks using regular HTTP GET
> (and so can be cached, proxied, etc) and joined together in the client.
>
> http://en.m.wikipedia.org/wiki/HTTP_Live_Streaming
> http://en.wikipedia.org/wiki/MPEG-DASH
>
Yeah, but in that case, the user really wants to skip around. I see
zero value in skipping around in a randomly ordered set of triples.
There's some value in skipping around when the set is ordered, but I
think the current ordering support is so weak as to be not worth
implementing. (In particular, it can only be determined by the server.)
> If you were to take this analogy into LDP, then one could request for
> a paging content type, and get back a list of IRIs for pages which can
> then be requested at the client's pace and choice.
>
Funny how much that sounds like an LDP container. We have paging
because THAT LIST might be very big.
> If a server insists on paging (collection is massive), but the client
> can't Accept it, then it is a simple case of HTTP 406 Not Acceptable,
> which should be more reasonable then giving only half the collection,
> without the client understanding that there is paging.
>
My sense of the world is that if clients will break when the server uses
a new protocol, and anyone important uses a client, then the server will
never get away with adopting a new protocol. Extensibility in
decentralized systems like the Web requires graceful fallback when the
other party doesn't implement the new feature.
-- Sandro
> On 10 Jun 2014 19:05, "Sandro Hawke" <sandro@w3.org
> <mailto:sandro@w3.org>> wrote:
>
> On 06/10/2014 01:34 PM, Andreas Kuckartz wrote:
>
> -------- Original Message --------
> To: Sandro Hawke <sandro@w3.org <mailto:sandro@w3.org>>
> CC: Linked Data Platform WG <public-ldp-wg@w3.org
> <mailto:public-ldp-wg@w3.org>>
>
> Thanks a lot for thinking outside the current box of paging. I
> have
> looked at several paging approaches and do not really like any
> of them.
> They contaminate the real data and/or seem to be unnecessarily
> complex
> to implement.
>
>
> You're welcome. I did a little more investigation today,
> including writing a tiny node.js server that streams data and lets
> me see what what happens when I do things on the client. This
> little bit of testing was encouraging.
>
> I also spoke to a few people at lunch here at the W3C AC meeting
> and again no one saw any serious problem.
>
> No clue yet how likely it is that browser vendors might implement
> an extension to xhr to allowed WebApps to use this properly.
>
> -- Sandro
>
> Cheers,
> Andreas
> ---
>
> Sandro Hawke:
>
> Thinking about paging a little, I'm really wondering if
> one isn't better
> off using TCP backpressure instead of explicit paging. It
> would have
> the huge advantage of requiring little or no special code
> in the client
> or the server, if they already implement high-performance
> streaming.
> (I started thinking about this because as far as I can
> tell, if we want
> to allow LDP servers to initiate paging, we have to
> require every LDP
> client to understand paging. That's a high bar. If you
> want to
> respond to that particular point, please change the
> subject line!)
>
> The key point here is that TCP already provides an elegant
> way to handle
> arbitrarily large data flows to arbitrary small devices on
> poor
> connections. If someone knows of a good simple
> explanation of this,
> please send along a pointer. My knowledge is largely
> pre-web.
>
> In web software we often to think of HTTP operations as
> atomic steps
> that take an arbitrary long time. With that model, doing
> a GET on a
> 100G resource is pretty much always a mistake. But
> nothing in the web
> protocols requires thinking about it that way. Instead,
> one can think
> of HTTP operations as opening streams which start data
> flowing.
>
> In some situations, those streams will complete in a small
> number of
> milliseconds, and there was no advantage to thinking of it
> as a stream.
> But once you hit human response time, it starts to make
> sense to be
> aware that there's a stream flowing. If you're a
> client doing a GET,
> and it's taking more than maybe 0.5s, you can provide a
> better UX by
> displaying something for the user based on what you've
> gotten so far.
>
> What's more, if the app only needs the first N results, it
> can read the
> stream until it gets N results, then .abort() the xhr. The
> server may
> produce a few more results than were consumed before it
> knows about the
> .abort(), but I doubt that's too bad in most cases.
>
> The case that's not handled well by current browsers is
> pausing the
> stream. In theory, as I understand it (and I'm no
> expert), this can be
> done by simply using TCP flow control. A non-browser app
> that stops
> reading data from its socket will exert backpressure,
> eventually
> resulting in the process writing data finding the stream's
> not ready for
> writing. My sense is that can and does work rather well
> in a wide
> variety of situations.
>
> Unfortunately, as I understand it, this doesn't work in
> WebApps today,
> because the browser will just keep reading and buffering
> until it runs
> out of VM. If instead xhr (and websockets) had a limit
> on how much it
> would buffer, and webapps could set that (and probably it
> starts around
> 10M), then a WebApp that stopped consuming data would produce
> backpressure that would result in the server learning it
> can't send any
> more yet. When the WebApp consumes more data, the
> server can start
> sending again.
>
> I'm very curious if there's any technical reason this wont
> work. I
> understand there may be problems with some existing
> software, including
> browsers, not handling this kind of streaming. But is
> there anything in
> the basic internet protocols and implementations that make
> this not
> work? For instance, it may be that after blocking for
> a long time
> (minutes, waiting for the user to request more),
> restarting is too slow,
> or something like that.
>
> One possible workaround for the lack of browser support
> would be for
> servers to be a bit smarter and make some guesses. For
> example, a
> server might say that requests with User-Agent being any
> known browser
> should be served normally for the first 10s, then drop to
> a much slower
> speed, consuming resources in the server, the net, and the
> client.
> WebApps that want to sidestep this could do so with a
> Prefer header,
> like Prefer initial-full-speed-duration=1s or 1000s. At
> some point,
> when browsers allow webapp backpressure, those browser
> User-Agent
> strings could be exempted from this slowdown.
>
> -- Sandro
>
>
>
>
>
>
Received on Wednesday, 11 June 2014 21:46:46 UTC