- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 11 Jun 2014 17:46:36 -0400
- To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- CC: Andreas Kuckartz <a.kuckartz@ping.de>, LDP <public-ldp@w3.org>
- Message-ID: <5398CE3C.8020309@w3.org>
On 06/11/2014 02:41 AM, Stian Soiland-Reyes wrote: > > Simple principles that seems reasonable, using existing infrastructure. > > However it would come at a large cost for the server to maintain a > large number of open TCP sockets. Paging has the advantage of allowing > a series of smaller, independent requests for different subsets of the > resource > You seem to have any idea what paging is going to be used for. I'm at a disadvantage, since I don't. (Seriously.) What use cases do you see for paging? In particular, what use is reverse paging, and why might a client want to page slowly? > (and as we've discussed at the cost of ambiguous results on collection > modifications), > Actually, I see now that ambiguity is there with or without paging. A naive server could be serving a normal GET on an LDP container when a PATCH, PUT, or POST comes in and changes what triples are in the container. Does that change get reflected in the remaining parts of the GET? Maybe "No", but that's a fairly expensive answer. > and so could be served by multiple backend servers (including caches > and proxies) without the client noticing. > Again, I'm needing a use case where that would be useful. > A TCP stream would also fall apart as a mobile client moves from 3G to > WiFi, with no good way to "continue". (Or would you combine this with > byte level HTTP ranges?) > Switching networks is a problem for everyone, and I suppose HTTP Range is the way folks handle it, if they're going to at all, so why should we be any different? Skimming http://tools.ietf.org/html/rfc7233 "Hypertext Transfer Protocol (HTTP/1.1): Range Requests" hot off the presses... There's a lot to be said for using Range Requests instead of paging, too, even if it's just byte ranges. > In fact, this technique is now used for streaming video, by basically > having a playlist of a series of static, smaller video snippets which > are distributed on Content Delivery Networks using regular HTTP GET > (and so can be cached, proxied, etc) and joined together in the client. > > http://en.m.wikipedia.org/wiki/HTTP_Live_Streaming > http://en.wikipedia.org/wiki/MPEG-DASH > Yeah, but in that case, the user really wants to skip around. I see zero value in skipping around in a randomly ordered set of triples. There's some value in skipping around when the set is ordered, but I think the current ordering support is so weak as to be not worth implementing. (In particular, it can only be determined by the server.) > If you were to take this analogy into LDP, then one could request for > a paging content type, and get back a list of IRIs for pages which can > then be requested at the client's pace and choice. > Funny how much that sounds like an LDP container. We have paging because THAT LIST might be very big. > If a server insists on paging (collection is massive), but the client > can't Accept it, then it is a simple case of HTTP 406 Not Acceptable, > which should be more reasonable then giving only half the collection, > without the client understanding that there is paging. > My sense of the world is that if clients will break when the server uses a new protocol, and anyone important uses a client, then the server will never get away with adopting a new protocol. Extensibility in decentralized systems like the Web requires graceful fallback when the other party doesn't implement the new feature. -- Sandro > On 10 Jun 2014 19:05, "Sandro Hawke" <sandro@w3.org > <mailto:sandro@w3.org>> wrote: > > On 06/10/2014 01:34 PM, Andreas Kuckartz wrote: > > -------- Original Message -------- > To: Sandro Hawke <sandro@w3.org <mailto:sandro@w3.org>> > CC: Linked Data Platform WG <public-ldp-wg@w3.org > <mailto:public-ldp-wg@w3.org>> > > Thanks a lot for thinking outside the current box of paging. I > have > looked at several paging approaches and do not really like any > of them. > They contaminate the real data and/or seem to be unnecessarily > complex > to implement. > > > You're welcome. I did a little more investigation today, > including writing a tiny node.js server that streams data and lets > me see what what happens when I do things on the client. This > little bit of testing was encouraging. > > I also spoke to a few people at lunch here at the W3C AC meeting > and again no one saw any serious problem. > > No clue yet how likely it is that browser vendors might implement > an extension to xhr to allowed WebApps to use this properly. > > -- Sandro > > Cheers, > Andreas > --- > > Sandro Hawke: > > Thinking about paging a little, I'm really wondering if > one isn't better > off using TCP backpressure instead of explicit paging. It > would have > the huge advantage of requiring little or no special code > in the client > or the server, if they already implement high-performance > streaming. > (I started thinking about this because as far as I can > tell, if we want > to allow LDP servers to initiate paging, we have to > require every LDP > client to understand paging. That's a high bar. If you > want to > respond to that particular point, please change the > subject line!) > > The key point here is that TCP already provides an elegant > way to handle > arbitrarily large data flows to arbitrary small devices on > poor > connections. If someone knows of a good simple > explanation of this, > please send along a pointer. My knowledge is largely > pre-web. > > In web software we often to think of HTTP operations as > atomic steps > that take an arbitrary long time. With that model, doing > a GET on a > 100G resource is pretty much always a mistake. But > nothing in the web > protocols requires thinking about it that way. Instead, > one can think > of HTTP operations as opening streams which start data > flowing. > > In some situations, those streams will complete in a small > number of > milliseconds, and there was no advantage to thinking of it > as a stream. > But once you hit human response time, it starts to make > sense to be > aware that there's a stream flowing. If you're a > client doing a GET, > and it's taking more than maybe 0.5s, you can provide a > better UX by > displaying something for the user based on what you've > gotten so far. > > What's more, if the app only needs the first N results, it > can read the > stream until it gets N results, then .abort() the xhr. The > server may > produce a few more results than were consumed before it > knows about the > .abort(), but I doubt that's too bad in most cases. > > The case that's not handled well by current browsers is > pausing the > stream. In theory, as I understand it (and I'm no > expert), this can be > done by simply using TCP flow control. A non-browser app > that stops > reading data from its socket will exert backpressure, > eventually > resulting in the process writing data finding the stream's > not ready for > writing. My sense is that can and does work rather well > in a wide > variety of situations. > > Unfortunately, as I understand it, this doesn't work in > WebApps today, > because the browser will just keep reading and buffering > until it runs > out of VM. If instead xhr (and websockets) had a limit > on how much it > would buffer, and webapps could set that (and probably it > starts around > 10M), then a WebApp that stopped consuming data would produce > backpressure that would result in the server learning it > can't send any > more yet. When the WebApp consumes more data, the > server can start > sending again. > > I'm very curious if there's any technical reason this wont > work. I > understand there may be problems with some existing > software, including > browsers, not handling this kind of streaming. But is > there anything in > the basic internet protocols and implementations that make > this not > work? For instance, it may be that after blocking for > a long time > (minutes, waiting for the user to request more), > restarting is too slow, > or something like that. > > One possible workaround for the lack of browser support > would be for > servers to be a bit smarter and make some guesses. For > example, a > server might say that requests with User-Agent being any > known browser > should be served normally for the first 10s, then drop to > a much slower > speed, consuming resources in the server, the net, and the > client. > WebApps that want to sidestep this could do so with a > Prefer header, > like Prefer initial-full-speed-duration=1s or 1000s. At > some point, > when browsers allow webapp backpressure, those browser > User-Agent > strings could be exempted from this slowdown. > > -- Sandro > > > > > >
Received on Wednesday, 11 June 2014 21:46:46 UTC