- From: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
- Date: Wed, 27 Mar 2013 12:48:03 +0000
- To: Erik Wilde <dret@berkeley.edu>
- CC: LDP <public-ldp@w3.org>, W3C provenance WG <public-prov-wg@w3.org>
Hi Erik, Thanks for your comments. I think the point of your message is concern for scalability because of resource requirements for handling unknown numbers of subscriptions in a pub/sub message push environment. My short response is that the mechanism described is not a pub/sub mechanism, in that there is no subscription, so those concerns do not apply. (I think it's entirely possible that pingback can be used *with* a pub-sub service, and the scalability issues could indeed be of concern for any pub-sub mechanism used, but that's a separate discussion.) More details below. On 26/03/2013 21:26, Erik Wilde wrote: > hello graham. > > thanks for your email! > > On 2013-03-14 9:44 , Graham Klyne wrote: >> The section on pingbacks >> (http://www.w3.org/TR/2013/WD-prov-aq-20130312/#forward-provenance) is >> intended to provide a way for a publisher to learn about additional >> provenance related to a published resource. We would be interested to >> hear from web services experts if they have any experience of using HTTP >> in this way, and if there are any known problems with the proposed >> approach. (The PROV WG has agreed to drop the implied directionality in >> the name used and description.) > > if understand this correctly, this is supposed to be some kind of push > mechanism, instead of the usual pull model. there is little in terms of > standardized/widely deployed technology on the web so far. browsers have been > using "long pulls", but that's not very scalable and mostly because of some > restrictions inherent to browsers. If you mean "push mechanism" in the sense of being initiated by the provider of information, then yes. But it is very different to techniques like long polling in that the provider of information is also the client (initiator) of the transaction. In this, I think it's no different to any other HTTP POST or PUT operation. For long-polling (and pub-sub), the recipient of information is assumed to have some a priori awareness of the provider. The pingback mechnism is the other way round: the provider is assumed to have a priori awareness of the recipient. Push mechanisms are often used as a way to avoid the inefficiencies (or effort/latency trade-off) of polling; pingback is different: it is designed allow discovery of information that is not generally discoverable through polling. > > the connection to LDP is a very interesting one, because there could be an > interesting opportunity to leverage LDP's model. for this, i'll explain how this > actually does work in Atom (which has a similar model of collections/entries). > Atom provides feeds that most often are sorted by date. PuSH (PubSubHubbub, a > now defunct google activity) defined a model that allowed people for subscribe > to feeds by registering a callback URI. for any update in the feed, the PuSH > server would package the update as an Atom entry and then POST it to the > callback URI. > > this being a pubsub model, this means that the PuSH servers much maintain > subscriber lists (of all callback URIs). in PuSH, this can be layered, because a > feed can advertise a hub for it (where clients can go and subscribe). While PuSH > worked, it never gained critical mass, and was hampered by the fact that there > was no standardized protocol how to subscribe/unsubscribe, so that was left for > implementers to figure out. a more promising protocol should probably cover this > aspect as well. > > to summarize: when LDP is stable, it would be conceivable for LDP services to > support a similar service: clients interested in updates would subscribe to a > URI, and would get pushed updates in the form of LDP data (which would be > exactly the same as they would have gotten when GETting the updates resource), > thanks to the RESTfu design of the protocol: URIs are the interaction points for > resources, and we can build protocols (such as this LDP/PuSH design) on top of it. > > in fact, some PuSH implementations were even smart enough to batch push > messages: when a client subscribed to multiple collections, or several updates > happened, they would send "batch updates" that would be POSTed to the callback > URI. the listening "client" would then act as if it had seen multiple updates > getting published in the feed (had it used pull interactions). > > LDP is definitely pull only, allowing you to GET resources at well defined URIs > (GET the collection and GET all updates, GET individual resources and GET all > data about them), so we will provide the right foundation in terms of a RESTful > design. layering LDPush on it actually would be a nice validation of the > benefits of RESTfu design, but would require additional protocol parts > (probably) such as how to handle subscription and unsubscription. > > implementation issues also arise in terms of scalability: how to deal with > millions of subscribers? many PuSH implementations chose to handle this > pragmatically and just automatically cancel subscriptions (requiring clients to > refresh periodically), thus making it easier for servers to deal with the > problem of subscriptions piling up because clients subscribed and never bothered > to unsubscribe. A difference between what is proposed for provenance pingback and pub/sub mechanisms is that there is no "fan out" of data, and hence no requirement to record subscriptions. It's more the reverse, declaring a kind of collection point for information, but (intentionally) being quite silent about what the recipient may do with that data. As such, it's more of a discovery mechanism than a propagation mechanism. So, while I can appreciate that there may be applications that use pingpacks in conjunction with pub/sub (or other distribution mechanisms), I don't think such considerations have any direct bearing on the pingback as described. If LDP does, in due course, introduce frameworks that support pub-sub distribution, I would see the pingback as being complementary: some systems may choose to pass on information from incoming pingbacks to a set of subscribers using these mechanisms. I also recognize that the pingback mechanism may be used as part of a larger pub-sub framework (in that subscriptions may be created for pingback resources), and any system that uses pub-sub in such a way will indeed to be aware of subscription scaling issues. But such use is not required by, and is outside the scope of, the mechanism specified. Thus, I feel the only scaling concern for pingback services as described is whether they can deal with the potential numbers of incoming messages. This is covered somewhat in the security considerations section. In particular, there is no requirement on a server to do anything in particular with a pingback, so it is free to take steps to protect its resources from abuse. #g --
Received on Wednesday, 27 March 2013 12:53:35 UTC