Re: Requesting reviews of Provenance Access and Query document. from Erik Wilde on 2013-03-26 (public-ldp@w3.org from March 2013)

From: Erik Wilde <dret@berkeley.edu>
Date: Tue, 26 Mar 2013 14:26:08 -0700
To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
CC: LDP <public-ldp@w3.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <51521270.5080601@berkeley.edu>
hello graham.

thanks for your email!

On 2013-03-14 9:44 , Graham Klyne wrote:
> The section on pingbacks
> (http://www.w3.org/TR/2013/WD-prov-aq-20130312/#forward-provenance) is
> intended to provide a way for a publisher to learn about additional
> provenance related to a published resource.  We would be interested to
> hear from web services experts if they have any experience of using HTTP
> in this way, and if there are any known problems with the proposed
> approach.  (The PROV WG has agreed to drop the implied directionality in
> the name used and description.)

if understand this correctly, this is supposed to be some kind of push 
mechanism, instead of the usual pull model. there is little in terms of 
standardized/widely deployed technology on the web so far. browsers have 
been using "long pulls", but that's not very scalable and mostly because 
of some restrictions inherent to browsers.

the connection to LDP is a very interesting one, because there could be 
an interesting opportunity to leverage LDP's model. for this, i'll 
explain how this actually does work in Atom (which has a similar model 
of collections/entries). Atom provides feeds that most often are sorted 
by date. PuSH (PubSubHubbub, a now defunct google activity) defined a 
model that allowed people for subscribe to feeds by registering a 
callback URI. for any update in the feed, the PuSH server would package 
the update as an Atom entry and then POST it to the callback URI.

this being a pubsub model, this means that the PuSH servers much 
maintain subscriber lists (of all callback URIs). in PuSH, this can be 
layered, because a feed can advertise a hub for it (where clients can go 
and subscribe). While PuSH worked, it never gained critical mass, and 
was hampered by the fact that there was no standardized protocol how to 
subscribe/unsubscribe, so that was left for implementers to figure out. 
a more promising protocol should probably cover this aspect as well.

to summarize: when LDP is stable, it would be conceivable for LDP 
services to support a similar service: clients interested in updates 
would subscribe to a URI, and would get pushed updates in the form of 
LDP data (which would be exactly the same as they would have gotten when 
GETting the updates resource), thanks to the RESTfu design of the 
protocol: URIs are the interaction points for resources, and we can 
build protocols (such as this LDP/PuSH design) on top of it.

in fact, some PuSH implementations were even smart enough to batch push 
messages: when a client subscribed to multiple collections, or several 
updates happened, they would send "batch updates" that would be POSTed 
to the callback URI. the listening "client" would then act as if it had 
seen multiple updates getting published in the feed (had it used pull 
interactions).

LDP is definitely pull only, allowing you to GET resources at well 
defined URIs (GET the collection and GET all updates, GET individual 
resources and GET all data about them), so we will provide the right 
foundation in terms of a RESTful design. layering LDPush on it actually 
would be a nice validation of the benefits of RESTfu design, but would 
require additional protocol parts (probably) such as how to handle 
subscription and unsubscription.

implementation issues also arise in terms of scalability: how to deal 
with millions of subscribers? many PuSH implementations chose to handle 
this pragmatically and just automatically cancel subscriptions 
(requiring clients to refresh periodically), thus making it easier for 
servers to deal with the problem of subscriptions piling up because 
clients subscribed and never bothered to unsubscribe.

kind regards,

dret.

-- 
erik wilde | mailto:dret@berkeley.edu  -  tel:+1-510-2061079 |
            | UC Berkeley  -  School of Information (ISchool) |
            | http://dret.net/netdret http://twitter.com/dret |
Received on Tuesday, 26 March 2013 21:26:37 UTC