Re: Requesting reviews of Provenance Access and Query document. from Graham Klyne on 2013-03-27 (public-prov-wg@w3.org from March 2013)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Wed, 27 Mar 2013 19:37:08 +0000
To: Erik Wilde <dret@berkeley.edu>
CC: LDP <public-ldp@w3.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <51534A64.3040004@zoo.ox.ac.uk>
Hi Erik,

I think you're extrapolating more complete service capabilities than are 
actually provided by the pingback mechanism alone.

The goal of the provenance pingback facility is to enable discovery of new 
related provenance information that is otherwise undiscoverable.  It ends there. 
  Other systems are free to build on this capability, but it's not the goal of 
this specification to specify what or how such additional systems may be 
constructed.

In particular, pingbacks may well be used with feeds in the sense you describe, 
but that's not required, and how to do so is intentionally not specified.  I 
think pingback should be complementary to the LDP containers you mention.

More details below...

On 27/03/2013 16:55, Erik Wilde wrote:
> hello graham.
>
> my apologies for obviously misinterpreting "pingback". i always found this term
> terribly confusing, and i think i have fallen into this terminology trap again
> by reading the term too literally.
>
> given this misunderstanding and now hopefully understanding better what you're
> asking for, it seems basically what you're looking for is the service model of
> feeds (which is what typical pingbacks are using). a way how clients can consume
> updates in a well-defined and unified way, based on a regular pull model, adding
> create/update/delete capabilities. clients creating pingbacks can POST to the
> pingback feed, clients interested in learning about pingbacks can GET the
> pingback feed.

I think you maybe reading too much into what we are trying to achieve.  The 
pingback spec isn't trying to create any specific service model, but rather is a 
mechanism that enables certain capabilities that would not be possible in its 
absence.  A cog rather than a machine.

Certainly, there are service capabilities that have motivated its design, but 
the specification attempts to provide a minimum of additional functionality on 
which they can be built, and in the process (hopefully) supports other kinds of 
functionality we haven't yet thought of.

Thus, as with pub-sub which you mentioned previously, I can imagine the pingback 
mechanism being used in a wider context of feeds, but there's no intent that 
feeds are the only way it might be used.

> this interaction pattern in general is something that LDP wants to support. a
> container accepts POST, and then creates a new resource as a result. this
> resource would represent the pingback, and in your case might represent the
> information about how a resource has been used by somebody who now POSTs the
> pingback.
>
> as a result, the newly created pingback resource now is linked to from the
> container, and anybody GETting the pingback container will see this link, and
> then can GET the pingback resource itself (if they're interested in actually
> consuming it).

That's one possible use, but to reinforce my previous point, let me suggest an 
alternative possibility (somewhat contrived, maybe) that indicates an almost 
opposite service pattern.  It may be that the publisher of a resource uses 
provenance information made available through a pingback to determine that a 
published resource has been superseded and may be withdrawn from circulation.

So, yes, the pingback might be used in a context of information feeds, but in 
this respect, I would see it as complementary to any feed supporting mechanism 
that is defined by the LDP group based around LDP containers.

>
> here are some things to look out for, given where we are:
>
> - this depends on date-ordered containers, so that pingback listeners can do
> conditional GETs, and will only see updates when the container has actually
> changed. as in feeds, we don't say anything about a "default order", but it
> certainly can be by update timestamp.

Actually, I could imagine, or even expect, a provenance container that does not 
depend on timestamps.  Provenance, by it's nature, defines implicit (partial) 
ordering of entities through derivation and usage/generation relations.  I don't 
see any particular value in simply treating provenance submissions as chunks of 
data that are turned into a linear sequence.  Of course, it's not forbidden to 
do that, I'm just not seeing that as an especially useful function.

>
> - we currently have no defined ordering. it is not sure whether we will have
> client-requested ordering. servers can order any way they like (including the
> default behavior that many feed servers use, which is update-based). we need a
> way how to represent order in containers at the very least, and that is
> currently being discussed.

See above.  This rather assumes that the incoming units are preserved as units 
of information.  For provenance, this isn't necessarily the case.

>
> - we may or may not have client-controlled paging. this again  might not matter,
> because as with regular feeds, pingbacks probably are based on a service model
> where all you GET to see is the last 10/20/100 pingbacks, and that might be
> enough for most clients. if you need a full history, a service could either
> always expose the complete history of pingbacks (which can of course become very
> large), or, should LDP not have paging at least in the form of "next page" links
> on containers, then pingback containers could introduce their own paging links.

See above.  The "last 10/20/100 pingbacks" isn't necessarily a meaningful unit 
of consideration.  We're more likely to be interested in the "last 2/5/10 steps 
of derivation", which may be quite unrelated.

>
> - for this to scale, common metadata is critical. feeds have "updated"
> timestamps, so that entries can be aggregated, filtered, and republished based
> on a common metadata model of entries. currently, it seems that the LDP WG
> majority does not want to have a metadata model for containers and resources. in
> that case, provenance/pingback would need to define their own timestamp model,
> and mechanics would depend on LDP servers supporting this particular metadata
> for ordering their containers.

That's good, because with provenance the common metadata models for (feed) 
containers probably wouldn't (always) be very useful;  timestamps have a role to 
play, but I think the inherent graph of a "provenance trace" is a more likely 
primary concern.

>
> - in terms of resource orientation, we should have what you need: you can POST
> to a container, which creates a new resource. this resource then represents the
> provenance pingback. clients GETting the pingback container will see this new
> member, and caching and conditional GETs work fine. clients can then GET the
> pingback resource to see the new provenance information. the original pingback
> provider can even go in an PUT an updated version of that resource (let's say
> there was an error in the initial POST), and it should show up again (with a
> newer "updated" timestamp) in the container. for removing a pingback, you can
> DELETE the pingback, and then LDP has the interesting problem of how to
> communicate that "this member now no longer is a member" (this problem was only
> recently solved in atom by http://tools.ietf.org/html/rfc6721 which adds an
> entry that, when showing up in feed, tells the feed consumer that a certain
> entry has been deleted from a collection). that last part (communicating DELETE)
> is not something we have discussed so far, and i don't see it being
> discussed/addressed for v1.
>
> to summarize: i think we have the resource model that supports the interactions
> you're looking for. we might not have the full data model support you need to
> make it work out of the box, because we might not give you the "updated"
> timestamp to work with. we also might not support communicating the deletion of
> resources. it's still not quite clear what LDP eventually will or will not
> provide out of the box. but i will suggest to the group that we add this example
> to our use cases. we're not far off, and it's very good input for us. thanks a lot!

That all sounds like a reasonable deployment scenario, and one that I think is 
broadly consistent with use of pingbacks to trigger such operations.  If a 
pingback service is modelled as a container, LDP-style, then the pingback POST 
might trigger addition of additional information to the container.  Note that 
the pingback doesn't contain content, just links to content, but I don't think 
that is a problem.

#g
--
Received on Wednesday, 27 March 2013 22:23:50 UTC