W3C home > Mailing lists > Public > public-xg-prov@w3.org > November 2010

Re: W3C Provenance Working Group Charter - another alternate version for discussion

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Fri, 19 Nov 2010 15:54:20 +0000
Cc: "Myers, Jim" <MYERSJ4@rpi.edu>, Paulo Pinheiro da Silva <paulo@utep.edu>, public-xg-prov@w3.org
Message-Id: <C3377F90-B3DB-4F67-A29B-3935058668AA@inf.ed.ac.uk>
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
('binary' encoding is not supported, stored as-is)
Hi,

Just to respond briefly since I will miss the teleconference (about to  
get on a train):

My objection in the offline discussion Luc refers to was based on the  
fact that there are not a lot of proposals that already deal with  
expressing history/dynamic provenance for versioned, structured data  
(e.g. databases), hence it may not be realistic to expect consensus to  
emerge within a WG.  Thus, I didn't want the charter to make it sound  
like solving this problem was a requirement (or to yield a solution  
that claimed to address this problem but didn't).

If there is near-consensus for the easier problem of expressing  
versioning or "source" relationships on non-structured data then I  
wouldn't object to that.  And in any case I don't have veto power.

--James

On Nov 19, 2010, at 2:37 PM, Luc Moreau wrote:

> Jim,
>
> Thanks for these constructive suggestions.
>
> I like the notion of source, it is definitely very useful, and is  
> necessary to address the News Aggregator scenario.
>
> The idea of a plan/recipe is also crucial for workflow based  
> systems. It would be good to have standardized conventions
> to refer to them.
>
> I also agree that developing a comprehensive solution for dealing  
> with mutable state is very challenging. This
> topic would probably involve notions of provenance from the database  
> community. In previous discussion with
> James Cheney, we felt that this should be out of scope of  
> standardization activity.  However, I agree with you
> that it would be nice to address a tractable subset of this problem.  
> In particular, the ability to relate versions/states
> of a resource would be useful.
>
> In summary, these are concrete terms: pml:Source, pml:Engine,  
> pml:Rule/Plan, pml:hasEngine, pml:hasRule/Plan.
>
> I would also add:  Resource, State Representation, Version
>
> I also recall Paulo mentioning Query. Should this be on the table of  
> a standardization activity?
>
> Thanks,
> Luc
>
> On 11/18/2010 03:55 PM, Myers, Jim wrote:
>>
>> Apologies for being silent this week – hard to get coherent time  
>> here, so some random thoughts. My take on the technical issues  
>> being raised in the edits is that:
>>
>> The basic core that was addressed by OPM is not controversial but  
>> naming of concepts could be improved  (the text changes are more  
>> focused on making it clearer that OPM didn’t invent these concepts  
>> - it’s value is really as evidence that this is roughly the right  
>> scope to address (OPM was the set that we could get agreement on)).
>>
>> I do see a few places where people are suggesting stretching that  
>> scope a bit:
>>
>> Sources – the idea of an agent or mutable resource from which a  
>> resource of interest (the thing were documenting the provenance of)  
>> comes. Nominally this could be dealt with by recording a an agent  
>> controlling a publication process to produce the resource and I  
>> think the question to resolve is whether a special construct would  
>> be useful. I think the PML folks would argue that it is since an  
>> agent-process-resource relation is too generic to signal that being  
>> a source is special (i.e. an article derived from the NYTimes  
>> differs in importance from the same article being handed to you by  
>> Joe the newspaper seller (both are just agent-process-resource  
>> constructs). With others in the XG group having special constructs  
>> for publication/retrieval from a service, it seems like consensus  
>> might be possible on this and I think having discussion of this be  
>> part of the working group scope would be useful.
>>
>> Another construct that looks useful is some link between provenance  
>> and the plan/recipe that was being followed. What that recipe is  
>> seems to differ – a workflow template, logical rules, mathematical  
>> function, scientific experiment protocol, a business contract, etc.  
>> – but the basic capability to make a link between a process and the  
>> recipe again seems like a useful and relatively non-controversial  
>> extension that a working group could address.
>>
>> A third area where it may make sense to do something would be to  
>> make a connection to mutable resources. I think this is a hard  
>> problem in the general case but some extension to standardize how  
>> one might link resources to a mutable thing as versions might be  
>> something that could be agreed to. Along the lines of the paper I  
>> sent in to IPAW this year, I think this is an area where a working  
>> group could really get stuck, but it’s also one where many groups  
>> have some capability and we’ve seen it arise in many use cases, so  
>> some capability here might broaden the usability. I tend to think  
>> of this as a profile that connect provenance with an existing  
>> versioning model rather than something new developed as  part of a  
>> language.
>>
>> Beyond this, I think we enter the area of research/domain  
>> extensions that showed up in the charter in the ‘however the  
>> languages also have lots of differences…’ part. (Other than  
>> wordsmithing -  to try to make it clearer that these differences  
>> are not a problem for reaching a standard but are instead a good  
>> way to delineate the scope of provenance that seems to have settled  
>> down and be done in common ways versus the set of advanced features  
>> where researchers are still experimenting, trying to discover what  
>> aspects of provenance provide the most value – I don’t think I’ve  
>> seen other concrete technical suggestions for more scope)
>>
>> The last thing I see is continuing wordsmithing to make it clear  
>> that OPM is not the only (or first) provenance language while also  
>> acknowledging that the XG group found it useful as evidence for  
>> what aspects of provenance were ready for standardization. I  
>> suspect that we could continue to edit this aspect forever (if  
>> Yolanda let us) – it will be important that we all let go of the  
>> text when we can live with it versus when we really happy with it.  
>> I’ve started and stopped editing a couple of times this week to try  
>> and come up with text that would move this aspect of things  
>> forward, but have not succeeded.
>>
>>   Jim
>>
>> From: public-xg-prov-request@w3.org [mailto:public-xg-prov-request@w3.org 
>> ] On Behalf Of Luc Moreau
>> Sent: Wednesday, November 17, 2010 5:27 PM
>> To: Paulo Pinheiro da Silva
>> Cc: public-xg-prov@w3.org
>> Subject: Re: W3C Provenance Working Group Charter - another  
>> alternate version for discussion
>>
>>
>>
>> Paulo,
>> Thanks for editing the draft charter and sending it to the group.
>>
>> Discussion with Satya have indicated that the *Name of the  
>> Provenance Language* will
>> be controversial. I suggest we don't focus on this issue, and we  
>> acknolwedge the XG will
>> identify its name. I agree with your proposal of naming it XG, or  
>> FOO, NPL or something neutral.
>>
>> However, all the feedback I have heard from people involved in  
>> standardization activities,
>> is that we have to have a clear scope. By indicating OPM, we meant  
>> not just a name, but a precise list
>> of provenance concepts.
>> To avoid an ambiguity, I attach this list of terms.  I will argue  
>> that each term in this list has got
>> a fairly precise meaning. I also acknolwedge that we can revisit  
>> the terminology, if appropriate.
>>
>> Your proposal is however vague about its starting point. A quick  
>> grep over pml-p indicates:
>>
>>
>> grep 'owl:Class ' pml-provenance.owl  | wc
>>
>>       32      64    1466
>>
>> grep 'Property ' pml-provenance.owl  | grep -v onProperty | wc
>>
>>       52     104    3018
>>
>>
>>
>> Are you telling us the starting point is 80+ concepts?
>>
>> Your document also indicates " The Working Group has an aggressive  
>> timetable based on the premise that it builds on existing work once  
>> we have a clear understanding of the boundaries of the  new model.  
>> ". So, you are explicitly leaving the scoping activity to the XG .  
>> I feel this is not the right approach. It is up to us to scope this  
>> model, in the charter definition.  TBL's suggestion was to list the  
>> terms to take into consideration!
>>
>> A few further points.
>> a. While I am in favour of a graphical notation to illustrate  
>> provenance concepts, I think it is dangerous to
>> promise a full graphical language. Experience in OPM is that beyond  
>> nodes and edges, the rest is very textual,
>> and overall is not very visual beyond toy examples.  So, by all  
>> means, graphical illustration, but not a full
>> graphical language.
>>
>> b. I am strongly in favour of a definition of a language in plain  
>> English, independently of any representation language.
>> It's part of the "accessibility agenda". We should be able to  
>> describe the provenance language without referring to an OWL  
>> ontology.
>>
>> c.  I am keen to reach out to the non semantic web community. What  
>> about XML?
>>
>> Cheers,
>> Luc
>>
>> PS I can't believe SC has connectivity problems ;-)
>>
>>
>>
>>
>> On 17/11/2010 21:43, Paulo Pinheiro da Silva wrote:
>> Dear All,
>>
>> Deborah and I had a discussion on Monday.  This discussion was in  
>> follow up to the meeting that Jim, Deborah, and I had at RPI two  
>> weeks ago and that was reported by Jim through an email to the  
>> group. I did an editing pass in the original draft of the charter  
>> on Monday and Deborah took an edit pass on top of that late Monday.  
>> The updated version of the draft attached here is in review mode so  
>> that you can see the rationale behind our changes (and hopefully  
>> comment them further).
>>
>> We were hoping that Jim would be able to do an edit pass but his  
>> has been very busy at Supercomputing 2010 and probably with  
>> challenging connectivity. This means that the comments in this  
>> updated draft may not necessarily reflect Jim’s opinions.
>>
>> We understand that the document is going to spur some discussion  
>> but we would like to highlight some of the principles used during  
>> our conversation and that Deborah and I considered in our comments:
>>
>> We understand the following:
>> 1)    The provenance community needs to make progress soon if the  
>> community wants the outcomes of the proposed working group to have  
>> impact;
>> 2)    Provenance has many dimensions and that the group has a good  
>> understanding of some dimensions while our collective understanding  
>> of other dimensions is still very superficial – thus the working  
>> group will need to focus its efforts in the well-known parts of  
>> provenance – the so-called core concepts of provenance;
>> 3)    No single provenance language can claim to have  
>> representation mechanisms for all already-identified core  
>> provenance concepts and just core provenance concepts (i.e., no  
>> language is a minimal representation of core provenance concepts).  
>> However, we also understand that the provenance languages discussed  
>> in the Provenance Incubator Group have ways of representing most of  
>> these core concepts and that the proposed working group needs to  
>> leverage all such languages in order to make progress fast.
>>
>> Many thanks,
>> Paulo (Deborah and Jim)
>
> -- 
> Professor Luc Moreau
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          fax:   +44 23 8059 2865
> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm



The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Friday, 19 November 2010 15:55:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 19 November 2010 15:55:36 GMT