- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Thu, 10 May 2012 15:58:49 +0100
- To: Jun Zhao <jun.zhao@zoo.ox.ac.uk>
- CC: public-prov-comments@w3.org
Hi Jun, James, Stephan, all, We have reopened a discussion which I believe had reached a satisfactory conclusion a long time ago. 1. prov-dm does not require wasGeneratedBy to be functional. 2. Some constraints exist and must be satisfied, in particular, that a given entity cannot be generated at two different times. 3. There was a consensus of not introducing into PROV notions of activity nesting. 4. prov-dm WD3 made the functional nature of wasGeneratedBy an option (then called struturally well formed provenance) http://www.w3.org/TR/2012/WD-prov-dm-20120202/#structural-constraints which allowed further inference. This has been somewhat lost in the latest incarnation of prov-constraints. Luc On 05/10/2012 03:39 PM, Jun Zhao wrote: > HI Stephen and all, > > Yes, I remember this problem vividly! > > I think we should consider the requirement seriously and provide an > effective way to support the community. James' proposal seems sensible > to me! > > -- Jun > > > On 10/05/2012 11:08, Daniel Garijo wrote: >> +1 >> >> 2012/5/10 Cresswell, Stephen<stephen.cresswell@tso.co.uk> >> >>> >>> I ran into exactly this problem with legislation workflows (this was >>> with OPMV, but the problem occurs the same way in PROV), and after some >>> discussion with Jun, adopted a solution similar to Stian's option (d). >>> However, I don't think any of these workarounds are really >>> satisfactory, >>> and am hugely in favour of PROV letting us describe activities at >>> different levels of granularity, and to state the relationship between >>> the activities across levels. We should be able to infer that an >>> entity >>> generated by fine-grained activity can also be seen as having been >>> generated by its course-grained parent, rather than regarding that as >>> inconsistent. >>> >>> Apart from anything else, this sort of abstraction seems very >>> helpful to >>> enable presentation of provenance information for human consumption >>> in a >>> way which doesn't immediately overwhelm with detail. >>> >>> Stephen Cresswell >>> >>>> -----Original Message----- >>>> From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of >>> Stian >>>> Soiland-Reyes >>>> Sent: 10 May 2012 10:03 >>>> To: Paolo Ncl >>>> Cc: Davide Ceolin; public-prov-comments@w3.org >>>> Subject: Re: Activity composition >>>> >>>> I would also prefer a way to talk about activity composition and >>>> entity composition. >>>> >>>> With Daniel and Khalid I earlier tried to reconcile how we could use >>>> PROV to trace executions of nested scientific workflows. Let's say we >>>> have trace of the master workflow: >>>> >>>> wasGeneratedBy(value1, service1) >>>> used(service2, value1) >>>> wasGeneratedBy(value2, service2) >>>> used(service3, value1) >>>> used(service3, value2) >>>> wasGeneratedBy(value3, service3) >>>> >>>> >>>> service2 is a nested workflow, so while service1 and 3 are black >>>> boxes, we also know the details of the 'inner workings' of service2: >>>> >>>> wasStartedByActivity(service2a, service2) >>>> wasStartedByActivity(service2b, service2) >>>> used(value1, service2a) >>>> wasGeneratedBy(internalValue, service2a) >>>> used(value1, service2b) >>>> used(internalValue, service2b) >>>> >>>> The additional usage of value1 should be fine, but does not convey >>>> that it was given to service2b by service2. >>>> >>>> >>>> However we can't also state: >>>> >>>> wasGeneratedBy(value2, service2b) >>>> >>>> This is due to the functional constraint - this would make service2b >>> == >>>> service2 >>>> >>>> >>>> >>>> Some current workarounds: >>>> >>>> a) Two entities, alternateOf >>>> >>>> wasGeneratedBy(value2Inside, service2b) >>>> alternateOf(value2, value2Inside) >>>> wasDerivedFrom(value2, value2Inside) >>>> >>>> I believe this is the cleanest solution. Here the derivation can be >>>> thought of as "Moving value2 from inside to outside". I added the >>>> derivation so that the existential link from value2Inside to value2 is >>>> stated. >>>> >>>> To 'close' value2Inside we can add: >>>> >>>> wasInvalidatedBy(value2Inside, service2) >>>> >>>> >>>> >>>> b) Two entities, common specializationOf super-entity >>>> >>>> wasGeneratedBy(value2Outside, service2) >>>> wasGeneratedBy(value2Inside, service2b) >>>> specializationOf(value2Inside, value2) >>>> specializationOf(value2Outside, value2) >>>> wasDerivedFrom(value2Outside, value2Inside) >>>> >>>> The specialization here is basically 'Being inside' and 'Being >>>> outside' - think of it as the entity being in a door opening or coming >>>> out of a pipe. It would allow you to break down the 'transfer' as >>>> well: >>>> >>>> specializationOf(value2InTransit, value2) >>>> wasDerivedFrom(value2Outside, value2InTransit) >>>> >>>> "value2" here is the "actual", pure Platonian value, which does not >>>> easily have a wasGeneratedBy. For computer internals it can be thought >>>> of in terms of the abstract "The number 14" and "The bytes [20, 65, >>>> 66, 67]" - for real world examples it is "The concept of the thing". >>>> >>>> >>>> >>>> c) Use different accounts >>>> >>>> Each account can have different view of how value2 was created. >>>> However, if you have many activities, iterations etc, you will get a >>>> whole lot of accounts, and growing query and representational issues. >>>> Merging of these accounts will be more of a challenge, as you would >>>> have to use one of the other solutions suggested here. >>>> >>>> We also don't have a way to say "This account shows the inner workings >>>> of this activity". (or can we use PROV-AQ for that? >>>> :activity1 prov:hasProvenance<activity1-provenance> ) >>>> >>>> >>>> d) Drop outer wasGeneratedBy >>>> >>>> Removing >>>> wasGeneratedBy(value2, service2) >>>> >>>> But then you have not just opened the lid of service2, you have >>>> removed the casing. This approach will mean that service2 did not have >>>> anything to do with value2. >>>> >>>> >>>> >>>> If we are unhappy about these kind of approaches, then I think a good >>>> solution would be to have a construct for service composition. Then we >>>> can lax the wasGeneratedBy functional requirement, and say that the >>>> activities are the same, or one of the activities contain the other, >>>> which can be expressed as some kind of "partOf" relation stronger than >>>> wasStartedBy (without implying any tokens). >>>> >>>> This will add complications, for instance if you have (e=entity, >>>> a=activity, ->= generated/used): >>>> >>>> a1 -> e1 -> a2 -> e2 >>>> >>>> and you also decompose a1 to: >>>> >>>> e0 -> a1a -> ex -> a1b -> ey -> a1c -> e2 >>>> >>>> >>>> Now the question is where did e0 come from - was it by composition not >>>> also used by a1? Can e0 also 'be part of a1' - an embedded entity, >>>> like a part of the machine performing a1? >>>> >>>> (I think the opposite case is OK, if a1 consumes e0, but not seen >>>> inside. This could just have been used for coordination purposes by >>>> a1). >>>> >>>> >>>> >>>> However, I believe service composition is still easier to deal with >>>> than a set of slightly unrelated 'mirror' entities at different >>>> granularities, it's just a more detailed path of the same trace. >>>> >>>> I guess one question is if it is up to the asserter or the consumer of >>>> the provenance trace to determine the granularity. The beauty of this >>>> approach is that the consumer can mix and match, he can go in details >>>> for a2, but use the shortcut for a1. The asserter just says everything >>>> he knows, including the inner workings where it is known, and outer >>>> abstractions where they make sense. >>>> >>>> >>>> >>>> A different solution would be to have a stronger kind of alternateOf >>>> that includes the derivation and 'passing' nature rather than any kind >>>> of 'change' derivation. Thus we use two entities, but have a >>>> PROV-specific way to say 'This is the same thing, but as generated by >>>> a different activity at a different scale'. >>>> >>>> >>>> I believe that for almost all the examples we have, the activities >>>> could also be expressed at a more granular level. For instance, >>>> filling-petrol could be decomposed into opening-fuel-cap, >>>> using-petrol-pump, closing-fuel-cap, paying. >>>> >>>> Is our stance that such decomposition must always be done through a >>>> separate provenance account/graph? >>>> >>>> >>>> On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl<Paolo.Missier@ncl.ac.uk> >>>> wrote: >>>>> Davide >>>>> >>>>> I guess it depends on how you define "part of" in this setting. You >>> can >>>> specify that an activity has started another, which makes, informally, >>> the >>>> former a "parent" of the latter. You can use this to model forking, >>> for >>>> example. This is about the observed behavior of a process and is >>> within >>>> scope. But there is no way to express structural containment, or >>>> composition, because describing process models and structure (for >>>> instance, the structure of a program, a workflow, a script etc.) is >>> not >>>> within the PROV scope. >>>>> I hope others in the group concur with this interpretation >>>>> >>>>> Regards, >>>>> >>>>> P.Missier - paolo.missier@ncl.ac.uk >>>>> >>>>> On 7 May 2012, at 21:44, Davide Ceolin<davide.ceolin@gmail.com> >>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I am a PhD student of the VU University Amsterdam, and I would have >>> a >>>> question about the composition of activities in PROV. I noticed that >>> it is >>>> not possible to explicitly state that an activity is actually part of >>>> another one. >>>>>> >>>>>> Suppose that a given entity is the result of an activity and, in >>> turn, >>>> this activity is part of a larger one. >>>>>> >>>>>> I can represent this scenario with two separate graphs stating that >>>> each of the two activities generated the entity, and from them (and >>> their >>>> execution times, etc.) I may infer that one is part of the other one, >>> but >>>> I can't explicitly state that. >>>>>> >>>>>> Is there a specific reason for such a limitation? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Davide >>>>>> >>>>>> Davide Ceolin MSc. >>>>>> PhD student >>>>>> The Network Institute >>>>>> VU University Amsterdam >>>>>> d.ceolin@vu.nl >>>>>> http://www.few.vu.nl/~dceolin/ >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Stian Soiland-Reyes, myGrid team >>>> School of Computer Science >>>> The University of Manchester >>>> >>>> >>>> >>> ________________________________________________________________________ >>> >>>> This e-mail has been scanned for all viruses by Star. The >>>> service is powered by MessageLabs. For more information on a proactive >>>> anti-virus service working around the clock, around the globe, visit: >>>> http://www.star.net.uk >>>> >>> ________________________________________________________________________ >>> >>> >>> >>> *********************************************************************************************** >>> >>> This email, including any attachment, is confidential and may be >>> legally >>> privileged. If you are not the intended recipient or if you have >>> received >>> this email in error, please inform the sender immediately by reply and >>> delete all copies from your system. Do not retain, copy, disclose, >>> distribute or otherwise use any of its contents. >>> >>> Whilst we have taken reasonable precautions to ensure that this >>> email has >>> been swept for computer viruses, we cannot guarantee that this email >>> does >>> not contain such material and we therefore advise you to carry out >>> your own >>> virus checks. We do not accept liability for any damage or losses >>> sustained >>> as a result of such material. >>> >>> Please note that incoming and outgoing email communications passing >>> through our IT systems may be monitored and/or intercepted by us >>> solely to >>> determine whether the content is business related and compliant with >>> company standards. >>> >>> *********************************************************************************************** >>> >>> >>> The Stationery Office Limited is registered in England No. 3049649 >>> at 10 >>> Eastbourne Terrace, London, W2 6LG >>> >>> >>> >>> >> > > -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Thursday, 10 May 2012 14:59:27 UTC