- From: Jun Zhao <jun.zhao@zoo.ox.ac.uk>
- Date: Thu, 10 May 2012 15:39:10 +0100
- To: public-prov-comments@w3.org
HI Stephen and all, Yes, I remember this problem vividly! I think we should consider the requirement seriously and provide an effective way to support the community. James' proposal seems sensible to me! -- Jun On 10/05/2012 11:08, Daniel Garijo wrote: > +1 > > 2012/5/10 Cresswell, Stephen<stephen.cresswell@tso.co.uk> > >> >> I ran into exactly this problem with legislation workflows (this was >> with OPMV, but the problem occurs the same way in PROV), and after some >> discussion with Jun, adopted a solution similar to Stian's option (d). >> However, I don't think any of these workarounds are really satisfactory, >> and am hugely in favour of PROV letting us describe activities at >> different levels of granularity, and to state the relationship between >> the activities across levels. We should be able to infer that an entity >> generated by fine-grained activity can also be seen as having been >> generated by its course-grained parent, rather than regarding that as >> inconsistent. >> >> Apart from anything else, this sort of abstraction seems very helpful to >> enable presentation of provenance information for human consumption in a >> way which doesn't immediately overwhelm with detail. >> >> Stephen Cresswell >> >>> -----Original Message----- >>> From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of >> Stian >>> Soiland-Reyes >>> Sent: 10 May 2012 10:03 >>> To: Paolo Ncl >>> Cc: Davide Ceolin; public-prov-comments@w3.org >>> Subject: Re: Activity composition >>> >>> I would also prefer a way to talk about activity composition and >>> entity composition. >>> >>> With Daniel and Khalid I earlier tried to reconcile how we could use >>> PROV to trace executions of nested scientific workflows. Let's say we >>> have trace of the master workflow: >>> >>> wasGeneratedBy(value1, service1) >>> used(service2, value1) >>> wasGeneratedBy(value2, service2) >>> used(service3, value1) >>> used(service3, value2) >>> wasGeneratedBy(value3, service3) >>> >>> >>> service2 is a nested workflow, so while service1 and 3 are black >>> boxes, we also know the details of the 'inner workings' of service2: >>> >>> wasStartedByActivity(service2a, service2) >>> wasStartedByActivity(service2b, service2) >>> used(value1, service2a) >>> wasGeneratedBy(internalValue, service2a) >>> used(value1, service2b) >>> used(internalValue, service2b) >>> >>> The additional usage of value1 should be fine, but does not convey >>> that it was given to service2b by service2. >>> >>> >>> However we can't also state: >>> >>> wasGeneratedBy(value2, service2b) >>> >>> This is due to the functional constraint - this would make service2b >> == >>> service2 >>> >>> >>> >>> Some current workarounds: >>> >>> a) Two entities, alternateOf >>> >>> wasGeneratedBy(value2Inside, service2b) >>> alternateOf(value2, value2Inside) >>> wasDerivedFrom(value2, value2Inside) >>> >>> I believe this is the cleanest solution. Here the derivation can be >>> thought of as "Moving value2 from inside to outside". I added the >>> derivation so that the existential link from value2Inside to value2 is >>> stated. >>> >>> To 'close' value2Inside we can add: >>> >>> wasInvalidatedBy(value2Inside, service2) >>> >>> >>> >>> b) Two entities, common specializationOf super-entity >>> >>> wasGeneratedBy(value2Outside, service2) >>> wasGeneratedBy(value2Inside, service2b) >>> specializationOf(value2Inside, value2) >>> specializationOf(value2Outside, value2) >>> wasDerivedFrom(value2Outside, value2Inside) >>> >>> The specialization here is basically 'Being inside' and 'Being >>> outside' - think of it as the entity being in a door opening or coming >>> out of a pipe. It would allow you to break down the 'transfer' as >>> well: >>> >>> specializationOf(value2InTransit, value2) >>> wasDerivedFrom(value2Outside, value2InTransit) >>> >>> "value2" here is the "actual", pure Platonian value, which does not >>> easily have a wasGeneratedBy. For computer internals it can be thought >>> of in terms of the abstract "The number 14" and "The bytes [20, 65, >>> 66, 67]" - for real world examples it is "The concept of the thing". >>> >>> >>> >>> c) Use different accounts >>> >>> Each account can have different view of how value2 was created. >>> However, if you have many activities, iterations etc, you will get a >>> whole lot of accounts, and growing query and representational issues. >>> Merging of these accounts will be more of a challenge, as you would >>> have to use one of the other solutions suggested here. >>> >>> We also don't have a way to say "This account shows the inner workings >>> of this activity". (or can we use PROV-AQ for that? >>> :activity1 prov:hasProvenance<activity1-provenance> ) >>> >>> >>> d) Drop outer wasGeneratedBy >>> >>> Removing >>> wasGeneratedBy(value2, service2) >>> >>> But then you have not just opened the lid of service2, you have >>> removed the casing. This approach will mean that service2 did not have >>> anything to do with value2. >>> >>> >>> >>> If we are unhappy about these kind of approaches, then I think a good >>> solution would be to have a construct for service composition. Then we >>> can lax the wasGeneratedBy functional requirement, and say that the >>> activities are the same, or one of the activities contain the other, >>> which can be expressed as some kind of "partOf" relation stronger than >>> wasStartedBy (without implying any tokens). >>> >>> This will add complications, for instance if you have (e=entity, >>> a=activity, ->= generated/used): >>> >>> a1 -> e1 -> a2 -> e2 >>> >>> and you also decompose a1 to: >>> >>> e0 -> a1a -> ex -> a1b -> ey -> a1c -> e2 >>> >>> >>> Now the question is where did e0 come from - was it by composition not >>> also used by a1? Can e0 also 'be part of a1' - an embedded entity, >>> like a part of the machine performing a1? >>> >>> (I think the opposite case is OK, if a1 consumes e0, but not seen >>> inside. This could just have been used for coordination purposes by >>> a1). >>> >>> >>> >>> However, I believe service composition is still easier to deal with >>> than a set of slightly unrelated 'mirror' entities at different >>> granularities, it's just a more detailed path of the same trace. >>> >>> I guess one question is if it is up to the asserter or the consumer of >>> the provenance trace to determine the granularity. The beauty of this >>> approach is that the consumer can mix and match, he can go in details >>> for a2, but use the shortcut for a1. The asserter just says everything >>> he knows, including the inner workings where it is known, and outer >>> abstractions where they make sense. >>> >>> >>> >>> A different solution would be to have a stronger kind of alternateOf >>> that includes the derivation and 'passing' nature rather than any kind >>> of 'change' derivation. Thus we use two entities, but have a >>> PROV-specific way to say 'This is the same thing, but as generated by >>> a different activity at a different scale'. >>> >>> >>> I believe that for almost all the examples we have, the activities >>> could also be expressed at a more granular level. For instance, >>> filling-petrol could be decomposed into opening-fuel-cap, >>> using-petrol-pump, closing-fuel-cap, paying. >>> >>> Is our stance that such decomposition must always be done through a >>> separate provenance account/graph? >>> >>> >>> On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl<Paolo.Missier@ncl.ac.uk> >>> wrote: >>>> Davide >>>> >>>> I guess it depends on how you define "part of" in this setting. You >> can >>> specify that an activity has started another, which makes, informally, >> the >>> former a "parent" of the latter. You can use this to model forking, >> for >>> example. This is about the observed behavior of a process and is >> within >>> scope. But there is no way to express structural containment, or >>> composition, because describing process models and structure (for >>> instance, the structure of a program, a workflow, a script etc.) is >> not >>> within the PROV scope. >>>> I hope others in the group concur with this interpretation >>>> >>>> Regards, >>>> >>>> P.Missier - paolo.missier@ncl.ac.uk >>>> >>>> On 7 May 2012, at 21:44, Davide Ceolin<davide.ceolin@gmail.com> >> wrote: >>>> >>>>> Hello, >>>>> >>>>> I am a PhD student of the VU University Amsterdam, and I would have >> a >>> question about the composition of activities in PROV. I noticed that >> it is >>> not possible to explicitly state that an activity is actually part of >>> another one. >>>>> >>>>> Suppose that a given entity is the result of an activity and, in >> turn, >>> this activity is part of a larger one. >>>>> >>>>> I can represent this scenario with two separate graphs stating that >>> each of the two activities generated the entity, and from them (and >> their >>> execution times, etc.) I may infer that one is part of the other one, >> but >>> I can't explicitly state that. >>>>> >>>>> Is there a specific reason for such a limitation? >>>>> >>>>> Thanks, >>>>> >>>>> Davide >>>>> >>>>> Davide Ceolin MSc. >>>>> PhD student >>>>> The Network Institute >>>>> VU University Amsterdam >>>>> d.ceolin@vu.nl >>>>> http://www.few.vu.nl/~dceolin/ >>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> Stian Soiland-Reyes, myGrid team >>> School of Computer Science >>> The University of Manchester >>> >>> >>> >> ________________________________________________________________________ >>> This e-mail has been scanned for all viruses by Star. The >>> service is powered by MessageLabs. For more information on a proactive >>> anti-virus service working around the clock, around the globe, visit: >>> http://www.star.net.uk >>> >> ________________________________________________________________________ >> >> >> *********************************************************************************************** >> This email, including any attachment, is confidential and may be legally >> privileged. If you are not the intended recipient or if you have received >> this email in error, please inform the sender immediately by reply and >> delete all copies from your system. Do not retain, copy, disclose, >> distribute or otherwise use any of its contents. >> >> Whilst we have taken reasonable precautions to ensure that this email has >> been swept for computer viruses, we cannot guarantee that this email does >> not contain such material and we therefore advise you to carry out your own >> virus checks. We do not accept liability for any damage or losses sustained >> as a result of such material. >> >> Please note that incoming and outgoing email communications passing >> through our IT systems may be monitored and/or intercepted by us solely to >> determine whether the content is business related and compliant with >> company standards. >> >> *********************************************************************************************** >> >> The Stationery Office Limited is registered in England No. 3049649 at 10 >> Eastbourne Terrace, London, W2 6LG >> >> >> >> >
Received on Thursday, 10 May 2012 14:39:41 UTC