- From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
- Date: Thu, 10 May 2012 12:08:39 +0200
- To: "Cresswell, Stephen" <stephen.cresswell@tso.co.uk>
- Cc: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, Paolo Ncl <Paolo.Missier@ncl.ac.uk>, Davide Ceolin <davide.ceolin@gmail.com>, public-prov-comments@w3.org
- Message-ID: <CAExK0DcrB8siKF6OmFacsO=19+GAJCUz9nfiMkY2y10SUVU2cw@mail.gmail.com>
+1 2012/5/10 Cresswell, Stephen <stephen.cresswell@tso.co.uk> > > I ran into exactly this problem with legislation workflows (this was > with OPMV, but the problem occurs the same way in PROV), and after some > discussion with Jun, adopted a solution similar to Stian's option (d). > However, I don't think any of these workarounds are really satisfactory, > and am hugely in favour of PROV letting us describe activities at > different levels of granularity, and to state the relationship between > the activities across levels. We should be able to infer that an entity > generated by fine-grained activity can also be seen as having been > generated by its course-grained parent, rather than regarding that as > inconsistent. > > Apart from anything else, this sort of abstraction seems very helpful to > enable presentation of provenance information for human consumption in a > way which doesn't immediately overwhelm with detail. > > Stephen Cresswell > > > -----Original Message----- > > From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of > Stian > > Soiland-Reyes > > Sent: 10 May 2012 10:03 > > To: Paolo Ncl > > Cc: Davide Ceolin; public-prov-comments@w3.org > > Subject: Re: Activity composition > > > > I would also prefer a way to talk about activity composition and > > entity composition. > > > > With Daniel and Khalid I earlier tried to reconcile how we could use > > PROV to trace executions of nested scientific workflows. Let's say we > > have trace of the master workflow: > > > > wasGeneratedBy(value1, service1) > > used(service2, value1) > > wasGeneratedBy(value2, service2) > > used(service3, value1) > > used(service3, value2) > > wasGeneratedBy(value3, service3) > > > > > > service2 is a nested workflow, so while service1 and 3 are black > > boxes, we also know the details of the 'inner workings' of service2: > > > > wasStartedByActivity(service2a, service2) > > wasStartedByActivity(service2b, service2) > > used(value1, service2a) > > wasGeneratedBy(internalValue, service2a) > > used(value1, service2b) > > used(internalValue, service2b) > > > > The additional usage of value1 should be fine, but does not convey > > that it was given to service2b by service2. > > > > > > However we can't also state: > > > > wasGeneratedBy(value2, service2b) > > > > This is due to the functional constraint - this would make service2b > == > > service2 > > > > > > > > Some current workarounds: > > > > a) Two entities, alternateOf > > > > wasGeneratedBy(value2Inside, service2b) > > alternateOf(value2, value2Inside) > > wasDerivedFrom(value2, value2Inside) > > > > I believe this is the cleanest solution. Here the derivation can be > > thought of as "Moving value2 from inside to outside". I added the > > derivation so that the existential link from value2Inside to value2 is > > stated. > > > > To 'close' value2Inside we can add: > > > > wasInvalidatedBy(value2Inside, service2) > > > > > > > > b) Two entities, common specializationOf super-entity > > > > wasGeneratedBy(value2Outside, service2) > > wasGeneratedBy(value2Inside, service2b) > > specializationOf(value2Inside, value2) > > specializationOf(value2Outside, value2) > > wasDerivedFrom(value2Outside, value2Inside) > > > > The specialization here is basically 'Being inside' and 'Being > > outside' - think of it as the entity being in a door opening or coming > > out of a pipe. It would allow you to break down the 'transfer' as > > well: > > > > specializationOf(value2InTransit, value2) > > wasDerivedFrom(value2Outside, value2InTransit) > > > > "value2" here is the "actual", pure Platonian value, which does not > > easily have a wasGeneratedBy. For computer internals it can be thought > > of in terms of the abstract "The number 14" and "The bytes [20, 65, > > 66, 67]" - for real world examples it is "The concept of the thing". > > > > > > > > c) Use different accounts > > > > Each account can have different view of how value2 was created. > > However, if you have many activities, iterations etc, you will get a > > whole lot of accounts, and growing query and representational issues. > > Merging of these accounts will be more of a challenge, as you would > > have to use one of the other solutions suggested here. > > > > We also don't have a way to say "This account shows the inner workings > > of this activity". (or can we use PROV-AQ for that? > > :activity1 prov:hasProvenance <activity1-provenance> ) > > > > > > d) Drop outer wasGeneratedBy > > > > Removing > > wasGeneratedBy(value2, service2) > > > > But then you have not just opened the lid of service2, you have > > removed the casing. This approach will mean that service2 did not have > > anything to do with value2. > > > > > > > > If we are unhappy about these kind of approaches, then I think a good > > solution would be to have a construct for service composition. Then we > > can lax the wasGeneratedBy functional requirement, and say that the > > activities are the same, or one of the activities contain the other, > > which can be expressed as some kind of "partOf" relation stronger than > > wasStartedBy (without implying any tokens). > > > > This will add complications, for instance if you have (e=entity, > > a=activity, ->= generated/used): > > > > a1 -> e1 -> a2 -> e2 > > > > and you also decompose a1 to: > > > > e0 -> a1a -> ex -> a1b -> ey -> a1c -> e2 > > > > > > Now the question is where did e0 come from - was it by composition not > > also used by a1? Can e0 also 'be part of a1' - an embedded entity, > > like a part of the machine performing a1? > > > > (I think the opposite case is OK, if a1 consumes e0, but not seen > > inside. This could just have been used for coordination purposes by > > a1). > > > > > > > > However, I believe service composition is still easier to deal with > > than a set of slightly unrelated 'mirror' entities at different > > granularities, it's just a more detailed path of the same trace. > > > > I guess one question is if it is up to the asserter or the consumer of > > the provenance trace to determine the granularity. The beauty of this > > approach is that the consumer can mix and match, he can go in details > > for a2, but use the shortcut for a1. The asserter just says everything > > he knows, including the inner workings where it is known, and outer > > abstractions where they make sense. > > > > > > > > A different solution would be to have a stronger kind of alternateOf > > that includes the derivation and 'passing' nature rather than any kind > > of 'change' derivation. Thus we use two entities, but have a > > PROV-specific way to say 'This is the same thing, but as generated by > > a different activity at a different scale'. > > > > > > I believe that for almost all the examples we have, the activities > > could also be expressed at a more granular level. For instance, > > filling-petrol could be decomposed into opening-fuel-cap, > > using-petrol-pump, closing-fuel-cap, paying. > > > > Is our stance that such decomposition must always be done through a > > separate provenance account/graph? > > > > > > On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl <Paolo.Missier@ncl.ac.uk> > > wrote: > > > Davide > > > > > > I guess it depends on how you define "part of" in this setting. You > can > > specify that an activity has started another, which makes, informally, > the > > former a "parent" of the latter. You can use this to model forking, > for > > example. This is about the observed behavior of a process and is > within > > scope. But there is no way to express structural containment, or > > composition, because describing process models and structure (for > > instance, the structure of a program, a workflow, a script etc.) is > not > > within the PROV scope. > > > I hope others in the group concur with this interpretation > > > > > > Regards, > > > > > > P.Missier - paolo.missier@ncl.ac.uk > > > > > > On 7 May 2012, at 21:44, Davide Ceolin <davide.ceolin@gmail.com> > wrote: > > > > > >> Hello, > > >> > > >> I am a PhD student of the VU University Amsterdam, and I would have > a > > question about the composition of activities in PROV. I noticed that > it is > > not possible to explicitly state that an activity is actually part of > > another one. > > >> > > >> Suppose that a given entity is the result of an activity and, in > turn, > > this activity is part of a larger one. > > >> > > >> I can represent this scenario with two separate graphs stating that > > each of the two activities generated the entity, and from them (and > their > > execution times, etc.) I may infer that one is part of the other one, > but > > I can't explicitly state that. > > >> > > >> Is there a specific reason for such a limitation? > > >> > > >> Thanks, > > >> > > >> Davide > > >> > > >> Davide Ceolin MSc. > > >> PhD student > > >> The Network Institute > > >> VU University Amsterdam > > >> d.ceolin@vu.nl > > >> http://www.few.vu.nl/~dceolin/ > > >> > > >> > > >> > > > > > > > > > > > -- > > Stian Soiland-Reyes, myGrid team > > School of Computer Science > > The University of Manchester > > > > > > > ________________________________________________________________________ > > This e-mail has been scanned for all viruses by Star. The > > service is powered by MessageLabs. For more information on a proactive > > anti-virus service working around the clock, around the globe, visit: > > http://www.star.net.uk > > > ________________________________________________________________________ > > > *********************************************************************************************** > This email, including any attachment, is confidential and may be legally > privileged. If you are not the intended recipient or if you have received > this email in error, please inform the sender immediately by reply and > delete all copies from your system. Do not retain, copy, disclose, > distribute or otherwise use any of its contents. > > Whilst we have taken reasonable precautions to ensure that this email has > been swept for computer viruses, we cannot guarantee that this email does > not contain such material and we therefore advise you to carry out your own > virus checks. We do not accept liability for any damage or losses sustained > as a result of such material. > > Please note that incoming and outgoing email communications passing > through our IT systems may be monitored and/or intercepted by us solely to > determine whether the content is business related and compliant with > company standards. > > *********************************************************************************************** > > The Stationery Office Limited is registered in England No. 3049649 at 10 > Eastbourne Terrace, London, W2 6LG > > > >
Received on Thursday, 10 May 2012 10:09:19 UTC