- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Thu, 10 May 2012 10:03:16 +0100
- To: Paolo Ncl <Paolo.Missier@ncl.ac.uk>
- Cc: Davide Ceolin <davide.ceolin@gmail.com>, "public-prov-comments@w3.org" <public-prov-comments@w3.org>
I would also prefer a way to talk about activity composition and entity composition. With Daniel and Khalid I earlier tried to reconcile how we could use PROV to trace executions of nested scientific workflows. Let's say we have trace of the master workflow: wasGeneratedBy(value1, service1) used(service2, value1) wasGeneratedBy(value2, service2) used(service3, value1) used(service3, value2) wasGeneratedBy(value3, service3) service2 is a nested workflow, so while service1 and 3 are black boxes, we also know the details of the 'inner workings' of service2: wasStartedByActivity(service2a, service2) wasStartedByActivity(service2b, service2) used(value1, service2a) wasGeneratedBy(internalValue, service2a) used(value1, service2b) used(internalValue, service2b) The additional usage of value1 should be fine, but does not convey that it was given to service2b by service2. However we can't also state: wasGeneratedBy(value2, service2b) This is due to the functional constraint - this would make service2b == service2 Some current workarounds: a) Two entities, alternateOf wasGeneratedBy(value2Inside, service2b) alternateOf(value2, value2Inside) wasDerivedFrom(value2, value2Inside) I believe this is the cleanest solution. Here the derivation can be thought of as "Moving value2 from inside to outside". I added the derivation so that the existential link from value2Inside to value2 is stated. To 'close' value2Inside we can add: wasInvalidatedBy(value2Inside, service2) b) Two entities, common specializationOf super-entity wasGeneratedBy(value2Outside, service2) wasGeneratedBy(value2Inside, service2b) specializationOf(value2Inside, value2) specializationOf(value2Outside, value2) wasDerivedFrom(value2Outside, value2Inside) The specialization here is basically 'Being inside' and 'Being outside' - think of it as the entity being in a door opening or coming out of a pipe. It would allow you to break down the 'transfer' as well: specializationOf(value2InTransit, value2) wasDerivedFrom(value2Outside, value2InTransit) "value2" here is the "actual", pure Platonian value, which does not easily have a wasGeneratedBy. For computer internals it can be thought of in terms of the abstract "The number 14" and "The bytes [20, 65, 66, 67]" - for real world examples it is "The concept of the thing". c) Use different accounts Each account can have different view of how value2 was created. However, if you have many activities, iterations etc, you will get a whole lot of accounts, and growing query and representational issues. Merging of these accounts will be more of a challenge, as you would have to use one of the other solutions suggested here. We also don't have a way to say "This account shows the inner workings of this activity". (or can we use PROV-AQ for that? :activity1 prov:hasProvenance <activity1-provenance> ) d) Drop outer wasGeneratedBy Removing wasGeneratedBy(value2, service2) But then you have not just opened the lid of service2, you have removed the casing. This approach will mean that service2 did not have anything to do with value2. If we are unhappy about these kind of approaches, then I think a good solution would be to have a construct for service composition. Then we can lax the wasGeneratedBy functional requirement, and say that the activities are the same, or one of the activities contain the other, which can be expressed as some kind of "partOf" relation stronger than wasStartedBy (without implying any tokens). This will add complications, for instance if you have (e=entity, a=activity, ->= generated/used): a1 -> e1 -> a2 -> e2 and you also decompose a1 to: e0 -> a1a -> ex -> a1b -> ey -> a1c -> e2 Now the question is where did e0 come from - was it by composition not also used by a1? Can e0 also 'be part of a1' - an embedded entity, like a part of the machine performing a1? (I think the opposite case is OK, if a1 consumes e0, but not seen inside. This could just have been used for coordination purposes by a1). However, I believe service composition is still easier to deal with than a set of slightly unrelated 'mirror' entities at different granularities, it's just a more detailed path of the same trace. I guess one question is if it is up to the asserter or the consumer of the provenance trace to determine the granularity. The beauty of this approach is that the consumer can mix and match, he can go in details for a2, but use the shortcut for a1. The asserter just says everything he knows, including the inner workings where it is known, and outer abstractions where they make sense. A different solution would be to have a stronger kind of alternateOf that includes the derivation and 'passing' nature rather than any kind of 'change' derivation. Thus we use two entities, but have a PROV-specific way to say 'This is the same thing, but as generated by a different activity at a different scale'. I believe that for almost all the examples we have, the activities could also be expressed at a more granular level. For instance, filling-petrol could be decomposed into opening-fuel-cap, using-petrol-pump, closing-fuel-cap, paying. Is our stance that such decomposition must always be done through a separate provenance account/graph? On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl <Paolo.Missier@ncl.ac.uk> wrote: > Davide > > I guess it depends on how you define "part of" in this setting. You can specify that an activity has started another, which makes, informally, the former a "parent" of the latter. You can use this to model forking, for example. This is about the observed behavior of a process and is within scope. But there is no way to express structural containment, or composition, because describing process models and structure (for instance, the structure of a program, a workflow, a script etc.) is not within the PROV scope. > I hope others in the group concur with this interpretation > > Regards, > > P.Missier - paolo.missier@ncl.ac.uk > > On 7 May 2012, at 21:44, Davide Ceolin <davide.ceolin@gmail.com> wrote: > >> Hello, >> >> I am a PhD student of the VU University Amsterdam, and I would have a question about the composition of activities in PROV. I noticed that it is not possible to explicitly state that an activity is actually part of another one. >> >> Suppose that a given entity is the result of an activity and, in turn, this activity is part of a larger one. >> >> I can represent this scenario with two separate graphs stating that each of the two activities generated the entity, and from them (and their execution times, etc.) I may infer that one is part of the other one, but I can't explicitly state that. >> >> Is there a specific reason for such a limitation? >> >> Thanks, >> >> Davide >> >> Davide Ceolin MSc. >> PhD student >> The Network Institute >> VU University Amsterdam >> d.ceolin@vu.nl >> http://www.few.vu.nl/~dceolin/ >> >> >> > -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester
Received on Thursday, 10 May 2012 09:04:12 UTC