Re: Activity composition

Hi Jun, James, Stephan, all,


We have reopened a discussion which I believe had reached a
satisfactory conclusion a long time ago.

1. prov-dm does not require wasGeneratedBy to be functional.

2. Some constraints exist and must be satisfied, in particular,
    that a given entity cannot be generated at two different times.

3. There was a consensus of not introducing into PROV notions of 
activity nesting.

4. prov-dm WD3 made the functional nature of wasGeneratedBy an option 
(then called struturally well formed provenance)
    http://www.w3.org/TR/2012/WD-prov-dm-20120202/#structural-constraints
    which allowed further inference.
    This has been somewhat lost in the latest incarnation of 
prov-constraints.

Luc



On 05/10/2012 03:39 PM, Jun Zhao wrote:
> HI Stephen and all,
>
> Yes, I remember this problem vividly!
>
> I think we should consider the requirement seriously and provide an 
> effective way to support the community. James' proposal seems sensible 
> to me!
>
> -- Jun
>
>
> On 10/05/2012 11:08, Daniel Garijo wrote:
>> +1
>>
>> 2012/5/10 Cresswell, Stephen<stephen.cresswell@tso.co.uk>
>>
>>>
>>> I ran into exactly this problem with legislation workflows (this was
>>> with OPMV, but the problem occurs the same way in PROV), and after some
>>> discussion with Jun, adopted a solution similar to Stian's option (d).
>>> However, I don't think any of these workarounds are really 
>>> satisfactory,
>>> and am hugely in favour of PROV letting us describe activities at
>>> different levels of granularity, and to state the relationship between
>>> the activities across levels.  We should be able to infer that an 
>>> entity
>>> generated by fine-grained activity can also be seen as having been
>>> generated by its course-grained parent, rather than regarding that as
>>> inconsistent.
>>>
>>> Apart from anything else, this sort of abstraction seems very 
>>> helpful to
>>> enable presentation of provenance information for human consumption 
>>> in a
>>> way which doesn't immediately overwhelm with detail.
>>>
>>> Stephen Cresswell
>>>
>>>> -----Original Message-----
>>>> From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of
>>> Stian
>>>> Soiland-Reyes
>>>> Sent: 10 May 2012 10:03
>>>> To: Paolo Ncl
>>>> Cc: Davide Ceolin; public-prov-comments@w3.org
>>>> Subject: Re: Activity composition
>>>>
>>>> I would also prefer a way to talk about activity composition and
>>>> entity composition.
>>>>
>>>> With Daniel and Khalid I earlier tried to reconcile how we could use
>>>> PROV to trace executions of nested scientific workflows. Let's say we
>>>> have trace of the master workflow:
>>>>
>>>> wasGeneratedBy(value1, service1)
>>>> used(service2, value1)
>>>> wasGeneratedBy(value2, service2)
>>>> used(service3, value1)
>>>> used(service3, value2)
>>>> wasGeneratedBy(value3, service3)
>>>>
>>>>
>>>> service2 is a nested workflow, so while service1 and 3 are black
>>>> boxes, we also know the details of the 'inner workings' of service2:
>>>>
>>>> wasStartedByActivity(service2a, service2)
>>>> wasStartedByActivity(service2b, service2)
>>>> used(value1, service2a)
>>>> wasGeneratedBy(internalValue, service2a)
>>>> used(value1, service2b)
>>>> used(internalValue, service2b)
>>>>
>>>> The additional usage of value1 should be fine, but does not convey
>>>> that it was given to service2b by service2.
>>>>
>>>>
>>>> However we can't also state:
>>>>
>>>>    wasGeneratedBy(value2, service2b)
>>>>
>>>> This is due to the functional constraint - this would make service2b
>>> ==
>>>> service2
>>>>
>>>>
>>>>
>>>> Some current workarounds:
>>>>
>>>> a) Two entities, alternateOf
>>>>
>>>> wasGeneratedBy(value2Inside, service2b)
>>>> alternateOf(value2, value2Inside)
>>>> wasDerivedFrom(value2, value2Inside)
>>>>
>>>> I believe this is the cleanest solution. Here the derivation can be
>>>> thought of as "Moving value2 from inside to outside". I added the
>>>> derivation so that the existential link from value2Inside to value2 is
>>>> stated.
>>>>
>>>> To 'close' value2Inside we can add:
>>>>
>>>> wasInvalidatedBy(value2Inside, service2)
>>>>
>>>>
>>>>
>>>> b) Two entities, common specializationOf  super-entity
>>>>
>>>> wasGeneratedBy(value2Outside, service2)
>>>> wasGeneratedBy(value2Inside, service2b)
>>>> specializationOf(value2Inside, value2)
>>>> specializationOf(value2Outside, value2)
>>>> wasDerivedFrom(value2Outside, value2Inside)
>>>>
>>>> The specialization here is basically 'Being inside' and 'Being
>>>> outside' - think of it as the entity being in a door opening or coming
>>>> out of a pipe. It would allow you to break down the 'transfer' as
>>>> well:
>>>>
>>>> specializationOf(value2InTransit, value2)
>>>> wasDerivedFrom(value2Outside, value2InTransit)
>>>>
>>>> "value2" here is the "actual", pure Platonian value, which does not
>>>> easily have a wasGeneratedBy. For computer internals it can be thought
>>>> of in terms of the abstract "The number 14" and "The bytes [20, 65,
>>>> 66, 67]" - for real world examples it is "The concept of the thing".
>>>>
>>>>
>>>>
>>>> c) Use different accounts
>>>>
>>>> Each account can have different view of how value2 was created.
>>>> However, if you have many activities, iterations etc, you will get a
>>>> whole lot of accounts, and growing query and representational issues.
>>>> Merging of these accounts will be more of a challenge, as you would
>>>> have to use one of the other solutions suggested here.
>>>>
>>>> We also don't have a way to say "This account shows the inner workings
>>>> of this activity".  (or can we use PROV-AQ for that?
>>>>    :activity1 prov:hasProvenance<activity1-provenance>     )
>>>>
>>>>
>>>> d) Drop outer wasGeneratedBy
>>>>
>>>> Removing
>>>>    wasGeneratedBy(value2, service2)
>>>>
>>>> But then you have not just opened the lid of service2, you have
>>>> removed the casing. This approach will mean that service2 did not have
>>>> anything to do with value2.
>>>>
>>>>
>>>>
>>>> If we are unhappy about these kind of approaches, then I think a good
>>>> solution would be to have a construct for service composition. Then we
>>>> can lax the wasGeneratedBy functional requirement, and say that the
>>>> activities are the same, or one of the activities contain the other,
>>>> which can be expressed as some kind of "partOf" relation stronger than
>>>> wasStartedBy (without implying any tokens).
>>>>
>>>> This will add complications, for instance if you have (e=entity,
>>>> a=activity, ->= generated/used):
>>>>
>>>> a1 ->  e1 ->  a2 ->  e2
>>>>
>>>> and you also decompose a1 to:
>>>>
>>>> e0 ->  a1a ->  ex ->  a1b ->  ey ->  a1c ->  e2
>>>>
>>>>
>>>> Now the question is where did e0 come from - was it by composition not
>>>> also used by a1? Can e0 also 'be part of a1' - an embedded entity,
>>>> like a part of the machine performing a1?
>>>>
>>>> (I think the opposite case is OK, if a1 consumes e0, but not seen
>>>> inside. This could just have been used for coordination purposes by
>>>> a1).
>>>>
>>>>
>>>>
>>>> However, I believe service composition is still easier to deal with
>>>> than a set of slightly unrelated 'mirror' entities at different
>>>> granularities, it's just a more detailed path of the same trace.
>>>>
>>>> I guess one question is if it is up to the asserter or the consumer of
>>>> the provenance trace to determine the granularity. The beauty of this
>>>> approach is that the consumer can mix and match, he can go in details
>>>> for a2, but use the shortcut for a1. The asserter just says everything
>>>> he knows, including the inner workings where it is known, and outer
>>>> abstractions where they make sense.
>>>>
>>>>
>>>>
>>>> A different solution would be to have a stronger kind of alternateOf
>>>> that includes the derivation and 'passing' nature rather than any kind
>>>> of 'change' derivation. Thus we use two entities, but have a
>>>> PROV-specific way to say 'This is the same thing, but as generated by
>>>> a different activity at a different scale'.
>>>>
>>>>
>>>> I believe that for almost all the examples we have, the activities
>>>> could also be expressed at a more granular level. For instance,
>>>> filling-petrol could be decomposed into opening-fuel-cap,
>>>> using-petrol-pump, closing-fuel-cap, paying.
>>>>
>>>> Is our stance that such decomposition must always be done through a
>>>> separate provenance account/graph?
>>>>
>>>>
>>>> On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl<Paolo.Missier@ncl.ac.uk>
>>>> wrote:
>>>>> Davide
>>>>>
>>>>> I guess it depends on how you define "part of" in this setting. You
>>> can
>>>> specify that an activity has started another, which makes, informally,
>>> the
>>>> former a "parent" of the latter. You can use this to model forking,
>>> for
>>>> example. This is about the observed behavior of a process and is
>>> within
>>>> scope. But there is no way to express structural containment, or
>>>> composition, because describing process models and structure (for
>>>> instance, the structure of a program, a workflow, a script etc.) is
>>> not
>>>> within the PROV scope.
>>>>> I hope others in the group concur with this interpretation
>>>>>
>>>>> Regards,
>>>>>
>>>>> P.Missier - paolo.missier@ncl.ac.uk
>>>>>
>>>>> On 7 May 2012, at 21:44, Davide Ceolin<davide.ceolin@gmail.com>
>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am a PhD student of the VU University Amsterdam, and I would have
>>> a
>>>> question about the composition of activities in PROV. I noticed that
>>> it is
>>>> not possible to explicitly state that an activity is actually part of
>>>> another one.
>>>>>>
>>>>>> Suppose that a given entity is the result of an activity and, in
>>> turn,
>>>> this activity is part of a larger one.
>>>>>>
>>>>>> I can represent this scenario with two separate graphs stating that
>>>> each of the two activities generated the entity, and from them (and
>>> their
>>>> execution times, etc.) I may infer that one is part of the other one,
>>> but
>>>> I can't explicitly state that.
>>>>>>
>>>>>> Is there a specific reason for such a limitation?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Davide
>>>>>>
>>>>>> Davide Ceolin MSc.
>>>>>> PhD student
>>>>>> The Network Institute
>>>>>> VU University Amsterdam
>>>>>> d.ceolin@vu.nl
>>>>>> http://www.few.vu.nl/~dceolin/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Stian Soiland-Reyes, myGrid team
>>>> School of Computer Science
>>>> The University of Manchester
>>>>
>>>>
>>>>
>>> ________________________________________________________________________ 
>>>
>>>> This e-mail has been scanned for all viruses by Star. The
>>>> service is powered by MessageLabs. For more information on a proactive
>>>> anti-virus service working around the clock, around the globe, visit:
>>>> http://www.star.net.uk
>>>>
>>> ________________________________________________________________________ 
>>>
>>>
>>>
>>> *********************************************************************************************** 
>>>
>>> This email, including any attachment, is confidential and may be 
>>> legally
>>> privileged.  If you are not the intended recipient or if you have 
>>> received
>>> this email in error, please inform the sender immediately by reply and
>>> delete all copies from your system. Do not retain, copy, disclose,
>>> distribute or otherwise use any of its contents.
>>>
>>> Whilst we have taken reasonable precautions to ensure that this 
>>> email has
>>> been swept for computer viruses, we cannot guarantee that this email 
>>> does
>>> not contain such material and we therefore advise you to carry out 
>>> your own
>>> virus checks. We do not accept liability for any damage or losses 
>>> sustained
>>> as a result of such material.
>>>
>>> Please note that incoming and outgoing email communications passing
>>> through our IT systems may be monitored and/or intercepted by us 
>>> solely to
>>> determine whether the content is business related and compliant with
>>> company standards.
>>>
>>> *********************************************************************************************** 
>>>
>>>
>>> The Stationery Office Limited is registered in England No. 3049649 
>>> at 10
>>> Eastbourne Terrace, London, W2 6LG
>>>
>>>
>>>
>>>
>>
>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Thursday, 10 May 2012 14:59:27 UTC