Re: Activity composition from Jun Zhao on 2012-05-10 (public-prov-comments@w3.org from May 2012)

From: Jun Zhao <jun.zhao@zoo.ox.ac.uk>
Date: Thu, 10 May 2012 15:39:10 +0100
To: public-prov-comments@w3.org
Message-ID: <4FABD30E.8060009@zoo.ox.ac.uk>
HI Stephen and all,

Yes, I remember this problem vividly!

I think we should consider the requirement seriously and provide an 
effective way to support the community. James' proposal seems sensible 
to me!

-- Jun


On 10/05/2012 11:08, Daniel Garijo wrote:
> +1
>
> 2012/5/10 Cresswell, Stephen<stephen.cresswell@tso.co.uk>
>
>>
>> I ran into exactly this problem with legislation workflows (this was
>> with OPMV, but the problem occurs the same way in PROV), and after some
>> discussion with Jun, adopted a solution similar to Stian's option (d).
>> However, I don't think any of these workarounds are really satisfactory,
>> and am hugely in favour of PROV letting us describe activities at
>> different levels of granularity, and to state the relationship between
>> the activities across levels.  We should be able to infer that an entity
>> generated by fine-grained activity can also be seen as having been
>> generated by its course-grained parent, rather than regarding that as
>> inconsistent.
>>
>> Apart from anything else, this sort of abstraction seems very helpful to
>> enable presentation of provenance information for human consumption in a
>> way which doesn't immediately overwhelm with detail.
>>
>> Stephen Cresswell
>>
>>> -----Original Message-----
>>> From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of
>> Stian
>>> Soiland-Reyes
>>> Sent: 10 May 2012 10:03
>>> To: Paolo Ncl
>>> Cc: Davide Ceolin; public-prov-comments@w3.org
>>> Subject: Re: Activity composition
>>>
>>> I would also prefer a way to talk about activity composition and
>>> entity composition.
>>>
>>> With Daniel and Khalid I earlier tried to reconcile how we could use
>>> PROV to trace executions of nested scientific workflows. Let's say we
>>> have trace of the master workflow:
>>>
>>> wasGeneratedBy(value1, service1)
>>> used(service2, value1)
>>> wasGeneratedBy(value2, service2)
>>> used(service3, value1)
>>> used(service3, value2)
>>> wasGeneratedBy(value3, service3)
>>>
>>>
>>> service2 is a nested workflow, so while service1 and 3 are black
>>> boxes, we also know the details of the 'inner workings' of service2:
>>>
>>> wasStartedByActivity(service2a, service2)
>>> wasStartedByActivity(service2b, service2)
>>> used(value1, service2a)
>>> wasGeneratedBy(internalValue, service2a)
>>> used(value1, service2b)
>>> used(internalValue, service2b)
>>>
>>> The additional usage of value1 should be fine, but does not convey
>>> that it was given to service2b by service2.
>>>
>>>
>>> However we can't also state:
>>>
>>>    wasGeneratedBy(value2, service2b)
>>>
>>> This is due to the functional constraint - this would make service2b
>> ==
>>> service2
>>>
>>>
>>>
>>> Some current workarounds:
>>>
>>> a) Two entities, alternateOf
>>>
>>> wasGeneratedBy(value2Inside, service2b)
>>> alternateOf(value2, value2Inside)
>>> wasDerivedFrom(value2, value2Inside)
>>>
>>> I believe this is the cleanest solution. Here the derivation can be
>>> thought of as "Moving value2 from inside to outside". I added the
>>> derivation so that the existential link from value2Inside to value2 is
>>> stated.
>>>
>>> To 'close' value2Inside we can add:
>>>
>>> wasInvalidatedBy(value2Inside, service2)
>>>
>>>
>>>
>>> b) Two entities, common specializationOf  super-entity
>>>
>>> wasGeneratedBy(value2Outside, service2)
>>> wasGeneratedBy(value2Inside, service2b)
>>> specializationOf(value2Inside, value2)
>>> specializationOf(value2Outside, value2)
>>> wasDerivedFrom(value2Outside, value2Inside)
>>>
>>> The specialization here is basically 'Being inside' and 'Being
>>> outside' - think of it as the entity being in a door opening or coming
>>> out of a pipe. It would allow you to break down the 'transfer' as
>>> well:
>>>
>>> specializationOf(value2InTransit, value2)
>>> wasDerivedFrom(value2Outside, value2InTransit)
>>>
>>> "value2" here is the "actual", pure Platonian value, which does not
>>> easily have a wasGeneratedBy. For computer internals it can be thought
>>> of in terms of the abstract "The number 14" and "The bytes [20, 65,
>>> 66, 67]" - for real world examples it is "The concept of the thing".
>>>
>>>
>>>
>>> c) Use different accounts
>>>
>>> Each account can have different view of how value2 was created.
>>> However, if you have many activities, iterations etc, you will get a
>>> whole lot of accounts, and growing query and representational issues.
>>> Merging of these accounts will be more of a challenge, as you would
>>> have to use one of the other solutions suggested here.
>>>
>>> We also don't have a way to say "This account shows the inner workings
>>> of this activity".  (or can we use PROV-AQ for that?
>>>    :activity1 prov:hasProvenance<activity1-provenance>     )
>>>
>>>
>>> d) Drop outer wasGeneratedBy
>>>
>>> Removing
>>>    wasGeneratedBy(value2, service2)
>>>
>>> But then you have not just opened the lid of service2, you have
>>> removed the casing. This approach will mean that service2 did not have
>>> anything to do with value2.
>>>
>>>
>>>
>>> If we are unhappy about these kind of approaches, then I think a good
>>> solution would be to have a construct for service composition. Then we
>>> can lax the wasGeneratedBy functional requirement, and say that the
>>> activities are the same, or one of the activities contain the other,
>>> which can be expressed as some kind of "partOf" relation stronger than
>>> wasStartedBy (without implying any tokens).
>>>
>>> This will add complications, for instance if you have (e=entity,
>>> a=activity, ->= generated/used):
>>>
>>> a1 ->  e1 ->  a2 ->  e2
>>>
>>> and you also decompose a1 to:
>>>
>>> e0 ->  a1a ->  ex ->  a1b ->  ey ->  a1c ->  e2
>>>
>>>
>>> Now the question is where did e0 come from - was it by composition not
>>> also used by a1? Can e0 also 'be part of a1' - an embedded entity,
>>> like a part of the machine performing a1?
>>>
>>> (I think the opposite case is OK, if a1 consumes e0, but not seen
>>> inside. This could just have been used for coordination purposes by
>>> a1).
>>>
>>>
>>>
>>> However, I believe service composition is still easier to deal with
>>> than a set of slightly unrelated 'mirror' entities at different
>>> granularities, it's just a more detailed path of the same trace.
>>>
>>> I guess one question is if it is up to the asserter or the consumer of
>>> the provenance trace to determine the granularity. The beauty of this
>>> approach is that the consumer can mix and match, he can go in details
>>> for a2, but use the shortcut for a1. The asserter just says everything
>>> he knows, including the inner workings where it is known, and outer
>>> abstractions where they make sense.
>>>
>>>
>>>
>>> A different solution would be to have a stronger kind of alternateOf
>>> that includes the derivation and 'passing' nature rather than any kind
>>> of 'change' derivation. Thus we use two entities, but have a
>>> PROV-specific way to say 'This is the same thing, but as generated by
>>> a different activity at a different scale'.
>>>
>>>
>>> I believe that for almost all the examples we have, the activities
>>> could also be expressed at a more granular level. For instance,
>>> filling-petrol could be decomposed into opening-fuel-cap,
>>> using-petrol-pump, closing-fuel-cap, paying.
>>>
>>> Is our stance that such decomposition must always be done through a
>>> separate provenance account/graph?
>>>
>>>
>>> On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl<Paolo.Missier@ncl.ac.uk>
>>> wrote:
>>>> Davide
>>>>
>>>> I guess it depends on how you define "part of" in this setting. You
>> can
>>> specify that an activity has started another, which makes, informally,
>> the
>>> former a "parent" of the latter. You can use this to model forking,
>> for
>>> example. This is about the observed behavior of a process and is
>> within
>>> scope. But there is no way to express structural containment, or
>>> composition, because describing process models and structure (for
>>> instance, the structure of a program, a workflow, a script etc.) is
>> not
>>> within the PROV scope.
>>>> I hope others in the group concur with this interpretation
>>>>
>>>> Regards,
>>>>
>>>> P.Missier - paolo.missier@ncl.ac.uk
>>>>
>>>> On 7 May 2012, at 21:44, Davide Ceolin<davide.ceolin@gmail.com>
>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am a PhD student of the VU University Amsterdam, and I would have
>> a
>>> question about the composition of activities in PROV. I noticed that
>> it is
>>> not possible to explicitly state that an activity is actually part of
>>> another one.
>>>>>
>>>>> Suppose that a given entity is the result of an activity and, in
>> turn,
>>> this activity is part of a larger one.
>>>>>
>>>>> I can represent this scenario with two separate graphs stating that
>>> each of the two activities generated the entity, and from them (and
>> their
>>> execution times, etc.) I may infer that one is part of the other one,
>> but
>>> I can't explicitly state that.
>>>>>
>>>>> Is there a specific reason for such a limitation?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Davide
>>>>>
>>>>> Davide Ceolin MSc.
>>>>> PhD student
>>>>> The Network Institute
>>>>> VU University Amsterdam
>>>>> d.ceolin@vu.nl
>>>>> http://www.few.vu.nl/~dceolin/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Stian Soiland-Reyes, myGrid team
>>> School of Computer Science
>>> The University of Manchester
>>>
>>>
>>>
>> ________________________________________________________________________
>>> This e-mail has been scanned for all viruses by Star. The
>>> service is powered by MessageLabs. For more information on a proactive
>>> anti-virus service working around the clock, around the globe, visit:
>>> http://www.star.net.uk
>>>
>> ________________________________________________________________________
>>
>>
>> ***********************************************************************************************
>> This email, including any attachment, is confidential and may be legally
>> privileged.  If you are not the intended recipient or if you have received
>> this email in error, please inform the sender immediately by reply and
>> delete all copies from your system. Do not retain, copy, disclose,
>> distribute or otherwise use any of its contents.
>>
>> Whilst we have taken reasonable precautions to ensure that this email has
>> been swept for computer viruses, we cannot guarantee that this email does
>> not contain such material and we therefore advise you to carry out your own
>> virus checks. We do not accept liability for any damage or losses sustained
>> as a result of such material.
>>
>> Please note that incoming and outgoing email communications passing
>> through our IT systems may be monitored and/or intercepted by us solely to
>> determine whether the content is business related and compliant with
>> company standards.
>>
>> ***********************************************************************************************
>>
>> The Stationery Office Limited is registered in England No. 3049649 at 10
>> Eastbourne Terrace, London, W2 6LG
>>
>>
>>
>>
>
Received on Thursday, 10 May 2012 14:39:41 UTC