RE: Activity composition

I ran into exactly this problem with legislation workflows (this was
with OPMV, but the problem occurs the same way in PROV), and after some
discussion with Jun, adopted a solution similar to Stian's option (d).
However, I don't think any of these workarounds are really satisfactory,
and am hugely in favour of PROV letting us describe activities at
different levels of granularity, and to state the relationship between
the activities across levels.  We should be able to infer that an entity
generated by fine-grained activity can also be seen as having been
generated by its course-grained parent, rather than regarding that as
inconsistent.  

Apart from anything else, this sort of abstraction seems very helpful to
enable presentation of provenance information for human consumption in a
way which doesn't immediately overwhelm with detail.  

Stephen Cresswell

> -----Original Message-----
> From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of
Stian
> Soiland-Reyes
> Sent: 10 May 2012 10:03
> To: Paolo Ncl
> Cc: Davide Ceolin; public-prov-comments@w3.org
> Subject: Re: Activity composition
> 
> I would also prefer a way to talk about activity composition and
> entity composition.
> 
> With Daniel and Khalid I earlier tried to reconcile how we could use
> PROV to trace executions of nested scientific workflows. Let's say we
> have trace of the master workflow:
> 
> wasGeneratedBy(value1, service1)
> used(service2, value1)
> wasGeneratedBy(value2, service2)
> used(service3, value1)
> used(service3, value2)
> wasGeneratedBy(value3, service3)
> 
> 
> service2 is a nested workflow, so while service1 and 3 are black
> boxes, we also know the details of the 'inner workings' of service2:
> 
> wasStartedByActivity(service2a, service2)
> wasStartedByActivity(service2b, service2)
> used(value1, service2a)
> wasGeneratedBy(internalValue, service2a)
> used(value1, service2b)
> used(internalValue, service2b)
> 
> The additional usage of value1 should be fine, but does not convey
> that it was given to service2b by service2.
> 
> 
> However we can't also state:
> 
>   wasGeneratedBy(value2, service2b)
> 
> This is due to the functional constraint - this would make service2b
==
> service2
> 
> 
> 
> Some current workarounds:
> 
> a) Two entities, alternateOf
> 
> wasGeneratedBy(value2Inside, service2b)
> alternateOf(value2, value2Inside)
> wasDerivedFrom(value2, value2Inside)
> 
> I believe this is the cleanest solution. Here the derivation can be
> thought of as "Moving value2 from inside to outside". I added the
> derivation so that the existential link from value2Inside to value2 is
> stated.
> 
> To 'close' value2Inside we can add:
> 
> wasInvalidatedBy(value2Inside, service2)
> 
> 
> 
> b) Two entities, common specializationOf  super-entity
> 
> wasGeneratedBy(value2Outside, service2)
> wasGeneratedBy(value2Inside, service2b)
> specializationOf(value2Inside, value2)
> specializationOf(value2Outside, value2)
> wasDerivedFrom(value2Outside, value2Inside)
> 
> The specialization here is basically 'Being inside' and 'Being
> outside' - think of it as the entity being in a door opening or coming
> out of a pipe. It would allow you to break down the 'transfer' as
> well:
> 
> specializationOf(value2InTransit, value2)
> wasDerivedFrom(value2Outside, value2InTransit)
> 
> "value2" here is the "actual", pure Platonian value, which does not
> easily have a wasGeneratedBy. For computer internals it can be thought
> of in terms of the abstract "The number 14" and "The bytes [20, 65,
> 66, 67]" - for real world examples it is "The concept of the thing".
> 
> 
> 
> c) Use different accounts
> 
> Each account can have different view of how value2 was created.
> However, if you have many activities, iterations etc, you will get a
> whole lot of accounts, and growing query and representational issues.
> Merging of these accounts will be more of a challenge, as you would
> have to use one of the other solutions suggested here.
> 
> We also don't have a way to say "This account shows the inner workings
> of this activity".  (or can we use PROV-AQ for that?
>   :activity1 prov:hasProvenance <activity1-provenance>    )
> 
> 
> d) Drop outer wasGeneratedBy
> 
> Removing
>   wasGeneratedBy(value2, service2)
> 
> But then you have not just opened the lid of service2, you have
> removed the casing. This approach will mean that service2 did not have
> anything to do with value2.
> 
> 
> 
> If we are unhappy about these kind of approaches, then I think a good
> solution would be to have a construct for service composition. Then we
> can lax the wasGeneratedBy functional requirement, and say that the
> activities are the same, or one of the activities contain the other,
> which can be expressed as some kind of "partOf" relation stronger than
> wasStartedBy (without implying any tokens).
> 
> This will add complications, for instance if you have (e=entity,
> a=activity, ->= generated/used):
> 
> a1 -> e1 -> a2 -> e2
> 
> and you also decompose a1 to:
> 
> e0 -> a1a -> ex -> a1b -> ey -> a1c -> e2
> 
> 
> Now the question is where did e0 come from - was it by composition not
> also used by a1? Can e0 also 'be part of a1' - an embedded entity,
> like a part of the machine performing a1?
> 
> (I think the opposite case is OK, if a1 consumes e0, but not seen
> inside. This could just have been used for coordination purposes by
> a1).
> 
> 
> 
> However, I believe service composition is still easier to deal with
> than a set of slightly unrelated 'mirror' entities at different
> granularities, it's just a more detailed path of the same trace.
> 
> I guess one question is if it is up to the asserter or the consumer of
> the provenance trace to determine the granularity. The beauty of this
> approach is that the consumer can mix and match, he can go in details
> for a2, but use the shortcut for a1. The asserter just says everything
> he knows, including the inner workings where it is known, and outer
> abstractions where they make sense.
> 
> 
> 
> A different solution would be to have a stronger kind of alternateOf
> that includes the derivation and 'passing' nature rather than any kind
> of 'change' derivation. Thus we use two entities, but have a
> PROV-specific way to say 'This is the same thing, but as generated by
> a different activity at a different scale'.
> 
> 
> I believe that for almost all the examples we have, the activities
> could also be expressed at a more granular level. For instance,
> filling-petrol could be decomposed into opening-fuel-cap,
> using-petrol-pump, closing-fuel-cap, paying.
> 
> Is our stance that such decomposition must always be done through a
> separate provenance account/graph?
> 
> 
> On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl <Paolo.Missier@ncl.ac.uk>
> wrote:
> > Davide
> >
> > I guess it depends on how you define "part of" in this setting. You
can
> specify that an activity has started another, which makes, informally,
the
> former a "parent" of the latter. You can use this to model forking,
for
> example. This is about the observed behavior of a process and is
within
> scope. But there is no way to express structural containment, or
> composition, because describing process models and structure (for
> instance, the structure of a program, a workflow, a script etc.) is
not
> within the PROV scope.
> > I hope others in the group concur with this interpretation
> >
> > Regards,
> >
> > P.Missier - paolo.missier@ncl.ac.uk
> >
> > On 7 May 2012, at 21:44, Davide Ceolin <davide.ceolin@gmail.com>
wrote:
> >
> >> Hello,
> >>
> >> I am a PhD student of the VU University Amsterdam, and I would have
a
> question about the composition of activities in PROV. I noticed that
it is
> not possible to explicitly state that an activity is actually part of
> another one.
> >>
> >> Suppose that a given entity is the result of an activity and, in
turn,
> this activity is part of a larger one.
> >>
> >> I can represent this scenario with two separate graphs stating that
> each of the two activities generated the entity, and from them (and
their
> execution times, etc.) I may infer that one is part of the other one,
but
> I can't explicitly state that.
> >>
> >> Is there a specific reason for such a limitation?
> >>
> >> Thanks,
> >>
> >> Davide
> >>
> >> Davide Ceolin MSc.
> >> PhD student
> >> The Network Institute
> >> VU University Amsterdam
> >> d.ceolin@vu.nl
> >> http://www.few.vu.nl/~dceolin/
> >>
> >>
> >>
> >
> 
> 
> 
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
> 
> 
>
________________________________________________________________________
> This e-mail has been scanned for all viruses by Star. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
>
________________________________________________________________________

***********************************************************************************************
This email, including any attachment, is confidential and may be legally privileged.  If you are not the intended recipient or if you have received this email in error, please inform the sender immediately by reply and delete all copies from your system. Do not retain, copy, disclose, distribute or otherwise use any of its contents.  

Whilst we have taken reasonable precautions to ensure that this email has been swept for computer viruses, we cannot guarantee that this email does not contain such material and we therefore advise you to carry out your own virus checks. We do not accept liability for any damage or losses sustained as a result of such material.

Please note that incoming and outgoing email communications passing through our IT systems may be monitored and/or intercepted by us solely to determine whether the content is business related and compliant with company standards.
***********************************************************************************************

The Stationery Office Limited is registered in England No. 3049649 at 10 Eastbourne Terrace, London, W2 6LG

Received on Thursday, 10 May 2012 10:06:42 UTC