Re: Activity composition from Daniel Garijo on 2012-05-10 (public-prov-comments@w3.org from May 2012)

From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
Date: Thu, 10 May 2012 12:08:39 +0200
To: "Cresswell, Stephen" <stephen.cresswell@tso.co.uk>
Cc: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, Paolo Ncl <Paolo.Missier@ncl.ac.uk>, Davide Ceolin <davide.ceolin@gmail.com>, public-prov-comments@w3.org
Message-ID: <CAExK0DcrB8siKF6OmFacsO=19+GAJCUz9nfiMkY2y10SUVU2cw@mail.gmail.com>
+1

2012/5/10 Cresswell, Stephen <stephen.cresswell@tso.co.uk>

>
> I ran into exactly this problem with legislation workflows (this was
> with OPMV, but the problem occurs the same way in PROV), and after some
> discussion with Jun, adopted a solution similar to Stian's option (d).
> However, I don't think any of these workarounds are really satisfactory,
> and am hugely in favour of PROV letting us describe activities at
> different levels of granularity, and to state the relationship between
> the activities across levels.  We should be able to infer that an entity
> generated by fine-grained activity can also be seen as having been
> generated by its course-grained parent, rather than regarding that as
> inconsistent.
>
> Apart from anything else, this sort of abstraction seems very helpful to
> enable presentation of provenance information for human consumption in a
> way which doesn't immediately overwhelm with detail.
>
> Stephen Cresswell
>
> > -----Original Message-----
> > From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of
> Stian
> > Soiland-Reyes
> > Sent: 10 May 2012 10:03
> > To: Paolo Ncl
> > Cc: Davide Ceolin; public-prov-comments@w3.org
> > Subject: Re: Activity composition
> >
> > I would also prefer a way to talk about activity composition and
> > entity composition.
> >
> > With Daniel and Khalid I earlier tried to reconcile how we could use
> > PROV to trace executions of nested scientific workflows. Let's say we
> > have trace of the master workflow:
> >
> > wasGeneratedBy(value1, service1)
> > used(service2, value1)
> > wasGeneratedBy(value2, service2)
> > used(service3, value1)
> > used(service3, value2)
> > wasGeneratedBy(value3, service3)
> >
> >
> > service2 is a nested workflow, so while service1 and 3 are black
> > boxes, we also know the details of the 'inner workings' of service2:
> >
> > wasStartedByActivity(service2a, service2)
> > wasStartedByActivity(service2b, service2)
> > used(value1, service2a)
> > wasGeneratedBy(internalValue, service2a)
> > used(value1, service2b)
> > used(internalValue, service2b)
> >
> > The additional usage of value1 should be fine, but does not convey
> > that it was given to service2b by service2.
> >
> >
> > However we can't also state:
> >
> >   wasGeneratedBy(value2, service2b)
> >
> > This is due to the functional constraint - this would make service2b
> ==
> > service2
> >
> >
> >
> > Some current workarounds:
> >
> > a) Two entities, alternateOf
> >
> > wasGeneratedBy(value2Inside, service2b)
> > alternateOf(value2, value2Inside)
> > wasDerivedFrom(value2, value2Inside)
> >
> > I believe this is the cleanest solution. Here the derivation can be
> > thought of as "Moving value2 from inside to outside". I added the
> > derivation so that the existential link from value2Inside to value2 is
> > stated.
> >
> > To 'close' value2Inside we can add:
> >
> > wasInvalidatedBy(value2Inside, service2)
> >
> >
> >
> > b) Two entities, common specializationOf  super-entity
> >
> > wasGeneratedBy(value2Outside, service2)
> > wasGeneratedBy(value2Inside, service2b)
> > specializationOf(value2Inside, value2)
> > specializationOf(value2Outside, value2)
> > wasDerivedFrom(value2Outside, value2Inside)
> >
> > The specialization here is basically 'Being inside' and 'Being
> > outside' - think of it as the entity being in a door opening or coming
> > out of a pipe. It would allow you to break down the 'transfer' as
> > well:
> >
> > specializationOf(value2InTransit, value2)
> > wasDerivedFrom(value2Outside, value2InTransit)
> >
> > "value2" here is the "actual", pure Platonian value, which does not
> > easily have a wasGeneratedBy. For computer internals it can be thought
> > of in terms of the abstract "The number 14" and "The bytes [20, 65,
> > 66, 67]" - for real world examples it is "The concept of the thing".
> >
> >
> >
> > c) Use different accounts
> >
> > Each account can have different view of how value2 was created.
> > However, if you have many activities, iterations etc, you will get a
> > whole lot of accounts, and growing query and representational issues.
> > Merging of these accounts will be more of a challenge, as you would
> > have to use one of the other solutions suggested here.
> >
> > We also don't have a way to say "This account shows the inner workings
> > of this activity".  (or can we use PROV-AQ for that?
> >   :activity1 prov:hasProvenance <activity1-provenance>    )
> >
> >
> > d) Drop outer wasGeneratedBy
> >
> > Removing
> >   wasGeneratedBy(value2, service2)
> >
> > But then you have not just opened the lid of service2, you have
> > removed the casing. This approach will mean that service2 did not have
> > anything to do with value2.
> >
> >
> >
> > If we are unhappy about these kind of approaches, then I think a good
> > solution would be to have a construct for service composition. Then we
> > can lax the wasGeneratedBy functional requirement, and say that the
> > activities are the same, or one of the activities contain the other,
> > which can be expressed as some kind of "partOf" relation stronger than
> > wasStartedBy (without implying any tokens).
> >
> > This will add complications, for instance if you have (e=entity,
> > a=activity, ->= generated/used):
> >
> > a1 -> e1 -> a2 -> e2
> >
> > and you also decompose a1 to:
> >
> > e0 -> a1a -> ex -> a1b -> ey -> a1c -> e2
> >
> >
> > Now the question is where did e0 come from - was it by composition not
> > also used by a1? Can e0 also 'be part of a1' - an embedded entity,
> > like a part of the machine performing a1?
> >
> > (I think the opposite case is OK, if a1 consumes e0, but not seen
> > inside. This could just have been used for coordination purposes by
> > a1).
> >
> >
> >
> > However, I believe service composition is still easier to deal with
> > than a set of slightly unrelated 'mirror' entities at different
> > granularities, it's just a more detailed path of the same trace.
> >
> > I guess one question is if it is up to the asserter or the consumer of
> > the provenance trace to determine the granularity. The beauty of this
> > approach is that the consumer can mix and match, he can go in details
> > for a2, but use the shortcut for a1. The asserter just says everything
> > he knows, including the inner workings where it is known, and outer
> > abstractions where they make sense.
> >
> >
> >
> > A different solution would be to have a stronger kind of alternateOf
> > that includes the derivation and 'passing' nature rather than any kind
> > of 'change' derivation. Thus we use two entities, but have a
> > PROV-specific way to say 'This is the same thing, but as generated by
> > a different activity at a different scale'.
> >
> >
> > I believe that for almost all the examples we have, the activities
> > could also be expressed at a more granular level. For instance,
> > filling-petrol could be decomposed into opening-fuel-cap,
> > using-petrol-pump, closing-fuel-cap, paying.
> >
> > Is our stance that such decomposition must always be done through a
> > separate provenance account/graph?
> >
> >
> > On Wed, May 9, 2012 at 10:47 PM, Paolo Ncl <Paolo.Missier@ncl.ac.uk>
> > wrote:
> > > Davide
> > >
> > > I guess it depends on how you define "part of" in this setting. You
> can
> > specify that an activity has started another, which makes, informally,
> the
> > former a "parent" of the latter. You can use this to model forking,
> for
> > example. This is about the observed behavior of a process and is
> within
> > scope. But there is no way to express structural containment, or
> > composition, because describing process models and structure (for
> > instance, the structure of a program, a workflow, a script etc.) is
> not
> > within the PROV scope.
> > > I hope others in the group concur with this interpretation
> > >
> > > Regards,
> > >
> > > P.Missier - paolo.missier@ncl.ac.uk
> > >
> > > On 7 May 2012, at 21:44, Davide Ceolin <davide.ceolin@gmail.com>
> wrote:
> > >
> > >> Hello,
> > >>
> > >> I am a PhD student of the VU University Amsterdam, and I would have
> a
> > question about the composition of activities in PROV. I noticed that
> it is
> > not possible to explicitly state that an activity is actually part of
> > another one.
> > >>
> > >> Suppose that a given entity is the result of an activity and, in
> turn,
> > this activity is part of a larger one.
> > >>
> > >> I can represent this scenario with two separate graphs stating that
> > each of the two activities generated the entity, and from them (and
> their
> > execution times, etc.) I may infer that one is part of the other one,
> but
> > I can't explicitly state that.
> > >>
> > >> Is there a specific reason for such a limitation?
> > >>
> > >> Thanks,
> > >>
> > >> Davide
> > >>
> > >> Davide Ceolin MSc.
> > >> PhD student
> > >> The Network Institute
> > >> VU University Amsterdam
> > >> d.ceolin@vu.nl
> > >> http://www.few.vu.nl/~dceolin/
> > >>
> > >>
> > >>
> > >
> >
> >
> >
> > --
> > Stian Soiland-Reyes, myGrid team
> > School of Computer Science
> > The University of Manchester
> >
> >
> >
> ________________________________________________________________________
> > This e-mail has been scanned for all viruses by Star. The
> > service is powered by MessageLabs. For more information on a proactive
> > anti-virus service working around the clock, around the globe, visit:
> > http://www.star.net.uk
> >
> ________________________________________________________________________
>
>
> ***********************************************************************************************
> This email, including any attachment, is confidential and may be legally
> privileged.  If you are not the intended recipient or if you have received
> this email in error, please inform the sender immediately by reply and
> delete all copies from your system. Do not retain, copy, disclose,
> distribute or otherwise use any of its contents.
>
> Whilst we have taken reasonable precautions to ensure that this email has
> been swept for computer viruses, we cannot guarantee that this email does
> not contain such material and we therefore advise you to carry out your own
> virus checks. We do not accept liability for any damage or losses sustained
> as a result of such material.
>
> Please note that incoming and outgoing email communications passing
> through our IT systems may be monitored and/or intercepted by us solely to
> determine whether the content is business related and compliant with
> company standards.
>
> ***********************************************************************************************
>
> The Stationery Office Limited is registered in England No. 3049649 at 10
> Eastbourne Terrace, London, W2 6LG
>
>
>
>
Received on Thursday, 10 May 2012 10:09:19 UTC