Re: Activity composition from Daniel Garijo on 2012-05-09 (public-prov-comments@w3.org from May 2012)

From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
Date: Thu, 10 May 2012 01:22:53 +0200
To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
Cc: Jim McCusker <mccusj@rpi.edu>, Paolo Missier <paolo.missier@newcastle.ac.uk>, Stephan Zednik <zednis@rpi.edu>, Davide Ceolin <davide.ceolin@gmail.com>, "public-prov-comments@w3.org" <public-prov-comments@w3.org>
Message-ID: <CAExK0DfQEgfVpJwPVRtUcwLZ_fjBosu9mRS0ytb8CXFdhS19eQ@mail.gmail.com>
Hi Paolo,
I think it has to do more with granularity than with process description:
A user A may see the experiment(ex1) as an activity which uses dataset d1
and produces result r1.

Another user may want a lower level of granularity, and for him the
experiment ex1 had 2 intermediate steps:
task123 and task124: task123 used d1 and produced r1', while task124 uses
r1' to produce r1.

So, besides the fact that task123 and task124 can be considered part of
ex1, we have 2 provenance traces
that correspond to 2 different accounts where r1 is produced by 2 different
activities. And that is not currently
supported in DM, because it's functional. Am I wrong?

Best,
Daniel

2012/5/10 Paolo Missier <Paolo.Missier@ncl.ac.uk>

>  absolutely, but what you are referring to with "steps within an
> experiment" seems to indicate that there is a process description which
> includes structural containment, and my understanding is that by design
> prov does not include process description at all. What I believe you can
> say is that you observed one activity (the "experiment") start another
> ("task123"). Then, you can say that task123 generated entity e1, but no
> relationship between the experiment and e1 would follow.
>   So do we need to extend the model to capture process description?
>
> -Paolo
>
>
>
>
> On 5/9/12 11:50 PM, Jim McCusker wrote:
>
> If I have an experiment, and that experiment generates a data file, but
> there were steps within that experiment that actually did the work, I would
> think we should be able to talk about that within an account.
>
>  Jim
>
> On Wed, May 9, 2012 at 6:43 PM, Paolo Missier <Paolo.Missier@ncl.ac.uk>wrote:
>
>>  May I ask what /is/ activity composition? i.e. what is the semantics of
>>
>>   :a2 a prov:Activity; dc:partOf :a1
>>
>>  (the use of dc:partOf seems to confirm that prov does not include such
>> concept).
>>
>> Also, I think what Davide has in mind with
>>
>>  " two separate graphs stating that each of the two activities generated
>> the entity"
>>  is a form of "bundling", or separate accounts, so the statement
>>
>>
>> :e1 a prov:Entity; prov:wasGeneratedBy :a1, :a2.
>>
>> would not hold within a single account, and thus the
>> generation-uniqueness rule does not apply?
>>
>> -Paolo
>>
>>
>>
>>
>> On 5/9/12 11:06 PM, Stephan Zednik wrote:
>>
>> Perhaps wasGeneratedBy should not be functional?
>>
>>  I think supporting activity composition will be heavily requested by
>> the provenance community.  I know projects at RPI/HAO  that I am a part of
>> and provenance projects at CSIRO have recognized it as an important
>> (potentially critical) aspect in generating provenance
>> presentations/visualizations for end users.
>>
>>  Perhaps if a :a2 generated an entity :e2 that was a specialization of
>> :e1?
>>
>>  We ~should~ be able to record provenance at different, and logically
>> connected, levels of abstraction, and activity composition seems a natural
>> component for doing so.
>>
>>  --Stephan
>>
>>  On May 9, 2012, at 3:56 PM, Jim McCusker wrote:
>>
>> There are some problems here with composition though, specifically when
>> you try to say something like this:
>>
>>  :a1 a prov:Activity.
>> :a2 a prov:Activity; dc:partOf :a1.
>>
>>  :e1 a prov:Entity; prov:wasGeneratedBy :a1, :a2.
>>
>>  Basically, since :a2 is part of :a1, and :a2 served as a "final
>> activity" (there aren't any further activities that used :e1), :e1, by
>> virtue of being generated by :a2 was also generated by :a1. But since
>> wasGeneratedBy is functional, we cannot assert that without :a1 and :a2
>> becoming identical (sameAs).
>>
>> Jim
>>
>> On Wed, May 9, 2012 at 5:47 PM, Paolo Ncl <Paolo.Missier@ncl.ac.uk>wrote:
>>
>>> Davide
>>>
>>> I guess it depends on how you define "part of" in this setting. You can
>>> specify that an activity has started another, which makes, informally, the
>>> former a "parent" of the latter. You can use this to model forking, for
>>> example. This is about the observed behavior of a process and is within
>>> scope. But there is no way to express structural containment, or
>>> composition, because describing process models and structure (for instance,
>>> the structure of a program, a workflow, a script etc.) is not within the
>>> PROV scope.
>>> I hope others in the group concur with this interpretation
>>>
>>> Regards,
>>>
>>> P.Missier - paolo.missier@ncl.ac.uk
>>>
>>> On 7 May 2012, at 21:44, Davide Ceolin <davide.ceolin@gmail.com> wrote:
>>>
>>> > Hello,
>>> >
>>> > I am a PhD student of the VU University Amsterdam, and I would have a
>>> question about the composition of activities in PROV. I noticed that it is
>>> not possible to explicitly state that an activity is actually part of
>>> another one.
>>> >
>>> > Suppose that a given entity is the result of an activity and, in turn,
>>> this activity is part of a larger one.
>>> >
>>> > I can represent this scenario with two separate graphs stating that
>>> each of the two activities generated the entity, and from them (and their
>>> execution times, etc.) I may infer that one is part of the other one, but I
>>> can't explicitly state that.
>>> >
>>> > Is there a specific reason for such a limitation?
>>> >
>>> > Thanks,
>>> >
>>> > Davide
>>> >
>>> > Davide Ceolin MSc.
>>> > PhD student
>>> > The Network Institute
>>> > VU University Amsterdam
>>> > d.ceolin@vu.nl
>>> > http://www.few.vu.nl/~dceolin/
>>> >
>>> >
>>> >
>>>
>>>
>>
>>
>>  --
>> Jim McCusker
>> Programmer Analyst
>> Krauthammer Lab, Pathology Informatics
>> Yale School of Medicine
>> james.mccusker@yale.edu | (203) 785-6330 <%28203%29%20785-6330>
>> http://krauthammerlab.med.yale.edu
>>
>> PhD Student
>> Tetherless World Constellation
>> Rensselaer Polytechnic Institute
>> mccusj@cs.rpi.edu
>> http://tw.rpi.edu
>>
>>
>>
>>
>>   --
>> -----------  ~oo~  --------------
>> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
>> School of Computing Science, Newcastle University,  UKhttp://www.cs.ncl.ac.uk/people/Paolo.Missier
>>
>>
>
>
>  --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker@yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu
>
> PhD Student
> Tetherless World Constellation
> Rensselaer Polytechnic Institute
> mccusj@cs.rpi.edu
> http://tw.rpi.edu
>
>
>
> --
> -----------  ~oo~  --------------
> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
> School of Computing Science, Newcastle University,  UKhttp://www.cs.ncl.ac.uk/people/Paolo.Missier
>
>
Received on Thursday, 10 May 2012 03:35:18 UTC