Re: Activity composition from Jim McCusker on 2012-05-09 (public-prov-comments@w3.org from May 2012)

From: Jim McCusker <mccusj@rpi.edu>
Date: Wed, 9 May 2012 19:35:13 -0400
To: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
Cc: Paolo Missier <Paolo.Missier@ncl.ac.uk>, Paolo Missier <paolo.missier@newcastle.ac.uk>, Stephan Zednik <zednis@rpi.edu>, Davide Ceolin <davide.ceolin@gmail.com>, "public-prov-comments@w3.org" <public-prov-comments@w3.org>
Message-ID: <CAAtgn=RpAfWykRHczuxZZgM-1gOLjU=RgW+vpp+4n9QkpB++cQ@mail.gmail.com>
Yes, granularity is the issue I'm referring to.

Jim

On Wed, May 9, 2012 at 7:22 PM, Daniel Garijo <
dgarijo@delicias.dia.fi.upm.es> wrote:

> Hi Paolo,
> I think it has to do more with granularity than with process description:
> A user A may see the experiment(ex1) as an activity which uses dataset d1
> and produces result r1.
>
> Another user may want a lower level of granularity, and for him the
> experiment ex1 had 2 intermediate steps:
> task123 and task124: task123 used d1 and produced r1', while task124 uses
> r1' to produce r1.
>
> So, besides the fact that task123 and task124 can be considered part of
> ex1, we have 2 provenance traces
> that correspond to 2 different accounts where r1 is produced by 2
> different activities. And that is not currently
> supported in DM, because it's functional. Am I wrong?
>
> Best,
> Daniel
>
>
> 2012/5/10 Paolo Missier <Paolo.Missier@ncl.ac.uk>
>
>>  absolutely, but what you are referring to with "steps within an
>> experiment" seems to indicate that there is a process description which
>> includes structural containment, and my understanding is that by design
>> prov does not include process description at all. What I believe you can
>> say is that you observed one activity (the "experiment") start another
>> ("task123"). Then, you can say that task123 generated entity e1, but no
>> relationship between the experiment and e1 would follow.
>>   So do we need to extend the model to capture process description?
>>
>> -Paolo
>>
>>
>>
>>
>> On 5/9/12 11:50 PM, Jim McCusker wrote:
>>
>> If I have an experiment, and that experiment generates a data file, but
>> there were steps within that experiment that actually did the work, I would
>> think we should be able to talk about that within an account.
>>
>>  Jim
>>
>> On Wed, May 9, 2012 at 6:43 PM, Paolo Missier <Paolo.Missier@ncl.ac.uk>wrote:
>>
>>>  May I ask what /is/ activity composition? i.e. what is the semantics of
>>>
>>>   :a2 a prov:Activity; dc:partOf :a1
>>>
>>>  (the use of dc:partOf seems to confirm that prov does not include such
>>> concept).
>>>
>>> Also, I think what Davide has in mind with
>>>
>>>  " two separate graphs stating that each of the two activities generated
>>> the entity"
>>>  is a form of "bundling", or separate accounts, so the statement
>>>
>>>
>>> :e1 a prov:Entity; prov:wasGeneratedBy :a1, :a2.
>>>
>>> would not hold within a single account, and thus the
>>> generation-uniqueness rule does not apply?
>>>
>>> -Paolo
>>>
>>>
>>>
>>>
>>> On 5/9/12 11:06 PM, Stephan Zednik wrote:
>>>
>>> Perhaps wasGeneratedBy should not be functional?
>>>
>>>  I think supporting activity composition will be heavily requested by
>>> the provenance community.  I know projects at RPI/HAO  that I am a part of
>>> and provenance projects at CSIRO have recognized it as an important
>>> (potentially critical) aspect in generating provenance
>>> presentations/visualizations for end users.
>>>
>>>  Perhaps if a :a2 generated an entity :e2 that was a specialization of
>>> :e1?
>>>
>>>  We ~should~ be able to record provenance at different, and logically
>>> connected, levels of abstraction, and activity composition seems a natural
>>> component for doing so.
>>>
>>>  --Stephan
>>>
>>>  On May 9, 2012, at 3:56 PM, Jim McCusker wrote:
>>>
>>> There are some problems here with composition though, specifically when
>>> you try to say something like this:
>>>
>>>  :a1 a prov:Activity.
>>> :a2 a prov:Activity; dc:partOf :a1.
>>>
>>>  :e1 a prov:Entity; prov:wasGeneratedBy :a1, :a2.
>>>
>>>  Basically, since :a2 is part of :a1, and :a2 served as a "final
>>> activity" (there aren't any further activities that used :e1), :e1, by
>>> virtue of being generated by :a2 was also generated by :a1. But since
>>> wasGeneratedBy is functional, we cannot assert that without :a1 and :a2
>>> becoming identical (sameAs).
>>>
>>> Jim
>>>
>>> On Wed, May 9, 2012 at 5:47 PM, Paolo Ncl <Paolo.Missier@ncl.ac.uk>wrote:
>>>
>>>> Davide
>>>>
>>>> I guess it depends on how you define "part of" in this setting. You can
>>>> specify that an activity has started another, which makes, informally, the
>>>> former a "parent" of the latter. You can use this to model forking, for
>>>> example. This is about the observed behavior of a process and is within
>>>> scope. But there is no way to express structural containment, or
>>>> composition, because describing process models and structure (for instance,
>>>> the structure of a program, a workflow, a script etc.) is not within the
>>>> PROV scope.
>>>> I hope others in the group concur with this interpretation
>>>>
>>>> Regards,
>>>>
>>>> P.Missier - paolo.missier@ncl.ac.uk
>>>>
>>>> On 7 May 2012, at 21:44, Davide Ceolin <davide.ceolin@gmail.com> wrote:
>>>>
>>>> > Hello,
>>>> >
>>>> > I am a PhD student of the VU University Amsterdam, and I would have a
>>>> question about the composition of activities in PROV. I noticed that it is
>>>> not possible to explicitly state that an activity is actually part of
>>>> another one.
>>>> >
>>>> > Suppose that a given entity is the result of an activity and, in
>>>> turn, this activity is part of a larger one.
>>>> >
>>>> > I can represent this scenario with two separate graphs stating that
>>>> each of the two activities generated the entity, and from them (and their
>>>> execution times, etc.) I may infer that one is part of the other one, but I
>>>> can't explicitly state that.
>>>> >
>>>> > Is there a specific reason for such a limitation?
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Davide
>>>> >
>>>> > Davide Ceolin MSc.
>>>> > PhD student
>>>> > The Network Institute
>>>> > VU University Amsterdam
>>>> > d.ceolin@vu.nl
>>>> > http://www.few.vu.nl/~dceolin/
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>>
>>>  --
>>> Jim McCusker
>>> Programmer Analyst
>>> Krauthammer Lab, Pathology Informatics
>>> Yale School of Medicine
>>> james.mccusker@yale.edu | (203) 785-6330 <%28203%29%20785-6330>
>>> http://krauthammerlab.med.yale.edu
>>>
>>> PhD Student
>>> Tetherless World Constellation
>>> Rensselaer Polytechnic Institute
>>> mccusj@cs.rpi.edu
>>> http://tw.rpi.edu
>>>
>>>
>>>
>>>
>>>   --
>>> -----------  ~oo~  --------------
>>> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
>>> School of Computing Science, Newcastle University,  UKhttp://www.cs.ncl.ac.uk/people/Paolo.Missier
>>>
>>>
>>
>>
>>  --
>> Jim McCusker
>> Programmer Analyst
>> Krauthammer Lab, Pathology Informatics
>> Yale School of Medicine
>> james.mccusker@yale.edu | (203) 785-6330
>> http://krauthammerlab.med.yale.edu
>>
>> PhD Student
>> Tetherless World Constellation
>> Rensselaer Polytechnic Institute
>> mccusj@cs.rpi.edu
>> http://tw.rpi.edu
>>
>>
>>
>> --
>> -----------  ~oo~  --------------
>> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
>> School of Computing Science, Newcastle University,  UKhttp://www.cs.ncl.ac.uk/people/Paolo.Missier
>>
>>
>


-- 
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu
Received on Wednesday, 9 May 2012 23:36:05 UTC