- From: James Cheney <jcheney@inf.ed.ac.uk>
- Date: Thu, 10 May 2012 11:33:33 +0100
- To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
- Cc: Stephan Zednik <zednis@rpi.edu>, Paolo Missier <paolo.missier@newcastle.ac.uk>, Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>, Jim McCusker <mccusj@rpi.edu>, Davide Ceolin <davide.ceolin@gmail.com>, "public-prov-comments@w3.org" <public-prov-comments@w3.org>
- Message-Id: <0E4A43B9-AFBD-4239-80AD-C0C00D9F20CB@inf.ed.ac.uk>
('binary' encoding is not supported, stored as-is)
Hi all, Given where we are in the process, I think it would be good keep activity granularity open as an option for the future and give a nudge towards a standard mechanism (e.g. as a part of a best practice note) but not standardize it prematurely. (perhaps similar to how collections may be handled). Accounts don't seem suitable because there are (currently) not good ways of linking the finer and coarser grained activities - you can say "here are two different ways this happened" but can't make it clear that this activity in this account corresponds to these two sub-activities in the other. I have also been thinking about this for other reasons - we have a workshop paper looking at process/activity granularity in an OPM-like model [1]. At the semantics level, I don't think it would be a huge problem to think of activities as forming a hierarchy, where if a1 is part of a2 then all of the events in a1 are also events of a2. But then we need to revisit lots of things. We could have similar issues to specializationOf and entity attributes. e.g. if one activity is part of another, do their shared attributes have to match? --James [1] http://homepages.inf.ed.ac.uk/jcheney/publications/drafts/granularity.pdf On May 10, 2012, at 8:46 AM, Paolo Missier wrote: > Stephan > > indeed two distinct but related entities. So uniqueness of generation ("functional") is not an issue here > > however having to use accounts for issues of granularity seems awfully heavyweight, this is not what accounts (or bundles) are for. I think Davide is right in that there is no suitable construct in prov. but again I believe structural containment has to do with Plans not with Activities, and may be addressed through extensions. > > -Paolo > > On 5/10/12 12:47 AM, Stephan Zednik wrote: >> >> >> >> On May 9, 2012, at 5:32 PM, Paolo Missier wrote: >> >>> Hi Daniel, >>> >>> I understand the setup, and what I am saying is if these are indeed two "accounts", i.e., two observers who see things at different levels (a la OPM), then generation is not "functional" (this is what I mean by generation-uniqueness) across accounts, and so there is no problem. >> >> Are you assuming a single r1 entity or an entity for r1 in each account? With the current PROVO we would have to have a separate r1 entity in each account, which we could then relate to each other via alternateOf. >> >> --Stephan >> >>> What we don't have is a way (within prov) to say task123 is part of ex1, task124 is part of ex2. >>> >>> (will continue tomorrow, too late now) >>> >>> -Paolo >>> >>> >>> On 5/10/12 12:22 AM, Daniel Garijo wrote: >>>> >>>> Hi Paolo, >>>> I think it has to do more with granularity than with process description: >>>> A user A may see the experiment(ex1) as an activity which uses dataset d1 and produces result r1. >>>> >>>> Another user may want a lower level of granularity, and for him the experiment ex1 had 2 intermediate steps: >>>> task123 and task124: task123 used d1 and produced r1', while task124 uses r1' to produce r1. >>>> >>>> So, besides the fact that task123 and task124 can be considered part of ex1, we have 2 provenance traces >>>> that correspond to 2 different accounts where r1 is produced by 2 different activities. And that is not currently >>>> supported in DM, because it's functional. Am I wrong? >>>> >>>> Best, >>>> Daniel >>>> >>>> 2012/5/10 Paolo Missier <Paolo.Missier@ncl.ac.uk> >>>> absolutely, but what you are referring to with "steps within an experiment" seems to indicate that there is a process description which includes structural containment, and my understanding is that by design prov does not include process description at all. What I believe you can say is that you observed one activity (the "experiment") start another ("task123"). Then, you can say that task123 generated entity e1, but no relationship between the experiment and e1 would follow. >>>> So do we need to extend the model to capture process description? >>>> >>>> -Paolo >>>> >>>> >>>> >>>> >>>> On 5/9/12 11:50 PM, Jim McCusker wrote: >>>>> >>>>> If I have an experiment, and that experiment generates a data file, but there were steps within that experiment that actually did the work, I would think we should be able to talk about that within an account. >>>>> >>>>> Jim >>>>> >>>>> On Wed, May 9, 2012 at 6:43 PM, Paolo Missier <Paolo.Missier@ncl.ac.uk> wrote: >>>>> May I ask what /is/ activity composition? i.e. what is the semantics of >>>>> >>>>> :a2 a prov:Activity; dc:partOf :a1 >>>>> >>>>> (the use of dc:partOf seems to confirm that prov does not include such concept). >>>>> >>>>> Also, I think what Davide has in mind with >>>>> >>>>> " two separate graphs stating that each of the two activities generated the entity" >>>>> is a form of "bundling", or separate accounts, so the statement >>>>> >>>>> >>>>> :e1 a prov:Entity; prov:wasGeneratedBy :a1, :a2. >>>>> >>>>> would not hold within a single account, and thus the generation-uniqueness rule does not apply? >>>>> >>>>> -Paolo >>>>> >>>>> >>>>> >>>>> >>>>> On 5/9/12 11:06 PM, Stephan Zednik wrote: >>>>>> >>>>>> Perhaps wasGeneratedBy should not be functional? >>>>>> >>>>>> I think supporting activity composition will be heavily requested by the provenance community. I know projects at RPI/HAO that I am a part of and provenance projects at CSIRO have recognized it as an important (potentially critical) aspect in generating provenance presentations/visualizations for end users. >>>>>> >>>>>> Perhaps if a :a2 generated an entity :e2 that was a specialization of :e1? >>>>>> >>>>>> We ~should~ be able to record provenance at different, and logically connected, levels of abstraction, and activity composition seems a natural component for doing so. >>>>>> >>>>>> --Stephan >>>>>> >>>>>> On May 9, 2012, at 3:56 PM, Jim McCusker wrote: >>>>>> >>>>>>> There are some problems here with composition though, specifically when you try to say something like this: >>>>>>> >>>>>>> :a1 a prov:Activity. >>>>>>> :a2 a prov:Activity; dc:partOf :a1. >>>>>>> >>>>>>> :e1 a prov:Entity; prov:wasGeneratedBy :a1, :a2. >>>>>>> >>>>>>> Basically, since :a2 is part of :a1, and :a2 served as a "final activity" (there aren't any further activities that used :e1), :e1, by virtue of being generated by :a2 was also generated by :a1. But since wasGeneratedBy is functional, we cannot assert that without :a1 and :a2 becoming identical (sameAs). >>>>>>> >>>>>>> Jim >>>>>>> >>>>>>> On Wed, May 9, 2012 at 5:47 PM, Paolo Ncl <Paolo.Missier@ncl.ac.uk> wrote: >>>>>>> Davide >>>>>>> >>>>>>> I guess it depends on how you define "part of" in this setting. You can specify that an activity has started another, which makes, informally, the former a "parent" of the latter. You can use this to model forking, for example. This is about the observed behavior of a process and is within scope. But there is no way to express structural containment, or composition, because describing process models and structure (for instance, the structure of a program, a workflow, a script etc.) is not within the PROV scope. >>>>>>> I hope others in the group concur with this interpretation >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> P.Missier - paolo.missier@ncl.ac.uk >>>>>>> >>>>>>> On 7 May 2012, at 21:44, Davide Ceolin <davide.ceolin@gmail.com> wrote: >>>>>>> >>>>>>> > Hello, >>>>>>> > >>>>>>> > I am a PhD student of the VU University Amsterdam, and I would have a question about the composition of activities in PROV. I noticed that it is not possible to explicitly state that an activity is actually part of another one. >>>>>>> > >>>>>>> > Suppose that a given entity is the result of an activity and, in turn, this activity is part of a larger one. >>>>>>> > >>>>>>> > I can represent this scenario with two separate graphs stating that each of the two activities generated the entity, and from them (and their execution times, etc.) I may infer that one is part of the other one, but I can't explicitly state that. >>>>>>> > >>>>>>> > Is there a specific reason for such a limitation? >>>>>>> > >>>>>>> > Thanks, >>>>>>> > >>>>>>> > Davide >>>>>>> > >>>>>>> > Davide Ceolin MSc. >>>>>>> > PhD student >>>>>>> > The Network Institute >>>>>>> > VU University Amsterdam >>>>>>> > d.ceolin@vu.nl >>>>>>> > http://www.few.vu.nl/~dceolin/ >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jim McCusker >>>>>>> Programmer Analyst >>>>>>> Krauthammer Lab, Pathology Informatics >>>>>>> Yale School of Medicine >>>>>>> james.mccusker@yale.edu | (203) 785-6330 >>>>>>> http://krauthammerlab.med.yale.edu >>>>>>> >>>>>>> PhD Student >>>>>>> Tetherless World Constellation >>>>>>> Rensselaer Polytechnic Institute >>>>>>> mccusj@cs.rpi.edu >>>>>>> http://tw.rpi.edu >>>>>> >>>>> >>>>> >>>>> -- >>>>> ----------- ~oo~ -------------- >>>>> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org >>>>> School of Computing Science, Newcastle University, UK >>>>> http://www.cs.ncl.ac.uk/people/Paolo.Missier >>>>> >>>>> >>>>> >>>>> -- >>>>> Jim McCusker >>>>> Programmer Analyst >>>>> Krauthammer Lab, Pathology Informatics >>>>> Yale School of Medicine >>>>> james.mccusker@yale.edu | (203) 785-6330 >>>>> http://krauthammerlab.med.yale.edu >>>>> >>>>> PhD Student >>>>> Tetherless World Constellation >>>>> Rensselaer Polytechnic Institute >>>>> mccusj@cs.rpi.edu >>>>> http://tw.rpi.edu >>>> >>>> >>>> -- >>>> ----------- ~oo~ -------------- >>>> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org >>>> School of Computing Science, Newcastle University, UK >>>> http://www.cs.ncl.ac.uk/people/Paolo.Missier >>>> >>> >>> >>> -- >>> ----------- ~oo~ -------------- >>> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org >>> School of Computing Science, Newcastle University, UK >>> http://www.cs.ncl.ac.uk/people/Paolo.Missier >> > > > -- > ----------- ~oo~ -------------- > Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org > School of Computing Science, Newcastle University, UK > http://www.cs.ncl.ac.uk/people/Paolo.Missier
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Received on Thursday, 10 May 2012 10:44:04 UTC