- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Thu, 6 Sep 2012 11:11:39 +0100
- To: Luc Moreau <l.moreau@ecs.soton.ac.uk>
- Cc: Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, public-prov-wg@w3.org
> I don't think this example makes much sense: > > activity(a1,2011-11-16T00:00:00,2011-11-17T00:00:00) // in 2011 > activity(a2,2012-11-16T00:00:00,2012-11-17T00:00:00) // in 2012 > wasSubactivity(a1,a2) I agree this would look stupid, but we have said before that the exact timestamps don't have any meaning in PROV-Constraints. In particular for subactivities, it could very much happen that the times are recorded by different mechanisms. Perhaps a difference of a year is a glaring error, but say a few seconds off might be acceptable. (For instance a shell script that does an SSH to a server that then does a wget to a web service, three different timestamps not quite synchronized.). Obviously this can easily be isolated using different accounts/bundles, but as has been discussed with workflow provenance, we often came to the conclusion that we don't want to split every subactivity into a new bundle, as it would mean hundreds of different standalone bundles which would be trickier to do any kind of reasoning over. > As indicated previously, it's a whole complete new design that > we have to undertake, for which we don't have enough experience. It seems that a wasSubActivity should have many of the characteristics of specializationOf, but it raises lots of discussion points for inferences: * the subactivity must be fully contained within the duration of the superactivity (This is the easy one!) * wasAssociatedWith(ag, subAct), then wasAssociatedWith(ag, act) ? Vice versa? * wasGeneratedBy(e, subAct), then wasGeneratedBy(e, act) ? Vice versa? * used(subAct, e), then wasGeneratedBy(act, e) ? Vice versa? * Must subactivities be 'isolated', or are they allowed to communicate with activities which also communicate with the superactivity? (Imposes a theory of execution!) * Can the superactivity communicate with the subactivity? Does it always? So I agree it is a big can of worms. This was difficult enough to settle for entities, now we would not only have to think about activity-to-activity, but the implications on the other relations. However the arguments we used for adding prov:specializationOf and prov:alternateOf would very easily also apply to activities: * Equivalent activities can be expressed at different granularities (prov:wasSubActivityOf ?) * Equivalent activities can be expressed using alternate interpretations (prov:alternateActivity ? ) So given this, why do we allow nesting and alternatives for entities, and not for activities? I strongly recognize the need for the expression of subactivities - but I am very afraid of all of these questions, and it is not like our model is not getting complex enough already. I would prefer to simply introduce it as a dcterms:hasPart (please, don't use dc !) kind of notion with no particular interpretation attached - it is simply a guide to the reader, like prov:alternateOf. Perhaps prov:partOfActivity to avoid the implications of "sub"? (ie. are you allowed to be part of multiple activities? I think we should not restrict that.) It still raises the question about entities generated by both activities and the generation-uniqueness constraint. One way around it, as I've approached it for Taverna's workflow PROV, is to use prov:alternateOf between two entities, one per generation/invalidation. You can picture these entities as representing "The value as output gate X" and "The value at output gate Y" - almost like the old prov:EntityInRole. This is the same reasoning a washed car coming out of the last-stage activity(polishing) and thereby completing the activity(carWashing) can be seen as generated twice, once as "polishedCar" and once as "washedCar" - even though there is nothing happening between the two activities and the two entities are equivalent. If this is the recommended approach, then it would be good to have a property to clarify this is not just any odd alternate; say prov:alternateInSubActivity. (as a property on the prov:Entity or a subproperty of prov:alternateOf). Otherwise it gets tricky to query the provenance across, we don't want to follow every odd alternate up and down the trace. The strange thing here is that you don't *need* to do the prov:alternateOf wrapping for usage or association. The question also then comes to which extend to the subactivities should always twin the entities or not. I don't particularly like that "work around" approach for subactivities, as it ends up making a verbose "twin world" with alternate identifiers (which you have to mint) - effectively making an inline bundle without clear boundaries. The second way, much simpler and my preference, is to allow multiple generation, but only as long as one activity is subactivity of the other. I guess we can't infer which one is the sub and which one is the super - so it would be a constraint rather than an inference, but this gets tricky with the open world assumption and the use of OR/NOT. (This can be solved by adding a prov:alternateActivityFor as a symmetric superproperty of prov:wasSubActivityOf, then we can instead of the constraint simply infer prov:alternateActivityFor on multiple generations. The semantics of prov:alternativeActivityFor would be particularly weak, similar to prov:alternativeOf. ) This is indeed the approach we have taken for Wf4Ever's 'simplified' workflow provenance model wfprov - http://wf4ever.github.com/ro/#wfprov Here wfprov:wasPartOfWorkflowRun is the workflow equivalent of wasSubActivityOf, and both are allowed to have the same artifact (ie. entity) as it's wfprov:wasOutputFrom. Because of this we currently we can't make wfprov:wasOutputFrom a subproperty of prov:wasGeneratedBy without violating PROV-Constraints. As we don't want to make a too verbose model, we are trying to avoid adding the equivalent of prov:alternateOf workaround I sketched above. -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester
Received on Thursday, 6 September 2012 10:12:30 UTC