Comments on Provenance Data Model (PROV-DM) Draft

Hi all,

As I mentioned in this morning telecon, I went over the data model  
document last week as I was thinking about scenarios and examples for  
the primer.  I think there are key concepts that could be described  
better, and perhaps using more commonly occurring examples.

My main comments are summarized below.  I will add them to the issue  
tracker.

Yolanda




1) Definitions of "Activity"/"Event"/"ProcessExecution" should be more  
crisp and differentiable

In Section 2.1, the distinction made between "activities" and "events"  
is very unclear.  They should be better differentiated, and more  
importantly they should be related to ProcessExecution which should  
also be better defined.  Examples should be given of all to illustrate  
the distinctions.


2) Definition and examples of "agent" should be clarified

In Section 3, "agents" are defined as capable of controlling a  
process.  This is a key concept, and I still have trouble with that  
definition.  In later examples you have the Royal Society, I think it  
is important that we explain that if Carol runs the process and works  
for the Royal Society it may be more important that the RS run it  
rather than Carol herself.  IMHO (and I brought this up at some call  
weeks ago), the notion of agent must be tied to a participating entity  
(as described in Section 5.3.8) who is noted in the provenance record  
to be accountable (or if that is too legalistic a term, one could say  
responsible) for the action.  In any case, the current definition  
should be better supported by examples, like the Royal society one.   
Also, section 4.1/4.2 has examples of agents but they are all people  
(all 5 of them), perhaps a good thing would be to broaden the example  
to illustrate better what can be considered agents.


3) Simplify the references to "recipe"

Section 4.1:  I'd suggest to add to the example a css to create a web  
page as a "recipe", it seems to me like a simple example of a recipe  
that everyone with a web background can understand.  It is also an  
example where noone would think of representing it formally (as  
opposed to a workflow or a series of steps).

Section 5.2.2.: recipe link is described as "a domain-specific  
description of the activity".  Not clear the recipe link needs to be a  
description (eg the CSS program above).


4) Improve the examples to make them more intuitive and of broader  
appeal

Section 4.2:  It seems to me we are using non-intuitive or incomplete  
notions in the examples, which will make our documents that much  
harder to be understood and therefore the standard adopted.  For  
instance, if evt1, evt2, etc are timestamps, why not label them t1,  
t2, etc so they don't have a label that makes them look like events?   
Another case: It says "A file is read by a process execution".  The  
fact that a file being read is a ProcessExecution seems to me to be a  
very contrived example (I don't think we've ever discussed a  
provenance scenario where file reading was considered, because there  
are other more pressing processes to represent).  Another case:  
Somewhere it mentions "spellchecked" as an attribute, if so we should  
really show how the spellchecker program plays a role in the  
provenance record so this attribute becomes so.  Another case: all the  
examples of agents are people, but agents can be other things (eg the  
Royal Society that is used in another section).  Perhaps using a  
couple of scenarios of broad interest, for example publishing a web  
page that has diverse and rich content, or an example with linked data.

5) Producing and delivering resources as part of provenance

Section 5.3.3: It says "affected by".  This is an important notion,  
that is part of the definition of provenance from the XG (which was:  
"Provenance of a resource is a record that describes entities and  
processes involved in producing and delivering or otherwise  
influencing that resource.", see http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance) 
.  I think this issue of how manipulating or delivering a resource can  
be part of the provenance should be emphasized earlier and in other  
sections of the document.

Received on Thursday, 20 October 2011 18:40:28 UTC