Comment on the DM from Khalid Belhajjame on 2012-02-24 (public-prov-wg@w3.org from February 2012)

From: Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>
Date: Fri, 24 Feb 2012 19:18:22 +0000
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>, Paolo Missier <Paolo.Missier@ncl.ac.uk>
CC: "<public-prov-wg@w3.org>" <public-prov-wg@w3.org>
Message-ID: <4F47E27E.2010907@cs.man.ac.uk>
Hi,

I read mainly Part-1, and briefly looked at Part-2.
I think that the simplification is on the right direction. I think 
however the part-1 can be further simplified by moving some definitions 
and details to part-2. I will give more details on this later on in the 
email.

Below are the comments.

- I think the title of part-2 is misleading as it does not contains only 
constraints but also definitions that are not present in part-1, and 
revise other definitions to provide more details, e.g., Entity. 
Therefore, I wonder if it would be better to rename part-1 and part-2. I 
couldn’t find better titles though. I thought of “core prov-dm” for 
part-1, and “extended prov-dm” for part 2, but that is not really what 
the two parts are about.

- ASN is used in part-1, but not introduced. A brief definition when it 
is used for the first time, for example, may be good.

- The first paragraph in Section 2.1, it is said that “provenance of 
Entities, that is of things in the world”. I am not sure that is the 
case, provenance of entities is not the same as provenance of things.

- In the same section 2.1, it is said that “The definition of agent 
intentionally stays away from using concepts such as enabling, causing, 
*initiating*, affecting…”. Isn’t wasStartedBy, which is defined in 
Section 4.2.2.2 is used to specify that an agent initiated the execution 
of an activity?

- The examples of generation and usage that are given in Section 2.2 are 
complicated. Although they are to give a precise definition of what 
generation and usage are by considering the time, e.g., “Examples of 
generation are the *completed* creation of a file by a program”. I think 
that at the stage it would be less confusing for the reader to simply 
know that the creation of a file is an example of generation.

- In Section 2.3, plan is used in the text without being introduced before.

- I have the impression that the diagram presented in Section 2.5 would 
be more useful if placed at the beginning of Section 2. Also, this 
diagram was not clear, i.e., the quality of the image is bad, when I 
printed it out on paper.

- The title of Section 3.2 “The Authors View” is confusing. A reader 
that is quickly browsing the document may think that this section gives 
the views of the prov-dm authors about the prov-dm document :-)

- In Section 4, first paragraph: “We revisit each concept *introduction* 
in Section 2” -> introduced

- In the definition of Entity in Section 4.1.1: “id: an identifier 
identifying an entity” -> “id: an entity identifier”.

- In the definition of Entity in Section 4.1.1: “attributes: an Optional 
set of attribute-value pairs *representing this entity’s situation in 
the world*” -> characterizing the thing that the entity represents. Or 
something in these lines.

- In the same section, the constraint that the set of Activities and 
Entities are disjoint is presented, later on in Section 4.1.2, this 
constraint is explained further. However, the explanation is based on 
details that are not present in part-1, but are presented later on in 
part-2, specifically that “an entity exists in full at any point in its 
lifetime, persists during this interval, and preserves the 
characteristics that makes it identifiable”. I would therefore suggests 
moving the discussion about the above constraint, i.e., that entities 
and activities are disjoint to the constraint document.

- In Section 4.2.1.1 Generation, it is said that “While each of the 
components activity, time, and attributes is Optional, at least one of 
them must be present”. I wonder if there is a straightforward way to 
encode this constraints in the serializations of prov-dm, in particular 
prov-o.

- In Section 4.2.3.1 Responsibility Chain, in the definition of 
actedOnBehalfOf, it is specified that activity can be optional. We need 
to add some details to specify what will be the semantics of 
actedOnBehalfOf when activity is not given as an argument, that is means 
that a given agent ag1 acts on behalf of another agent ag2 in all the 
activities that ag1 is involved in?

- Section 4.2.3.2 presents derivation. If the objective is to simplify 
part-1, then this section needs serious simplifications :-) In 
particular, there are three version of derivation precise-1, imprecise-n 
and imprecise-n. I was thinking of presenting only one, e.g., imprecise, 
without saying that it is imprecise, and giving more details about the 
different kinds of derivations in the constraint document. Also, I think 
traceability which is presented later on 5, is a first class relation, 
and therefore should be introduced when speaking about entity-entity 
relations in Section 4.2.3.

- Section 4.2.3.3 on Alternate and Specialization can be moved to 
part-2, since to grasp these relations one needs to have more details 
about what entity represents, which are given in part-2.

- Section 4.2 Relation, I think the order in which the subsections of 
this section are presented should be re-thinked. In particular, I have 
the impression that the reader would be interested to know about 
entity-entity relations, which are probably the most important relations 
in provenance, before getting to know what are the agent-activity and 
agent-agent relations.

- The table presented in Section 4.2 need some text that explains to the 
reader how it can be read.

Hope these comments will be of help, khalid
Received on Friday, 24 February 2012 19:18:45 UTC