Re: PROV-ISSUE-409 (prov-dm-review-LC): feedback on PROV-DM document (for last call release) [prov-dm] from Khalid Belhajjame on 2012-06-17 (public-prov-wg@w3.org from June 2012)

From: Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>
Date: Sun, 17 Jun 2012 14:34:25 +0100
To: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <CAANah+E3oGDmGNupfgreJuf8k0wrGXMaXyb8yXh4ff5sXbJ2jQ@mail.gmail.com>
Hi,

I read the new draft of the prov-dm. You will find my comments below.
Regarding the question of the editors about conceptualization. I am no
opposed to its presence in the DM, but its definition should be simplified
substantially (see the comments below).

Regards, khalid

-----------
- In the beginning of the document (PROC Family Specification), it is
stated that "PROV-O, the PROV ontology, an OWL-RL ontology allowing the
mapping of PROV to RDF". I am not sure that PROVO is entirely OWL-RL
compliant. We have been using in PROVO the term OWL-RL++, because there are
minor violation of OWL-RL in few places in the ontology.

- In the Table of Content, the titles of Section 4.1 and Section 4.2 may
need to be detailed a bit more. As they are, they are not informative at
the level of the table of content, when the reader is browsing.

- In the introduction, in the list that describes the components, there is
a mismatch between this list and the components in the table of contents:
according to the list in the introduction, component 2 is about agents and
component 3 is about derivations, whereas according to the table of
contents, component 2 is about derivations and component 3 is about agents.

- Section 2 is supposed to be an overview, but it is quite long.

- Section 2 makes the difference between binary and expanded relations. I
am not sure this makes sense in the context of the DM. It was introduced in
PROV-O, because the language we are using is not expressive enough for
specifying n-ary relations in a natural way. This is not the case for
PROV-DM, PROV-N allow expressing such relations without a problem. Also,
reading the section on Expanded relations from the point of view of a
reader who is not part of the working group, it seems that this is a source
of confusion, and I don't see a real benefit from its presence in the DM.

- In Section 2, when explaining "Usage", it is said that "Usage is the
beginning of utilizing an entity by an activity. Before usage, the activity
had not begun to utilize this entity and could not have been afected by the
entity.". this statement does not hold when an entity is used multiple
times by the same activity (e.g., to feed different parameters).

- The discussion that follows Example 3, and explains that actually a car
is used and that another car is generated at the end of the journey is a
possible interpretation, and I don't think it is the more natural
interpretation. A simpler interpretation that the reader may grasp quickly,
is that the driving activity used a car, that's it. Not every activity
needs to generate an entity.

- In Section 2.1.3, first paragraph: "more trustworthy that that from a
lobby organization" -> "more trustworthy than that from a lobby
organization"

- In Section 2.1.3, in the statement about Delegation, it may be worth
specifying what is the scope of delegation, is the delegation valid for a
given activity or all activities carried out by the agent.

- In example 13, "[...] but also determine who its provenance is attributed
to [...]". This sentence implies that an agent is always a human. "who" can
be replaced by "the agent" to avoid confusion.

- The column "Core Structures", in Table 3, is confusing. components 1, 2
and 3 do not contain only core concepts.

- In the UML diagram in Figure 5, as well as in other UML diagrams,
"attributes" is defined as a filed for Entity, Activity and others. Looking
just as the UML diagram, the reader may think that there is a filed called
attributes!

- In the definition of communication, Section 5.1.5, it is stated that
"Communication is the exchange of an unspecified entity". Why do we require
that the entity should be unspecified. Aren't we restricting who may want
to specify the entity (or entities) exchanged between two activities to be
specified. I would suggest to rephrase that sentence in the  following
lines "Communication is the exchange of an entity that may be unspecified".

- I notice that Invalidation (Section 5.1.8), is not present in Figure 5.

- In section 5.2 (Component 2: Derivations), the first sentence in this
section says "The third component".

- I find the definition of "Primary Source", hard to follow. Can we
simplify it?

- In the definition of delegation, the activity is an optional argument.
What is the semantics of delegation when the activity is not specified. I
suspect that it means that the activity for which the delegation holds is
unknown. However, the reader may think that the delegation hold for all the
activities that are carried out by the agent in question.

- The first paragraph, 3rd sentence, in Section 5.4, "It comprises a Bundle
class and a subclass of Entity"-> "It specifies that Bundle is a sub-class
of Entity".

- The first sentence in Example 40 states that "A provenance aggregator
could merge two bundles". the verb merge has a strong semantics that does
not applies in this case. I think we could simply say "could union"?

- Section 5.5.3 on contextualization is difficult to follow. The third
paragraph in this section states that "A bundle's description provide a
context in which to interpret an entity in a domain-specific manner".  This
is not reflected in the definition of bundle, which form my understanding,
aggregate a number of provenance descriptions that happen (by accident) to
be in a bundle, e.g., a file. The notion of context and domain dependency
introduced in contextualization seems to assume that a bundle contains
provenance description within the bundle are domain dependent and that they
have been specified within a given context. The notion of context is also
loose, and cam mean different things to different people.

Now, looking at example 45, it may be that what the first paragraphs in
Section 5.5.3 are misleading, and that the purpose is to have something
simple. If the objective is basically to specify that a given entity e1 is
a specialization of another entity e2 and to be able to locate the bundle
in which e2 is described, then we should just do that. In other words, we
should use "specializationOf", and add a construct that specify the bundle
in which a given entity is described, e.g., isDescribedIn(e2,bundle2)?

Therefore, to answer the question that the editor asked regarding
contextualization, I do not oppose its presence in the DM, but I think it
definition should be simplified substantially to reflect the way it will be
used in practice. I would also urge the editors to avoid using the term
contextualization as it is vague.

- In section 5.6.1, it is stated that collection is a multiset because it
may not be possible to verify that two distinct entity identifiers do not
denote the same entity. This is one reason, but not the main one.
Collection is a general contruct, and we should allow people to contruct
collections that contains duplicate entities with different or same
identifiers.


On 14 June 2012 12:07, Provenance Working Group Issue Tracker <
sysbot+tracker@w3.org> wrote:

> PROV-ISSUE-409 (prov-dm-review-LC): feedback on PROV-DM document (for last
> call release) [prov-dm]
>
> http://www.w3.org/2011/prov/track/issues/409
>
> Raised by: Luc Moreau
> On product: prov-dm
>
>
> This is the issue to collect feedback on the prov-dm document.
>
> Document to review is available from:
>
>
> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120614/prov-dm.html
>
> Question for reviewers:
> http://www.w3.org/2011/prov/wiki/Meetings:Telecon2012.06.14
>
> Cheers,
> Luc
>
>
>
>
Received on Sunday, 17 June 2012 13:34:54 UTC