Re: PROV-ISSUE-409 (prov-dm-review-LC): feedback on PROV-DM document (for last call release) [prov-dm]

Hi Luc, all.
I have gone through the latest draft. I think you and Paolo have done a
very good job. You will find my feedback below,
although I just realize that some of my points overlap with the ones posted
by Khalid.
Best,
Daniel.

Document reviewed:
http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120614/prov-dm.html

*Questions for the reviewers:*

   1. Can the document be released as a WD? *Yes, but take into account the
   comments about contextualization (I think it is not necessary)*
   2. Can the documen*t* be released as a last call WD? *Same response as
   the previous one.*
   3. Renaming wasRevisionOf to wasRevisedFrom? *+0. Both terms are ok.
   *
   4. Primitive datatypes. Do we have to list them all? *I think that for
   prov types makes sense, but table 8 is unnecessary.*

*Detailed feedback:*

Abstract:
    Provenance is information about entities, activities, and people,
involved in producing a piece of data or thing.
I think that:
    Provenance is information about entities, activities and people
involved in producing a piece of data or thing
is easier to read.

STATUS:
PROV-O is not an OWL-RL profile completely. The prov-o team decided to
relax a bit the constraints in order to be able to be closer to prov dm.

2. PROV Overview:
"A provenance description is an instance of a core and extended provenance
structure described below. "
    I don't understand very well this sentence. Provenance description is
referred to a lot of times along the document, but the only definition I
found was this sentence. Please clarify.
2.1 Prov Core structures
Typo: the title of figure 1 is not placed under the figure.

"In the Core of PROV, all relations are binary".
So we have 2 different relationships for qualifications? are used (a1, e1)
and used(a1, e1, [ prov:role="inpu1" ]) different?
This is what it looks when you read this line, and in DM both usages are
the same relation. Thus I suggest to remove the binary vs n-ary distinction.

2.1.1
"In PROV, things we want to describe the provenance of are called entities
and have some fixed aspect."
I would change sentence to "In PROV, things we want to describe the
provenance of are called entities.", since you are not saying what a "some
fixed aspect" is.

In usage definition, the activity could not have been affected by the
enitity. I remember discussing in PROV-O scenarios where an activity uses
the same entity twice with different roles to produce different results.
According to this definition, that scenario would not be possible.
Therefore I suggest to remove the "affected" part of the definition. This
also applies for section 5.1.4.

(After example 3)
"This is answered by considering that a single artifact may correspond to
several entities;"
We have not talked about artifact until now. I suggest to change "artifact"
with "thing", which has been already used befor in the examples.
We find the artifact reference again at the end of the paragraph : "This
breadth of provenance allows descriptions of interactions between physical
and digital artifacts"

2.2.2
Example 13 is not an example, it is just an explanation of the definition.
I suggest to rephrase it as an example (one could be having many provenance
descriptions about a resource, and a client that wants to see who did them
in order to filter them by creator or date of creation)

Section 3
"PROV-N optional arguments need not be specified:"
->
typo: PROV-N optional arguments need not TO be specified:

4.1
Caption of figure 2 is out of place. Also happens in figure 3 (section 4.2).

5
Table 5. I think there is a typo with Revision, Quotation and Primary
Source. They do not appear as relationships.

Figure 5: the title is not centered under the figure.

5.2 (Derivations).
The third component of PROV-DM is concerned with: derivations of entities
from others, ...
I suggest to change slightly the sentence:
The third component of PROV-DM is concerned with: derivations of entities
from other entities, ...

Caption of Figure 6 is not under the figure.
Caption of Figure 7 is not under the figure.

Typo in Example 34: instead of using prov:role, the example uses prov:type

Figure 8: the caption is not under the figure.

Figure 9: the caption is not under the figure.

5.5.3: Contextualization
I'm not entirely convinced with this section. It reminds me a lot to what
the accounts were trying to do, and we decided to leave accounts
out of the DM because they making the model complex.

Example 45, could be modeled without contextualization by just pinning the
rating to the activity and the agent. The rating tool could
just say: ex:Bob ex:hadGoodPerformanceIn ex:a1.

In example 46, I don't see why do I have to create a different entity for
asserting the triples about the rendering. I would just reuse
the entity I had in the first bundle. Since the triples are asserted in
different bundles, I can allways filter the statements about
an entity depending on the bundle it belongs to. I thought that the bundles
were there for doing that too.

>From a usability point of view, contextualization makes it difficult to
retrieve provenance information about an entity, IMO.

However I haven't been participating actively on the discussions about this
in the mailing list, so I don't disagree on having it.
I just think that it overcomplicates the model, and I don't think I'll use
it.

Figure 10: the caption is not under the figure.

Example 54: in the example, it should be Le louvre instead of Le Louvres,
right?

Note after Example 57: No, I don't think prov:encoding is necessary. That
is domain specific.

I don't think table 8 is necessary. It would be enough to say that the
prov:values are compatible with the xsd stanndard.

2012/6/17 Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>

> Hi,
>
> I read the new draft of the prov-dm. You will find my comments below.
> Regarding the question of the editors about conceptualization. I am no
> opposed to its presence in the DM, but its definition should be simplified
> substantially (see the comments below).
>
> Regards, khalid
>
> -----------
> - In the beginning of the document (PROC Family Specification), it is
> stated that "PROV-O, the PROV ontology, an OWL-RL ontology allowing the
> mapping of PROV to RDF". I am not sure that PROVO is entirely OWL-RL
> compliant. We have been using in PROVO the term OWL-RL++, because there are
> minor violation of OWL-RL in few places in the ontology.
>
> - In the Table of Content, the titles of Section 4.1 and Section 4.2 may
> need to be detailed a bit more. As they are, they are not informative at
> the level of the table of content, when the reader is browsing.
>
> - In the introduction, in the list that describes the components, there is
> a mismatch between this list and the components in the table of contents:
> according to the list in the introduction, component 2 is about agents and
> component 3 is about derivations, whereas according to the table of
> contents, component 2 is about derivations and component 3 is about agents.
>
> - Section 2 is supposed to be an overview, but it is quite long.
>
> - Section 2 makes the difference between binary and expanded relations. I
> am not sure this makes sense in the context of the DM. It was introduced in
> PROV-O, because the language we are using is not expressive enough for
> specifying n-ary relations in a natural way. This is not the case for
> PROV-DM, PROV-N allow expressing such relations without a problem. Also,
> reading the section on Expanded relations from the point of view of a
> reader who is not part of the working group, it seems that this is a source
> of confusion, and I don't see a real benefit from its presence in the DM.
>
> - In Section 2, when explaining "Usage", it is said that "Usage is the
> beginning of utilizing an entity by an activity. Before usage, the activity
> had not begun to utilize this entity and could not have been afected by the
> entity.". this statement does not hold when an entity is used multiple
> times by the same activity (e.g., to feed different parameters).
>
> - The discussion that follows Example 3, and explains that actually a car
> is used and that another car is generated at the end of the journey is a
> possible interpretation, and I don't think it is the more natural
> interpretation. A simpler interpretation that the reader may grasp quickly,
> is that the driving activity used a car, that's it. Not every activity
> needs to generate an entity.
>
> - In Section 2.1.3, first paragraph: "more trustworthy that that from a
> lobby organization" -> "more trustworthy than that from a lobby
> organization"
>
> - In Section 2.1.3, in the statement about Delegation, it may be worth
> specifying what is the scope of delegation, is the delegation valid for a
> given activity or all activities carried out by the agent.
>
> - In example 13, "[...] but also determine who its provenance is
> attributed to [...]". This sentence implies that an agent is always a
> human. "who" can be replaced by "the agent" to avoid confusion.
>
> - The column "Core Structures", in Table 3, is confusing. components 1, 2
> and 3 do not contain only core concepts.
>
> - In the UML diagram in Figure 5, as well as in other UML diagrams,
> "attributes" is defined as a filed for Entity, Activity and others. Looking
> just as the UML diagram, the reader may think that there is a filed called
> attributes!
>
> - In the definition of communication, Section 5.1.5, it is stated that
> "Communication is the exchange of an unspecified entity". Why do we require
> that the entity should be unspecified. Aren't we restricting who may want
> to specify the entity (or entities) exchanged between two activities to be
> specified. I would suggest to rephrase that sentence in the  following
> lines "Communication is the exchange of an entity that may be unspecified".
>
> - I notice that Invalidation (Section 5.1.8), is not present in Figure 5.
>
> - In section 5.2 (Component 2: Derivations), the first sentence in this
> section says "The third component".
>
> - I find the definition of "Primary Source", hard to follow. Can we
> simplify it?
>
> - In the definition of delegation, the activity is an optional argument.
> What is the semantics of delegation when the activity is not specified. I
> suspect that it means that the activity for which the delegation holds is
> unknown. However, the reader may think that the delegation hold for all the
> activities that are carried out by the agent in question.
>
> - The first paragraph, 3rd sentence, in Section 5.4, "It comprises a
> Bundle class and a subclass of Entity"-> "It specifies that Bundle is a
> sub-class of Entity".
>
> - The first sentence in Example 40 states that "A provenance aggregator
> could merge two bundles". the verb merge has a strong semantics that does
> not applies in this case. I think we could simply say "could union"?
>
> - Section 5.5.3 on contextualization is difficult to follow. The third
> paragraph in this section states that "A bundle's description provide a
> context in which to interpret an entity in a domain-specific manner".  This
> is not reflected in the definition of bundle, which form my understanding,
> aggregate a number of provenance descriptions that happen (by accident) to
> be in a bundle, e.g., a file. The notion of context and domain dependency
> introduced in contextualization seems to assume that a bundle contains
> provenance description within the bundle are domain dependent and that they
> have been specified within a given context. The notion of context is also
> loose, and cam mean different things to different people.
>
> Now, looking at example 45, it may be that what the first paragraphs in
> Section 5.5.3 are misleading, and that the purpose is to have something
> simple. If the objective is basically to specify that a given entity e1 is
> a specialization of another entity e2 and to be able to locate the bundle
> in which e2 is described, then we should just do that. In other words, we
> should use "specializationOf", and add a construct that specify the bundle
> in which a given entity is described, e.g., isDescribedIn(e2,bundle2)?
>
> Therefore, to answer the question that the editor asked regarding
> contextualization, I do not oppose its presence in the DM, but I think it
> definition should be simplified substantially to reflect the way it will be
> used in practice. I would also urge the editors to avoid using the term
> contextualization as it is vague.
>
> - In section 5.6.1, it is stated that collection is a multiset because it
> may not be possible to verify that two distinct entity identifiers do not
> denote the same entity. This is one reason, but not the main one.
> Collection is a general contruct, and we should allow people to contruct
> collections that contains duplicate entities with different or same
> identifiers.
>
>
> On 14 June 2012 12:07, Provenance Working Group Issue Tracker <
> sysbot+tracker@w3.org> wrote:
>
>> PROV-ISSUE-409 (prov-dm-review-LC): feedback on PROV-DM document (for
>> last call release) [prov-dm]
>>
>> http://www.w3.org/2011/prov/track/issues/409
>>
>> Raised by: Luc Moreau
>> On product: prov-dm
>>
>>
>> This is the issue to collect feedback on the prov-dm document.
>>
>> Document to review is available from:
>>
>>
>> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120614/prov-dm.html
>>
>> Question for reviewers:
>> http://www.w3.org/2011/prov/wiki/Meetings:Telecon2012.06.14
>>
>> Cheers,
>> Luc
>>
>>
>>
>>
>

Received on Monday, 18 June 2012 15:13:14 UTC