Re: Comments on the current data model from Luc Moreau on 2011-10-10 (public-prov-wg@w3.org from October 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Mon, 10 Oct 2011 17:05:48 +0100
To: public-prov-wg@w3.org
Message-ID: <EMEW3|20a2bd87c82ef6ddbad6f617bcbe55c0n99H5q08L.Moreau|ecs.soton.ac.uk|4E9317DC>
Hi Simon,

Thanks for your feedback.  Here is a first set of responses to your 
comments.
Our responses are interleaved.  Changes, where indicated, have already been
committed.

There is still a list of comments to address (marked with TODO)

Cheers,
Luc



 >Luc, Paolo,
 >
 >Here's my comments on the current data model document, annotated with
 >(T) for typo/text clarity or (C) for content comment/question. I think
 >most/all comments are small enough that an issue need not be raised.
 >
 >Throughout:
 >(T) Sections are referred to in the text by "Section Entity", "Section
 >Process Execution" etc. Shouldn't these be the section numbers?

TODO when stable

 >(T) There seems to be inconsistency in symbols following the change
 >from roles to qualifiers. Sometimes "q" is used in constraint
 >definitions, examples etc. and sometimes "r" is used. I suggest it
 >would be clearer to always use "q".

Done (hopefully everywhere)

 >(T) There are a few "characterised" in amongst the majority
 >"characterized" spelling.

Done

 >(C) At least one standard qualifier name, "role", is used in the
 >document, but it is not clear what namespace this name is in. Does it
 >mean no other "role"s from domain-specific ontologies may be used in
 >Prov data?

Added section 1.2 explicitly declaring PROV-DM namespace.
+ added that role is declared in that namespace.

 >
 >Sec 2.1:
 >(T) paragraph 1: "Words such thing or activity" should be "Words such
 >as 'thing' or 'activity'"

Done

 >(C) paragraph 2: The first mention of "provenance" in the document
 >proper is in the second paragraph of this section, and is a bit out of
 >the blue ("unambiguously report provenance"). Can we add some
 >intuition about what provenance is (for this data model)?


Now, provenance is introduced in section 1.


 >(T) Example paragraph 1: "perspectives about a resource" should be
 >"perspectives on a resource"

Done

 >(C) Example paragraph 1: "the report independent of where it is hosted
 >over time" - I suggest also saying "and of its content over time", to
 >distinguish this entity from the report version entity above it

Done

===================
 >(C) paragraph 6: "punctual events"? "punctual" as most commonly used
 >implies prior planning of when something should occur. I'm not sure
 >what you are intending in this context.

???? I don't understand
PM "instantaneous"?
Luc OK
===================

===================
 >(C) paragraph 6: "a partial order exists between events". I assume you
 >mean a temporal order? What kind of ordering do you mean?

... partial ;-) .... between events...
What is the issue?

PM agree, statement seems clear to me.
it's an order amongst events, not instants in time
it's partial: you can't always say ev1 before ev2

LUC: No change then
===================



===================
 >(C) paragraph 6: "global notion of time and Lamport's style clocks" -
 >this seems like a weirdly specific level of detail for this overview
 >section, especially considering that many other aspects of the model
 >are not mentioned at all in the overview.

Given that time is so critical and the object of several issues, it's
important to state our assumptions.

PM I think you wrote it correctly, but also that we are not stating any 
assumptions: you say that this is out of scope, and point to a possible 
frame of reference. I see no problems
===================


 >
 >Sec 2.3:
 >(C) Regarding the note (not attempting to ensure consistency of an
 >asserter) - this seems practical. I'm not sure how we could enforce
 >consistency in any circumstance, only define what it means or say it
 >is application specific.

Rephrased.

 >
 >Sec 4.1:
 >(T) "We denote this e1." and the same for e2 etc. It is not entirely
 >clear whether "this" refers to the event or the entity.
 >

TODO

 >Sec 4.2:
 >(C) The fact that Alice is the creator of e1 seems to be expressed
 >twice, first as an attribute "creator=Alice", and secondly as the
 >"creator" role of an agent in the creation process. I don't think it
 >is a good idea for either clarity of use of the model or for ensuring
 >interoperability for there to be multiple ways to express the same
 >thing, if it can be at all avoided. Even if we cannot stop someone
 >using either method, can't we say which they *should* use to aid
 >interoperability?

TODO

 >(T) "Generation expressions... represent the event at which a file is
 >created". The surrounding text is generic rather than specific to the
 >example, implying this should be "entity" rather than "file",
 >Otherwise, readers may assume that all entities are files or that
 >generation only applies to files.

TODO

 >(T) Paragraph on wasComplementOf: in "attribute content" and
 >"attribute spellchecked", fixed width font (or another font) should be
 >used for the attribute names to show they are names, else the sentence
 >can be read in strange ways.

Done.

 >
 >Sec 4.3:
 >(T) Fig 1: The arrow from pe2 to a3 is a different direction to the
 >other "agent" links. It is also not clear if an "agent" link is the
 >same as a "wasControlledBy" link. If so, the pe2-a3 arrow direction
 >makes most sense, as the others seem to be saying the agent was
 >controlled by the process execution.
 >

TODO

 >Sec 5.1:
 >(T) The last sentence, regarding a "house-keeping construct" is rather
 >opaque. I'm not sure what the reader is supposed to understand from
 >this.
 >

Rewritten

 >Sec 5.2.1:
 >(C) First sentence: "entity expression" is given exactly the same
 >definition that "entity" was in Section 4. I think having two terms
 >for the same thing will cause confusion. I like addition of
 >"expressions" to the model in general, though, as I think this greatly
 >clarifies what is intended.

TODO: well the issue is that in the ER diagram, we really talk about 
Entity Expressions, not Entities?
Given that we are about to define entity as characterized thing, should 
the ER diagram change?

 >(C) "the meaning of attribute in the context of a process execution
 >expression is similar to the meaning of attribute for entity
 >expression" - I think the meaning should be exactly the same, not just
 >similar, else there will be confusion.

Replaced similar by same, and replaced meaning by interpretation.


 >(C) Following from the above point: "A process execution expression's
 >attribute remains constant for the duration of the activity" - OK, but
 >does it also characterise the process execution, e.g. is the start
 >time part of what distinguishes one execution from others?

TODO: what if we say that remain constant and characterize the activity.
What's the implication?  What if we don't?

 >(T) "noted processExecution" - I think you mean "denoted" (or
 >"written" or "expressed")

Done, replaced by written

 >
 >Sec 5.2.3:
 >(T) "representation a characterized thing" - missing "of"

Done

 >(T) Last sentence, "On the contrary" should be "On the other hand",
 >and "inferred" should be "infer"

Done
 >
 >Sec 5.2.4:
 >(T) Last sentence: "expectede"

Done

===================
 >Sec 5.3.3.1:
 >(C) I suggest that, as accounts are not introduced until later in the
 >document, the generation-unicity constraint will not make sense here.
 >Moreover, I think the constraint is more about accounts and what it
 >means for them to be consistent than it is about generation events or
 >process executions. Therefore, I suggest moving this constraint to the
 >section on accounts.


I am not sure I agree. I think it says a lot about generation, since
provenance assertions are always in accounts (even if it is a default
account of the provenance container).

PM: the account is mentioned for accuracy of definition here. If you 
din't know about accounts, then this would just be correct without 
qualification.
So I would leave it as is.

Luc: OK
===================


===================
 >(C) Given that constraint derivation-events applies, don't we just
 >have two ways of saying the same thing? Why use the long form of
 >wasDerivedFrom when the same can be expressed using wasGeneratedBy and
 >used? Which variety *should* be used?


It's not an equivalence, it's an implication.  We don't have two ways
of saying the same thing.

PM agree with you but also this set of constraints may come across as 
odd: there is derivation-use but should there be derviation-generation 
as well?

Luc: the text below derivation-use says that the symmetric inference 
does not hold.

also can we put derivation-attributes first: it is the one that defines 
the meaning of derivation.

Luc: Done
===================

 >
 >Sec 5.3.3.2:
 >(T?) Constraint "derivation-linked-independent" seems to be a
 >tautology. I guess this is a typo?

Fixed

 >
 >Sec 5.3.3.3:
 >(T) Paragraph 4: "In other word" should be "In other words"
 >

Done

 >Sec 5.3.4:
 >(C) This section seems to be confusingly expressed, implying that
 >non-agent entities can control executions, whereas the control-agent
 >constraint (in the section on agents) contradicts this. It is probably
 >just a matter of clarifying the text, e.g. if you mean that a
 >non-agent entity can be asserted to be controlling an execution but
 >from this inferred to be an agent.

TODO

===================
 >(T) The text may be read to imply that a control link has only one
 >qualifier, role, whereas I guess you mean that, like use/generate, it
 >can have multiple "modalities" as part of the qualifier?

TODO: what other meaningful qualifier could we use for control?

PM don't kow but for consistency I think we should add them

Luc: I was thinking of properties such as synchronous/asynchronous 
control. Do they make sense?
===================


===================
 >Sec 5.3.5:
 >(C) I can see this section causing some difficulty... While that may
 >just be the nature of the topic, there seems an important thing
 >missing: what has complementarity got to do with provenance? In other
 >words, what value (with regards to provenance) is there in asserting
 >complementarity?

TODO

PM we discussed this for ages. It should be there. to me this is about a 
formal definition of how entities can be compared across accounts
PM there was a lot of noise of changing the term "complement-of"  have 
we ever considered that??

Luc: We should put a note for now.
===================

 >(C) The text suddenly starts talking about "properties" from the
 >second paragraph. What are these, and do they have any relation to
 >attributes?

We had agreed it should be 'attribute(s)'. Text udpated.


===================
 >(C) Should the justification of why the complementarity relation is
 >not transitive be in this document? I would expect this document to
 >just state that it is not transitive and, for brevity and simplicity,
 >leave justifications to another document.

At this time, there is no such other document. It also brings intuition.
So, no change.


PM agree, you can't just state it's not transitive. We have had a long 
discussion on this which indicates that readers would be puzzled if 
there was no justification

===================


 >
 >Sec 5.3.6:
 >(C) Similarly to above, I'm not sure the justification of why
 >wasInformedBy is not transitive should be in this document.

Same.

===================
 >Sec 5.3.8:
 >(C) Constraint participation: This seems odd to me. In what
 >circumstances would you not know or want to assert which of the three
 >possibilities (used/controlled/complement) applied for a given entity
 >and execution? Is hadParticipant as defined really useful?

I am not a fan of it.  This said, it's one of those "extensions" and
should probably be moved to section 7.

PM yes. it /is/ odd.  but keep in mind that the OWL group is including it

Luc: I think it's intresting for complemenetOf. So you can say that e0 
participated in pe1, I think,  in the example.
===================

 >
 >Sec 5.3.9:
 >(C) Grammar definition: I don't understand what the
 >"relationIdentification" stuff is about or what all the identifiers
 >identify.

Grammar revisited. Example extended. Explanation provided.
Not sure the grammar allows for all forms of relations to be captured.

===================
 >Sec 5.4.1:
 >(C) This appears to be yet another way to say the same thing,
 >following the comment on Sec 4.2 above. If A is an "asserter" of
 >expression E, then we can either (i) express E to be an entity and use
 >an attribute "asserter=E"; (ii) express E to be an entity and A to be
 >an agent playing "role=asserter"; or (iii) put A in the "asserter"
 >slot of an "account" expression containing E. Why do we need all three
 >ways? Isn't method (ii) most consistent with the rest of the model?


If the WG support the idea that the asserter should be an agent, that
we'll go for it.

PM just leave the note there
===================

 >
 >Sec 5.4.2:
 >(T) Second sentence: "return all the provenance assertions" - all the
 >assertions? or just "all the assertions in the container"?

.. to return assertions. Changed.

 >(C) Under the definition given, you cannot have expressions in a
 >container but not in an account. Does this imply that every Prov
 >expression is made accessible as part of an account? I think this
 >would be a good thing for clarity, but it is not explicit in the
 >document (and also differs from OPM).

Added sentence: Consequently, every provenance expression is always
expressed in the context of an account, either explicitly in an
asserted account, or implicitly in a container's default account.

 >
 >Section 5.5.1:
 >(C) I agree with the first note. If it is mandatory to say something
 >but that what we say can be nothing, that means that it is not
 >mandatory at all. The "mandatory" thing seems to be just saying
 >something about the ASN, and so is irrelevant as the ASN is just there
 >to make the model concrete and readable.

TODO: should we change to MAY contain a qualifier. A qualifier is a non 
empty sequence ...

 >
 >Sec 5.5.4:
 >(C) Second note: Wouldn't this mean that either account IDs or entity
 >IDs can never be URIs, as a sequence of URIs would itself not be a
 >URI? If so, that seems to make RDF serialisation difficult to achieve.

TODO.

Why do we need to have a URI for a qualified identifier?

Why would this make serialization to RDF difficult. We are proposing for
a qualified identifier to be only usable in wasComplementOf.


 >
 >Sec 5.5.6:
 >(C) I don't see the connection between the section's introductory text
 >and the content of the subsections.

TODO. It's unclear where this content should go.

 >
 >Sec 5.7.1:
 >(C) I think this section needs something introductory to say why it is
 >relevant to the data model, i.e. what has it to do with provenance,
 >why is it useful in the context of provenance, why is it standardised
 >rather than application-specific?

TODO.(now 7.1)

 >(C) If my record of what occurred does not start with an empty
 >container, but one with contents, how do I say that the elements are
 >part of the container? Do I have to model this as a series of
 >wasAddedTo links, even if I know nothing about how the elements were
 >added? Or is it out of scope of the standard?


TODO: macro expand ...

 >
 >Sec 5.7.2:
 >(C) I don't see how wasQuoteOf is a sub-relation of wasRevisionOf, or
 >wasAttributedTo a sub-relation of wasEventuallyDerivedFrom, when the
 >super-relations do not contain reference to any agents but the
 >sub-relations do. What does it mean?


Update as an implication


 >(T) Last sentence of 5.7.2.2: "wasQuoteOf" should be "wasAttributedTo"

Update (but is in a note for now)

 >
 >Thanks,
 >Simon
 >
 >-- Dr Simon Miles Lecturer, Department of Informatics Kings College 
London, WC2R 2LS, UK +44 (0)20 7848 1166
 >


On 09/24/2011 05:55 PM, Simon Miles wrote:
> Luc, Paolo,
>
> Here's my comments on the current data model document, annotated with
> (T) for typo/text clarity or (C) for content comment/question. I think
> most/all comments are small enough that an issue need not be raised.
>
> Throughout:
> (T) Sections are referred to in the text by "Section Entity", "Section
> Process Execution" etc. Shouldn't these be the section numbers?
> (T) There seems to be inconsistency in symbols following the change
> from roles to qualifiers. Sometimes "q" is used in constraint
> definitions, examples etc. and sometimes "r" is used. I suggest it
> would be clearer to always use "q".
> (T) There are a few "characterised" in amongst the majority
> "characterized" spelling.
> (C) At least one standard qualifier name, "role", is used in the
> document, but it is not clear what namespace this name is in. Does it
> mean no other "role"s from domain-specific ontologies may be used in
> Prov data?
>
> Sec 2.1:
> (T) paragraph 1: "Words such thing or activity" should be "Words such
> as 'thing' or 'activity'"
> (C) paragraph 2: The first mention of "provenance" in the document
> proper is in the second paragraph of this section, and is a bit out of
> the blue ("unambiguously report provenance"). Can we add some
> intuition about what provenance is (for this data model)?
> (T) Example paragraph 1: "perspectives about a resource" should be
> "perspectives on a resource"
> (C) Example paragraph 1: "the report independent of where it is hosted
> over time" - I suggest also saying "and of its content over time", to
> distinguish this entity from the report version entity above it
> (C) paragraph 6: "punctual events"? "punctual" as most commonly used
> implies prior planning of when something should occur. I'm not sure
> what you are intending in this context.
> (C) paragraph 6: "a partial order exists between events". I assume you
> mean a temporal order? What kind of ordering do you mean?
> (C) paragraph 6: "global notion of time and Lamport's style clocks" -
> this seems like a weirdly specific level of detail for this overview
> section, especially considering that many other aspects of the model
> are not mentioned at all in the overview.
>
> Sec 2.3:
> (C) Regarding the note (not attempting to ensure consistency of an
> asserter) - this seems practical. I'm not sure how we could enforce
> consistency in any circumstance, only define what it means or say it
> is application specific.
>
> Sec 4.1:
> (T) "We denote this e1." and the same for e2 etc. It is not entirely
> clear whether "this" refers to the event or the entity.
>
> Sec 4.2:
> (C) The fact that Alice is the creator of e1 seems to be expressed
> twice, first as an attribute "creator=Alice", and secondly as the
> "creator" role of an agent in the creation process. I don't think it
> is a good idea for either clarity of use of the model or for ensuring
> interoperability for there to be multiple ways to express the same
> thing, if it can be at all avoided. Even if we cannot stop someone
> using either method, can't we say which they *should* use to aid
> interoperability?
> (T) "Generation expressions... represent the event at which a file is
> created". The surrounding text is generic rather than specific to the
> example, implying this should be "entity" rather than "file",
> Otherwise, readers may assume that all entities are files or that
> generation only applies to files.
> (T) Paragraph on wasComplementOf: in "attribute content" and
> "attribute spellchecked", fixed width font (or another font) should be
> used for the attribute names to show they are names, else the sentence
> can be read in strange ways.
>
> Sec 4.3:
> (T) Fig 1: The arrow from pe2 to a3 is a different direction to the
> other "agent" links. It is also not clear if an "agent" link is the
> same as a "wasControlledBy" link. If so, the pe2-a3 arrow direction
> makes most sense, as the others seem to be saying the agent was
> controlled by the process execution.
>
> Sec 5.1:
> (T) The last sentence, regarding a "house-keeping construct" is rather
> opaque. I'm not sure what the reader is supposed to understand from
> this.
>
> Sec 5.2.1:
> (C) First sentence: "entity expression" is given exactly the same
> definition that "entity" was in Section 4. I think having two terms
> for the same thing will cause confusion. I like addition of
> "expressions" to the model in general, though, as I think this greatly
> clarifies what is intended.
> (C) "the meaning of attribute in the context of a process execution
> expression is similar to the meaning of attribute for entity
> expression" - I think the meaning should be exactly the same, not just
> similar, else there will be confusion.
> (C) Following from the above point: "A process execution expression's
> attribute remains constant for the duration of the activity" - OK, but
> does it also characterise the process execution, e.g. is the start
> time part of what distinguishes one execution from others?
> (T) "noted processExecution" - I think you mean "denoted" (or
> "written" or "expressed")
>
> Sec 5.2.3:
> (T) "representation a characterized thing" - missing "of"
> (T) Last sentence, "On the contrary" should be "On the other hand",
> and "inferred" should be "infer"
>
> Sec 5.2.4:
> (T) Last sentence: "expectede"
>
> Sec 5.3.3.1:
> (C) I suggest that, as accounts are not introduced until later in the
> document, the generation-unicity constraint will not make sense here.
> Moreover, I think the constraint is more about accounts and what it
> means for them to be consistent than it is about generation events or
> process executions. Therefore, I suggest moving this constraint to the
> section on accounts.
> (C) Given that constraint derivation-events applies, don't we just
> have two ways of saying the same thing? Why use the long form of
> wasDerivedFrom when the same can be expressed using wasGeneratedBy and
> used? Which variety *should* be used?
>
> Sec 5.3.3.2:
> (T?) Constraint "derivation-linked-independent" seems to be a
> tautology. I guess this is a typo?
>
> Sec 5.3.3.3:
> (T) Paragraph 4: "In other word" should be "In other words"
>
> Sec 5.3.4:
> (C) This section seems to be confusingly expressed, implying that
> non-agent entities can control executions, whereas the control-agent
> constraint (in the section on agents) contradicts this. It is probably
> just a matter of clarifying the text, e.g. if you mean that a
> non-agent entity can be asserted to be controlling an execution but
> from this inferred to be an agent.
> (T) The text may be read to imply that a control link has only one
> qualifier, role, whereas I guess you mean that, like use/generate, it
> can have multiple "modalities" as part of the qualifier?
>
> Sec 5.3.5:
> (C) I can see this section causing some difficulty... While that may
> just be the nature of the topic, there seems an important thing
> missing: what has complementarity got to do with provenance? In other
> words, what value (with regards to provenance) is there in asserting
> complementarity?
> (C) The text suddenly starts talking about "properties" from the
> second paragraph. What are these, and do they have any relation to
> attributes?
> (C) Should the justification of why the complementarity relation is
> not transitive be in this document? I would expect this document to
> just state that it is not transitive and, for brevity and simplicity,
> leave justifications to another document.
>
> Sec 5.3.6:
> (C) Similarly to above, I'm not sure the justification of why
> wasInformedBy is not transitive should be in this document.
>
> Sec 5.3.8:
> (C) Constraint participation: This seems odd to me. In what
> circumstances would you not know or want to assert which of the three
> possibilities (used/controlled/complement) applied for a given entity
> and execution? Is hadParticipant as defined really useful?
>
> Sec 5.3.9:
> (C) Grammar definition: I don't understand what the
> "relationIdentification" stuff is about or what all the identifiers
> identify.
>
> Sec 5.4.1:
> (C) This appears to be yet another way to say the same thing,
> following the comment on Sec 4.2 above. If A is an "asserter" of
> expression E, then we can either (i) express E to be an entity and use
> an attribute "asserter=E"; (ii) express E to be an entity and A to be
> an agent playing "role=asserter"; or (iii) put A in the "asserter"
> slot of an "account" expression containing E. Why do we need all three
> ways? Isn't method (ii) most consistent with the rest of the model?
>
> Sec 5.4.2:
> (T) Second sentence: "return all the provenance assertions" - all the
> assertions? or just "all the assertions in the container"?
> (C) Under the definition given, you cannot have expressions in a
> container but not in an account. Does this imply that every Prov
> expression is made accessible as part of an account? I think this
> would be a good thing for clarity, but it is not explicit in the
> document (and also differs from OPM).
>
> Section 5.5.1:
> (C) I agree with the first note. If it is mandatory to say something
> but that what we say can be nothing, that means that it is not
> mandatory at all. The "mandatory" thing seems to be just saying
> something about the ASN, and so is irrelevant as the ASN is just there
> to make the model concrete and readable.
>
> Sec 5.5.4:
> (C) Second note: Wouldn't this mean that either account IDs or entity
> IDs can never be URIs, as a sequence of URIs would itself not be a
> URI? If so, that seems to make RDF serialisation difficult to achieve.
>
> Sec 5.5.6:
> (C) I don't see the connection between the section's introductory text
> and the content of the subsections.
>
> Sec 5.7.1:
> (C) I think this section needs something introductory to say why it is
> relevant to the data model, i.e. what has it to do with provenance,
> why is it useful in the context of provenance, why is it standardised
> rather than application-specific?
> (C) If my record of what occurred does not start with an empty
> container, but one with contents, how do I say that the elements are
> part of the container? Do I have to model this as a series of
> wasAddedTo links, even if I know nothing about how the elements were
> added? Or is it out of scope of the standard?
>
> Sec 5.7.2:
> (C) I don't see how wasQuoteOf is a sub-relation of wasRevisionOf, or
> wasAttributedTo a sub-relation of wasEventuallyDerivedFrom, when the
> super-relations do not contain reference to any agents but the
> sub-relations do. What does it mean?
> (T) Last sentence of 5.7.2.2: "wasQuoteOf" should be "wasAttributedTo"
>
> Thanks,
> Simon
>
>    

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Monday, 10 October 2011 16:06:33 UTC