- From: Simon Miles <simon.miles@kcl.ac.uk>
- Date: Sat, 24 Sep 2011 17:55:41 +0100
- To: Provenance Working Group WG <public-prov-wg@w3.org>
Luc, Paolo, Here's my comments on the current data model document, annotated with (T) for typo/text clarity or (C) for content comment/question. I think most/all comments are small enough that an issue need not be raised. Throughout: (T) Sections are referred to in the text by "Section Entity", "Section Process Execution" etc. Shouldn't these be the section numbers? (T) There seems to be inconsistency in symbols following the change from roles to qualifiers. Sometimes "q" is used in constraint definitions, examples etc. and sometimes "r" is used. I suggest it would be clearer to always use "q". (T) There are a few "characterised" in amongst the majority "characterized" spelling. (C) At least one standard qualifier name, "role", is used in the document, but it is not clear what namespace this name is in. Does it mean no other "role"s from domain-specific ontologies may be used in Prov data? Sec 2.1: (T) paragraph 1: "Words such thing or activity" should be "Words such as 'thing' or 'activity'" (C) paragraph 2: The first mention of "provenance" in the document proper is in the second paragraph of this section, and is a bit out of the blue ("unambiguously report provenance"). Can we add some intuition about what provenance is (for this data model)? (T) Example paragraph 1: "perspectives about a resource" should be "perspectives on a resource" (C) Example paragraph 1: "the report independent of where it is hosted over time" - I suggest also saying "and of its content over time", to distinguish this entity from the report version entity above it (C) paragraph 6: "punctual events"? "punctual" as most commonly used implies prior planning of when something should occur. I'm not sure what you are intending in this context. (C) paragraph 6: "a partial order exists between events". I assume you mean a temporal order? What kind of ordering do you mean? (C) paragraph 6: "global notion of time and Lamport's style clocks" - this seems like a weirdly specific level of detail for this overview section, especially considering that many other aspects of the model are not mentioned at all in the overview. Sec 2.3: (C) Regarding the note (not attempting to ensure consistency of an asserter) - this seems practical. I'm not sure how we could enforce consistency in any circumstance, only define what it means or say it is application specific. Sec 4.1: (T) "We denote this e1." and the same for e2 etc. It is not entirely clear whether "this" refers to the event or the entity. Sec 4.2: (C) The fact that Alice is the creator of e1 seems to be expressed twice, first as an attribute "creator=Alice", and secondly as the "creator" role of an agent in the creation process. I don't think it is a good idea for either clarity of use of the model or for ensuring interoperability for there to be multiple ways to express the same thing, if it can be at all avoided. Even if we cannot stop someone using either method, can't we say which they *should* use to aid interoperability? (T) "Generation expressions... represent the event at which a file is created". The surrounding text is generic rather than specific to the example, implying this should be "entity" rather than "file", Otherwise, readers may assume that all entities are files or that generation only applies to files. (T) Paragraph on wasComplementOf: in "attribute content" and "attribute spellchecked", fixed width font (or another font) should be used for the attribute names to show they are names, else the sentence can be read in strange ways. Sec 4.3: (T) Fig 1: The arrow from pe2 to a3 is a different direction to the other "agent" links. It is also not clear if an "agent" link is the same as a "wasControlledBy" link. If so, the pe2-a3 arrow direction makes most sense, as the others seem to be saying the agent was controlled by the process execution. Sec 5.1: (T) The last sentence, regarding a "house-keeping construct" is rather opaque. I'm not sure what the reader is supposed to understand from this. Sec 5.2.1: (C) First sentence: "entity expression" is given exactly the same definition that "entity" was in Section 4. I think having two terms for the same thing will cause confusion. I like addition of "expressions" to the model in general, though, as I think this greatly clarifies what is intended. (C) "the meaning of attribute in the context of a process execution expression is similar to the meaning of attribute for entity expression" - I think the meaning should be exactly the same, not just similar, else there will be confusion. (C) Following from the above point: "A process execution expression's attribute remains constant for the duration of the activity" - OK, but does it also characterise the process execution, e.g. is the start time part of what distinguishes one execution from others? (T) "noted processExecution" - I think you mean "denoted" (or "written" or "expressed") Sec 5.2.3: (T) "representation a characterized thing" - missing "of" (T) Last sentence, "On the contrary" should be "On the other hand", and "inferred" should be "infer" Sec 5.2.4: (T) Last sentence: "expectede" Sec 5.3.3.1: (C) I suggest that, as accounts are not introduced until later in the document, the generation-unicity constraint will not make sense here. Moreover, I think the constraint is more about accounts and what it means for them to be consistent than it is about generation events or process executions. Therefore, I suggest moving this constraint to the section on accounts. (C) Given that constraint derivation-events applies, don't we just have two ways of saying the same thing? Why use the long form of wasDerivedFrom when the same can be expressed using wasGeneratedBy and used? Which variety *should* be used? Sec 5.3.3.2: (T?) Constraint "derivation-linked-independent" seems to be a tautology. I guess this is a typo? Sec 5.3.3.3: (T) Paragraph 4: "In other word" should be "In other words" Sec 5.3.4: (C) This section seems to be confusingly expressed, implying that non-agent entities can control executions, whereas the control-agent constraint (in the section on agents) contradicts this. It is probably just a matter of clarifying the text, e.g. if you mean that a non-agent entity can be asserted to be controlling an execution but from this inferred to be an agent. (T) The text may be read to imply that a control link has only one qualifier, role, whereas I guess you mean that, like use/generate, it can have multiple "modalities" as part of the qualifier? Sec 5.3.5: (C) I can see this section causing some difficulty... While that may just be the nature of the topic, there seems an important thing missing: what has complementarity got to do with provenance? In other words, what value (with regards to provenance) is there in asserting complementarity? (C) The text suddenly starts talking about "properties" from the second paragraph. What are these, and do they have any relation to attributes? (C) Should the justification of why the complementarity relation is not transitive be in this document? I would expect this document to just state that it is not transitive and, for brevity and simplicity, leave justifications to another document. Sec 5.3.6: (C) Similarly to above, I'm not sure the justification of why wasInformedBy is not transitive should be in this document. Sec 5.3.8: (C) Constraint participation: This seems odd to me. In what circumstances would you not know or want to assert which of the three possibilities (used/controlled/complement) applied for a given entity and execution? Is hadParticipant as defined really useful? Sec 5.3.9: (C) Grammar definition: I don't understand what the "relationIdentification" stuff is about or what all the identifiers identify. Sec 5.4.1: (C) This appears to be yet another way to say the same thing, following the comment on Sec 4.2 above. If A is an "asserter" of expression E, then we can either (i) express E to be an entity and use an attribute "asserter=E"; (ii) express E to be an entity and A to be an agent playing "role=asserter"; or (iii) put A in the "asserter" slot of an "account" expression containing E. Why do we need all three ways? Isn't method (ii) most consistent with the rest of the model? Sec 5.4.2: (T) Second sentence: "return all the provenance assertions" - all the assertions? or just "all the assertions in the container"? (C) Under the definition given, you cannot have expressions in a container but not in an account. Does this imply that every Prov expression is made accessible as part of an account? I think this would be a good thing for clarity, but it is not explicit in the document (and also differs from OPM). Section 5.5.1: (C) I agree with the first note. If it is mandatory to say something but that what we say can be nothing, that means that it is not mandatory at all. The "mandatory" thing seems to be just saying something about the ASN, and so is irrelevant as the ASN is just there to make the model concrete and readable. Sec 5.5.4: (C) Second note: Wouldn't this mean that either account IDs or entity IDs can never be URIs, as a sequence of URIs would itself not be a URI? If so, that seems to make RDF serialisation difficult to achieve. Sec 5.5.6: (C) I don't see the connection between the section's introductory text and the content of the subsections. Sec 5.7.1: (C) I think this section needs something introductory to say why it is relevant to the data model, i.e. what has it to do with provenance, why is it useful in the context of provenance, why is it standardised rather than application-specific? (C) If my record of what occurred does not start with an empty container, but one with contents, how do I say that the elements are part of the container? Do I have to model this as a series of wasAddedTo links, even if I know nothing about how the elements were added? Or is it out of scope of the standard? Sec 5.7.2: (C) I don't see how wasQuoteOf is a sub-relation of wasRevisionOf, or wasAttributedTo a sub-relation of wasEventuallyDerivedFrom, when the super-relations do not contain reference to any agents but the sub-relations do. What does it mean? (T) Last sentence of 5.7.2.2: "wasQuoteOf" should be "wasAttributedTo" Thanks, Simon -- Dr Simon Miles Lecturer, Department of Informatics Kings College London, WC2R 2LS, UK +44 (0)20 7848 1166
Received on Saturday, 24 September 2011 16:56:09 UTC