- From: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
- Date: Sat, 07 Apr 2012 08:20:07 +0100
- To: Paul Groth <p.t.groth@vu.nl>
- CC: W3C provenance WG <public-prov-wg@w3.org>
Paul, Yes, it's largely a document/text quality thing - I feel it doesn't entirely lay things out clearly enough for its target audience, and in some cases is actively confusing. This may be "editorial", but I think it's important enough to need addressing to move forwards towards LC. There are a few points of substance (mainly stuff that feels superfluous to me), but I wouldn't be surprised to be lone voice on that. I've indicated a number of specific points points in the "details" part of my email, with suggested alternative phrasing, though there are many more (similar to those I detail) that I've skipped over in passing. #g -- On 06/04/2012 21:36, Paul Groth wrote: > Hi Graham, > > Just for clarification, given that you think prov-dm is not ready for > release, it's important to understand what exactly could be done to > get it to the point where it is. > > Reading through your points, it seems to me that your comments are > primarily editorial, in that it's the explanation, definition and > organization of the terms that is the issue. Is that a correct > interpretation? > > If not, can you identify the specific things that would need to be > addressed for us to move forward on prov-dm? > > Regards > Paul > > > On Fri, Apr 6, 2012 at 9:51 PM, Graham Klyne<graham.klyne@zoo.ox.ac.uk> wrote: >> Re: >> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120402/prov-dm.html >> (Retrieved on 2012-04-03) >> >> While this has many improvements over previous documents, I still feel that >> there are several respects in which the document does not really serve its >> intended purpose. >> >> Generally, I found the tone and phrasing were more akin to academic rhetoric, >> whose purpose is to persuade a peer of the truth of some proposition, than a >> technical standard whose aim should be to *specify*, *inform* and where >> necessary to *explain*. Especially for developers who will have to use this >> material as a reference source. Thus, I found much of what I read, particularly >> in the introductory section, had far to much justification (some of which was >> obvious, other aspects of which were just "noise") which didn't help to to >> understand what was being presented, or how to use it. >> >> I also still have problems with the overall organization. In particular, I >> (still) find the example in section 3 breaks the hoped-for flow between the >> section 2 overview (which I also now think is mis-titled) and the provenance >> expression details in section 4. I also don't think the final two subsections >> of section 2 belong there, as they deal with provenance expression details, not >> concepts. >> >> Finally, I found many examples of unusual or awkward phrasing which I found to >> be unhelpful, confusing or in some cases just plain wrong. >> >> To summarize: if we expect the next public working draft to be nearly ready for >> last, then I don't think this document is ready for release. >> >> Details follow. >> >> ... >> >> >> == Abstract == >> >> The phrase "derivations between entities" is strange and confusing. I think you >> mean something like "derivation of entities from other entities". >> >> "Properties that link entities that refer to a same thing". I think this is >> just wrong: I don't believe that entities *refer*. I think you mean something >> like "Properties that link entities that are based on the same thing". >> >> "collections of entities, whose provenance itself can be tracked" - this feels >> vaguely ungrammatical, and I'm not quite sure what this is trying to express. >> In any case, I'll argue later that I don;t see why this is necessary as part of >> the provenance core model. (What I'm not seeing here is anything I can >> recognize as the notion of accounts, which allow for provenance of provenance to >> be expressed.) >> >> Here, and later in the document, there are references to "natural language". I >> believe this is a term of art that is meaningful only to those who have exposure >> to formal languages, as a way of distinguishing, and may be confusing to some >> readers. In the abstract, I'd suggest just dropping this - the rest of the >> sentence carries the intended meaning. >> >> I'm not sure what you mean by "systematically defines". Just "defines" would >> do, I think. >> >> == Status of this document == >> >> The heading "how to read this document" is, I think, both patronizing and >> inaccurate. And the following comments seem to significantly replicate the >> content of the preceding text. I'd suggest moving descriptive material about >> the documents into the preceding text, and drop the stuff that tries to tell >> people what to read. >> >> "Fourth public working draft". Really!! Are we really up to 4 with this? I >> lose count. >> >> == Introduction == >> >> "how it should be integrated with other diverse information sources". I find >> this phrase to be vague and unclear, and hence unhelpful. I'd suggest dropping >> this, and changing "... help those users to make trust judgements" in the next >> sentence to read: >> >> "... help those users to decide which information to include in their analyses, >> and which to exclude." >> >> "The idea that ... a pragmatiuc approach is to consider ..." add's no useful >> value. I suggest replacing all of this with "We consider ...". >> >> "the vision is that" is pure noise. Suggest deleting this. This whole >> paragraph seems to be an unnecessary repetition of what the previous says. >> While I sometimes think that a repeated summary can be useful, in this case I >> think it would be more helpful to simplify the preceding paragraph. >> >> The material that starts with "A set of specifications, ..." seems to be pure >> repetition of material contained in the "status of this document" - is it really >> necessary to repeat it here? >> >> The listing of "components!" seems to be greatly redundant. Each component is >> both numbered (N) and introduced as "component N". I think a simple numbered >> list without the "component N" tags would suffice. >> >> Two paragraphs starting with "This specification intentionally presents..." - >> these paragraphs are loaded with unnecessary self-justification. I think a >> simpler statement along the lines of: >> >> "This specification presents the key concepts of the PROV data model and >> provenance expressions, without specific concern for how they are applied. A >> companion document [PROV-DM-CONSTRAINTS] discusses some possible constraints on >> the application of this model, and corresponding useful inferences that may be >> available when those constraints are known to be satisfied." >> >> [[The next comment is rendered moot if the previous one is accepted...]] >> Paragraph: "However, if data changes...". To an uninitiated reader, it is not >> at all clear what is meant by "data" here. I'd suggest something like "If a >> thing about which provenance is expressed is subject to change, it is >> challenging to express its provenance precisely (e.g. the data from which a >> daily weather report is derived will change from day to day)." Drop the >> reference to other metadata here - it adds nothing of value. >> >> @@(note to self) raise a separate issue about how to describe this "refinement". >> I know I have argued for "refinement" over the idea of an "updated" or >> "modified" provenance model, but the term is still a bit vague. I find myself >> leaning toward a notion of a "strict" interpretation of provenance that in turn >> allows certain inferences to be drawn if the supplied provenance satisfies >> certain strictness criteria (constraints). >> >> == 1.2 PROV namespace == >> >> This section glibly introduces the notion of a "namespace" without explaining >> (or citing) what it means. >> >> "The PROV namespace is http://www.w3.org/prov#". This is WRONG. >> http://www.w3.org/prov# is a URI, not a namespace (or, more precisely, it's a >> string that conforms to URI syntax). >> >> What should be said is something like: "The names for concepts, attributes and >> other reserved names introduced by this document belong to a namespace >> identified by the URI http://www.w3.org/prov#". >> >> And: what is the consequence of these names belonging to a namespace? I think >> it would be appropriate to cite the corresponding XML and RDF documents that >> deal with namespace issues [1] [2]. >> >> [1] http://www.w3.org/TR/REC-xml-names/ >> >> [2] http://www.w3.org/TR/REC-rdf-syntax/ (sections 6.1.2, 6.1.4, etc. These >> define how RDF/XML forms a URI-reference by appending a local name to a >> namespace URI.) >> >> == Section 2, PROV-DM staring points == >> >> I think this section is mis-titled. >> >> I think it should be: "2. Introduction to provenance concepts", since that is >> what most of the section is about. >> >> In light of this, the final two sub-sections seem mis-placed, and I suggest they >> should be part of the early material in section 4. >> >> "... that a novice reader would write in a first instance". Yuk! How >> patronizing! Also, a reference here to "natural language" (see previous). I >> would phrase this whole paragraph thus: >> >> "This section introduces provenance concepts with informal descriptions and >> illustrative examples. Later (section @@ref), we describe how these concepts >> are described using PROV-DM types and relations." >> >> (where @@ref should be in another section that actually deals with PROV-DM terms.) >> >> == 2.1 Entity and Activity == >> >> "The term things encompasses..." - I find this phrasing awkward and potentially >> confusing - are we talking here about things or entities? I suggest simply >> "These encompass ..." >> >> The final sentence is mostly noise. Why not just "Any Web resource may be an >> entity."? >> >> "For the purpose of this specification..." is just noise. Also, confusing >> reference to "entities" and "things". Suggest for this para: "An entity is a >> thing one wants to provide provenance for, which may be physical, digital, >> conceptual, or otherwise; entities may be real or imaginary." >> >> "This action can take multiple forms: ..." - this is confusing; are we talking >> about a single activity having multiple forms, or different activities having >> different forms. I think you mean the latter, hence I suggest: "An activity is >> something that occurs over a period of time and acts upon or with entities. They >> may include consuming, processing, transforming, modifying, relocating, using, >> generating, or other associations with entities." >> >> >> == 2.2, et seq. == >> >> I find similar issues with the wording of subsequent sections, but I haven't >> gone through every one for lack of time. But I hope you get the general thrust >> from the above. >> >> >> == 2.3 Agents and other types of entities == >> >> I think this exhibits poor organization of the material. I think Agents and >> Plans are related, and suggest a sub-section for them. Collections and accounts >> don't have any obvious relationship, and IMO should be separated. >> >> Concerning collections, it is not at all clear to me that these need to be in >> the core PROV-DM. By including them here, you impose a particular view of >> collections that may not be appropriate (somewhere, though I can't immediately >> find where, there is mention of a collection being a key-value map). Domains >> that deal with collections have their own models for these, so why not let this >> be an aspect for domain-specific extension? >> >> >> I think accounts should have a section of their own, since they underpin the key >> feature of supporting provenance0-of-provenance. >> >> However, I have a problem with the description "An account is an entity that >> contains a bundle of provenance descriptions." I think that this should be "An >> account *is* an entity that is a bundle of provenance descriptions." That is, I >> don't think the core DM needs to or should expose the notion of containment, >> since that begs more questions. >> >> == 2.4 Attribution, association and responsibility == >> >> I find the expression of these ideas to be hopelessly muddled, and incoherent. >> In particular, it seems to be self-contradictory with respect to the notion of >> "responsibility" (also with section 2.3): >> >> "An agent is a type of entity that bears some form of responsibility for an >> activity taking place." >> "Software for checking the use of grammar in a document may be defined as an agent" >> "Agents are defined as having some kind of responsibility for activities." >> "[an association may be] an XSLT transform launched by a user ..." >> "An activity association is an assignment of responsibility to an agent for an >> activity" >> "Responsibility is the fact that an agent is accountable for ..." >> >> At heart, I think the problem here is the notion that agents are "responsible". >> Especially when "responsibility" is later defined in terms of accountability - >> I can't see a software agent as being accountable. I don't know how to make >> sense of this, so it's hard for me to suggest alternatives. >> >> == Section 2.5, Simplified overview diagram == >> == Section 2.6, PROV-N ... == >> >> See earlier comments. These is about PROV-DM terms, not provenance concepts, so >> I don't really think they belong here. >> >> I'd move them to start start of section 4. >> >> == Section 3, Illustration... == >> >> I *still* think the positioning of this example disrupts the logical flow from >> concepts (section 2) to PROV-DM expressions (section 4). >> >> (I haven't reviewed the content of this section.) >> >> >> == 4. PROV-DM types and relations == >> >> The enumeration of components seems to be repetitive. Numbered items *and* >> component numbers? (See earlier comment.) >> >> "In the first column, one finds concept names directly linking to their English >> definition. In the second column, ...". Why not just use column headings in the >> table? The reference to "English" description seems redundant. >> >> "In the rest of the section, each concept and relation is defined, in English >> initially, followed by a more formal definition and some example." Similar >> comment. Suggest: >> "In the rest of the section, each type and relation is defined informally, >> followed by a summary of the information used to represent the concept, and >> illustrated with PROV-N examples." >> >> == 4.1.1 Entity == >> >> "An entity is a thing one wants to provide provenance for. For the purpose of >> this specification, things can be physical, digital, conceptual, or otherwise; >> things may be real or imaginary." confuses entities and things again. Suggest: >> "An entity is a thing one wants to provide provenance for. It can be physical, >> digital, conceptual, or otherwise, and may be real or imaginary." >> >> "An entity, written entity(id, [attr1=val1, ...]) in PROV-N, contains:" - I >> think this is wrong - an entity does not (in general) *contain*. Suggest: >> "An entity, written entity(id, [attr1=val1, ...]) in PROV-N, has:" >> >> "id: an identifier for an entity;" - this is redundant and potentially >> confusing. Suggest "id: an identifier". >> >> "attributes: an optional set of attribute-value pairs ((attr1, val1), ...) >> representing this entity's situation in the world." - I find this phrasing >> awkward and unclear. Suggest: >> "attributes: an optional set of attribute-value pairs ((attr1, val1), ...) >> representing additional nformation about this entity." >> >> == 4.1.2, et seq == >> >> (Similar editorial comments to those for 4.1.1 Entity. I'm not repeating them >> all now for lack of time.) >> >> >> == Section 4.1.5 Start == >> >> I find this whole section is confusing. Starting with: >> >> "trigger: an optional identifier (e) for the entity triggering the activity;" - >> do you really mean to allow *any* entity here, rather than just agents? >> >> Looking forward to the example, I find the idea that an email (qua entity) can >> "trigger" an activity is incoherent. Suppose the email is drafted and never >> sent. It still exists as an entity, but can't be said to actually *trigger* >> anything. For me, it is the act of actually sending (or receiving) an email >> that may trigger something, not the email as a passive entity. >> >> >> == Section 4.1.6, End == >> >> (Similar comments to those above.) >> >> >> == Section 4.1.7, Communication == >> >> It seems strange to me, given the pattern used for other concepts/expressions, >> that the communicated entity cannot be optionally named. I find myself >> wondering if I've understood the definition properly. >> >> >> == Section 4.2.1, Agent == >> >> Continues the muddle about responsibility. I don't know what it all means >> (especially when the agent is running software). See previous comments. >> >> Awkward and unnecessary phrase "situation in the world" again. See earlier for >> suggested phrasing. >> >> >> == Section 4.3.1 Derivation == >> >> "A derivation is a transformation of an entity into another, a construction of >> an entity into another, or an update of an entity, resulting in a new one." >> seems ungrammatical. Suggest: >> "A derivation is a transformation of an entity into another, a construction of >> an entity *from* another, or an update of an entity, resulting in a new one." >> >> >> == Section 4.5 Collections == >> >> I'm not understanding why this needs to be part of the core PROV-DM, and cannot >> be habdled by domain specific notions of aggregation. >> >> The stated goal is that "it is also of interest to be able to express the >> provenance of the collection itself" - this could be done equally well with a >> domain-specific collection notion, AFAICT. >> >> See also earlier comments. >> >> >> == Section 4.6, Annotations == >> >> I'm still not seeing why these are needed as part of the core DM. There's no >> associated inference that I am aware of, and additional information can be added >> via attributes, so I'm not seeing what useful additional expressive capability >> this affords. >> >> >> == Section 4.7.4 Attribute == >> >> Is an attribute really just a qualified name, or is it a pair consisting of a >> qualified name and a value? >> >> >> == Section 5, Extensibility points == >> >> This section makes little sense to me. The obvious extensibility points of >> sub-typing and sub-properties of defined PROV-DM terms isn't mentioned. >> >> The use of new attributes seems reasonable, though it's not entirely clear how >> they act as extension points, and the mention of "perspective on the world" >> doesn't mean anything to me. >> >> I cannot see how notes, which are defined to be pretty much semantics-free, can >> be described as an extensibility point - they don't actually add any expressive >> power that I can see. >> >> The remaining points I just don't get. >> >> I think this whole notion of extensibility needs to be treated more carefully >> and comprehensively if it is to be taken seriously. Otherwise expect developers >> to ignore this and just use extensibility options in the representation >> substrate (e.g. RDF) used. >> >> == Section 6 == >> >> I think this section is completely redundant and out-of-place, and could be >> removed without any loss. >> >> ... >> >> That's it for now. >> >> (BTW, my email access is patchy, so I may not be able to respond promptly to any >> follow-up discussion.) >> >> #g >> -- >> >> >> >> >> > > >
Received on Saturday, 7 April 2012 15:58:05 UTC