- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Thu, 28 Jul 2011 22:38:07 +0100
- To: W3C provenance WG <public-prov-wg@w3.org>
With reference to: http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html Retrieved at about 17:30 on 28-Jul-2011 As promised, I've taken a tilt at reviewing the model draft. I must say, I've found it to be really hard going - many of the notions described are not making sense to me, and the language used sometimes seems to be unnecessarily obscure. After a mammoth session going though this, I really don't have the time or energy to split my comments out into separate issues. I think many of them are purely editorial in nature, and as such could be cleaned up relatively easily. There are some substantive comments that I may separate out as formal issues later, but I'm rather hoping that won't be needed. My comments follow: 3.1 Notation used is obscure. What does [...[ mean? Should be explained. For a general audience, examples based on Unix command shell commands are probably not very helpful. What is "characterized entity represented by the file". As this is an example, just say "crime statistics" - would that be a correct interpretation? 3.2 where did 'e0' come from? - it's not mentioned in 3.1. What is it intended to denote? The "agent" statements are completely impenetrable to me. How is the notation to be interpreted. It looks a b it like some kind of deviant Prolog, but either I've forgotten some of the basic constructs, or it's not entirely clear how the deviant bits are meant to be interpreted. 3.3 graphical representation: could be very useful, and would be much easier to follow if the illustration included a key What does it mean for an agent to be linked to a BOB as opposed to a process execution (cf. Alice and e0). 4. About the Provenance Language Introduction of "characterized entities" - if this is something that really needs to be said, I think it needs to be clarified. I spent some time thinking about these two sentences, trying to work out if they could ever be completely correct, or just not understanding what they are intended to convey: [[ Furthermore, this specification is concerned with characterized entities, that is, entities and their situation in the world, as perceived by their asserters. In the rest of the document, we are concerned with the representation of such entities; their situation in the world will be represented using sets of attributes. ]] Why "characterized entities" as opposed to perceived entities"? What's the important distinction here? The only interpretation I've found that makes sense to me is that the document is concerning itself with entities that are characterized by the values of some bounded set of attributes. But that interpretation, if correct, is not obvious to me from the wording here. "PIL is a language by which representations of the world can be expressed using terms that are drawn from a controlled vocabulary. " I'm not sure how to interpret this. Does this "controlled vocabulary include, for example, numbers? Is this controlled vocabulary expected to be the complete set of terms used in PIL expressions? "These representations are relative to an asserter, and in that sense constitute assertions about the world." What is this trying to say? I think you might mean something like: "These representations are relative to the context of an asserter, and in that sense constitute perceptions about the world." which ties back to the earlier statement about "as perceived by their asserters". "All assertions in PIL SHOULD be interpreted as a record of what has happened, as opposed to what may or will happen." I feel we should find a way to strengthen this SHOULD to a MUST, but comments from earlier discussions make this tricky to get right. Maybe: "All assertions in PIL MUST be interpreted as a record of what has happened or been observed in some context, as opposed to what might happen or potential observations." In this, I am using the reference to a context to provide just enough wiggle-room for description in future or imagined contexts. "This specification does not prescribe the means by which assertions are made, for example on the basis of observations, inferences, or any other means." The phrasing "... assertions are made" here is jarring, if not confusing - I would think that assertions are made in PIL for the purposes of this spec. Suggest "... how assertions are arrived at, ..." "The language introduces a notion of "provenance container", which provides a default scope for assertions." The term "container" here is suggested of a physical or logical encapsulation, which I don't think is meant. How about "provenance context"? [[ ... The model may define additional scoping rules for assertions. Identifiers can safely be used within that scope. Optionally, identifiers can be exported so that they can be used outside their default scope. The language does not prescribe the mechanisms by which identifiers are generated. ]] This spec is describing a data model, *not* a language. It says so at the top. As such I think it's entirely inappropriate to start defining linguistic constructs such as identifiers and scoping. Assuming the actual language used will be RDF, I'm not seeing how what you describe will be possible. "In this specification, when an assertion is defined to refer to another assertion about something, it does so by means of that thing's identifier." I don't understand what this is trying to say. 5.1 BOB "A BOB represents an identifiable characterized entity." What does it mean to be "characterized" here? What does this tell us? What does it mean to not be "characterized"? If this refers to the attribute-based assertions mentioned earlier, does this mean that if there are no such assertions, an entity cannot be a "BOB"? [[ A BOB assertion is about a characterized entity, whose situation in the world is variant. A BOB assertion is made at a particular point and is invariant, in the sense that all the attributes are assigned a value as part of that assertion. ]] This section is, according to its heading, about "BOB". But this is defining a different concept, so shouldn't this be in a separate section? It seems to me that what we're talking about here is a "provenance assertion". I think it would be clearer to just describe that, e.g. [[ A provenance assertion is about an entity, whose situation in the world is generally assumed to be variable. ]] I either don't understand or don't agree with the second part of that description. The notion of assigning values as party of an assertion seems wrong to me (I think the notion of constraining attributes is the job of the IVP-of relation). I would expect something like: [[ A provenance assertion is made at a particular point and is invariant, in the sense that the attributes it mentions do not change for the entity concerned. ]] [[ A BOB assertion must describe a characterized entity over a continuous time interval in the world (which may collapse into a single instant). Characterizing an entity over multiple time intervals requires multiple BOB assertions, each with its own identifier. Some attributes may retain their values across multiple assertions. ]] This constraint seems rather unnecessary, and maybe counter-productive. Suppose we want to describe the collective observations of a particular telescope when pointed at a particular region of the sky. This might actually consist of a (possibly unknown) number of disjoint time-segments caused by the rotation of the earth and other factors. I can't see any clear benefit in being forced to treat these observation-sets as distinct entities. [[ There is no assumption that the set of attributes is complete and that the attributes are independent/orthogonal of each other. ]] I don't see this adding any useful information here. Remove? 5.2 Process Execution Thinking about today's teleconference (28 July) and reading this, I'm seeing the key distinction between Entity and Process execution being like the philosophical distinction between continuants (endurant) and occurrents (perdurant) (http://en.wikipedia.org/wiki/Formal_ontology#Common_terms_in_formal_ontologies) 5.3 Generation "characterized entitity" is clumsy - suggest just "entity" (or whatever term is selected for "BOB"). If I had not previously read about OPM, I'd be completely confused by the introduction of "role" here. Following the hyperlink here does not help at all. [[ Given an assertion isGeneratedBy(x,pe,r) or isGeneratedBy(x,pe,r,t), the activity denoted by pe and the entities used by pe dermine values of some of x's attributes. ]] I've no idea what this is trying to say. 5.4 Use Same problem with 'role' as above. [[ A reference to a given BOB may appear in multiple use assertions that refer to a given process execution, but each of those use assertions must have a distinct role. ]] In light of the above, this seems nonsensical to me. [[ Given an assertion uses(pe,x,r) or uses(pe,x,r,t), at least one value of x's attributes is a pre-condition for the activity denoted by pe to terminate. ]] As written this doesn't make sense - a value of an attribute being a precondition seems like a type error to me. I think you mean something like availability of an attribute value. But even that is hard to follow. Suggest simplifying this to just: [[ Given an assertion uses(pe,x,r) or uses(pe,x,r,t), existence of x is a pre-condition for the activity denoted by pe to terminate. ]] 5.5 Derivation [[ Given an assertion isDerivedFrom(B,A), one can infer that the use of characterized entity denoted by A precedes the generation of the characterized entity denoted by B. ]] Where does this notion of "use" come from in the absence of some referenced activity? Concerning transitivity of derivation: Suppose: A has attributes a0, a1 B having attributes b0, b1 is derived from A, with b0 being dependent on a0 C having attributes c0, c1, is derived from B with c1 being dependent on b1 So none of the attributes of C can be said to be directly or indirectly dependent on attributes of A, which by the given definition is a requirement for derivation of C from A. Thus, as defined, derivation cannot be transitive. I don't really know if derivation should or should not be transitive, but the above seems to me like a problem of spurious over-specification. My suggestion for now would be to focus on what really matters and see what logical properties fall out later. 5.8 IVP of The revised (w.r.t. http://www.w3.org/2011/prov/wiki/F2F1ConceptDefinitions#IVP_of) treatment of IVP-of, and relabeling as "complement-of" completely overturns my understanding of what this was intended to capture. I understood the whole point of A IVP-of B was intended to capture the notion that A denotes a contextually constrained form of the entity denoted by B. I don't see what useful purpose this relation serves. From a practical perspective, given the asymmetric nature of IVP-of (as was) it is easy to express the effect of complement-of in RDF by introducing a new entity node. But I see no way of constructing the strict constraining role of IVP using complement-of. 5.9 Time [[ Time is defined according to [ISO8601]. ]] I don't think it is appropriate of an open standard to be normatively dependent on a standard that is available only on payment of a charge for access. In this case, we could make reference to the XML scheme datatypes, which would also require us to think about my next point... As far as I'm aware, ISO 8601 covers both points in time and time intervals. As such a bare reference to ISO 86012 is not really an adequate definition: which do we want? I suspect http://www.w3.org/TR/xmlschema-2/#dateTime. 5.10 Recipe Link I don't see what useful purpose this serves. 5.11 Role I can't completely follow the description given. 5.13 Ordering of Processes This section confusingly changes the style of presentation from sections dedicated to specific concepts to a vague discussion of possible relationships between things. 5.14 Revision This seems to be just a different form of Derivation that happens to mention an agent. I'm not sure why I'd choose one over the other. I think this may be unnecessary - would not a similar effect be achieved by having a process execution of "revision" that uses b1, generates b2 and is controlled by ag (possibly with role "revise"?). 5.16 Provenance Container It's not clear what this is intended to be (maybe unsurprising, since the definition is absent). But it looks as if it's intended to a syntactical kind of thing, which I feel is out of place in a data model description (especially if we're expecting to use RDF to represent the data). The next version of RDF will probably formally define named graphs - I'm not seeing what additional definition would be needed here.
Received on Thursday, 28 July 2011 21:38:41 UTC