- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Fri, 29 Jul 2011 10:17:34 +0100
- To: public-prov-wg@w3.org
Thanks Graham, for the extensive comments. I raised issues on your behalf, since it's easier for us to discuss issues separately and track them. Luc On 07/28/2011 10:38 PM, Graham Klyne wrote: > With reference to: > http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html > Retrieved at about 17:30 on 28-Jul-2011 > > As promised, I've taken a tilt at reviewing the model draft. I must > say, I've found it to be really hard going - many of the notions > described are not making sense to me, and the language used sometimes > seems to be unnecessarily obscure. > > After a mammoth session going though this, I really don't have the > time or energy to split my comments out into separate issues. I think > many of them are purely editorial in nature, and as such could be > cleaned up relatively easily. There are some substantive comments that > I may separate out as formal issues later, but I'm rather hoping that > won't be needed. > > My comments follow: > > > 3.1 Notation used is obscure. What does [...[ mean? Should be > explained. > > For a general audience, examples based on Unix command shell commands > are probably not very helpful. > > What is "characterized entity represented by the file". As this is an > example, just say "crime statistics" - would that be a correct > interpretation? > > > 3.2 where did 'e0' come from? - it's not mentioned in 3.1. What is it > intended to denote? > > The "agent" statements are completely impenetrable to me. > > How is the notation to be interpreted. It looks a b it like some kind > of deviant Prolog, but either I've forgotten some of the basic > constructs, or it's not entirely clear how the deviant bits are meant > to be interpreted. > > > 3.3 graphical representation: could be very useful, and would be much > easier to follow if the illustration included a key > > What does it mean for an agent to be linked to a BOB as opposed to a > process execution (cf. Alice and e0). > > > 4. About the Provenance Language > > Introduction of "characterized entities" - if this is something that > really needs to be said, I think it needs to be clarified. I spent > some time thinking about these two sentences, trying to work out if > they could ever be completely correct, or just not understanding what > they are intended to convey: > [[ > Furthermore, this specification is concerned with characterized > entities, that is, entities and their situation in the world, as > perceived by their asserters. > > In the rest of the document, we are concerned with the representation > of such entities; their situation in the world will be represented > using sets of attributes. > ]] > > Why "characterized entities" as opposed to perceived entities"? > What's the important distinction here? > > The only interpretation I've found that makes sense to me is that the > document is concerning itself with entities that are characterized by > the values of some bounded set of attributes. But that > interpretation, if correct, is not obvious to me from the wording here. > > > "PIL is a language by which representations of the world can be > expressed using terms that are drawn from a controlled vocabulary. " > I'm not sure how to interpret this. Does this "controlled vocabulary > include, for example, numbers? Is this controlled vocabulary expected > to be the complete set of terms used in PIL expressions? > > > "These representations are relative to an asserter, and in that sense > constitute assertions about the world." > What is this trying to say? I think you might mean something like: > "These representations are relative to the context of an asserter, and > in that sense constitute perceptions about the world." > which ties back to the earlier statement about "as perceived by their > asserters". > > "All assertions in PIL SHOULD be interpreted as a record of what has > happened, as opposed to what may or will happen." > I feel we should find a way to strengthen this SHOULD to a MUST, but > comments from earlier discussions make this tricky to get right. Maybe: > "All assertions in PIL MUST be interpreted as a record of what has > happened or been observed in some context, as opposed to what might > happen or potential observations." In this, I am using the reference > to a context to provide just enough wiggle-room for description in > future or imagined contexts. > > "This specification does not prescribe the means by which assertions > are made, for example on the basis of observations, inferences, or any > other means." > The phrasing "... assertions are made" here is jarring, if not > confusing - I would think that assertions are made in PIL for the > purposes of this spec. Suggest "... how assertions are arrived at, ..." > > "The language introduces a notion of "provenance container", which > provides a default scope for assertions." > The term "container" here is suggested of a physical or logical > encapsulation, which I don't think is meant. How about "provenance > context"? > > [[ > ... The model may define additional scoping rules for assertions. > Identifiers can safely be used within that scope. Optionally, > identifiers can be exported so that they can be used outside their > default scope. The language does not prescribe the mechanisms by which > identifiers are generated. > ]] > This spec is describing a data model, *not* a language. It says so at > the top. As such I think it's entirely inappropriate to start > defining linguistic constructs such as identifiers and scoping. > Assuming the actual language used will be RDF, I'm not seeing how > what you describe will be possible. > > "In this specification, when an assertion is defined to refer to > another assertion about something, it does so by means of that thing's > identifier." > I don't understand what this is trying to say. > > > 5.1 BOB > > "A BOB represents an identifiable characterized entity." > > What does it mean to be "characterized" here? What does this tell > us? What does it mean to not be "characterized"? If this refers to > the attribute-based assertions mentioned earlier, does this mean that > if there are no such assertions, an entity cannot be a "BOB"? > > [[ > A BOB assertion is about a characterized entity, whose situation in > the world is variant. A BOB assertion is made at a particular point > and is invariant, in the sense that all the attributes are assigned a > value as part of that assertion. > ]] > > This section is, according to its heading, about "BOB". But this is > defining a different concept, so shouldn't this be in a separate section? > > It seems to me that what we're talking about here is a "provenance > assertion". I think it would be clearer to just describe that, e.g. > [[ > A provenance assertion is about an entity, whose situation in the > world is generally assumed to be variable. > ]] > > I either don't understand or don't agree with the second part of that > description. The notion of assigning values as party of an assertion > seems wrong to me (I think the notion of constraining attributes is > the job of the IVP-of relation). I would expect something like: > [[ > A provenance assertion is made at a particular point and is invariant, > in the sense that the attributes it mentions do not change for the > entity concerned. > ]] > > [[ > A BOB assertion must describe a characterized entity over a continuous > time interval in the world (which may collapse into a single instant). > Characterizing an entity over multiple time intervals requires > multiple BOB assertions, each with its own identifier. Some attributes > may retain their values across multiple assertions. > ]] > This constraint seems rather unnecessary, and maybe counter-productive. > > Suppose we want to describe the collective observations of a > particular telescope when pointed at a particular region of the sky. > This might actually consist of a (possibly unknown) number of > disjoint time-segments caused by the rotation of the earth and other > factors. I can't see any clear benefit in being forced to treat these > observation-sets as distinct entities. > > [[ > There is no assumption that the set of attributes is complete and that > the attributes are independent/orthogonal of each other. > ]] > I don't see this adding any useful information here. Remove? > > > 5.2 Process Execution > > Thinking about today's teleconference (28 July) and reading this, I'm > seeing the key distinction between Entity and Process execution being > like the philosophical distinction between continuants (endurant) and > occurrents (perdurant) > (http://en.wikipedia.org/wiki/Formal_ontology#Common_terms_in_formal_ontologies) > > > > 5.3 Generation > > "characterized entitity" is clumsy - suggest just "entity" (or > whatever term is selected for "BOB"). > > If I had not previously read about OPM, I'd be completely confused by > the introduction of "role" here. Following the hyperlink here does > not help at all. > > [[ > Given an assertion isGeneratedBy(x,pe,r) or isGeneratedBy(x,pe,r,t), > the activity denoted by pe and the entities used by pe dermine values > of some of x's attributes. > ]] > I've no idea what this is trying to say. > > > 5.4 Use > > Same problem with 'role' as above. > > [[ > A reference to a given BOB may appear in multiple use assertions that > refer to a given process execution, but each of those use assertions > must have a distinct role. > ]] > In light of the above, this seems nonsensical to me. > > [[ > Given an assertion uses(pe,x,r) or uses(pe,x,r,t), at least one value > of x's attributes is a pre-condition for the activity denoted by pe to > terminate. > ]] > As written this doesn't make sense - a value of an attribute being a > precondition seems like a type error to me. I think you mean > something like availability of an attribute value. But even that is > hard to follow. Suggest simplifying this to just: > [[ > Given an assertion uses(pe,x,r) or uses(pe,x,r,t), existence of x is a > pre-condition for the activity denoted by pe to terminate. > ]] > > > 5.5 Derivation > > [[ > Given an assertion isDerivedFrom(B,A), one can infer that the use of > characterized entity denoted by A precedes the generation of the > characterized entity denoted by B. > ]] > Where does this notion of "use" come from in the absence of some > referenced activity? > > Concerning transitivity of derivation: > > Suppose: > A has attributes a0, a1 > B having attributes b0, b1 is derived from A, with b0 being dependent > on a0 > C having attributes c0, c1, is derived from B with c1 being dependent > on b1 > > So none of the attributes of C can be said to be directly or > indirectly dependent on attributes of A, which by the given definition > is a requirement for derivation of C from A. Thus, as defined, > derivation cannot be transitive. > > I don't really know if derivation should or should not be transitive, > but the above seems to me like a problem of spurious > over-specification. My suggestion for now would be to focus on what > really matters and see what logical properties fall out later. > > > 5.8 IVP of > > The revised (w.r.t. > http://www.w3.org/2011/prov/wiki/F2F1ConceptDefinitions#IVP_of) > treatment of IVP-of, and relabeling as "complement-of" completely > overturns my understanding of what this was intended to capture. I > understood the whole point of A IVP-of B was intended to capture the > notion that A denotes a contextually constrained form of the entity > denoted by B. I don't see what useful purpose this relation serves. > > From a practical perspective, given the asymmetric nature of IVP-of > (as was) it is easy to express the effect of complement-of in RDF by > introducing a new entity node. But I see no way of constructing the > strict constraining role of IVP using complement-of. > > > 5.9 Time > > [[ > Time is defined according to [ISO8601]. > ]] > > I don't think it is appropriate of an open standard to be normatively > dependent on a standard that is available only on payment of a charge > for access. In this case, we could make reference to the XML scheme > datatypes, which would also require us to think about my next point... > > As far as I'm aware, ISO 8601 covers both points in time and time > intervals. As such a bare reference to ISO 86012 is not really an > adequate definition: which do we want? I suspect > http://www.w3.org/TR/xmlschema-2/#dateTime. > > > 5.10 Recipe Link > > I don't see what useful purpose this serves. > > > 5.11 Role > > I can't completely follow the description given. > > > 5.13 Ordering of Processes > > This section confusingly changes the style of presentation from > sections dedicated to specific concepts to a vague discussion of > possible relationships between things. > > > 5.14 Revision > > This seems to be just a different form of Derivation that happens to > mention an agent. I'm not sure why I'd choose one over the other. > > I think this may be unnecessary - would not a similar effect be > achieved by having a process execution of "revision" that uses b1, > generates b2 and is controlled by ag (possibly with role "revise"?). > > > 5.16 Provenance Container > > It's not clear what this is intended to be (maybe unsurprising, since > the definition is absent). But it looks as if it's intended to a > syntactical kind of thing, which I feel is out of place in a data > model description (especially if we're expecting to use RDF to > represent the data). The next version of RDF will probably formally > define named graphs - I'm not seeing what additional definition would > be needed here. > > -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Friday, 29 July 2011 09:18:05 UTC