- From: Graham Klyne <GK@ninebynine.org>
- Date: Thu, 25 Aug 2011 18:03:44 +0100
- To: W3C provenance WG <public-prov-wg@w3.org>
With reference to: http://dvcs.w3.org/hg/prov/raw-file/71fa5079e6b3/model/ProvenanceModel.html (Note this is the Mercurial version at 25-Aug-2011, about 16:30UTC) ... This is my second attempt to read this document, and while I have slightly better comprehension than first time round, I'm still finding it very hard going. Rather than go through many small points, I'm going to try to focus on high-level issues. I am really concerned that the direction of development of this specification will make it unusable for (what I take to be) its intended purpose, which is to provide a reference for developers who are generating and consuming provenance information. For this, the what is needed first and foremost is a simple data model with a straightforward mapping to one or more common web representation. From the WG charter, I expect RDF to be one of those. If this is not provided, then I fear the specification will render itself irrelevant. I find the example in section 3, and the examples associated with the individual definitions, are really unhelpful. In particular, the "Abstract Syntax Notation" is used without any explanation of what it means. (There is an appendix with a BNF syntax for ASN, but no explanation of how to read or interpret it.) What I think would be *far* more useful than section 3 in its current form is a "50,000-foot" view of the model, outlining the main classes and their possible relationships. A very nice example of such a view for a simpler provenance ontology (OPMV) can be seen in http://open-biomed.sourceforge.net/opmv/ns.html#sec-desc. Coupled with a 1-2 line summary of each class and relation, this would go a long way to making the whole framework more approachable. In section 3.3, the diagram key shows what the different line styles represent, but not the different shapes. ... 5.1 Entity In sections 4 and 5.1, I think the definition of Entity, and its distinction as a "characterized thing" is unhelpful and confusing. It is not clear to me what the distinction of being "characterized" means in any formal sense. There has been a lot of discussion on the mailing list about this, but I haven't see a single argument that shows why it is *needed* to distinguish an "Entity" in any way from anything that can be identified; i.e. any web resource. It seems to me that the primitive provenance assertions one might make about an "Entity" (e.g. dc:creator, doap:release, to use examples from OPMV) can be made regardless of any claim that the Entity is a "characterized thing". Similarly, I see no breakage arising from appropriate application of provenance relations (such as derivedFrom, generatedBy, etc.) to arbitrary entities. Using RDF, the normal approach would be to infer from the existence of such relations some information about the type of the things they relate. If they are used inappropriately, the resulting expression may disagree with reality, or be unsatisfiable - we can't stop people from making nonsense statements on the web. .... 5.8 isComplementOf I struggle to understand from the text what practical application there is for the isComplementOf relation. From mailing list discussion, I come to an understanding that it might be useful for talking about different provenance accounts, to understand when they are referring to the same underlying thing ("Royal Society", etc.) But I find the description given is somewhat complicated and confusing, in part because of the enforced distinction between things and "Entities". I find it much more natural to think of views (i.e. where I think we started out with "IVPof") where different accounts may use different views (roughly corresponding to different observational constraints) of some thing, which can themselves be seen as things (e.g. in the examples given M1, M2, etc might be things corresponding to "Royal Society" in the period(s) when its membership was as indicated in each case, or L1, L2, etc, corresponding to it being located at specific places. The "established on" view could be considered as the underlying "Royal Society" itself, as that never changes. From this notion of "view", the "complementOf" relation is easily derived: if two things are both views of some underlying thing, then they are complementOf each other. This all seems much simpler, more intuitive and less complicated to me than the contortions used to explain complementOf in terms of "characterized entities" and attributes. .... 5.16 Provenance container Leaving aside, for now, the notion of accounts, I find this concept completely unnecessary. Further, it is defined as having a set of "provenance constructs", but I see no defined concept for these, so the definition given is incomplete. I think it would be easier to have a "Provenance expression" (roughly corresponding to an RDF expression) that is a provenance assertion about some thing or things, which can be evaluated to be true or false. At this level, I see no need for having any kind of visibility of the inside of such an expression looks like. The main requirement to be a valid provenance expression is that it is not dependent on any ways in which the referenced thing or things may vary (i.e. talks only about invariant aspects of the thing(s)). I accept some form of containment is needed for accounts, as it is important for some purposes to be able to consider them separately - but I can't see why that containment isn't just part of what it is to be an account. My message to Daniel (http://lists.w3.org/Archives/Public/public-prov-wg/2011Aug/0242.html) expands a little on how I see this. .... In summary: I've tried to focus in high-level issues that I think make this provenance model document difficult to understand, hence difficult to use in discussion of other areas of the provenance WG discussion, and which I also fear may have the effect of it being ignored by developers in favour of simpler specifications like OPMV: - I think a short, high-level overview is needed to illustrate how the various ideas work together - I think that too much formality is invoked too early on in the document, and that the formalism used is inadequately described. Also, I note that the formalism is applied to the examples, not to the definitions themselves, which seems a little odd to me. - I think the definition of "Entity" is unnecessarily complex, and has been giving rise to much confusion in working group discussions. - I think the definition of "Entity" gives rise to an over-complicated definition of "complementOf". I think the original notion of "IPVof" was more useful, and got lost along the way. - I think the notion of provenance containers as a separate concept is not needed, and that a simpler concept of provenance expression (or assertion), which is just another "thing", could be all that is needed. Accounts could then be implicit containers for provenance expressions. #g --
Received on Thursday, 25 August 2011 17:05:25 UTC