- From: Paolo Missier <Paolo.Missier@ncl.ac.uk>
- Date: Tue, 08 May 2012 16:03:50 +0100
- To: W3C provenance WG <public-prov-wg@w3.org>
Hi Graham, I have a naive question on the W3C model: is there a notion of different "compliance levels" wrt a recommendation? this probably echoes Luc's earlier comment on your proposal -- it is unclear to me what the consequences are of cutting through the corpus of existing material in a particular way. Can an organization be partially compliant just by implementing the "core"? (this is genuinely a reflection of my ignorance!) In the specifics, two comments. I don't think that directing developers to the primer is an admission of failure. I have used it as the entry point for student for a number of local projects now and it did do a nice of job of preparing for the prescriptive language of the DM. The second comment is that I wouldn't relegate PROV-N to the semantics docs. Developers need to be aware of PROV-N both to generate and consume provenance, regardless of the formal semantics (which most developers will probably ignore). But while I am happy that PROV goes beyond OPMV in many ways, I am also worried about some of the specific complications that we are introducing in the model, see for instance the ongoing discussion on the various wasStartedBy* relations. My concrete suggestion is that, if we decide that it is ok to keep these relations in all their subtlety, at the very least we need to offer a non-normative "pattern book" specifically targeted at developers who need to generate "correct" provenance. It should reflect and be consistent with the constraints but never mention them. Thoughts? -Paolo On 5/8/12 1:20 PM, Graham Klyne wrote: > On 06/05/2012 12:01, Paul Groth wrote: >> It would really be good to get specific suggestions from you. What >> should be cut? What should be changed? > <TL:DR> > For "normal" developers: > 1. A simple structural core model/vocabulary for provenance, also identifying > extension points > 2. Common extension terms > 3. Ontology (i.e. expressing provenance in RDF) > 4. A simple guide for generating provenance information > > For advanced users of provenance: > 5. Formal semantics (incorporating PROV-N) > 6. An advanced guide for using and interpreting provenance > </TL:DR> > > ... > > Paul, I've been thinking about your question, and will try to articulate here my > thoughts. They will be quite radical, and I don't really expect the group to > accept them - but I hope they may trigger some useful reflection. (Separating > collections is a useful step, but I feel it's rather nibbling at the edge of the > complexity problem rather than facing it head-on.) > > Before diving in, I think it's worth reviewing my motivation for this... > > > At the heart of my position is the question: > > "For provenance, what does success look like?" > > (a) Maybe it looks like this: rich and fully worked out specifications which > are shown to address a range of described use-cases, complete with a consistent > underlying theory that can be used to construct useful proofs around provenance > information, reviewed and accepted for standards-track publication in the W3C. > Software implementations that capture and exploit this provenance information in > all its richness, and peer reviewed papers showing how provenance information, > if provided according to the specification, can be used to underpin a range of > trust issues around data on the web. > > (b) Or maybe like this: a compact easily-grasped structure that makes it easy > for developers to attach available information to their published datasets with > just a few extra lines of code. So easy to understand and apply that it becomes > the norm to provide for every published dataset on the web, so that provenance > information about data becomes as ubiquitous as data on the web, as ubiquitous > as FOAF information about people. > > I think we are pretty much on course for (a), which is a perfectly reasonable > position, but for me the massive potential we have for real impact is (b), which > I think will be much harder to achieve on the basis of the current specifications. > > (My following comments are based in part on my experience as a developer working > with other complex ontologies (notably FRBR and CIDOC-CRM): by isolating and > clearly explaining the structural core, the whole ontology comes much easier to > approach and utilize.) > > > So what does it take to stand a chance of achieving (b)? My thoughts: > > 1. Identify the simple, structural core of provenance and describe that in a > normative self-contained document for developers, with sufficient rigor and > detail that developers who follow the spec can consistently generate basic > provenance information structures, and with enough simplicity that developers > whose primary interest is not provenance *can* follow the spec. This should be > less than 20 terms overall (the current "starting point" consists of 13 terms; > OPMV (http://open-biomed.sourceforge.net/opmv/ns.html) has 15). > > This structural core should also identify the intended extension points, and how > to add the "epistemic" aspects of provenance. (That's a term I've adopted for > this purpose- meaning the vocabulary terms that convey specific knowledge in > conjunction with the underlying provenance structure; e.g. the specific role of > an agent in an activity, the author of a document. Is there a more widely used > term for this?) The document at http://code.google.com/p/opmv/wiki/OPMVGuide2 > (esp. section 3) covers many of the relevant issues, including how to use common > provenance-related vocabularies in concert with the structural core. > > (NOTE: I say "normative" here, because I think the approach of directing > developers first to a non-normative primer is a kind of admission of failure, > and still leaves a developer needing to master the normative documents if there > are to be confident that their code is generating valid provenance information.) > > This could use information currently in the Primer (section 2, but not the stuff > about specialization/alternative) and/or Ontology documents (section 3.1). > > > 2. Introduce "epistemic" provenance concepts that deal with common specific > requirements (e.g. collections, quotation, etc.), without formalization. I > would expect this to be organized as reference material, consisting of several > optional and free-standing sub-sections (or even separate documents). Examples > of the kind of material might be > http://code.google.com/p/opmv/wiki/GuideOfCommonModule, > http://code.google.com/p/opmv/wiki/OPMVExtensionsDataCollections. > > This would cover the parts of the model corresponding to ""Expanded terms" and > "Dictionary terms" in the ontology document, and maybe aspects of "Qualified > terms" (see below). > > > 3. Ontology - specific terms for representing provenance in RDF. The current > provenance document seems to me to be pretty well organized from a high-level > view. (My assumption is that any of the subsections of "expanded terms", > "qualified terms" and "Dictionary terms" can be skipped by anyone who does not > need access to the capabilities they provide.) > > I have not been involved in the discussions about qualified terms, and I am > somewhat concerned by the level of complexity the introduce into the RDF model > (22 additional classes and 26 properties). I can only hope that most > applications that generate provenance information do not have to be concerned > with these. (Looking at figure 2 in the ontology document, it seems to me that > for many practical purposes the intent of these properties could be captured by > properties applied directly to the Activity ... it seems there's a kind of > "double reification" going on here with respect to the naive presentation of > provenance via something like DC. In practice, if I were developing an > application around this model using RDF that had to work with data at any > reasonable scale, I'd probably end up introducing such properties in any case > for performance reasons - cf. http://code.google.com/p/milarq/). > > > 4. Describe how to generate provenance information in very simple terms for > developers who are not and do not what to be specialists in provenance > information (e.g. think of a developer creating a web site using Drupal - we > want it to be really easy for them to design provenance information into their > system). > > > 5. Formal semantics, including the formal definition of PROV-N upon which it is > based. This would include material from > http://www.w3.org/2011/prov/wiki/FormalSemanticsWD3 > > > 6. Describe how to consume/interpret provenance information, in particular with > reference to the formal semantics. This would be aimed at more specialist users > (and creators) of provenance information, and would address the subtleties such > as specialization, alternative, etc. Among other things, it would cover more > formal aspects such as constraints, inferences, mappings from common patterns, > mapping from subproperties of the basic structural properties, and other > simplified ways of expressing information, to the qualified terms pattern, etc. > Much of the material currently in the DM "constraints" document might end up here. > > ... > > In summary: > > 1. A simple structural core model/vocabulary for provenance (Normative) > This should be the entry point, easy to read and absorb, for all users. > 2. Common extension terms (Normative) > This should be structured more as a reference work, > so relevant parts are easily accessed and others can be ignored. > 3. Ontology (i.e. expressing provenance in RDF) (Normative) > Pretty much as the current document. > 4. A simple guide for generating provenance information (Informative) > This would contain primer material dealing with the core concepts. > > For most developers, the above would be all they need to know about. > > 5. Formal semantics (incorporating PROV-N) (Normative) > A dense, formal description of PROV-N syntax and model theoretic > formal semantics for a strict interpretation of the provenance model. > 6. An advanced guide for using and interpreting provenance (Informative) > For advanced developers of provenance applications and/or theory, > exploring and explaining the more formal aspects of provenance and how > they might affect applications that use provenance. > > ... > > So those are my thoughts. They involve a fairly radical reorganization of the > material we have, but I don't think that they call for fundamental changes to > the technical consensus, or for the creation significant new material. Existing > material may need sub-editing, heavily in places. > > #g > -- > > -- ----------- ~oo~ -------------- Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org School of Computing Science, Newcastle University, UK http://www.cs.ncl.ac.uk/people/Paolo.Missier
Received on Tuesday, 8 May 2012 15:04:32 UTC