- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Thu, 17 May 2012 09:20:07 +0100
- To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- CC: W3C provenance WG <public-prov-wg@w3.org>
I've gone back to the proposal I created at http://www.w3.org/2011/prov/wiki/ProvDM_Proposal_for_restructuring and added some explanation of my rationale. In summary, my goals are: 1. Separate core provenance patterns from specific applications 2. Maximize interoperability with other systems and mechanisms 3. Minimize ontological commitment required of users For ease of reference, I also include my explanation of rationale below: Rationale This proposal attempts to facilitate take-up of the provenance model and vocabulary by addressing the following goals: 1. Separate core provenance patterns from specific applications At heart, the provenance data model and ontology provide a framework and pattern for constructing a provenance trace; i.e. to enable traceability of artifacts to their ultimate sources, by whatever mean s they may have been produced. The proposed core patterns aim to capture the essence of this pattern, separately from the specific applications for which it may be used. It has been my experience working with other complex ontologies that understanding the central patterns is the key to being able to work with the ontology as a whole. An example of this is FRBR (http://archive.ifla.org/VII/s13/frbr/frbr_current3.htm): if one goes to the original FRBR specification, it is bloated with detailed bibliographic record types, many of which are outdated (e.g. has terms for the groove pitch of a gramophone record, but no term for the bit rate of an MP3). But at the heart of FRBR is a very simple structure which is timeless and is frequently referenced in discussions about information systems and catalogues (i.e. work, expression, manifestation, item and supporting relationships. That core structure of FRBR is not easy to discover unless one already understands it. 2. Maximize interoperability with other systems and mechanisms The proposed core patterns substantially satisfy the "Test of independent invention" (cf. http://www.w3.org/DesignIssues/Principles.html). There are many proposals for provenance information, all of which differ in their details end emphasis, but all of them use something like the core pattern described here. There are also ontologies that exhibit a similar structure that are not presented as provenance. By describing the central patterns separately from the application refinements, it becomes easier to locate and exploit the correspondences between the provenance ontology and other provenance-likje structures that may be encountered on the web. As a standards group, we do our most useful work by identifying and documenting the intersection of alternative provenance representation systems, rather than collecting and documenting their union. In my experience, a good standard is one that a reader can pick up and almost dismiss as being blindingly obvious, rather than one that demands detailed attention to a plethora of explicit detail; getting users "on the same page" is the first step to meaningful interoperability. Simplicity of essential concepts is a key for getting "ordinary developers" to accept and use a specification, without which there is no meaningful interoperability. 3. Minimize ontological commitment required of users The core patterns embody very little actual knowledge of the artifacts and processes involved; i.e. they require very little ontological commitment to a specific application view of the world. Less ontological commitment means that there is less for different people or communities to disagree about, and hence provides a basis for wider uptake of the core ideas. Provenance is one of those topics that many people agree is important, but very few actually actually agree what it must capture. It's a bit like the story of the blind men and the elephant - everyone sees it from their own perspective. By digging down and exposing the core structures in very simple terms, we can make it easier for different communities to relate their views of provenance. A symptom of ontological commitment can be seen within the working group: there is an large amount of discussion taking place which is concerned with the minutiae of exactly what term X or term Y mean, and exactly what information needs to be represented. In practice, I submit, applications will record the information they have available, and for maximum fidelity will use whatever terms are "native" to the application. To the extent that the PROV application extension terms represent commonly understood operations, they are useful and may be used, but we should not expect them to displace more precise application ontologies. The challenge then becomes how to recognize that the application ontologies have a relationship with (or are specializations of) these common terms. The core structural patterns, through their lack of ontological commitment, are the natural highest common factor for such alignment and, as such, deserve to be presented very clearly as the central concepts around which other terms are assembled. #g -- On 12/05/2012 11:13, Graham Klyne wrote: > Following Thursday's telecon, I've done an initial cut of a proposal for > rearranging PROV-DM material: > > http://www.w3.org/2011/prov/wiki/ProvDM_Proposal_for_restructuring > > I've added abstract to the document outline that try to capture the > distinction/rationale for the proposed structure. > > For the most part, I find the distinction between essential structure and > epistemic refinement has bene fairly easy to call, but there are, inevitably, a > couple of areas where it's not so clear for me. > > (a) wasInformedBy and wasStartedByActivity - I think these are both instances of > an (as yet) unstated parent structure, which one might call "wasInfluencedBy" - > i.e. any effect of one activity on another activity. My choice would be to have > this new property in the core, and wasInformedBy and wasStartedByActivity as > refinements (i.e.extensions) > > (b) wasInvalidatedBy - in terms of capturing the essence of a provenance trace, > this seems of secondary importance, but it does seem to be the natural > counterpart for wasGeneratedBy so I've left it in core for now. > > (c) entity specializationOf and alternateOf. These could be argued to be purely > structural, but I felt that they aren't essential to representing a provenance > trace, and they are sufficiently tricky that I didn't want to risk the potential > distraction of including them in the core. > > In drawing up this proposal, I have tried to focus on reorganizing existing > material. Separately from that, I think there are a number of possible > improvements, some of which wouldbe facilitated by the reorganization. > > I've also included in the core some of the auxilliary material that I think is > needed to properly explain the core data model constructs (attributes, values, > etc.) and have included it further to the front than in the current document, > under "Preliminaries". > > #g >
Received on Thursday, 17 May 2012 08:21:33 UTC