Re: Proposal for restructuring PROV-DM into core+extensions from Graham Klyne on 2012-05-17 (public-prov-wg@w3.org from May 2012)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Thu, 17 May 2012 09:20:07 +0100
To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
CC: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <4FB4B4B7.8040500@zoo.ox.ac.uk>
I've gone back to the proposal I created at 
http://www.w3.org/2011/prov/wiki/ProvDM_Proposal_for_restructuring and added 
some explanation of my rationale.  In summary, my goals are:

1. Separate core provenance patterns from specific applications
2. Maximize interoperability with other systems and mechanisms
3. Minimize ontological commitment required of users

For ease of reference, I also include my explanation of rationale below:

Rationale

This proposal attempts to facilitate take-up of the provenance model and 
vocabulary by addressing the following goals:

1. Separate core provenance patterns from specific applications

At heart, the provenance data model and ontology provide a framework and pattern 
for constructing a provenance trace; i.e. to enable traceability of artifacts to 
their ultimate sources, by whatever mean s they may have been produced. The 
proposed core patterns aim to capture the essence of this pattern, separately 
from the specific applications for which it may be used.

It has been my experience working with other complex ontologies that 
understanding the central patterns is the key to being able to work with the 
ontology as a whole. An example of this is FRBR 
(http://archive.ifla.org/VII/s13/frbr/frbr_current3.htm): if one goes to the 
original FRBR specification, it is bloated with detailed bibliographic record 
types, many of which are outdated (e.g. has terms for the groove pitch of a 
gramophone record, but no term for the bit rate of an MP3). But at the heart of 
FRBR is a very simple structure which is timeless and is frequently referenced 
in discussions about information systems and catalogues (i.e. work, expression, 
manifestation, item and supporting relationships. That core structure of FRBR is 
not easy to discover unless one already understands it.

2. Maximize interoperability with other systems and mechanisms

The proposed core patterns substantially satisfy the "Test of independent 
invention" (cf. http://www.w3.org/DesignIssues/Principles.html). There are many 
proposals for provenance information, all of which differ in their details end 
emphasis, but all of them use something like the core pattern described here. 
There are also ontologies that exhibit a similar structure that are not 
presented as provenance.

By describing the central patterns separately from the application refinements, 
it becomes easier to locate and exploit the correspondences between the 
provenance ontology and other provenance-likje structures that may be 
encountered on the web.

As a standards group, we do our most useful work by identifying and documenting 
the intersection of alternative provenance representation systems, rather than 
collecting and documenting their union. In my experience, a good standard is one 
that a reader can pick up and almost dismiss as being blindingly obvious, rather 
than one that demands detailed attention to a plethora of explicit detail; 
getting users "on the same page" is the first step to meaningful interoperability.

Simplicity of essential concepts is a key for getting "ordinary developers" to 
accept and use a specification, without which there is no meaningful 
interoperability.

3. Minimize ontological commitment required of users

The core patterns embody very little actual knowledge of the artifacts and 
processes involved; i.e. they require very little ontological commitment to a 
specific application view of the world. Less ontological commitment means that 
there is less for different people or communities to disagree about, and hence 
provides a basis for wider uptake of the core ideas.

Provenance is one of those topics that many people agree is important, but very 
few actually actually agree what it must capture. It's a bit like the story of 
the blind men and the elephant - everyone sees it from their own perspective. By 
digging down and exposing the core structures in very simple terms, we can make 
it easier for different communities to relate their views of provenance.

A symptom of ontological commitment can be seen within the working group: there 
is an large amount of discussion taking place which is concerned with the 
minutiae of exactly what term X or term Y mean, and exactly what information 
needs to be represented. In practice, I submit, applications will record the 
information they have available, and for maximum fidelity will use whatever 
terms are "native" to the application. To the extent that the PROV application 
extension terms represent commonly understood operations, they are useful and 
may be used, but we should not expect them to displace more precise application 
ontologies. The challenge then becomes how to recognize that the application 
ontologies have a relationship with (or are specializations of) these common 
terms. The core structural patterns, through their lack of ontological 
commitment, are the natural highest common factor for such alignment and, as 
such, deserve to be presented very clearly as the central concepts around which 
other terms are assembled.

#g
--




On 12/05/2012 11:13, Graham Klyne wrote:
> Following Thursday's telecon, I've done an initial cut of a proposal for
> rearranging PROV-DM material:
>
> http://www.w3.org/2011/prov/wiki/ProvDM_Proposal_for_restructuring
>
> I've added abstract to the document outline that try to capture the
> distinction/rationale for the proposed structure.
>
> For the most part, I find the distinction between essential structure and
> epistemic refinement has bene fairly easy to call, but there are, inevitably, a
> couple of areas where it's not so clear for me.
>
> (a) wasInformedBy and wasStartedByActivity - I think these are both instances of
> an (as yet) unstated parent structure, which one might call "wasInfluencedBy" -
> i.e. any effect of one activity on another activity. My choice would be to have
> this new property in the core, and wasInformedBy and wasStartedByActivity as
> refinements (i.e.extensions)
>
> (b) wasInvalidatedBy - in terms of capturing the essence of a provenance trace,
> this seems of secondary importance, but it does seem to be the natural
> counterpart for wasGeneratedBy so I've left it in core for now.
>
> (c) entity specializationOf and alternateOf. These could be argued to be purely
> structural, but I felt that they aren't essential to representing a provenance
> trace, and they are sufficiently tricky that I didn't want to risk the potential
> distraction of including them in the core.
>
> In drawing up this proposal, I have tried to focus on reorganizing existing
> material. Separately from that, I think there are a number of possible
> improvements, some of which wouldbe facilitated by the reorganization.
>
> I've also included in the core some of the auxilliary material that I think is
> needed to properly explain the core data model constructs (attributes, values,
> etc.) and have included it further to the front than in the current document,
> under "Preliminaries".
>
> #g
>
Received on Thursday, 17 May 2012 08:21:33 UTC