Re: A proposal for reorganizing PROV materials

Paul,

Summary:

- we clearly have differing perspectives and priorities; I'm OK with that.  What 
we have now is mostly OK, but IMO could be better.  Whether or not the effort to 
make it better is worthwhile, or even possible, is for the group to decide.

- I think the best possible outreach is a clear concise normative specification 
that doesn't try to define too much.

- Despite my reservations about the complexity introduced by the qualified 
relations pattern, I'm not disputing the technical consensus.

On 08/05/2012 21:41, Paul Groth wrote:
> Hi Graham,
>
> I actually with some fine-tuning we can get to something that
> addresses your overarching comment of accessibility of the
> specifications.
>
> * I note that we already agreed at the F2F meeting about a reading of
> the specifications, namely prov-primer ->  prov-o ->  prov-dm. This is
> something we put in all documents.

Sure... I'm not a fan of that, but don't object.  But I'd still like the main 
DM, which is notionally the central document (IMO), to be more directly accessible.

> * I think the separation of collections into a separate document would
> help tremendously in terms of shortening the documents.

Yes it helps.

> * As you also note, most web developers will be coming at this through
> the ontology. As you note, you find this accessible. Hopefully we can
> spend some more time with an RDFa best practice once we have finished
> the core specs.

The ontology document is a saviour, IMO.  When I was thinking about "radical" 
proposals, I toyed with making a suggestion to drop the DM entirely. I think 
it's essential normative content is covered elsewhere.  But I'm pretty sure 
others might think differently :)

> * The pattern of qualifiedRelations was *extensively* discussed in the
> prov-o team for many months. The conclusion the group came to follows
> a simple pattern that can be directly and systematically applied. In
> my opinion, this is settled business.

Sure.  I haven't been following that discussion in detail, but when I look at 
the resulting ontology I can't help but have reservations.  I can see how these 
structures capture information at a level of detail that is not otherwise easily 
captured in systematic form, but I think this will be of benefit to a relatively 
tiny minority of provenance specialists.  I'm not actually proposing a change, 
but that doesn't mean I like its presence here.

> * There is a larger question around core/extension in terms of
> editorial organization. I think Luc's email deserves thought. There
> were complaints about the original core/extension discussion in the
> group and no one defended the organization so it was reworked into the
> modular structure of components.

Yes, I responded to Luc.  I think the original core/extension presentation 
didn't really achieve the accessibility I'd like to see, so would have been hard 
to defend in that form.

> * I find the starting points structure very clear. When I have given
> presentations on prov, people quickly grasp this. I don't see how this
> differs much from your request for a clear core.

I somewhat agree with this.  Particularly as presented in the ontology document. 
  But presentation != reading document in isolation, which is what most 
developers will have to do.  I find the notion of "starting point" is less 
useful that "core", but clearly you feel differently.  In particular, for me, 
the presentation of a core that is complete in itself is more useful.  And the 
"starting point" is contained within larger documents.

It's maybe unfortunate that I've focused most of my available reviewing effort 
on the Data Model document, which *was* really hard going, and I think is still 
the least easy part of the material to assimilate.  If I'd started elsewhere, 
maybe I would not be making these comments.  My concern is that (notwithstanding 
the exhortation to start with the primer), it will be treated as the definitive 
go-to place for developer information.

> In general, I think the group has achieved consensus on the
> constructs, organization and definitions of the model. We are in the
> proces of cleaning things up. I would like to see us have some time to
> spend on all the outreach material (best practice, examples,
> implementations) that I think will make this a success.

This takes me back to my original motivation (cf. 
http://lists.w3.org/Archives/Public/public-prov-wg/2012May/0099.html - "what 
does success look like").  My feeling is that the best possible outreach 
material is a clear concise normative specification of the essential concepts.

I suppose we're having this discussion in part because the "cleaning things up" 
process started rather late in the day (following the last F2F, in my 
perception).  How much time and effort we can put into that is for the group to 
decide, and I'm just one voice here.  You asked me for specific suggestions, and 
I responded as best I could; I'm not really expecting my suggestions to be 
adopted, but if they suggests anything the group feels can usefully be done then 
that's a good thing.

#g
--

> On Tue, May 8, 2012 at 2:20 PM, Graham Klyne<graham.klyne@zoo.ox.ac.uk>  wrote:
>> On 06/05/2012 12:01, Paul Groth wrote:
>>> It would really be good to get specific suggestions from you. What
>>> should be cut? What should be changed?
>>
>> <TL:DR>
>> For "normal" developers:
>> 1. A simple structural core model/vocabulary for provenance, also identifying
>> extension points
>> 2. Common extension terms
>> 3. Ontology (i.e. expressing provenance in RDF)
>> 4. A simple guide for generating provenance information
>>
>> For advanced users of provenance:
>> 5. Formal semantics (incorporating PROV-N)
>> 6. An advanced guide for using and interpreting provenance
>> </TL:DR>
>>
>> ...
>>
>> Paul, I've been thinking about your question, and will try to articulate here my
>> thoughts.  They will be quite radical, and I don't really expect the group to
>> accept them - but I hope they may trigger some useful reflection.  (Separating
>> collections is a useful step, but I feel it's rather nibbling at the edge of the
>> complexity problem rather than facing it head-on.)
>>
>> Before diving in, I think it's worth reviewing my motivation for this...
>>
>>
>> At the heart of my position is the question:
>>
>>    "For provenance, what does success look like?"
>>
>> (a) Maybe it looks like this:  rich and fully worked out specifications which
>> are shown to address a range of described use-cases, complete with a consistent
>> underlying theory that can be used to construct useful proofs around provenance
>> information, reviewed and accepted for standards-track publication in the W3C.
>> Software implementations that capture and exploit this provenance information in
>> all its richness, and peer reviewed papers showing how provenance information,
>> if provided according to the specification, can be used to underpin a range of
>> trust issues around data on the web.
>>
>> (b) Or maybe like this:  a compact easily-grasped structure that makes it easy
>> for developers to attach available information to their published datasets with
>> just a few extra lines of code.  So easy to understand and apply that it becomes
>> the norm to provide for every published dataset on the web, so that provenance
>> information about data becomes as ubiquitous as data on the web, as ubiquitous
>> as FOAF information about people.
>>
>> I think we are pretty much on course for (a), which is a perfectly reasonable
>> position, but for me the massive potential we have for real impact is (b), which
>> I think will be much harder to achieve on the basis of the current specifications.
>>
>> (My following comments are based in part on my experience as a developer working
>> with other complex ontologies (notably FRBR and CIDOC-CRM):  by isolating and
>> clearly explaining the structural core, the whole ontology comes much easier to
>> approach and utilize.)
>>
>>
>> So what does it take to stand a chance of achieving (b)?  My thoughts:
>>
>> 1. Identify the simple, structural core of provenance and describe that in a
>> normative self-contained document for developers, with sufficient rigor and
>> detail that developers who follow the spec can consistently generate basic
>> provenance information structures, and with enough simplicity that developers
>> whose primary interest is not provenance *can* follow the spec.  This should be
>> less than 20 terms overall (the current "starting point" consists of 13 terms;
>> OPMV (http://open-biomed.sourceforge.net/opmv/ns.html) has 15).
>>
>> This structural core should also identify the intended extension points, and how
>> to add the "epistemic" aspects of provenance.  (That's a term I've adopted for
>> this purpose- meaning the vocabulary terms that convey specific knowledge in
>> conjunction with the underlying provenance structure; e.g. the specific role of
>> an agent in an activity, the author of a document.  Is there a more widely used
>> term for this?)  The document at http://code.google.com/p/opmv/wiki/OPMVGuide2
>> (esp. section 3) covers many of the relevant issues, including how to use common
>> provenance-related vocabularies in concert with the structural core.
>>
>> (NOTE: I say "normative" here, because I think the approach of directing
>> developers first to a non-normative primer is a kind of admission of failure,
>> and still leaves a developer needing to master the normative documents if there
>> are to be confident that their code is generating valid provenance information.)
>>
>> This could use information currently in the Primer (section 2, but not the stuff
>> about specialization/alternative) and/or Ontology documents (section 3.1).
>>
>>
>> 2. Introduce "epistemic" provenance concepts that deal with common specific
>> requirements (e.g. collections, quotation, etc.), without formalization.  I
>> would expect this to be organized as reference material, consisting of several
>> optional and free-standing sub-sections (or even separate documents).  Examples
>> of the kind of material might be
>> http://code.google.com/p/opmv/wiki/GuideOfCommonModule,
>> http://code.google.com/p/opmv/wiki/OPMVExtensionsDataCollections.
>>
>> This would cover the parts of the model corresponding to ""Expanded terms" and
>> "Dictionary terms" in the ontology document, and maybe aspects of "Qualified
>> terms" (see below).
>>
>>
>> 3. Ontology - specific terms for representing provenance in RDF.  The current
>> provenance document seems to me to be pretty well organized from a high-level
>> view.  (My assumption is that any of the subsections of "expanded terms",
>> "qualified terms" and "Dictionary terms" can be skipped by anyone who does not
>> need access to the capabilities they provide.)
>>
>> I have not been involved in the discussions about qualified terms, and I am
>> somewhat concerned by the level of complexity the introduce into the RDF model
>> (22 additional classes and 26 properties).  I can only hope that most
>> applications that generate provenance information do not have to be concerned
>> with these.  (Looking at figure 2 in the ontology document, it seems to me that
>> for many practical purposes the intent of these properties could be captured by
>> properties applied directly to the Activity ... it seems there's a kind of
>> "double reification" going on here with respect to the naive presentation of
>> provenance via something like DC.  In practice, if I were developing an
>> application around this model using RDF that had to work with data at any
>> reasonable scale, I'd probably end up introducing such properties in any case
>> for performance reasons - cf. http://code.google.com/p/milarq/).
>>
>>
>> 4. Describe how to generate provenance information in very simple terms for
>> developers who are not and do not what to be specialists in provenance
>> information (e.g. think of a developer creating a web site using Drupal - we
>> want it to be really easy for them to design provenance information into their
>> system).
>>
>>
>> 5. Formal semantics, including the formal definition of PROV-N upon which it is
>> based.  This would include material from
>> http://www.w3.org/2011/prov/wiki/FormalSemanticsWD3
>>
>>
>> 6. Describe how to consume/interpret provenance information, in particular with
>> reference to the formal semantics.  This would be aimed at more specialist users
>> (and creators) of provenance information, and would address the subtleties such
>> as specialization, alternative, etc.  Among other things, it would cover more
>> formal aspects such as constraints, inferences, mappings from common patterns,
>> mapping from subproperties of the basic structural properties, and other
>> simplified ways of expressing information, to the qualified terms pattern, etc.
>>   Much of the material currently in the DM "constraints" document might end up here.
>>
>> ...
>>
>> In summary:
>>
>> 1. A simple structural core model/vocabulary for provenance (Normative)
>>     This should be the entry point, easy to read and absorb, for all users.
>> 2. Common extension terms (Normative)
>>     This should be structured more as a reference work,
>>     so relevant parts are easily accessed and others can be ignored.
>> 3. Ontology (i.e. expressing provenance in RDF) (Normative)
>>     Pretty much as the current document.
>> 4. A simple guide for generating provenance information (Informative)
>>     This would contain primer material dealing with the core concepts.
>>
>> For most developers, the above would be all they need to know about.
>>
>> 5. Formal semantics (incorporating PROV-N) (Normative)
>>     A dense, formal description of PROV-N syntax and model theoretic
>>     formal semantics for a strict interpretation of the provenance model.
>> 6. An advanced guide for using and interpreting provenance (Informative)
>>     For advanced developers of provenance applications and/or theory,
>>     exploring and explaining the more formal aspects of provenance and how
>>     they might affect applications that use provenance.
>>
>> ...
>>
>> So those are my thoughts.  They involve a fairly radical reorganization of the
>> material we have, but I don't think that they call for fundamental changes to
>> the technical consensus, or for the creation significant new material.  Existing
>> material may need sub-editing, heavily in places.
>>
>> #g
>> --
>>
>
>
>

Received on Wednesday, 9 May 2012 10:18:28 UTC