Re: Comments to the working draft 4 of DM from Paolo Missier on 2012-02-23 (public-prov-wg@w3.org from February 2012)

From: Paolo Missier <Paolo.Missier@ncl.ac.uk>
Date: Thu, 23 Feb 2012 15:44:08 +0000
To: Paul Groth <p.t.groth@vu.nl>
CC: Jun Zhao <jun.zhao@zoo.ox.ac.uk>, Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <4F465EC8.4070502@ncl.ac.uk>
Jun

thank you for your feedback and for getting involved in your usual passionate way :-)

I'm jumping in the middle of a long ping-pong at this point, but briefly:

- I agree with Luc's replies in the later part of the thread, except possibly that I don't see anything wrong with "entities and 
relations" (vs classes/concepts and properties), as long as they are used consistently and not mixed.

Some specific comments:

- scruffy vs proper: Luc and Paul elaborated on this distinction. In editorial terms, a beneficial side-effect of this distinction 
is that it makes it possible to write  part I /without mention/ of the "proper" semantics, which includes complications such as 
"partial states" and the whole baggage of constraints and interpretation axioms, which in fact have now been pushed into part II. So 
part I is simpler to read. On this note, Luc knows from a recent editorial chat that I don't completely agree on having sec. 7 in 
part I, as you risk turning the "teaser" into a confusion generator :-)  but this is beyond your comments and will be discussed later.

- the data model in 2.5 is a conceptual data model. It's deliberately written in very, very simple UML and it leaves out many 
concepts. It is designed to provide a simple baseline which roughly matches the concepts and relations in sec. 4 (it should match 
quite accurately, actually). And yes, I did write a narrative for it but it will appear in WD5 as it was too late for this round. 
Sorry about that.

- I agree that the terms "core" and "common relations" are not optimal. I would go for "additional" or "further" rather than 
"common". The intent of the separation, I think, is to indicate "if you only have time for some of this, then "core" is what you 
need to really really know about PROV. It will be enough for you to write useful provenance for starters". So the ideal progression 
to complete PROV enlightenment is
(primer) -> core -> "more stuff" (common relations) -> part II -> Semantics
The choice of minimal model in 2.5 follows this same idea.

I guess more comments will be coming and this is very shortly before the group feedback, so better stop at this.

Cheers,
   -Paolo



On 2/21/12 8:00 PM, Paul Groth wrote:
> Hi Jun
>
> Thanks for the clarifications. I think they're good comments and should be addressed. They seem primarily editorial so I'll let the editors take them
>
> Luc , Paolo? :-)
>
> Paul
>
> On Feb 21, 2012, at 20:39, Jun Zhao<jun.zhao@zoo.ox.ac.uk>  wrote:
>
>> Hi Paul,
>>
>> Sorry for my delay of my reply...
>>
>> On 19/02/2012 19:56, Paul Groth wrote:
>>> Hi Jun,
>>>
>>> I let the editors respond in more detail. Thanks for the review!
>>>
>>> ==Goal==
>>> I believe, The goal of the first document (PROV-DM Part 1) is to present
>>> the terms of the data model in natural language. It is a conceptual
>>> model. At least this is what I think :-) Maybe it should be said more
>>> explicitly...
>> Ditto.
>>> ==Scruffy&   Proper==
>>> In terms of "proper" and "scruffy" provenance here's what I believe we
>>> meant by these terms at F2F2. We identified two use cases:
>>>
>>> 1. The ability to use the PROV vocabulary to make provenance statements
>>> about existing things on the Web. Think for example adding simple
>>> provenance metadata (i.e. authorship) in a web page.
>>> 2. The ability to exchange PROV information between provenance systems
>>> where a static or fixed view of data is key. This is common in current
>>> provenance tracking systems. Think exchanging information between
>>> version control systems or two scientific workflow systems.
>>>
>>> Number 1 is the "scruffy" use-case, we don't want people to have care
>>> about fixing the state of things whereas Number 2 is the "proper"
>>> use-case where being able to refer to a specific partial state is
>>> important. So scruffy and proper aren't about minimal and non-minimal.
>>> It's about what sort of semantics a user wants to support.
>> Ok, that's totally different from what I have in mind. I'll write
>> another email after I thinking through whether/how we should get this
>> into the doc.
>>
>>> ==Lightweight??==
>>> I'm curious as to what you consider lightweight? Currently, we have 3
>>> "core" classes and edges between those. I guess the Figure in Section
>>> 2.5 seems fairly lightweight to me... I wonder what you think?
>> Yes, we have three 3 classes and their edges in the overview section,
>> but many more in the core section. Again, what do we mean by core?
>> Section 2 is much more lightweight than sect 4, which is good. The name
>> of core is confusing, at least for me, who doesn't have all the context
>> to interpret its actual meaning.
>>
>> The figure in sect 2.5 is very lightweight, but it doesn't correspond to
>> the content in section 2. In that figure, there are edges like used,
>> wasGeneratedBy, wasAssociatedWith, are they meant to match to Use,
>> Generation, Association? This is not a precise matching. We don't have
>> an agreed definition of what we mean by data model. We are leaving
>> readers to interpret the figure and the content.
>>
>> And what about Plan, which is mentioned in the content but not in the
>> figure.
>>
>> And what about wasStartedBy, wasEndedBy, addedOnBehalfOf, and
>> wasDerivedFrom, they are not discussed in the content.
>>
>> And many other such of kind of inconsistency. Am I reading a different
>> version of draft from you and Luc?
>>
>>> Just a note on the goal of the prov-dm document. It is to be accessible
>>> but it's not the entry point for the set of specifications. At the F2F2,
>>> it was agreed that the entry point would be the Primer and then the
>>> Ontology (or other serialization) and then one could drill down to the
>>> data model and finally to the semantics document. So this document may
>>> have more than one would want in a brief introduction.
>> Is this also clear in the document?
>>> ==Definition Repetition==
>>> Section 4 repeats many definition, actually by my request, so that for
>>> each term we have its definition. It acts as a glossary of terms.
>> I think section 4 is the right place for definitions, because that's
>> what that section is for. But section 2 is meant to give overviews,
>> right? Do we need that sort of formality in section 2? It just looks
>> complex and verbose, purely from a presentation point of view.
>>
>> HTH,
>>
>> -- Jun
>>> cheers
>>> Paul
>>>
>>>
>>> Jun Zhao wrote:
>>>> These comments are respect to the DM working draft 4,
>>>> http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html.
>>>> accessed on February 17, 2011.
>>>>
>>>> First of all, as my first time of reading the DM working draft, with my
>>>> very fresh pair of eyes, I would like to say well done to the group.
>>>> There are a lot of very interesting ideas in the model document, clearly
>>>> reflecting a lot of deep thinking about the problem domain. And I like
>>>> very much the position of the DM as for an interchange language. So well
>>>> done, guys!
>>>>
>>>> However, if the main goal of this new version of the working draft is to
>>>> simplify what we had, particularly to enable "an upgrade path, from
>>>> 'scruffy provenance' (term TBD), to 'precise provenance' (term TBD)", I
>>>> am not sure this goal was achieved!
>>>>
>>>> Here are what I think and why:
>>>>
>>>> 1. In the introduction section, there is no such introduction about
>>>> 'scruffy provenance' (term TBD), or 'precise provenance' (term TBD). I
>>>> think this is a key that should be brought in the front, and which
>>>> should be used to structure the rest of the document. And this is not
>>>> the case atm, IMO.
>>>>
>>>> 2. The Overview section: I am not sure I see much difference between
>>>> this section and the section giving definitions to the 'core'. I would
>>>> rather expect to see an overview of the model, for example, for the
>>>> scruffy and precise level, what terms and properties we have at each
>>>> level etc. I am sure Luc knows that the overview diagram needs update
>>>> and I couldn't read the figure properly even printed the doc with
>>>> high-resolution laser printer:)
>>>>
>>>> 3. I used the terminology of "terms" and "properties", but actually I
>>>> don't what this data model is. What do we mean by "data model"? Is it a
>>>> conceptual model, logical model, entity relationship model, or something
>>>> else? It's not clearly stated and I am confused what terminologies I
>>>> should used when referring to the model:(
>>>>
>>>> 4. The Example section: Would it be a good idea to define an example up
>>>> in the front and use it throughout the whole document? I don't find a
>>>> description about an example in this section and I found it hard to
>>>> follow the 'examples' given in Section 3. And in the rest of the
>>>> document, examples from many different scenarios are used. I wonder
>>>> whether that prevents us from simplifying the reading of the spec.
>>>>
>>>> 5. Section 4, the PROM-DM Core: There are a lot of repetition with the
>>>> overview section. And I wonder what we mean by "core". The core almost
>>>> includes "all" the DM terms (apart from the few in section 5). My
>>>> understanding of "core" would be really the essential set of DM terms
>>>> that are must-haves to express the minimal provenance. IMO, the current
>>>> "core" is rather inclusive, and provides constructs that can be used to
>>>> support some rather complex provenance expressions.
>>>>
>>>> If we can agree on the notion of "scruffy" (minimal??) and "precise"
>>>> (extended??), maybe the core part can be used to correspond to the
>>>> "scruffy" part, and make it lighter, more succinct, and easier and
>>>> quicker to grasp and follow?
>>>>
>>>> 6. There are many cross-references that don't quite work in the current
>>>> working draft, like saying some terms are mentioned in the previous or
>>>> another section. I didn't include these problems here because I think
>>>> these were caused by the re-structuring. I could list them out once the
>>>> structure gets more stable.
>>>>
>>>> 7. There are also some technical points that I marked down in the
>>>> review, which I didn't raise here either, because I am 'new' to the
>>>> group and I don't want to re-open closed issues. What's the stage of the
>>>> technical part of DM? Are there still open technical discussions?
>>>>
>>>>
>>>> In my opinion I think the document still needs some more work on the
>>>> structuring and organization front to make it simplified.
>>>>
>>>> I think we should make a better use of the notion of "scruffy"
>>>> (minimal??) and "precise" (extended??), and use this to guide the
>>>> restructuring of the document.
>>>>
>>>> Thoughts?
>>>>
>>>> HTH,
>>>>
>>>> -- Jun
>>>>
>>>>


-- 
-----------  ~oo~  --------------
Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
School of Computing Science, Newcastle University,  UK
http://www.cs.ncl.ac.uk/people/Paolo.Missier
Received on Thursday, 23 February 2012 15:44:32 UTC