Re: PROV-DM (DM4) - review up to section 4.2.3.3 from Graham Klyne on 2012-03-29 (public-prov-wg@w3.org from March 2012)

From: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
Date: Thu, 29 Mar 2012 07:52:38 +0100
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: public-prov-wg@w3.org
Message-ID: <4F7406B6.60209@zoo.ox.ac.uk>
Yes indeed.   I shall raise a new issue if I feel the need on my next cycle. 
Please close this one.

#g
--

On 28/03/2012 16:31, Luc Moreau wrote:
> Hi Graham,
>
> Given this, I am closing iSSUE-274 (feedback on WD4).
> I assume you will create new issues, when you review the next version.
>
> Further responses interleaved.
>
> On 03/25/2012 10:06 AM, Graham Klyne wrote:
>> On 23/03/2012 13:09, Luc Moreau wrote:
>>> Hi Graham,
>>>
>>> Thanks for your feedback. We have incorporated some of your suggestions in the
>>> current editor's draft [1]
>>>
>>> Find below our response to your individual points.
>>>
>>> If you think that some of these points are going to be blockers for the release
>>> of WD5 or LC, it would
>>> be useful if you could raise them now, so that we can discuss them by email,
>>> and find a solution before you review again the document in 10 days time, or so.
>>>
>>> In particular, after careful consideration, Paolo and I think that:
>>> - Overview diagram should remain in section 2.5
>>
>> You offer no reasons to change my view. I'll see what I think on my next
>> review of the document. These are IMO document
>> quality/readability/approachability issues, not technical fundamentals, but
>> approachability of provenance is the issue that is supposed to have been
>> addressed by the reorganization.
>>
>> Let me try and explain my rationale for this suggestions:
>>
>> I approached this document with a mindset of a developer trying to understand
>> the provenance model. Ideally, I should be able to read the document once,
>> front-to-back, and know what I need to know. For this, it is really useful if
>> one of the first things I encounter is a high-level overview of what follows:
>> the diagram is a great way to do this (though the diagram itself could do with
>> some improvement). Without this high level overview, I have no conceptual
>> framework to relate the ore detailed concepts that follow. Hence my suggestion
>> to include it at the start of section 2.
>>
>
> The starting points section is about explaining provenance concepts as well as
> types and relations of the data model.
>
> As soon as one starts with types and relations, there is some technicality
> involved (e.g. binary/n-ary relations, etc).
> In the definitions of concepts, we are staying away from this technicality.
> Hence, the order currently in the document.
>
>>> - Example of section 3 should remain there
>>
>> I find the example to be completely unhelpful, until I have a clearer view of
>> what it is meant to be an example *of*. It is demanding that I understand the
>> (relatively) complex scenario of the example when what I really want to
>> understand is the provenance model. It may serve a purpose for motivating
>> provenance, but it doesn't help me to understand the provenance model. In
>> practice, when reading the document, I looked at the early paragraphs and
>> skipped this section entirely. I think it breaks the flow between the
>> introductory material and the more detailed description of the DM.
>
> Hopefully the introduction to the example (which didn't exist when you read) helps.
>>
>> [Later] below, I make an alternative suggestion to put the example section
>> *before* the overview. Maybe also title it as a "motivating example".
>>
>>> - AlternateOf/SpecializationOf are part of prov-dm and should be presented in
>>> this document
>>
>> Again, no reason given to change my view - maybe there is good reason, but I
>> don't know what it is. And I note, per issue 29, it's still a challenge to
>> explain, which might be indicative. I think there's a danger that we've been
>> round this so much that the document/model is becoming too inward-looking as
>> opposed considering the goals of its readers/users.
>>
>
> This is I hope a resolved issue now.
>>> - Notions of responsibility, agents and plan were debated at length in ISSUE-203
>>> which is now
>>> closed, and we are not proposing to reopen it, unless new evidence is offered.
>>
>> I'll accept this for now, pending review of a revised document. As I recall,
>> my comment was to do with lack of clarity of what is being described.
>>
>>
>>> [1] http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html
>>>
>>> > Summary: I think the content is generally a big improvement, but there
>>> > are some possible further removals, and I think there remain a number
>>> > of document quality issues to be addressed before getting to last
>>> > call. Hopefully, these can be considered in DM5
>>> >
>>> > When the content stabilizes, I may offer some alternate drafting
>>> > suggestions, but I think it's in too much flux right now for that to
>>> > be worthwhile.
>>> >
>>> > ...
>>> >
>>> > Re: http://dvcs.w3.org/hg/prov/raw-file/f52c0bb53dd4/model/prov-dm.html
>>> > (Retrieved 2012-30-08)
>>> >
>>> > I'd wish to see all references to "things in the world" expunged: it's
>>> > an ugly expression that begs more questions than it answers, and IMO
>>> > runs the risk of confusing readers.
>>>
>>>
>>> OK, no longer talk about "thing in the world" but "thing".
>>
>> Thanks.
>>
>>> > Section 1 intro: rewording in 1st 3 paras.
>>> >
>>> > Suggest that the provenance notation be a part 1 appendix, not a
>>> > separate part/document. Drop references to ASN - it's *not* an
>>> > *abstract* syntax notion; indeed, I think that very expression is an
>>> > oxymoron.
>>>
>>> We now call it PROV-N.
>>
>> Ack.
>>
>>> Having gone through the process of writing productions fully, there
>>> are some grammatical syntactic details that have no place in the PROV-DM
>>> document.
>>> Also, PROV-N provides examples of instances to explain the grammar.
>>> This has no place in the PROV-DM document either.
>>>
>>> Furthermore, past experience has shown that readers confuse prov-dm and prov-n.
>>>
>>> So, the editor's recommendation is to keep the documents separate.
>>> >
>>> > Part 2 is *not* an upgrade path. Please don't say this. (It's a
>>> > refinement of use that allows provenance information from different
>>> > sources to be combined in meaningful ways.)
>>>
>>>
>>> Replaced 'upgrade path' by 'refinement'
>>
>> Thanks. (FWIW, I've started to think of it as a "strict interpretation", which
>> is a kind of refinement...)
>>
>>> > More text refinement in section 1.
>>> >
>>> >
>>> > Section 2.1
>>> >
>>> > Saying "Activity is anything ..." is confusing. It suggests a
>>> > continuant rather than an occurrent.
>>>
>>> Rephrased as follows:
>>>
>>> An activity is something that occurs and acts upon or with entities.
>>
>> Better.
>>
>>> > Sub-editing would improve this.
>>
>> Maybe...
>> "An activity occurs within some period of time and acts upon entities."
>> ?
>>
> An activity is something that occurs over a period of time and acts upon or with
> entities.
>
>>> >
>>> >
>>> > Section 2.2
>>> >
>>> > I think it would be clearer if generation and usage were introduced as
>>> > events associated with activities. (Discussion of them being
>>> > instantaneous can come in Part 2)
>>>
>>> It was agreed at F2F2 that we shouldn't introduce event in part 1.
>>> We followed this guidance. The term event is only defined in part 2.
>>
>> I have a vague recollection of this, and feeling uneasy at the time, but
>> unable to articulate why. It seems to me that an "event" (stripped of
>> subtleties) is a concept that is easy enough to grasp, and might make it
>> easier to describe the various types of events.
>>
>>> > Introducing generation as "completed production" reads really
>>> > strangely to me, and sounds as if it could be a produced artifact. I
>>> > think a form like "completion of production" is clearer. Similarly
>>> > for usage, something like "starting to consume".
>>> >
>>>
>>> Updated definitions as follows:
>>>
>>> Generation is the completion of production of a new entity by an activity.
>>>
>>> Usage is the beginning of consumption of a new entity by an activity.
>>>
>>>
>>> > Sub-editing would improve this.
>>> >
>>> >
>>> > Section 2.3:
>>> >
>>> > "AccountEntity" - why not just "Account". Also, I understood this was
>>> > to *be* a bundle, not a container for a bundle.
>>>
>>> To be addressed, once other editing work for WD5 is completed.
>>>
>>> The two notions (container vs bundle) are useful, for different purposes.
>>> To be investigated.
>>
>> At an implementation level it may be important to be clear about a distinction
>> between the contained and the container, but for a conceptual model I really
>> think we should try to focus on the contained ("bundle") avoid talking about
>> containers - I think that adds confusion.
>>
>>> >
>>> > The example given has no clear relationship to the description. I
>>> > understood the key use-case here was to express provenance of
>>> > procenance, and that is why we have accounts. I think that should be
>>> > stated clearly; e.g.
>>>
>>> This is made clearer, following definition and in example.
>>>
>>> >
>>> > "An account is a bundle of provenance statements treated as an entity
>>> > which may itself have some associated provenance."
>>> >
>>>
>>> Subtle difference again: "... treated as an entity ..." vs " ... is an entity
>>> ..."
>>
>> I agree ...
>>
>>> We can definitely add "... which may itself have some associated provenance "
>>
>> I think that's the main point.
>>
>>> >
>>> > Agents. I think the notion of responsibility here is so loose as to
>>> > be of no practical value. When we say a text editor is "responsible
>>> > for" crashing a computer, that's a kind of anthropomorphism, not a
>>> > literal claim of responsibility. What we really mean is that the text
>>> > editor caused the crash. The notion of responsibility is generally
>>> > associated with duty, authority and/or accountability
>>> > (cf. http://oxforddictionaries.com/definition/responsibility?view=uk).
>>> > This is why persons and organizations are distinct from software
>>> > agents. I suggest that the text here should "stick to the knitting":
>>> > just state that these are commonly encountered kinds of agent, and
>>> > leave it at that.
>>>
>>>
>>> The example about software agent was simplified. Indeed no need to mention
>>> responsibility here.
>>> This is left to section 2.4.
>>
>> Thanks.
>>
>>> >
>>> > Section 2.4
>>> >
>>> > This continues the muddle about "responsibility", until the definition
>>> > of agent responsibility realtion which seems about right to me (note
>>> > the phrase "accountable for" here).
>>> >
>>> > The use of responsibility in the description of association seems
>>> > completely wrong to me.
>>>
>>> What would you suggest?
>>
>> Focusing on the accountability aspect? I'll look again at your text in a
>> subsequent review
>>
>>> >
>>> > The discussion of activity association is surreal. A plan is defined
>>> > previously as an "Entity", but association relates an *agent* to an
>>> > activity.
>>>
>>> It's a ternary relation.
>>> This was discussed at length in ISSUE-203, which is now closed.
>>>
>>> I am not proposing to reopen it, unless new information is brought forward.
>>
>> (See comments at head - maybe the actual intent isn't coming through.)
>>
>>> >
>>> > I think this section needs re-drafting.
>>> >
>>> >
>>> > Section 2.5
>>> >
>>> > I think the intent and content of the diagram is generally good, but
>>> > that its visual presentation could usefully be improved. I think it
>>> > should appear as part of the introduction to section 2, not at the
>>> > end.
>>> >
>>>
>>> We are now generating a PNG, so hopefully its better.
>>>
>>> After careful consideration, we felt it was better to leave it in section 2.5,
>>> in part,
>>> because we need to map the concepts (expressed in natural language) to prov-dm
>>> types/relations.
>>
>> I don't see how the diagram-at-end aids this. See comments at top.
>>
>>> > Generally in section 2, I think the examples are mostly well-chosen,
>>> > but their presentation breaks up the flow of the overview; I woukd
>>> > prefer that the examples were more succinct, maybe fewer, and
>>> > introduced inline in the descriptive overview text. Ideally the whole
>>> > overview would fit on just one or two pages (i.e. about half its
>>> > current length on a printed page). The key purpose here, IMO, is to
>>> > give a quick overview of how the various concepts are used together.
>>> >
>>> >
>>>
>>> Usual trade-off. Now that concepts seem clearer, than we don't need examples.
>>>
>>> I think that examples are clearly delimited and can be skipped if the reader
>>> wants.
>>
>> Maybe it's OK. But I don't think the "reader can skip" argument really works
>> when the quantity of material to be skipped is as much as the core material.
>> As you say, it's a trade-off; in an introductory/overview section, I'd wish
>> the trade-off to be more in favour of concision. IMO, a function of an
>> overview is be be easily scan-able, so physical proximity of concepts is a
>> real virtue.
>>
>> Also, in this case, I think the well-chosen and brief examples are actually a
>> useful part of the overview, and as such can be incorporated into the text
>> rather than set apart, making the whole more compact.
>>
>>> > Section 3:
>>> >
>>> > I don't find this example at all helpful. It requires too much effort
>>> > to understand, and I find the process view vs author view is
>>> > confusing. What is this section actually trying to tell the reader?
>>> > I can't tell.
>>>
>>> Publishing of documents and their provenance on the Web.
>>> It seems that it is a primary use case for this specification.
>>
>> I don't dispute that it describes a primary use case. I just don't find it
>> helpful for understanding the model.
>>
>>> >
>>> > I think a comprehensive example like this would be better sited as an
>>> > appendix, rather than an interruption to the main flow of the
>>> > document.
>>>
>>> We received positive feedback about the example, and in particular that
>>> it deals with attribution of provenance.
>>
>> That's a compelling argument. Another possibility might be to put it *before*
>> the overview, so that the overview and more detailed description are not
>> separated? I still don't understand what is being addressed by the process
>> view and author view.
>>
>>> >
>>> >
>>> > Section 4.1:
>>> >
>>> > I find the sub-heading "Element" is confusing/unhelpful.
>>> >
>>>
>>> Gone with the new component structure.
>>>
>>>
>>> >
>>> > Section 4.1.1 - verbatim repetition of text defining "Entity" already
>>> > present in section - this is unhelpful.
>>>
>>> Section 4 contains the systematic presentation of all types and relations.
>>> Given that many had not been (and should not be) introduced in the
>>> "starting point section", it is better to have *all* terms defined in section 4.
>>>
>>>
>>> >
>>> > The description of the provenance notation expressions should use the
>>> > same terms as are used in the template presented; i.e.. *not* "[
>>> > attr=val1, ... ]" and "attributes".
>>> >
>>>
>>> The template shows instances of arguments, where as the descriptions
>>> provide names for attributes.
>>
>> That is not clear. And even now I know this pattern exists, I still find it
>> awkward to use when trying to construct examples based on the provided text.
>> The main problem I have is the use of different names, so the exampe I pocked
>> may not have been the best.
>>
>
> Check the latest version to see if this is addressed.
>>>
>>> > Don't need to say anything about disjointness of entities and
>>> > activities in Part 1.
>>> >
>>>
>>> This seems in conflict with the next comment. Or is it just about the
>>> English (avoiding disjoint term)?
>>
>> Yes, it's mainly about the language.
>>
>
> Dropped the sentence in the entity section.
>>> >
>>> > Secftion 4.1.2
>>> >
>>> > Similar comments to section 4.1.1
>>> >
>>> > (But I think the simple statement "An activity is not an entity ..."
>>> > is good.)
>>> >
>>> >
>>> > Section 4.1.3
>>> >
>>> > Similar comments to section 4.1.1
>>> >
>>> > Don't need to say why sub-categories of agent are introduced.
>>>
>>> why not? In particular, this was introduced in response to feedback
>>> from the working group.
>>
>> My point was against introducing the sub-categories, but that the rationale
>> did not need to be explained here (as I found it cluttered the relevant text)
>>
> Text has been further trimmed.
>>> >
>>> > I would probably avoid making the mutual exclusivity claim (legally,
>>> > it may be or become a debatable point).
>>> >
>>>
>>> OK
>>>
>>> >
>>> > Section 4.1.4
>>> >
>>> > I don't see that notes are an essential part of the provenance
>>> > structure. I'd prefer to drop them, as I don't see them adding any
>>> > expressive capability.
>>>
>>> This is ISSUE-260, potentially related to account. We will tackle
>>> this once we have some bandwidth.
>>>
>>> To me, it's crucial to be able to annotate provenance, and to do so in
>>> an inter-operable way, whatever the serialization.
>>>
>>> The questioni is whether the mechanism presented here is the right
>>> one, or, as Tim suggests, Accounts take care of that.
>>
>> Let's see how this falls out. I as questioning the need for interoperability
>> and distinguished statius within the core DM of a feature that has no
>> associated semantics. We already have attributes for interoperability of
>> additional information - aren't they enough?
>>
>
> Entity attributes and note attributes are being expressed at different times by
> different asserters.
> e.g. You generate some provenance for a document (and express attributes).
> I visualize it and my visualization tools adds other attributes.
>
>>> >
>>> > Section 4.2
>>> >
>>> > The table of different relation domain and range combinations is fair
>>> > enough, but I'm not convinced the additional level of document
>>> > structure reflecting this is useful.
>>>
>>> Table was kept as a form of index.
>>> Structure changed to components.
>>>
>>> >
>>> > Ideally, I think the relations would all appear at the same document
>>> > level as the concepts, so they have a similar "visual signature" when
>>> > scanning the document.
>>>
>>> All done.
>>>
>>> >
>>> > Most or all subsections have repetition of text from section 2 similar
>>> > to that noted for section 4.1.1
>>>
>>> Some are repeat, some are new, as indicated above.
>>>
>>> >
>>> > Also, most sections seem to suffer from a similar mismatch between the
>>> > provenance notation template given and the accompanying description of
>>> > the constituent elements.
>>>
>>> The template shows instances of arguments, where as the descriptions
>>> provide names for attributes.
>>>
>>> >
>>> > I think generation and usage should be described as events (not
>>> > necxessarily to introduce a formal notion of events, just make it
>>> > clear that they are events corresponding to some change in the
>>> > relationship between an entity and an activity)
>>> >
>>>
>>> See comment above.
>>>
>>> >
>>> > Section 4.2.2.1
>>> >
>>> > "Responsibility" again.
>>> >
>>> > There are two things going on here that I feel are very muddled:
>>> >
>>> > (a) this rather odd notion of responsibility, and
>>> >
>>> > (b) associating a plan with an activity.
>>> >
>>> > At the very least, I think these aspects should be separated, not just
>>> > lumped into an single overloaded element.
>>>
>>> This was discussed at length in ISSUE-203, which is now closed. see above.
>>>
>>> >
>>> > I'm not sure why some expression components are explicit and possibly
>>> > optional parameters, while athewrs are attributes. What's the
>>> > intended difference here?
>>>
>>> For rationale see:
>>>
>>> http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-n.html#positional-vs-named-attributes
>>>
>>
>> Ah, OK. I think this argues for annotations as attributes.
>>
>> From this reader's perspective, it still seems arbitrary - I'm not sure if
>> anything can be done about that.
>>
>>
>>> > Section 4.2.3.1
>>> >
>>> > Responsibility again. In this case, I think there may be some
>>> > justification for talking about responsibility, but earlier treatment
>>> > of this idea makes it hard for me to know what is really being
>>> > expressed. I think it is the notion that some actions of one agent
>>> > are authorized or controlled by another agent in the context of a
>>> > given activity, hence any accountability for the outcome may propagate
>>> > back to the controlling or authorizing agent. But that's not entirely
>>> > clear to me from the text.
>>> >
>>> > Also, I can't tell if the structures here would accommodate different
>>> > agents having different responsibilities. E.g. a manager authorizes
>>> > an engineer to purchase a component, but is then instructed by the
>>> > engineer in its deployment/installation... when the component fails
>>> > to achieve some required outcome, who is accountable? The manager for
>>> > not authorizing enough funds, or the engineer for not properly
>>> > explaining how to use the component?
>>> >
>>> >
>>>
>>> PROV-DM allows you to express the relations.
>>> If I understood correctly, we have:
>>>
>>> wasGeneratedBy(component,purchase)
>>> actedOnBehalfOf(engineer,manager,purchase, [role="line management"])
>>> actedOnBehalfOf(manager,engineer,deployment, [role="technical guidance"])
>>>
>>> PROV-DM does not say how to reason about responsibility.
>>> What is the answer to your question?
>>
>> I think the notion of roles does it. I guess I missed that on reading. I don't
>> know the answer to my question - was just trying to exemplify that
>> responsibility is not such a simple thing :)
>
> It is not, and we should not be over zealous on this front. Again, some further
> text was trimmed, to that end.
>>
>>> This said, did you mean
>>> actedOnBehalfOf(manager,engineer,deployment, [role="technical guidance"])
>>> or did you mean:
>>> wasInformedBy(manager,engineer)
>>
>> Your first interpretation is closer to what I was trying to uncover.
>>
>>> > Section 4.2.3.2
>>> >
>>> > Skipped - I understand this is due to be replaced. (Despite my
>>> > reservations expressed elsewhere, the replacement looks like a
>>> > significant improvement.)
>>> >
>>> >
>>> > Section 4.2.3.3
>>> >
>>> > Do we still need Alternate and Specialization in the provenance
>>> > notation?
>>>
>>>
>>> Do you mean in PROV-DM?
>>>
>>> Yes, I think these are relations of the data model. They need
>>> to be introduced in this document.
>>
>> See above - I don't understand what purpose these are intended to serve.
>>
>> #g
>> --
>>
>>
>>
> Luc
>
>
>>
>>
>
Received on Thursday, 29 March 2012 13:30:58 UTC