Re: PROV-DM (DM4) - review up to section 4.2.3.3

Hi Graham,

Given this, I am closing iSSUE-274 (feedback on WD4).
I assume you will create new issues, when you review the next version.

Further responses interleaved.

On 03/25/2012 10:06 AM, Graham Klyne wrote:
> On 23/03/2012 13:09, Luc Moreau wrote:
>> Hi Graham,
>>
>> Thanks for your feedback. We have incorporated some of your 
>> suggestions in the
>> current editor's draft [1]
>>
>> Find below our response to your individual points.
>>
>> If you think that some of these points are going to be blockers for 
>> the release
>> of WD5 or LC, it would
>> be useful if you could raise them now, so that we can discuss them by 
>> email,
>> and find a solution before you review again the document in 10 days 
>> time, or so.
>>
>> In particular, after careful consideration, Paolo and I think that:
>> - Overview diagram should remain in section 2.5
>
> You offer no reasons to change my view.  I'll see what I think on my 
> next review of the document.  These are IMO document 
> quality/readability/approachability issues, not technical 
> fundamentals, but approachability of provenance is the issue that is 
> supposed to have been addressed by the reorganization.
>
> Let me try and explain my rationale for this suggestions:
>
> I approached this document with a mindset of a developer trying to 
> understand the provenance model.  Ideally, I should be able to read 
> the document once, front-to-back, and know what I need to know.  For 
> this, it is really useful if one of the first things I encounter is a 
> high-level overview of what follows: the diagram is a great way to do 
> this (though the diagram itself could do with some improvement).  
> Without this high level overview, I have no conceptual framework to 
> relate the ore detailed concepts that follow.  Hence my suggestion to 
> include it at the start of section 2.
>

The starting points section is about explaining provenance concepts as 
well as types and relations of the data model.

As soon as one starts with types and relations, there is some 
technicality involved (e.g. binary/n-ary relations, etc).
In the definitions of concepts, we are staying away from this 
technicality. Hence, the order currently in the document.

>> - Example of section 3 should remain there
>
> I find the example to be completely unhelpful, until I have a clearer 
> view of what it is meant to be an example *of*.  It is demanding that 
> I understand the (relatively) complex scenario of the example when 
> what I really want to understand is the provenance model.  It may 
> serve a purpose for motivating provenance, but it doesn't help me to 
> understand the provenance model.  In practice, when reading the 
> document, I looked at the early paragraphs and skipped this section 
> entirely.  I think it breaks the flow between the introductory 
> material and the more detailed description of the DM.

Hopefully the introduction to the example (which didn't exist when you 
read) helps.
>
> [Later] below, I make an alternative suggestion to put the example 
> section *before* the overview.  Maybe also title it as a "motivating 
> example".
>
>> - AlternateOf/SpecializationOf are part of prov-dm and should be 
>> presented in
>> this document
>
> Again, no reason given to change my view - maybe there is good reason, 
> but I don't know what it is.  And I note, per issue 29, it's still a 
> challenge to explain, which might be indicative.   I think there's a 
> danger that we've been round this so much that the document/model is 
> becoming too inward-looking as opposed considering the goals of its 
> readers/users.
>

This is I hope a resolved issue now.
>> - Notions of responsibility, agents and plan were debated at length 
>> in ISSUE-203
>> which is now
>> closed, and we are not proposing to reopen it, unless new evidence is 
>> offered.
>
> I'll accept this for now, pending review of a revised document.  As I 
> recall, my comment was to do with lack of clarity of what is being 
> described.
>
>
>> [1] http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html
>>
>> > Summary: I think the content is generally a big improvement, but there
>> > are some possible further removals, and I think there remain a number
>> > of document quality issues to be addressed before getting to last
>> > call. Hopefully, these can be considered in DM5
>> >
>> > When the content stabilizes, I may offer some alternate drafting
>> > suggestions, but I think it's in too much flux right now for that to
>> > be worthwhile.
>> >
>> > ...
>> >
>> > Re: 
>> http://dvcs.w3.org/hg/prov/raw-file/f52c0bb53dd4/model/prov-dm.html
>> > (Retrieved 2012-30-08)
>> >
>> > I'd wish to see all references to "things in the world" expunged: it's
>> > an ugly expression that begs more questions than it answers, and IMO
>> > runs the risk of confusing readers.
>>
>>
>> OK, no longer talk about "thing in the world" but "thing".
>
> Thanks.
>
>> > Section 1 intro: rewording in 1st 3 paras.
>> >
>> > Suggest that the provenance notation be a part 1 appendix, not a
>> > separate part/document. Drop references to ASN - it's *not* an
>> > *abstract* syntax notion; indeed, I think that very expression is an
>> > oxymoron.
>>
>> We now call it PROV-N.
>
> Ack.
>
>> Having gone through the process of writing productions fully, there
>> are some grammatical syntactic details that have no place in the 
>> PROV-DM document.
>> Also, PROV-N provides examples of instances to explain the grammar.
>> This has no place in the PROV-DM document either.
>>
>> Furthermore, past experience has shown that readers confuse prov-dm 
>> and prov-n.
>>
>> So, the editor's recommendation is to keep the documents separate.
>> >
>> > Part 2 is *not* an upgrade path. Please don't say this. (It's a
>> > refinement of use that allows provenance information from different
>> > sources to be combined in meaningful ways.)
>>
>>
>> Replaced 'upgrade path' by 'refinement'
>
> Thanks.  (FWIW, I've started to think of it as a "strict 
> interpretation", which is a kind of refinement...)
>
>> > More text refinement in section 1.
>> >
>> >
>> > Section 2.1
>> >
>> > Saying "Activity is anything ..." is confusing. It suggests a
>> > continuant rather than an occurrent.
>>
>> Rephrased as follows:
>>
>> An activity is something that occurs and acts upon or with entities.
>
> Better.
>
>> > Sub-editing would improve this.
>
> Maybe...
> "An activity occurs within some period of time and acts upon entities."
> ?
>
An activity is something that occurs over a period of time and acts upon 
or with entities.

>> >
>> >
>> > Section 2.2
>> >
>> > I think it would be clearer if generation and usage were introduced as
>> > events associated with activities. (Discussion of them being
>> > instantaneous can come in Part 2)
>>
>> It was agreed at F2F2 that we shouldn't introduce event in part 1.
>> We followed this guidance. The term event is only defined in part 2.
>
> I have a vague recollection of this, and feeling uneasy at the time, 
> but unable to articulate why.  It seems to me that an "event" 
> (stripped of subtleties) is a concept that is easy enough to grasp, 
> and might make it easier to describe the various types of events.
>
>> > Introducing generation as "completed production" reads really
>> > strangely to me, and sounds as if it could be a produced artifact. I
>> > think a form like "completion of production" is clearer. Similarly
>> > for usage, something like "starting to consume".
>> >
>>
>> Updated definitions as follows:
>>
>> Generation is the completion of production of a new entity by an 
>> activity.
>>
>> Usage is the beginning of consumption of a new entity by an activity.
>>
>>
>> > Sub-editing would improve this.
>> >
>> >
>> > Section 2.3:
>> >
>> > "AccountEntity" - why not just "Account". Also, I understood this was
>> > to *be* a bundle, not a container for a bundle.
>>
>> To be addressed, once other editing work for WD5 is completed.
>>
>> The two notions (container vs bundle) are useful, for different 
>> purposes.
>> To be investigated.
>
> At an implementation level it may be important to be clear about a 
> distinction between the contained and the container, but for a 
> conceptual model I really think we should try to focus on the 
> contained ("bundle") avoid talking about containers - I think that 
> adds confusion.
>
>> >
>> > The example given has no clear relationship to the description. I
>> > understood the key use-case here was to express provenance of
>> > procenance, and that is why we have accounts. I think that should be
>> > stated clearly; e.g.
>>
>> This is made clearer, following definition and in example.
>>
>> >
>> > "An account is a bundle of provenance statements treated as an entity
>> > which may itself have some associated provenance."
>> >
>>
>> Subtle difference again: "... treated as an entity ..." vs " ... is 
>> an entity ..."
>
> I agree ...
>
>> We can definitely add "... which may itself have some associated 
>> provenance "
>
> I think that's the main point.
>
>> >
>> > Agents. I think the notion of responsibility here is so loose as to
>> > be of no practical value. When we say a text editor is "responsible
>> > for" crashing a computer, that's a kind of anthropomorphism, not a
>> > literal claim of responsibility. What we really mean is that the text
>> > editor caused the crash. The notion of responsibility is generally
>> > associated with duty, authority and/or accountability
>> > (cf. http://oxforddictionaries.com/definition/responsibility?view=uk).
>> > This is why persons and organizations are distinct from software
>> > agents. I suggest that the text here should "stick to the knitting":
>> > just state that these are commonly encountered kinds of agent, and
>> > leave it at that.
>>
>>
>> The example about software agent was simplified. Indeed no need to 
>> mention
>> responsibility here.
>> This is left to section 2.4.
>
> Thanks.
>
>> >
>> > Section 2.4
>> >
>> > This continues the muddle about "responsibility", until the definition
>> > of agent responsibility realtion which seems about right to me (note
>> > the phrase "accountable for" here).
>> >
>> > The use of responsibility in the description of association seems
>> > completely wrong to me.
>>
>> What would you suggest?
>
> Focusing on the accountability aspect?  I'll look again at your text 
> in a subsequent review
>
>> >
>> > The discussion of activity association is surreal. A plan is defined
>> > previously as an "Entity", but association relates an *agent* to an
>> > activity.
>>
>> It's a ternary relation.
>> This was discussed at length in ISSUE-203, which is now closed.
>>
>> I am not proposing to reopen it, unless new information is brought 
>> forward.
>
> (See comments at head - maybe the actual intent isn't coming through.)
>
>> >
>> > I think this section needs re-drafting.
>> >
>> >
>> > Section 2.5
>> >
>> > I think the intent and content of the diagram is generally good, but
>> > that its visual presentation could usefully be improved. I think it
>> > should appear as part of the introduction to section 2, not at the
>> > end.
>> >
>>
>> We are now generating a PNG, so hopefully its better.
>>
>> After careful consideration, we felt it was better to leave it in 
>> section 2.5,
>> in part,
>> because we need to map the concepts (expressed in natural language) 
>> to prov-dm
>> types/relations.
>
> I don't see how the diagram-at-end aids this.  See comments at top.
>
>> > Generally in section 2, I think the examples are mostly well-chosen,
>> > but their presentation breaks up the flow of the overview; I woukd
>> > prefer that the examples were more succinct, maybe fewer, and
>> > introduced inline in the descriptive overview text. Ideally the whole
>> > overview would fit on just one or two pages (i.e. about half its
>> > current length on a printed page). The key purpose here, IMO, is to
>> > give a quick overview of how the various concepts are used together.
>> >
>> >
>>
>> Usual trade-off. Now that concepts seem clearer, than we don't need 
>> examples.
>>
>> I think that examples are clearly delimited and can be skipped if the 
>> reader wants.
>
> Maybe it's OK.  But I don't think the "reader can skip" argument 
> really works when the quantity of material to be skipped is as much as 
> the core material.  As you say, it's a trade-off;  in an 
> introductory/overview section, I'd wish the trade-off to be more in 
> favour of concision.  IMO, a function of an overview is be be easily 
> scan-able, so physical proximity of concepts is a real virtue.
>
> Also, in this case, I think the well-chosen and brief examples are 
> actually a useful part of the overview, and as such can be 
> incorporated into the text rather than set apart, making the whole 
> more compact.
>
>> > Section 3:
>> >
>> > I don't find this example at all helpful. It requires too much effort
>> > to understand, and I find the process view vs author view is
>> > confusing. What is this section actually trying to tell the reader?
>> > I can't tell.
>>
>> Publishing of documents and their provenance on the Web.
>> It seems that it is a primary use case for this specification.
>
> I don't dispute that it describes a primary use case.  I just don't 
> find it helpful for understanding the model.
>
>> >
>> > I think a comprehensive example like this would be better sited as an
>> > appendix, rather than an interruption to the main flow of the
>> > document.
>>
>> We received positive feedback about the example, and in particular that
>> it deals with attribution of provenance.
>
> That's a compelling argument.  Another possibility might be to put it 
> *before* the overview, so that the overview and more detailed 
> description are not separated?  I still don't understand what is being 
> addressed by the process view and author view.
>
>> >
>> >
>> > Section 4.1:
>> >
>> > I find the sub-heading "Element" is confusing/unhelpful.
>> >
>>
>> Gone with the new component structure.
>>
>>
>> >
>> > Section 4.1.1 - verbatim repetition of text defining "Entity" already
>> > present in section - this is unhelpful.
>>
>> Section 4 contains the systematic presentation of all types and 
>> relations.
>> Given that many had not been (and should not be) introduced in the
>> "starting point section", it is better to have *all* terms defined in 
>> section 4.
>>
>>
>> >
>> > The description of the provenance notation expressions should use the
>> > same terms as are used in the template presented; i.e.. *not* "[
>> > attr=val1, ... ]" and "attributes".
>> >
>>
>> The template shows instances of arguments, where as the descriptions
>> provide names for attributes.
>
> That is not clear.  And even now I know this pattern exists, I still 
> find it awkward to use when trying to construct examples based on the 
> provided text. The main problem I have is the use of different names, 
> so the exampe I pocked may not have been the best.
>

Check the latest version to see if this is addressed.
>>
>> > Don't need to say anything about disjointness of entities and
>> > activities in Part 1.
>> >
>>
>> This seems in conflict with the next comment. Or is it just about the
>> English (avoiding disjoint term)?
>
> Yes, it's mainly about the language.
>

Dropped the sentence in the entity section.
>> >
>> > Secftion 4.1.2
>> >
>> > Similar comments to section 4.1.1
>> >
>> > (But I think the simple statement "An activity is not an entity ..."
>> > is good.)
>> >
>> >
>> > Section 4.1.3
>> >
>> > Similar comments to section 4.1.1
>> >
>> > Don't need to say why sub-categories of agent are introduced.
>>
>> why not? In particular, this was introduced in response to feedback
>> from the working group.
>
> My point was against introducing the sub-categories, but that the 
> rationale did not need to be explained here (as I found it cluttered 
> the relevant text)
>
Text has been further trimmed.
>> >
>> > I would probably avoid making the mutual exclusivity claim (legally,
>> > it may be or become a debatable point).
>> >
>>
>> OK
>>
>> >
>> > Section 4.1.4
>> >
>> > I don't see that notes are an essential part of the provenance
>> > structure. I'd prefer to drop them, as I don't see them adding any
>> > expressive capability.
>>
>> This is ISSUE-260, potentially related to account. We will tackle
>> this once we have some bandwidth.
>>
>> To me, it's crucial to be able to annotate provenance, and to do so in
>> an inter-operable way, whatever the serialization.
>>
>> The questioni is whether the mechanism presented here is the right
>> one, or, as Tim suggests, Accounts take care of that.
>
> Let's see how this falls out.  I as questioning the need for 
> interoperability and distinguished statius within the core DM of a 
> feature that has no associated semantics.  We already have attributes 
> for interoperability of additional information - aren't they enough?
>

Entity attributes and note attributes are being expressed at different 
times by different asserters.
e.g. You generate some provenance for a document (and express attributes).
        I visualize it and my visualization tools adds other attributes.

>> >
>> > Section 4.2
>> >
>> > The table of different relation domain and range combinations is fair
>> > enough, but I'm not convinced the additional level of document
>> > structure reflecting this is useful.
>>
>> Table was kept as a form of index.
>> Structure changed to components.
>>
>> >
>> > Ideally, I think the relations would all appear at the same document
>> > level as the concepts, so they have a similar "visual signature" when
>> > scanning the document.
>>
>> All done.
>>
>> >
>> > Most or all subsections have repetition of text from section 2 similar
>> > to that noted for section 4.1.1
>>
>> Some are repeat, some are new, as indicated above.
>>
>> >
>> > Also, most sections seem to suffer from a similar mismatch between the
>> > provenance notation template given and the accompanying description of
>> > the constituent elements.
>>
>> The template shows instances of arguments, where as the descriptions
>> provide names for attributes.
>>
>> >
>> > I think generation and usage should be described as events (not
>> > necxessarily to introduce a formal notion of events, just make it
>> > clear that they are events corresponding to some change in the
>> > relationship between an entity and an activity)
>> >
>>
>> See comment above.
>>
>> >
>> > Section 4.2.2.1
>> >
>> > "Responsibility" again.
>> >
>> > There are two things going on here that I feel are very muddled:
>> >
>> > (a) this rather odd notion of responsibility, and
>> >
>> > (b) associating a plan with an activity.
>> >
>> > At the very least, I think these aspects should be separated, not just
>> > lumped into an single overloaded element.
>>
>> This was discussed at length in ISSUE-203, which is now closed. see 
>> above.
>>
>> >
>> > I'm not sure why some expression components are explicit and possibly
>> > optional parameters, while athewrs are attributes. What's the
>> > intended difference here?
>>
>> For rationale see:
>>
>> http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-n.html#positional-vs-named-attributes 
>>
>
> Ah, OK.  I think this argues for annotations as attributes.
>
> From this reader's perspective, it still seems arbitrary - I'm not 
> sure if anything can be done about that.
>
>
>> > Section 4.2.3.1
>> >
>> > Responsibility again. In this case, I think there may be some
>> > justification for talking about responsibility, but earlier treatment
>> > of this idea makes it hard for me to know what is really being
>> > expressed. I think it is the notion that some actions of one agent
>> > are authorized or controlled by another agent in the context of a
>> > given activity, hence any accountability for the outcome may propagate
>> > back to the controlling or authorizing agent. But that's not entirely
>> > clear to me from the text.
>> >
>> > Also, I can't tell if the structures here would accommodate different
>> > agents having different responsibilities. E.g. a manager authorizes
>> > an engineer to purchase a component, but is then instructed by the
>> > engineer in its deployment/installation... when the component fails
>> > to achieve some required outcome, who is accountable? The manager for
>> > not authorizing enough funds, or the engineer for not properly
>> > explaining how to use the component?
>> >
>> >
>>
>> PROV-DM allows you to express the relations.
>> If I understood correctly, we have:
>>
>> wasGeneratedBy(component,purchase)
>> actedOnBehalfOf(engineer,manager,purchase, [role="line management"])
>> actedOnBehalfOf(manager,engineer,deployment, [role="technical 
>> guidance"])
>>
>> PROV-DM does not say how to reason about responsibility.
>> What is the answer to your question?
>
> I think the notion of roles does it.  I guess I missed that on 
> reading.  I don't know the answer to my question - was just trying to 
> exemplify that responsibility is not such a simple thing :)

It is not, and we should not be over zealous on this front. Again, some 
further text was trimmed, to that end.
>
>> This said, did you mean
>> actedOnBehalfOf(manager,engineer,deployment, [role="technical 
>> guidance"])
>> or did you mean:
>> wasInformedBy(manager,engineer)
>
> Your first interpretation is closer to what I was trying to uncover.
>
>> > Section 4.2.3.2
>> >
>> > Skipped - I understand this is due to be replaced. (Despite my
>> > reservations expressed elsewhere, the replacement looks like a
>> > significant improvement.)
>> >
>> >
>> > Section 4.2.3.3
>> >
>> > Do we still need Alternate and Specialization in the provenance
>> > notation?
>>
>>
>> Do you mean in PROV-DM?
>>
>> Yes, I think these are relations of the data model. They need
>> to be introduced in this document.
>
> See above - I don't understand what purpose these are intended to serve.
>
> #g
> -- 
>
>
>
Luc


>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Wednesday, 28 March 2012 15:31:48 UTC