PROV-DM (DM4) - review up to section

Date: Thu, 08 Mar 2012 15:49:46 +0000
Summary:  I think the content is generally a big improvement, but there are some 
possible further removals, and I think there remain a number of document quality 
issues to be addressed before getting to last call.  Hopefully, these can be 
considered in DM5

When the content stabilizes, I may offer some alternate drafting suggestions, 
but I think it's in too much flux right now for that to be worthwhile.


Re: http://dvcs.w3.org/hg/prov/raw-file/f52c0bb53dd4/model/prov-dm.html
(Retrieved 2012-30-08)

I'd wish to see all references to "things in the world" expunged: it's an ugly 
expression that begs more questions than it answers, and IMO runs the risk of 
confusing readers.

Section 1 intro: rewording in 1st 3 paras.

Suggest that the provenance notation be a part 1 appendix, not a separate 
part/document.  Drop references to ASN - it's *not* an *abstract* syntax notion; 
indeed, I think that very expression is an oxymoron.

Part 2 is *not* an upgrade path. Please don't say this.  (It's a refinement of 
use that allows provenance information from different sources to be combined in 
meaningful ways.)

More text refinement in section 1.

Section 2.1

Saying "Activity is anything ..." is confusing.  It suggests a continuant rather 
than an occurrent.

Sub-editing would improve this.

Section 2.2

I think it would be clearer if generation and usage were introduced as events 
associated with activities.  (Discussion of them being instantaneous can come in 
Part 2)

Introducing generation as "completed production" reads really strangely to me, 
and sounds as if it could be a produced artifact.  I think a form like 
"completion of production" is clearer.  Similarly for usage, something like 
"starting to consume".

Sub-editing would improve this.

Section 2.3:

"AccountEntity" - why not just "Account".  Also, I understood this was to *be* a 
bundle, not a container for a bundle.

The example given has no clear relationship to the description.  I understood 
the key use-case here was to express provenance of procenance, and that is why 
we have accounts.  I think that should be stated clearly; e.g.

"An account is a bundle of provenance statements treated as an entity which may 
itself have some associated provenance."

Agents.  I think the notion of responsibility here is so loose as to be of no 
practical value.  When we say a text editor is "responsible for" crashing a 
computer, that's a kind of anthropomorphism, not a literal claim of 
responsibility.  What we really mean is that the text editor caused the crash. 
The notion of responsibility is generally associated with duty, authority and/or 
accountability (cf. 
http://oxforddictionaries.com/definition/responsibility?view=uk).  This is why 
persons and organizations are distinct from software agents.  I suggest that the 
text here should "stick to the knitting": just state that these are commonly 
encountered kinds of agent, and leave it at that.

Section 2.4

This continues the muddle about "responsibility", until the definition of agent 
responsibility realtion which seems about right to me (note the phrase 
"accountable for" here).

The use of responsibility in the description of association seems completely 
wrong to me.

The discussion of activity association is surreal.  A plan is defined previously 
as an "Entity", but association relates an *agent* to an activity.

I think this section needs re-drafting.

Section 2.5

I think the intent and content of the diagram is generally good, but that its 
visual presentation could usefully be improved.  I think it should appear as 
part of the introduction to section 2, not at the end.

Generally in section 2, I think the examples are mostly well-chosen, but their 
presentation breaks up the flow of the overview; I woukd prefer that the 
examples were more succinct, maybe fewer, and introduced inline in the 
descriptive overview text.  Ideally the whole overview would fit on just one or 
two pages (i.e. about half its current length on a printed page).  The key 
purpose here, IMO, is to give a quick overview of how the various concepts are 
used together.

Section 3:

I don't find this example at all helpful.  It requires too much effort to 
understand, and I find the process view vs author view is confusing.  What is 
this section actually trying to tell the reader?  I can't tell.

I think a comprehensive example like this would be better sited as an appendix, 
rather than an interruption to the main flow of the document.

Section 4.1:

I find the sub-heading "Element" is confusing/unhelpful.

Section 4.1.1 - verbatim repetition of text defining "Entity" already present in 
section - this is unhelpful.

The description of the provenance notation expressions should use the same terms 
as are used in the template presented;  i.e.. *not* "[ attr=val1, ... ]" and 

Don't need to say anything about disjointness of entities and activities in Part 1.

Secftion 4.1.2

Similar comments to section 4.1.1

(But I think the simple statement "An activity is not an entity ..." is good.)

Section 4.1.3

Similar comments to section 4.1.1

Don't need to say why sub-categories of agent are introduced.

I would probably avoid making the mutual exclusivity claim (legally, it may be 
or become a debatable point).

Section 4.1.4

I don't see that notes are an essential part of the provenance structure.  I'd 
prefer to drop them, as I don't see them adding any expressive capability.

Section 4.2

The table of different relation domain and range combinations is fair enough, 
but I'm not convinced the additional level of document structure reflecting this 
is useful.

Ideally, I think the relations would all appear at the same document level as 
the concepts, so they have a similar "visual signature" when scanning the document.

Most or all subsections have repetition of text from section 2 similar to that 
noted for section 4.1.1

Also, most sections seem to suffer from a similar mismatch between the 
provenance notation template given and the accompanying description of the 
constituent elements.

I think generation  and usage should be described as events (not necxessarily to 
introduce a formal notion of events, just make it clear that they are events 
corresponding to some change in the relationship between an entity and an activity)


"Responsibility" again.

There are two things going on here that I feel are very muddled:

(a) this rather odd notion of responsibility, and

(b) associating a plan with an activity.

At the very least, I think these aspects should be separated, not just lumped 
into an single overloaded element.

I'm not sure why some expression components are explicit and possibly optional 
parameters, while athewrs are attributes.  What's the intended difference here?


Responsibility again.  In this case, I think there may be some justification for 
talking about responsibility, but earlier treatment of this idea makes it hard 
for me to know what is really being expressed.  I think it is the notion that 
some actions of one agent are authorized or controlled by another agent in the 
context of a given activity, hence any accountability for the outcome may 
propagate back to the controlling or authorizing agent.  But that's not entirely 
clear to me from the text.

Also, I can't tell if the structures here would accommodate different agents 
having different responsibilities.  E.g. a manager authorizes an engineer to 
purchase a component, but is then instructed by the engineer in its 
deployment/installation...  when the component fails to achieve some  required 
outcome, who is accountable?  The manager for not authorizing enough funds, or 
the engineer for not properly explaining how to use the component?


Skipped - I understand this is due to be replaced.  (Despite my reservations 
expressed elsewhere, the replacement looks like a significant improvement.)


Do we still need Alternate and Specialization in the provenance notation?


I'm running out of time, so I'll stop here.
