Pre-last-call review of PROV-N from Graham Klyne on 2012-07-09 (public-prov-wg@w3.org from July 2012)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Mon, 09 Jul 2012 09:57:18 +0100
To: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <4FFA9CEE.7060805@zoo.ox.ac.uk>
All,

I read through PROV-N over the weekend, and (with one exception that I'll raise 
as a separate issue) I think it's fine for last call, even though I might 
personally (still) prefer to see it as an appendix to PROV-DM than as a separate 
document.

Some comments for consideration before or after last call:

Section 1.1:  since we don't have a candidate for PROV-XML going into last call, 
I think that the reference to [PROV-XML] should be dropped.

Also, for a section titled "purpose of this document...", I find it hard to dig 
out a description of the purpose of this document - it's all rather buried in 
(IMO) unnecessary justification and explanation.

Section 1.3:  rather than repeating the namespace URIs here, I would reference 
the corresponding section in PROV-DM (reducing possibilities for key information 
getting out of phase).

Section 2.4: I would have expected to see this section deal with the form of 
identifiers used in PROV-N.  At the very least, I'd suggest a forward reference 
to section 3.7 - but I think it would be better to deal with identifier format 
up-front, since it is, along with the use of resulting URIs, such a central 
aspect of PROV-N in particular, and provenance in general.

Section 3.*;  there are several references to "non-terminal".  This is a rather 
specialized term, maybe familiar to those who have studied formal grammar 
theory, but I think is a bit obscure for a wider audience.  For example, "how 
each constituent of a PROV-DM Entity maps to a syntax element" would be more 
descriptive.

As I write this, I'm thinking that this aspect of mapping from PROV-DM 
descriptions to syntax elements is rather clumsy;  I think the whole mapping 
table could be dropped here without any loss of useful information - the 
information is apparent from a combination of the PROV-DM description and the 
formal syntax given;  I'd suggest dropping the mapping tables completely, as 
this would make the description a lot more compact.  (This also suggests to me 
that having PROV-N in a separate document isn't really a useful separation - see 
comment at top).

I'd also suggest using production names that are a bit more evocative;  e.g., in 
section 3.1.2:

   activityExpression ::= "activity" "(" identifier
                          ( "," startTime "," endTime )?
                          optionalAttributes ")"

   startTime ::= timeOrMarker

   endTime   ::= timeOrMarker

etc.


Section 3.1.6:

The syntax for production startExpression suggests that the trigger, start 
activity and time must all be present or all absent.  I thought that in 
discussion it had been agreed that trailing optional parameters (other than 
attributes) could be omitted (though sect5ion 2.3 does not say this - maybe this 
changed?).  I don't think it's a blocker, but I find the all-or-nothing approach 
to optionals is a bit unexpected.

Section 3.7.2

This section introduces reserved attribute names, but there's no indication of 
where to look for a description of what they mean.

SECTION 3.7.3.1

This section introduces reserved type values, but there's no indication of where 
to look for a description of what they mean.

Section 3.7.4:

"Every qualified name with this prefix in the scope of this declaration refers 
to this namespace" - I find the use of "refers" here is a bit confusing - I 
would expect the namespace URI to *refer to* the namespace.  I'd say something 
more like "belongs to" or "is part of".

I would not repeat the namespace URIs here.

Section 4

I think someone else has commented on the presentation of "toplevel bindle".  I 
also found the presentation a bit confusing.  The sort of approach I'd take for 
this would probably to have an early (sub-)section something like "ovcerall 
structure of PROV-N description" - effectively introducing the distinguished 
term of the grammar and its immediate productions - and drop this section 4. 
Much of the content could come from the current section 4, but the reference to 
"housekeeping construct" can be dropped.

Indeed, it might be as simple as this:

[[
x. Overall structure of PROV-N

A PROV-N expression matches the _bundle_ syntax production.
]]


Section 5:

For this, especially as it's an IETF standards-tree registration, I would expect 
the change controller to be W3C, and possibly also the contact.

@sandro: does the W3C maintain email addresses as contact points for IETF 
registrations (URI schemes, MIME types, header fields, etc.)?

Encoding considerations:

Further to previous discussions, I propose:  "The encoding is always UTF-8 (or 
the ASCII subset of UTF-8)"

Security considerations:

**NOTE** I'm going to raise this as a separate issue for cross-document 
consideration before last call.

I think most of this material should appear in the PROV-DM document, and simply 
be referenced here.  Putting it all into the media type registration makes it 
seem that it's all a but of an afterthought.  Also, I suspect it won't get 
reviewed so carefully here, and security is one area which really *needs* as 
much review as it can get.

The second paragraph doesn't make sense to me: PROV-N does NOT express arbitrary 
application data.

This section ignores (or obscures) the (IMO) fundamental issue that provenance 
is intended to be used to make trust decisions, and as such the reliability of 
provenance information used must be carefully considered according to the 
importance of the decision.

...

End of review comments

#g
--
Received on Monday, 9 July 2012 08:59:03 UTC