Re: PROV-ISSUE-438 (prov-n-post-f2f3-review ): Final review before last call vote [prov-n] from Luc Moreau on 2012-07-10 (public-prov-wg@w3.org from July 2012)

From: Luc Moreau <l.moreau@ecs.soton.ac.uk>
Date: Tue, 10 Jul 2012 18:13:26 +0100
To: public-prov-wg@w3.org
CC: l.moreau@ecs.soton.ac.uk
Message-ID: <EMEW3|6c6c2e4190d986c4072e007c67ee915bo69IDV08l.moreau|ecs.soton.ac.uk|4FFC62B6>
Hi Stian, and all prov-n reviewers

I have addressed Stian's comments first, since, in a first 
approximation, all the other reviews
on prov-n, raise similar issues.

Stian, Thanks for helping  draft some new sections.

The latest editor's draft [1] includes all changes.
Colour coded diffs appear [2]
Response to your comments appear in [3].

Your key concerns before release have been addressed:

* introduce "nonterminal"
You did it, thanks.
* fix syntax errors
addressed as you suggested
* memberOf -> hasMember
now hadMembers
* clarify/remote  % encoding in local part of QUALIFIED_NAME
issue solved with \-espcaping (PN_LOCAL_ESC)
  * clarifications about namespace scoping/overwrite
rules have been clarified

Can you let me know if there are still issues before Wednesday's call?
Thanks,
Luc


[1] http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-n.html
[2] http://dvcs.w3.org/hg/prov/raw-file/default/model/diff-n.html
[3] 
http://dvcs.w3.org/hg/prov/raw-file/default/model/comments/issue-438-stian.txt

On 07/06/2012 03:02 PM, Stian Soiland-Reyes wrote:
> On Fri, Jun 29, 2012 at 1:23 PM, Provenance Working Group Issue
> Tracker <sysbot+tracker@w3.org> wrote:
>> PROV-ISSUE-438 (prov-n-post-f2f3-review  ): Final review before last call vote   [prov-n]
>> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-n-20120629/prov-n.html
>>
>> Question for reviewers: Can the document be published as Last Call working draft?
>
> Of my comments below, the following SHOULD be addressed before
> publishing as Last Call draft:
> * introduce "nonterminal"
> * fix syntax errors
> * memberOf -> hasMember
> * clarify/remote  % encoding in local part of QUALIFIED_NAME
> * clarifications about namespace scoping/overwrite
>
> Given a minimum of the above are addressed, then yes, the document can
> be published as Last Call WD.
>
> Loads of details follow.. don't despair!
>
>
> ===================================
>
>
>
>
>
> In 1.1, the use of the term "nonterminal" is not introduced. This
> computer science term used for parsers might be obvious for anyone who
> is reading the document to produce a PROV-N parser, but not for
> "Readers of the [PROV-DM] and of [PROV-CONSTRAINTS] documents". I see
> the term is used several times later in the document, and so should be
> introduced.
>
> A sentence like "Those readers may find the _expression_ nonterminal a
> useful entry point into the grammar" should be reformulated to
> something like "Those readers may find the definition _expression_ a
> useful entry point into the grammar".
>
>
>
>
>> 1.3 The PROV namespace (see Section 4.7.1)
> The section number should be 3.7.4
>
>
>> 2.1
>> All PROV data model relations involve two primary elements, the subject and the object, in this order
> Not true. Examples:
>
>> activity(a1)
>> entity(e1)
> So those might not be 'relations' - but that is not clear from the
> above. Rephrase to something like "Many PROV-N predicates denote a
> relationship between two primary elements, the _subject_ and the
> _object_, in this order".
>
> I would include an entity() example in this section, to show that not
> all have subject and object. I would do this before Example 1 and the
> above sentence.
>
>> Example 0:
>> In this example, _e1_ is asserted to be a PROV _entity_.
>>     entity(e1)
>
>
>> Example 2
>> In the following expressions, the optional activity a along with the generation and usage identifiers g2 and u1:
>> wasDerivedFrom(e2, e1, a, g2, u1)
> The sentence does not parse. I suggest "The following expression
> expands the above derivation relation by providing additional
> elements: the activity _a_, the generation _g2_ and the usage _u1_.
>
>> Extended Backus-Naur Form (EBNF) notation
> This is again well known in CS - but a hyperlink would be useful. Now
> I had to go to Wikipedia. However the mini-intro right below is
> useful. "The below is a summary of "
>
>
> I would be more precise, that "The PROV-N grammar is specified in this
> document using the .." - as knowledge of EBNF is not a requirement for
> using PROV-N - just for understanding the definitions in this
> document.
>
>
>
>> 2.2 EBNF Grammar
>> ..
>> Each expression non-terminal expression i.e., entityExpression, activityExpression etc., corresponds to
> expression non-terminal expression??
>
>
> Reformulate to something like "Each of the symbols included in
> _expression_ above, e.g. entityExpression, corresponds to one element
> (e.g. entity) in the PROV data model."
>
>
> Why hyperlinks on 'expression' here in this section? It goes 1 line up
> or down, it's very confusing.  Use <code>expression</code> or similar
> instead.
>
>
>> such that the text for an element matches the corresponding expression production of the grammar.
> What does this mean? That the text of the document corresponds to what
> you get by parsing the text? That does not tell me anything..
>
>
> Is the bundle construct now required for the PROV-N document? If so,
> almost none of the examples in this document or in PROV-DM are valid
> PROV-N documents. That might well be the case, as they want to reuse
> namespaces, etc - but that should then be stated explicitly, kind of
> like
>
>
>> 2.3 Optional attributes
> Should add something like "The interpretation of an optional attribute
> that is not provided is given by Section 4.1 in PROV-CONSTRAINTS"
> somewhere.
>
>
>> 2.4 Identifiers and attributes
>> Most expressions defined in the grammar include the use of two terms: an identifier and a set of attribute-value pairs, delimited by square brackets
> This reads like:
>
>> identifier [] attribute-value pairs
> Which is not what is meant! Something more like:
>
> "Almost all expressions defined in this grammar may also include an
> identifier for the expression.  Most expression can also include a set
> of attribute-value pairs, grouped within square brackets."
>
>
> "By convention, optional identifiers are separated using a semi-colon
> ';'."  By convention.. but I might do something else?  Just say that
> they ARE separated like that: "Optional identifiers are separated
> using a semi-colon ';', but where the identifiers are required, the
> regular comma ',' is used."
>
> Or do you say I need to also allow parsing with ',' for the
> permutations that are non-ambigious? The defition for
> optionalIdentifier says no. (good!)
>
>
>> Example 7
>> ..
>> The third example shows that one can optionally indicate the missing identifier using the - marker.
> Add "This is equivalent to the first expression".
>
>> Example 8
>> The first and second activities have no attributes.
> Add ", and are equivalent."
>
>
>> 2.5 Comments
>> ... such cooments ...
> "such comments"
>
>
>> 3.1.3 Generation
>> generationExpression    ::=    "wasGeneratedBy" "(" optionalIdentifier eIdentifier ( "," aIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")"
> This does not allow aIdentifierOrMarker without timeOrMarker - is this
> intentional?
>
>
> The following examples are not valid:
>
>> wasGeneratedBy(e2, a1, tr:WD-prov-dm-20111215)
>> wasGeneratedBy(ex:g1; e, a, tr:WD-prov-dm-20111215)
> because tr:WD-prov-dm-20111215 is not a valid timeOrMarker.
>
>
> I would also recommend that these show-cases use the same names, so
> rather than say intermixing a1 and ex:edit1 - just always use a1.
>
>
>> Even though the production generationExpression allows for expressions wasGeneratedBy(e2, -, -) and wasGeneratedBy(-; e2, -, -), these expressions are not valid in PROV-N, since at least one of id, activity, time, and attributes must be present.
> This is suboptimal, could it not be worked into the rules in time?
> Could we rather say this follows from PROV-CONSTRAINTS and PROV-DM?
> "Since" does not make sense here - where is it stated that they MUST
> be present? Here. So try rather:  "... are not valid in PROV-N; at
> least one of... MUST be present"
>
> Equivalent comment for this kind of paragraph for usage, startExpression, etc.
>
>
>
>> 3.1.4 Usage
> usageExpression    ::=    "used" "(" optionalIdentifier aIdentifier
> "," ( "," eIdentifierOrMarker "," timeOrMarker )?
> optionalAttributeValuePairs ")"
>
> This wrongly requires a double comma before
> eIdentifierOrMarker-timeormarker block:
>
> used(a1,,e,-)
>
> Should be:
>
>> usageExpression    ::=    "used" "(" optionalIdentifier aIdentifier ( "," eIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")"
>
> This rule also requires timeOrMarker if eIdentifierOrMarker is present
> - is this intentional? I won't ask this again for the following forms
> - but assume that the style is "no optionals or all optionals" - in
> which case this should be clarified in section 2.3 - the position
> argument hints that the old style of allowing to chop of trailing -'s
> is allowed.
>
>> Even though the production usageExpression allows for expressions used(a2, -, -) and used(-; e2, -, -),
> should be:
>    used(-; a1, -, -)
>
>> 3.1.6 Start
>> Note: Even though the production startExpression allows for expressions wasStartedBy(e2, -, -) and wasStartedBy(-; e2, -, -),
> Should be:
>     wasStartedBy(a1, -, -, -)
>     wasStartedBy(-; a1, -, -, -)
>
>
>> 3.1.7 End
>> wasEndedBy(e; ex:act2)
>> wasEndedBy(e; ex:act2, ex:trigger, -, 2011-11-16T16:00:00)
> I would not use 'e' as id here - that is confusing as 'e' is used for
> entity earlier. Try 'end'.
>
>> Note:Even though the production endExpression allows for expressions wasEndedBy(e2, -, -) and wasEndedBy(-; e2, -, -),
> Should be:
>   wasEndedBy(a1, -, -, -)
>   wasEndedBy(-; a1, -, -, -),
>
>
> 3.2.1 Derivation
>
> The following examples are invalid:
>
>    wasDerivedFrom(d, e2, e1, a, g2, u1, [ex:comment="a righteous derivation"])
>    wasDerivedFrom(d, e2, e1, a, g2, u1)
>    wasDerivedFrom(-, e2, e1, a, g2, u1)
>
> As they should use ; to terminate optionalIdentifier rather than ,
>
>
>
> 3.3.2 -> 3.2.4 (Revision, Quotation, Primary Source)
>
> An introduction on the top of these should be provided, to note that
> the regular syntax for wasDerivedFrom is used, with the additional
> prov:type attribute given to specify the type of derivation. As it
> stands it can read as if only the syntax used in the example is valid.
>    Make these subsections of 3.2.1 instead?
>
>
> 3.2.4
>> [ prove:type='prov:PrimarySource' ])
> Should be:
>
>    [ prov:type='prov:PrimarySource' ])
>
>
> The use of 'single quotes' for the prov:type as opposed to "double
> quote" for the other attribute is confusing. By reading deeply into
> section 3.7.3 I see this is a short hand for "ex:value" %%
> prov:QUALIFIED_NAME. Could a link be provided for this in 3.2.x?
>
> prov:type is not explained anywhere in PROV-N. Almost all examples
> using it with custom attributes use "double quotes", for instance:
>
>    >  actedOnBehalfOf(del1; ag2, ag1, a, [prov:type="contract"])
>
>    > agent(ag4, [ prov:type="prov:Person", ex:name="David" ])
>
> I assumed (wrongly?) that prov:type in a way has range
> prov:QUALIFIED_NAME union xsd:anyURI - but I'm not sure as "a Literal
> may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no
> specific interpretation in the context of PROV.".
>
> So does prov:type="contract" simply mean "contract" out of thin air
> rather than bound to any namespace? We would struggle to translate
> this to PROV-O where prov:type maps to rdf:type. If this is the case,
> then I expect all prov:type="prov.*" should be with single quotes
> instead.
>
>
> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120628/prov-dm.html#term-attribute-type
> says "PROV-DM is agnostic about the representation of types, and only
> states that the value associated with a prov:type attribute must be a
> PROV-DM Value." - but does not distinguish between 'single',  "double"
> quote and "qualified" %% prov:QUALIFIED_NAME notation.
>
>
>
>> 3.4.1 Bundle declaration
>> Example 26
>> bundle ex:author-view
>>    agent(ex:Paolo,   [ prov:type='prov:Person' ])
>>    agent(ex:Simon,   [ prov:type='prov:Person' ])
>>    ...
>> endBundle
> Nitpicking, but I would use " // ... " as "..." is not valid PROV-N.
>
>
> 3.4.1 should also provide a forward link to 4. Toplevel Bundle,
> because otherwise the example does not make sense (prefix ex: is not
> declared)
>
>
>
>
>> 3.5.3 Mention
>> mentionExpression    ::=    "mentionOf" "(" identifier "," identifier "," bIdentifier ")"
> Should be:
>
>    mentionExpression    ::=    "mentionOf" "(" eIdentifier ","
> eIdentifier "," bIdentifier ")"
>
>
>
>
> 3.6.1 Membership
>
>>   memberOf(mId, c, {e1, e2, e3}, [])   // Collection membership
> should be:
>
>>   memberOf(mId; c, {e1, e2, e3}, [])   // Collection membership
>
>
> This reads the wrong way - it says c is a member of {e1, e2, e3} by
> the previously mentioned subject-relation-object rule.
>
> Propose renaming 'memberOf' to 'hasMembers', 'hadMembers' or
> 'members'. (I do not propose to make the entity or entity set as first
> argument)
>
>
> Opposite of all other example, this example does not start with a
> 'full' expression, as the 'complete' argument is missing. Also the
> attributes are empty. I would change to:
>
>
>   memberOf(mId; c, {e1, e2, e3}, true, [dct:description="All of them"])
>    // Collection membership
>
>
>> // default "complete" flag is false
> This is a PROV-DM or PROV-CONSTRAINT matter and should not be
> described here - the remaining relations don't explain meaning of
> missing optionals.
>
>
>> memberOf(c3, ,[])
>> memberOf(c3, ,true, [])
> Invalid syntax, as entitySet    ::=    "{" (eIdentifier)* "}"
>
> Change to:
>
>    memberOf(c3, {} ,[])
>    memberOf(c3, {} ,true, [])
>
> First one of these should be invalid by the same reason as for usage,
> and thus should not be listed. (It would need to include some
> attributes in [] to be valid)
>
>
> 3.7.1 Identifier
>
>> dIdentifier    ::=    identifier
>
> This is not used anywhere and can be removed. (Dictionary?)
>
>
>> A qualified name's prefix is optional. If a prefix occurs in a qualified name, it refers to a namespace declared in a namespace declaration.
> I would change it to "it MUST refer to a namespace as declared in a.."
> - otherwise this would be valid:
>
> bundle
>    prefix ex <http://example.com/>
>    entity(fred:e1)
> endBundle
>
> without declaring the prefix "fred".
>
>
>
>
>> 3.7.2 Attribute
>> The reserved attributes in the PROV namespace are the following.
> add "Their meaning is explained by [PROV-DM]".
>
>
>  <INT_LITERAL>    ::=    ("-")? (DIGIT)+
>
> This *might* be in conflict with QUALIFIED_NAME which allows local
> names like "1337" (without quotes) - but I have not checked so
> thoroughly. It is at least confusing. It means that you can do:
>
> entity(1337, [1337=1337])
>
> where the first and second 1337 is the relative IRI reference <1337>
> based on the default namespace, while the third 1337 is the number
> "1337" %% xsd:int.
>
> However I don't think there is anywhere that allows both a literal and
> an identifier in the same position, so we MIGHT be safe. Parser heads
> converge here.
>
>
>> 3.7.3 Literal
>> Note:The productions for prov:QUALIFIED_NAME and INT_LITERAL are conflicting. In the context of a literal, a parser should give precedence to the production for INT_LITERAL.
> Again, I don't see this conflict, as the former requires 'single'
> quotes and the latter does not allow that.
>
>
> entity(e1, [ex:value='1337'])   // equivalent to
> entity(e1, [ex:value="1337" %% prov:QUALIFIED_NAME])
>
> entity(e1, [ex:value=1337])   // equivalent to
> entity(e1, [ex:value="1337" %% xsd:int])
>
>
> It would be good if a (separate) example in 3.7.3 showed all of these
> equivalences.
>
>
>
>> 3.7.3.1 Reserved Type Values
>> The reserved type values in the PROV namespace are the following
> Add "Their meaning is defined by [PROV-DM].
>
>
>
>
>
> 3.7.4 Namespace
>
>
>> namespaceDeclaration    ::=    "prefix" QUALIFIED_NAME namespace
>> <QUALIFIED_NAME>    ::=    ( PN_PREFIX ":" )? PN_LOCAL
>>                                                         | PN_PREFIX ":"
> This would allow:
>
> bundle
>    prefix fred:soup <http://example.com/>
> endBundle
>
> but all the examples define prefix with only the lefthand side of a
> QUALIFIED_NAME - ie. PN_PREFIX. So it should be:
>
>
>    namespaceDeclaration    ::=    "prefix" PN_PREFIX namespace
>
> To match all valid prefixes in QUALIFIED_NAME.
>
> Note that QUALIFIED_NAME allows the empty prefix, ie ":ex1" (which is
> different from "ex1"). ((And thus also ":"))
>
> However this would be difficult to declare ":" in the current prefix
> rule,  because unlike say in Sparql and Turtle, the prefix is not
> declared with the trailing ":". One would have to say:
>
> bundle
>    prefix  <http://example.com/> // Two spaces before < !
> endBundle
>
>
> I suggest to change the prefix declaration to include the trailing : - ie:
>
>   namespaceDeclaration    ::=    "prefix" QUALIFIED_NAME ":" namespace
>   bundle
>     prefix : <http://example.com/>
>     prefix ex: <http://www.example.org/>
>   endBundle
>
>
>>   Instead, they can be %-escaped or incorporated in the IRI denoted by a prefix.
>> <PERCENT>    ::=    "%" HEX HEX
> It is not defined what is the meaning of this escaping. What do the
> HEX represent? If you mean "as in section 3.1.  Mapping of IRIs to
> URIs of [RFC3987]" - then include this.
>
> Should be "in corporated in *a* namespace URI denoted by a prefix" -
> as presumably that specific namespace binding does not exist yet if
> you needed special characters in the local name.
>
> I'm still not sure what this mean.
>
> Assume you have:
>
> bundle
>    prefix s11: <http://s11.no/>
>    // ...
> endBundle
>
> And in there, you want to refer to the entity for
> <http://s11.no/?fred=soup>. Then presumably I could do:
>
>    entity(s11:?fred%3Dsoup)
>
> but only if we intend for %xx to be expanded before making the URI,
> rather than, as suggested by PROV-N, simply be valid parts of the URI
> by concatenation.
>
> If we say they are passed on directly - then I don't see a way to
> represent the above, as %3d escape for = is valid argument in query
> parameters - such as in ?q=1%2B1%3D2  (1+1=2) - and thus %3Dsoup not
> understood as the original =soup.
>
> Note - I am not arguing for double-escaping here, as I think that
> would become very confusing - I am just wondering why % was added here
> in the first place, if still only a selection of possibly local names
> (given a general prefix) is valid - I think it just adds potential
> complexity.
>
> Of course the reason why this happens is because we don't have a
> <URIREF> syntax allowed together with QUALIFIED_NAME - so the Sparql
> analogy breaks down.
>
>
>>    entity(ex:1234)  //  corresponds to IRI http://example.org/2/1234
> should be
>
>    entity(ex:1234)  //  corresponds to IRI http://example.org/1/1234
>
>
> I would add:
>
>    entity(c/)   //  corresponds to IRI http://example.org/2/c/
>    entity(ex:/)   //  corresponds to IRI http://example.org/1//   //
> Strict concatenation
>
>
>
> What does the following mean?
>
>> Note:The productions for qualifiedName and prefix are conflicting. In the context of a namespaceDeclaration, a parser should give precedence to the production for prefix.
>
> With 'prefix' here - do you mean PN_PREFIX or the keyword 'prefix' within
>
>> namespaceDeclaration    ::=    "prefix" QUALIFIED_NAME namespace
> ?
>
> I still don't see any conflict. namespaceDeclaration requires the word
> 'prefix', and so the following should be valid:
>
> bundle
>    prefix
>    prefix
>    <prefix>
> endBundle
>
> ie. binding the prefix "prefix:" to the relative IRI reference <prefix>.
>
> namespaceDeclarations is optional in both bundle and namedBundle - but
> "prefix" is not a valid expression, and can thus I don't see any
> conflict here.
>
>
>
> 3.7.4 does not define how to interpret re-declaration of the same prefix:
>
> bundle
>    prefix ex1 <http://example.org/1/>
>    prefix ex1 <http://example.org/2/>
>    entity(ex1:fred)
>   endBundle
>
> (Personally I think they should not be allowed - Turtle will overwrite
> sequentially - but nothing else in PROV-N depends on the sequence)
>
>
> 3.7.4 does neither not define that that prefixes and defaults are
> looked up in the named bundle before the top level bundle.
>
> bundle
>    default <http://example.com/>
>    entity(fred)
>    bundle fred  // same fred
>      default <http://www.example.org/>
>      entity(fred) // Different fred!
>    endBundle
> endBundle
>
>
>> Section 4
>> A toplevel bundle, written _bundle decls exprs bundles endBundle_ in PROV-N, contains:
> This 'written' thing is very confusing, as it is not written like
> that. I would move up the bundle production rule first - use the same
> style as for all the other productions. The following "Contains"
> bullet points also only make sense after seeing the production rule:
>
>> bundle    ::=    "bundle" (namespaceDeclarations)? (expression)* (namedBundle)* "endBundle"
> The bullet points are confusing, because they talk about decls and
> exprs vs namespaceDeclarations and expressions. Only one variable
> name, please!
>
>
> namespaceDeclarations are optional - both here and in the named
> bundle. This means that this would be valid:
>
> bundle
>    entity(e1) // What is the default namespace??
>    bundle b1
>      // or what is it here?
>      entity(e2)
>    endBundle
> endBundle
>
>
> I found it confusing that the top level bundle and named bundle have
> the same keyword. However I expect - although it is not stated
> explicitly here - that you can't have a free-standing named bundle -
> and that a PROV-N Document should have all expressions and named
> bundled within a single top level bundle.
>
>
>>   activity(a1, 2011-11-16T16:05:00, -,[prov:type="edit"])
> Should be (pretty printed ;) )
>
>   activity(a1, 2011-11-16T16:05:00, -, [prov:type="edit"])
>
>
>> The following container
>>   (..)
>> This container
> s/container/bundle/g
>
>
>
>
>> 5. Media Type
> We agreed some changes to this in the telcon 2012-07-05,
> "text/provenance-notation" and ".provn".
>
>> Published specification:
>>    This specification.
> In the actual submission I assume that this will rather refer to /tr/PROV-N/.
>
>
>> may have the strings 'bundle' near the beginning
> s/strings/string/
>
>
>> Section B
> I did not review section A or B in detail.
>
> Some whitespace needed after 'Cheney' both for [PROV-XML] and [PROV-RDF]
>
>
Received on Tuesday, 10 July 2012 17:14:05 UTC