Re: PROV-ISSUE-438 (prov-n-post-f2f3-review ): Final review before last call vote [prov-n]

On Fri, Jun 29, 2012 at 1:23 PM, Provenance Working Group Issue
Tracker <sysbot+tracker@w3.org> wrote:
> PROV-ISSUE-438 (prov-n-post-f2f3-review  ): Final review before last call vote   [prov-n]

> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-n-20120629/prov-n.html
>
> Question for reviewers: Can the document be published as Last Call working draft?


Of my comments below, the following SHOULD be addressed before
publishing as Last Call draft:
* introduce "nonterminal"
* fix syntax errors
* memberOf -> hasMember
* clarify/remote  % encoding in local part of QUALIFIED_NAME
* clarifications about namespace scoping/overwrite

Given a minimum of the above are addressed, then yes, the document can
be published as Last Call WD.

Loads of details follow.. don't despair!


===================================





In 1.1, the use of the term "nonterminal" is not introduced. This
computer science term used for parsers might be obvious for anyone who
is reading the document to produce a PROV-N parser, but not for
"Readers of the [PROV-DM] and of [PROV-CONSTRAINTS] documents". I see
the term is used several times later in the document, and so should be
introduced.

A sentence like "Those readers may find the _expression_ nonterminal a
useful entry point into the grammar" should be reformulated to
something like "Those readers may find the definition _expression_ a
useful entry point into the grammar".




> 1.3 The PROV namespace (see Section 4.7.1)

The section number should be 3.7.4


> 2.1
> All PROV data model relations involve two primary elements, the subject and the object, in this order

Not true. Examples:

> activity(a1)
> entity(e1)

So those might not be 'relations' - but that is not clear from the
above. Rephrase to something like "Many PROV-N predicates denote a
relationship between two primary elements, the _subject_ and the
_object_, in this order".

I would include an entity() example in this section, to show that not
all have subject and object. I would do this before Example 1 and the
above sentence.

> Example 0:
> In this example, _e1_ is asserted to be a PROV _entity_.
>    entity(e1)



> Example 2
> In the following expressions, the optional activity a along with the generation and usage identifiers g2 and u1:
> wasDerivedFrom(e2, e1, a, g2, u1)

The sentence does not parse. I suggest "The following expression
expands the above derivation relation by providing additional
elements: the activity _a_, the generation _g2_ and the usage _u1_.

> Extended Backus-Naur Form (EBNF) notation

This is again well known in CS - but a hyperlink would be useful. Now
I had to go to Wikipedia. However the mini-intro right below is
useful. "The below is a summary of "


I would be more precise, that "The PROV-N grammar is specified in this
document using the .." - as knowledge of EBNF is not a requirement for
using PROV-N - just for understanding the definitions in this
document.



> 2.2 EBNF Grammar
> ..
> Each expression non-terminal expression i.e., entityExpression, activityExpression etc., corresponds to

expression non-terminal expression??


Reformulate to something like "Each of the symbols included in
_expression_ above, e.g. entityExpression, corresponds to one element
(e.g. entity) in the PROV data model."


Why hyperlinks on 'expression' here in this section? It goes 1 line up
or down, it's very confusing.  Use <code>expression</code> or similar
instead.


> such that the text for an element matches the corresponding expression production of the grammar.

What does this mean? That the text of the document corresponds to what
you get by parsing the text? That does not tell me anything..


Is the bundle construct now required for the PROV-N document? If so,
almost none of the examples in this document or in PROV-DM are valid
PROV-N documents. That might well be the case, as they want to reuse
namespaces, etc - but that should then be stated explicitly, kind of
like


> 2.3 Optional attributes

Should add something like "The interpretation of an optional attribute
that is not provided is given by Section 4.1 in PROV-CONSTRAINTS"
somewhere.


> 2.4 Identifiers and attributes

> Most expressions defined in the grammar include the use of two terms: an identifier and a set of attribute-value pairs, delimited by square brackets

This reads like:

> identifier [] attribute-value pairs

Which is not what is meant! Something more like:

"Almost all expressions defined in this grammar may also include an
identifier for the expression.  Most expression can also include a set
of attribute-value pairs, grouped within square brackets."


"By convention, optional identifiers are separated using a semi-colon
';'."  By convention.. but I might do something else?  Just say that
they ARE separated like that: "Optional identifiers are separated
using a semi-colon ';', but where the identifiers are required, the
regular comma ',' is used."

Or do you say I need to also allow parsing with ',' for the
permutations that are non-ambigious? The defition for
optionalIdentifier says no. (good!)


> Example 7
> ..
> The third example shows that one can optionally indicate the missing identifier using the - marker.

Add "This is equivalent to the first expression".

> Example 8
> The first and second activities have no attributes.

Add ", and are equivalent."


> 2.5 Comments
> ... such cooments ...

"such comments"


> 3.1.3 Generation

> generationExpression    ::=    "wasGeneratedBy" "(" optionalIdentifier eIdentifier ( "," aIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")"

This does not allow aIdentifierOrMarker without timeOrMarker - is this
intentional?


The following examples are not valid:

> wasGeneratedBy(e2, a1, tr:WD-prov-dm-20111215)
> wasGeneratedBy(ex:g1; e, a, tr:WD-prov-dm-20111215)

because tr:WD-prov-dm-20111215 is not a valid timeOrMarker.


I would also recommend that these show-cases use the same names, so
rather than say intermixing a1 and ex:edit1 - just always use a1.


> Even though the production generationExpression allows for expressions wasGeneratedBy(e2, -, -) and wasGeneratedBy(-; e2, -, -), these expressions are not valid in PROV-N, since at least one of id, activity, time, and attributes must be present.

This is suboptimal, could it not be worked into the rules in time?
Could we rather say this follows from PROV-CONSTRAINTS and PROV-DM?
"Since" does not make sense here - where is it stated that they MUST
be present? Here. So try rather:  "... are not valid in PROV-N; at
least one of... MUST be present"

Equivalent comment for this kind of paragraph for usage, startExpression, etc.



> 3.1.4 Usage

usageExpression    ::=    "used" "(" optionalIdentifier aIdentifier
"," ( "," eIdentifierOrMarker "," timeOrMarker )?
optionalAttributeValuePairs ")"

This wrongly requires a double comma before
eIdentifierOrMarker-timeormarker block:

used(a1,,e,-)

Should be:

> usageExpression    ::=    "used" "(" optionalIdentifier aIdentifier ( "," eIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")"


This rule also requires timeOrMarker if eIdentifierOrMarker is present
- is this intentional? I won't ask this again for the following forms
- but assume that the style is "no optionals or all optionals" - in
which case this should be clarified in section 2.3 - the position
argument hints that the old style of allowing to chop of trailing -'s
is allowed.

> Even though the production usageExpression allows for expressions used(a2, -, -) and used(-; e2, -, -),

should be:
  used(-; a1, -, -)

> 3.1.6 Start

> Note: Even though the production startExpression allows for expressions wasStartedBy(e2, -, -) and wasStartedBy(-; e2, -, -),

Should be:
   wasStartedBy(a1, -, -, -)
   wasStartedBy(-; a1, -, -, -)


> 3.1.7 End

> wasEndedBy(e; ex:act2)
> wasEndedBy(e; ex:act2, ex:trigger, -, 2011-11-16T16:00:00)

I would not use 'e' as id here - that is confusing as 'e' is used for
entity earlier. Try 'end'.

> Note:Even though the production endExpression allows for expressions wasEndedBy(e2, -, -) and wasEndedBy(-; e2, -, -),

Should be:
 wasEndedBy(a1, -, -, -)
 wasEndedBy(-; a1, -, -, -),


3.2.1 Derivation

The following examples are invalid:

  wasDerivedFrom(d, e2, e1, a, g2, u1, [ex:comment="a righteous derivation"])
  wasDerivedFrom(d, e2, e1, a, g2, u1)
  wasDerivedFrom(-, e2, e1, a, g2, u1)

As they should use ; to terminate optionalIdentifier rather than ,



3.3.2 -> 3.2.4 (Revision, Quotation, Primary Source)

An introduction on the top of these should be provided, to note that
the regular syntax for wasDerivedFrom is used, with the additional
prov:type attribute given to specify the type of derivation. As it
stands it can read as if only the syntax used in the example is valid.
  Make these subsections of 3.2.1 instead?


3.2.4
> [ prove:type='prov:PrimarySource' ])

Should be:

  [ prov:type='prov:PrimarySource' ])


The use of 'single quotes' for the prov:type as opposed to "double
quote" for the other attribute is confusing. By reading deeply into
section 3.7.3 I see this is a short hand for "ex:value" %%
prov:QUALIFIED_NAME. Could a link be provided for this in 3.2.x?

prov:type is not explained anywhere in PROV-N. Almost all examples
using it with custom attributes use "double quotes", for instance:

  >  actedOnBehalfOf(del1; ag2, ag1, a, [prov:type="contract"])

  > agent(ag4, [ prov:type="prov:Person", ex:name="David" ])

I assumed (wrongly?) that prov:type in a way has range
prov:QUALIFIED_NAME union xsd:anyURI - but I'm not sure as "a Literal
may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no
specific interpretation in the context of PROV.".

So does prov:type="contract" simply mean "contract" out of thin air
rather than bound to any namespace? We would struggle to translate
this to PROV-O where prov:type maps to rdf:type. If this is the case,
then I expect all prov:type="prov.*" should be with single quotes
instead.


http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120628/prov-dm.html#term-attribute-type
says "PROV-DM is agnostic about the representation of types, and only
states that the value associated with a prov:type attribute must be a
PROV-DM Value." - but does not distinguish between 'single',  "double"
quote and "qualified" %% prov:QUALIFIED_NAME notation.



> 3.4.1 Bundle declaration

> Example 26
> bundle ex:author-view
>   agent(ex:Paolo,   [ prov:type='prov:Person' ])
>   agent(ex:Simon,   [ prov:type='prov:Person' ])
>   ...
> endBundle

Nitpicking, but I would use " // ... " as "..." is not valid PROV-N.


3.4.1 should also provide a forward link to 4. Toplevel Bundle,
because otherwise the example does not make sense (prefix ex: is not
declared)




> 3.5.3 Mention
> mentionExpression    ::=    "mentionOf" "(" identifier "," identifier "," bIdentifier ")"

Should be:

  mentionExpression    ::=    "mentionOf" "(" eIdentifier ","
eIdentifier "," bIdentifier ")"




3.6.1 Membership

>  memberOf(mId, c, {e1, e2, e3}, [])   // Collection membership

should be:

>  memberOf(mId; c, {e1, e2, e3}, [])   // Collection membership



This reads the wrong way - it says c is a member of {e1, e2, e3} by
the previously mentioned subject-relation-object rule.

Propose renaming 'memberOf' to 'hasMembers', 'hadMembers' or
'members'. (I do not propose to make the entity or entity set as first
argument)


Opposite of all other example, this example does not start with a
'full' expression, as the 'complete' argument is missing. Also the
attributes are empty. I would change to:


 memberOf(mId; c, {e1, e2, e3}, true, [dct:description="All of them"])
  // Collection membership


> // default "complete" flag is false

This is a PROV-DM or PROV-CONSTRAINT matter and should not be
described here - the remaining relations don't explain meaning of
missing optionals.


> memberOf(c3, ,[])
> memberOf(c3, ,true, [])

Invalid syntax, as entitySet    ::=    "{" (eIdentifier)* "}"

Change to:

  memberOf(c3, {} ,[])
  memberOf(c3, {} ,true, [])

First one of these should be invalid by the same reason as for usage,
and thus should not be listed. (It would need to include some
attributes in [] to be valid)


3.7.1 Identifier

> dIdentifier    ::=    identifier


This is not used anywhere and can be removed. (Dictionary?)


> A qualified name's prefix is optional. If a prefix occurs in a qualified name, it refers to a namespace declared in a namespace declaration.

I would change it to "it MUST refer to a namespace as declared in a.."
- otherwise this would be valid:

bundle
  prefix ex <http://example.com/>
  entity(fred:e1)
endBundle

without declaring the prefix "fred".




> 3.7.2 Attribute

> The reserved attributes in the PROV namespace are the following.

add "Their meaning is explained by [PROV-DM]".


 <INT_LITERAL>    ::=    ("-")? (DIGIT)+

This *might* be in conflict with QUALIFIED_NAME which allows local
names like "1337" (without quotes) - but I have not checked so
thoroughly. It is at least confusing. It means that you can do:

entity(1337, [1337=1337])

where the first and second 1337 is the relative IRI reference <1337>
based on the default namespace, while the third 1337 is the number
"1337" %% xsd:int.

However I don't think there is anywhere that allows both a literal and
an identifier in the same position, so we MIGHT be safe. Parser heads
converge here.


> 3.7.3 Literal

> Note:The productions for prov:QUALIFIED_NAME and INT_LITERAL are conflicting. In the context of a literal, a parser should give precedence to the production for INT_LITERAL.

Again, I don't see this conflict, as the former requires 'single'
quotes and the latter does not allow that.


entity(e1, [ex:value='1337'])   // equivalent to
entity(e1, [ex:value="1337" %% prov:QUALIFIED_NAME])

entity(e1, [ex:value=1337])   // equivalent to
entity(e1, [ex:value="1337" %% xsd:int])


It would be good if a (separate) example in 3.7.3 showed all of these
equivalences.



> 3.7.3.1 Reserved Type Values
> The reserved type values in the PROV namespace are the following

Add "Their meaning is defined by [PROV-DM].





3.7.4 Namespace


> namespaceDeclaration    ::=    "prefix" QUALIFIED_NAME namespace
> <QUALIFIED_NAME>    ::=    ( PN_PREFIX ":" )? PN_LOCAL
>                                                        | PN_PREFIX ":"

This would allow:

bundle
  prefix fred:soup <http://example.com/>
endBundle

but all the examples define prefix with only the lefthand side of a
QUALIFIED_NAME - ie. PN_PREFIX. So it should be:


  namespaceDeclaration    ::=    "prefix" PN_PREFIX namespace

To match all valid prefixes in QUALIFIED_NAME.

Note that QUALIFIED_NAME allows the empty prefix, ie ":ex1" (which is
different from "ex1"). ((And thus also ":"))

However this would be difficult to declare ":" in the current prefix
rule,  because unlike say in Sparql and Turtle, the prefix is not
declared with the trailing ":". One would have to say:

bundle
  prefix  <http://example.com/> // Two spaces before < !
endBundle


I suggest to change the prefix declaration to include the trailing : - ie:

 namespaceDeclaration    ::=    "prefix" QUALIFIED_NAME ":" namespace
 bundle
   prefix : <http://example.com/>
   prefix ex: <http://www.example.org/>
 endBundle


>  Instead, they can be %-escaped or incorporated in the IRI denoted by a prefix.

> <PERCENT>    ::=    "%" HEX HEX

It is not defined what is the meaning of this escaping. What do the
HEX represent? If you mean "as in section 3.1.  Mapping of IRIs to
URIs of [RFC3987]" - then include this.

Should be "in corporated in *a* namespace URI denoted by a prefix" -
as presumably that specific namespace binding does not exist yet if
you needed special characters in the local name.

I'm still not sure what this mean.

Assume you have:

bundle
  prefix s11: <http://s11.no/>
  // ...
endBundle

And in there, you want to refer to the entity for
<http://s11.no/?fred=soup>. Then presumably I could do:

  entity(s11:?fred%3Dsoup)

but only if we intend for %xx to be expanded before making the URI,
rather than, as suggested by PROV-N, simply be valid parts of the URI
by concatenation.

If we say they are passed on directly - then I don't see a way to
represent the above, as %3d escape for = is valid argument in query
parameters - such as in ?q=1%2B1%3D2  (1+1=2) - and thus %3Dsoup not
understood as the original =soup.

Note - I am not arguing for double-escaping here, as I think that
would become very confusing - I am just wondering why % was added here
in the first place, if still only a selection of possibly local names
(given a general prefix) is valid - I think it just adds potential
complexity.

Of course the reason why this happens is because we don't have a
<URIREF> syntax allowed together with QUALIFIED_NAME - so the Sparql
analogy breaks down.


>   entity(ex:1234)  //  corresponds to IRI http://example.org/2/1234

should be

  entity(ex:1234)  //  corresponds to IRI http://example.org/1/1234


I would add:

  entity(c/)   //  corresponds to IRI http://example.org/2/c/
  entity(ex:/)   //  corresponds to IRI http://example.org/1//   //
Strict concatenation



What does the following mean?

> Note:The productions for qualifiedName and prefix are conflicting. In the context of a namespaceDeclaration, a parser should give precedence to the production for prefix.


With 'prefix' here - do you mean PN_PREFIX or the keyword 'prefix' within

> namespaceDeclaration    ::=    "prefix" QUALIFIED_NAME namespace

?

I still don't see any conflict. namespaceDeclaration requires the word
'prefix', and so the following should be valid:

bundle
  prefix
  prefix
  <prefix>
endBundle

ie. binding the prefix "prefix:" to the relative IRI reference <prefix>.

namespaceDeclarations is optional in both bundle and namedBundle - but
"prefix" is not a valid expression, and can thus I don't see any
conflict here.



3.7.4 does not define how to interpret re-declaration of the same prefix:

bundle
  prefix ex1 <http://example.org/1/>
  prefix ex1 <http://example.org/2/>
  entity(ex1:fred)
 endBundle

(Personally I think they should not be allowed - Turtle will overwrite
sequentially - but nothing else in PROV-N depends on the sequence)


3.7.4 does neither not define that that prefixes and defaults are
looked up in the named bundle before the top level bundle.

bundle
  default <http://example.com/>
  entity(fred)
  bundle fred  // same fred
    default <http://www.example.org/>
    entity(fred) // Different fred!
  endBundle
endBundle


> Section 4
> A toplevel bundle, written _bundle decls exprs bundles endBundle_ in PROV-N, contains:

This 'written' thing is very confusing, as it is not written like
that. I would move up the bundle production rule first - use the same
style as for all the other productions. The following "Contains"
bullet points also only make sense after seeing the production rule:

> bundle    ::=    "bundle" (namespaceDeclarations)? (expression)* (namedBundle)* "endBundle"

The bullet points are confusing, because they talk about decls and
exprs vs namespaceDeclarations and expressions. Only one variable
name, please!


namespaceDeclarations are optional - both here and in the named
bundle. This means that this would be valid:

bundle
  entity(e1) // What is the default namespace??
  bundle b1
    // or what is it here?
    entity(e2)
  endBundle
endBundle


I found it confusing that the top level bundle and named bundle have
the same keyword. However I expect - although it is not stated
explicitly here - that you can't have a free-standing named bundle -
and that a PROV-N Document should have all expressions and named
bundled within a single top level bundle.


>  activity(a1, 2011-11-16T16:05:00, -,[prov:type="edit"])

Should be (pretty printed ;) )

 activity(a1, 2011-11-16T16:05:00, -, [prov:type="edit"])


> The following container
>  (..)
> This container

s/container/bundle/g




> 5. Media Type

We agreed some changes to this in the telcon 2012-07-05,
"text/provenance-notation" and ".provn".

> Published specification:
>   This specification.

In the actual submission I assume that this will rather refer to /tr/PROV-N/.


> may have the strings 'bundle' near the beginning

s/strings/string/


> Section B

I did not review section A or B in detail.

Some whitespace needed after 'Cheney' both for [PROV-XML] and [PROV-RDF]


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Friday, 6 July 2012 14:03:32 UTC