- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Fri, 6 Jul 2012 15:02:43 +0100
- To: Provenance Working Group <public-prov-wg@w3.org>
On Fri, Jun 29, 2012 at 1:23 PM, Provenance Working Group Issue Tracker <sysbot+tracker@w3.org> wrote: > PROV-ISSUE-438 (prov-n-post-f2f3-review ): Final review before last call vote [prov-n] > http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-n-20120629/prov-n.html > > Question for reviewers: Can the document be published as Last Call working draft? Of my comments below, the following SHOULD be addressed before publishing as Last Call draft: * introduce "nonterminal" * fix syntax errors * memberOf -> hasMember * clarify/remote % encoding in local part of QUALIFIED_NAME * clarifications about namespace scoping/overwrite Given a minimum of the above are addressed, then yes, the document can be published as Last Call WD. Loads of details follow.. don't despair! =================================== In 1.1, the use of the term "nonterminal" is not introduced. This computer science term used for parsers might be obvious for anyone who is reading the document to produce a PROV-N parser, but not for "Readers of the [PROV-DM] and of [PROV-CONSTRAINTS] documents". I see the term is used several times later in the document, and so should be introduced. A sentence like "Those readers may find the _expression_ nonterminal a useful entry point into the grammar" should be reformulated to something like "Those readers may find the definition _expression_ a useful entry point into the grammar". > 1.3 The PROV namespace (see Section 4.7.1) The section number should be 3.7.4 > 2.1 > All PROV data model relations involve two primary elements, the subject and the object, in this order Not true. Examples: > activity(a1) > entity(e1) So those might not be 'relations' - but that is not clear from the above. Rephrase to something like "Many PROV-N predicates denote a relationship between two primary elements, the _subject_ and the _object_, in this order". I would include an entity() example in this section, to show that not all have subject and object. I would do this before Example 1 and the above sentence. > Example 0: > In this example, _e1_ is asserted to be a PROV _entity_. > entity(e1) > Example 2 > In the following expressions, the optional activity a along with the generation and usage identifiers g2 and u1: > wasDerivedFrom(e2, e1, a, g2, u1) The sentence does not parse. I suggest "The following expression expands the above derivation relation by providing additional elements: the activity _a_, the generation _g2_ and the usage _u1_. > Extended Backus-Naur Form (EBNF) notation This is again well known in CS - but a hyperlink would be useful. Now I had to go to Wikipedia. However the mini-intro right below is useful. "The below is a summary of " I would be more precise, that "The PROV-N grammar is specified in this document using the .." - as knowledge of EBNF is not a requirement for using PROV-N - just for understanding the definitions in this document. > 2.2 EBNF Grammar > .. > Each expression non-terminal expression i.e., entityExpression, activityExpression etc., corresponds to expression non-terminal expression?? Reformulate to something like "Each of the symbols included in _expression_ above, e.g. entityExpression, corresponds to one element (e.g. entity) in the PROV data model." Why hyperlinks on 'expression' here in this section? It goes 1 line up or down, it's very confusing. Use <code>expression</code> or similar instead. > such that the text for an element matches the corresponding expression production of the grammar. What does this mean? That the text of the document corresponds to what you get by parsing the text? That does not tell me anything.. Is the bundle construct now required for the PROV-N document? If so, almost none of the examples in this document or in PROV-DM are valid PROV-N documents. That might well be the case, as they want to reuse namespaces, etc - but that should then be stated explicitly, kind of like > 2.3 Optional attributes Should add something like "The interpretation of an optional attribute that is not provided is given by Section 4.1 in PROV-CONSTRAINTS" somewhere. > 2.4 Identifiers and attributes > Most expressions defined in the grammar include the use of two terms: an identifier and a set of attribute-value pairs, delimited by square brackets This reads like: > identifier [] attribute-value pairs Which is not what is meant! Something more like: "Almost all expressions defined in this grammar may also include an identifier for the expression. Most expression can also include a set of attribute-value pairs, grouped within square brackets." "By convention, optional identifiers are separated using a semi-colon ';'." By convention.. but I might do something else? Just say that they ARE separated like that: "Optional identifiers are separated using a semi-colon ';', but where the identifiers are required, the regular comma ',' is used." Or do you say I need to also allow parsing with ',' for the permutations that are non-ambigious? The defition for optionalIdentifier says no. (good!) > Example 7 > .. > The third example shows that one can optionally indicate the missing identifier using the - marker. Add "This is equivalent to the first expression". > Example 8 > The first and second activities have no attributes. Add ", and are equivalent." > 2.5 Comments > ... such cooments ... "such comments" > 3.1.3 Generation > generationExpression ::= "wasGeneratedBy" "(" optionalIdentifier eIdentifier ( "," aIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")" This does not allow aIdentifierOrMarker without timeOrMarker - is this intentional? The following examples are not valid: > wasGeneratedBy(e2, a1, tr:WD-prov-dm-20111215) > wasGeneratedBy(ex:g1; e, a, tr:WD-prov-dm-20111215) because tr:WD-prov-dm-20111215 is not a valid timeOrMarker. I would also recommend that these show-cases use the same names, so rather than say intermixing a1 and ex:edit1 - just always use a1. > Even though the production generationExpression allows for expressions wasGeneratedBy(e2, -, -) and wasGeneratedBy(-; e2, -, -), these expressions are not valid in PROV-N, since at least one of id, activity, time, and attributes must be present. This is suboptimal, could it not be worked into the rules in time? Could we rather say this follows from PROV-CONSTRAINTS and PROV-DM? "Since" does not make sense here - where is it stated that they MUST be present? Here. So try rather: "... are not valid in PROV-N; at least one of... MUST be present" Equivalent comment for this kind of paragraph for usage, startExpression, etc. > 3.1.4 Usage usageExpression ::= "used" "(" optionalIdentifier aIdentifier "," ( "," eIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")" This wrongly requires a double comma before eIdentifierOrMarker-timeormarker block: used(a1,,e,-) Should be: > usageExpression ::= "used" "(" optionalIdentifier aIdentifier ( "," eIdentifierOrMarker "," timeOrMarker )? optionalAttributeValuePairs ")" This rule also requires timeOrMarker if eIdentifierOrMarker is present - is this intentional? I won't ask this again for the following forms - but assume that the style is "no optionals or all optionals" - in which case this should be clarified in section 2.3 - the position argument hints that the old style of allowing to chop of trailing -'s is allowed. > Even though the production usageExpression allows for expressions used(a2, -, -) and used(-; e2, -, -), should be: used(-; a1, -, -) > 3.1.6 Start > Note: Even though the production startExpression allows for expressions wasStartedBy(e2, -, -) and wasStartedBy(-; e2, -, -), Should be: wasStartedBy(a1, -, -, -) wasStartedBy(-; a1, -, -, -) > 3.1.7 End > wasEndedBy(e; ex:act2) > wasEndedBy(e; ex:act2, ex:trigger, -, 2011-11-16T16:00:00) I would not use 'e' as id here - that is confusing as 'e' is used for entity earlier. Try 'end'. > Note:Even though the production endExpression allows for expressions wasEndedBy(e2, -, -) and wasEndedBy(-; e2, -, -), Should be: wasEndedBy(a1, -, -, -) wasEndedBy(-; a1, -, -, -), 3.2.1 Derivation The following examples are invalid: wasDerivedFrom(d, e2, e1, a, g2, u1, [ex:comment="a righteous derivation"]) wasDerivedFrom(d, e2, e1, a, g2, u1) wasDerivedFrom(-, e2, e1, a, g2, u1) As they should use ; to terminate optionalIdentifier rather than , 3.3.2 -> 3.2.4 (Revision, Quotation, Primary Source) An introduction on the top of these should be provided, to note that the regular syntax for wasDerivedFrom is used, with the additional prov:type attribute given to specify the type of derivation. As it stands it can read as if only the syntax used in the example is valid. Make these subsections of 3.2.1 instead? 3.2.4 > [ prove:type='prov:PrimarySource' ]) Should be: [ prov:type='prov:PrimarySource' ]) The use of 'single quotes' for the prov:type as opposed to "double quote" for the other attribute is confusing. By reading deeply into section 3.7.3 I see this is a short hand for "ex:value" %% prov:QUALIFIED_NAME. Could a link be provided for this in 3.2.x? prov:type is not explained anywhere in PROV-N. Almost all examples using it with custom attributes use "double quotes", for instance: > actedOnBehalfOf(del1; ag2, ag1, a, [prov:type="contract"]) > agent(ag4, [ prov:type="prov:Person", ex:name="David" ]) I assumed (wrongly?) that prov:type in a way has range prov:QUALIFIED_NAME union xsd:anyURI - but I'm not sure as "a Literal may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no specific interpretation in the context of PROV.". So does prov:type="contract" simply mean "contract" out of thin air rather than bound to any namespace? We would struggle to translate this to PROV-O where prov:type maps to rdf:type. If this is the case, then I expect all prov:type="prov.*" should be with single quotes instead. http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120628/prov-dm.html#term-attribute-type says "PROV-DM is agnostic about the representation of types, and only states that the value associated with a prov:type attribute must be a PROV-DM Value." - but does not distinguish between 'single', "double" quote and "qualified" %% prov:QUALIFIED_NAME notation. > 3.4.1 Bundle declaration > Example 26 > bundle ex:author-view > agent(ex:Paolo, [ prov:type='prov:Person' ]) > agent(ex:Simon, [ prov:type='prov:Person' ]) > ... > endBundle Nitpicking, but I would use " // ... " as "..." is not valid PROV-N. 3.4.1 should also provide a forward link to 4. Toplevel Bundle, because otherwise the example does not make sense (prefix ex: is not declared) > 3.5.3 Mention > mentionExpression ::= "mentionOf" "(" identifier "," identifier "," bIdentifier ")" Should be: mentionExpression ::= "mentionOf" "(" eIdentifier "," eIdentifier "," bIdentifier ")" 3.6.1 Membership > memberOf(mId, c, {e1, e2, e3}, []) // Collection membership should be: > memberOf(mId; c, {e1, e2, e3}, []) // Collection membership This reads the wrong way - it says c is a member of {e1, e2, e3} by the previously mentioned subject-relation-object rule. Propose renaming 'memberOf' to 'hasMembers', 'hadMembers' or 'members'. (I do not propose to make the entity or entity set as first argument) Opposite of all other example, this example does not start with a 'full' expression, as the 'complete' argument is missing. Also the attributes are empty. I would change to: memberOf(mId; c, {e1, e2, e3}, true, [dct:description="All of them"]) // Collection membership > // default "complete" flag is false This is a PROV-DM or PROV-CONSTRAINT matter and should not be described here - the remaining relations don't explain meaning of missing optionals. > memberOf(c3, ,[]) > memberOf(c3, ,true, []) Invalid syntax, as entitySet ::= "{" (eIdentifier)* "}" Change to: memberOf(c3, {} ,[]) memberOf(c3, {} ,true, []) First one of these should be invalid by the same reason as for usage, and thus should not be listed. (It would need to include some attributes in [] to be valid) 3.7.1 Identifier > dIdentifier ::= identifier This is not used anywhere and can be removed. (Dictionary?) > A qualified name's prefix is optional. If a prefix occurs in a qualified name, it refers to a namespace declared in a namespace declaration. I would change it to "it MUST refer to a namespace as declared in a.." - otherwise this would be valid: bundle prefix ex <http://example.com/> entity(fred:e1) endBundle without declaring the prefix "fred". > 3.7.2 Attribute > The reserved attributes in the PROV namespace are the following. add "Their meaning is explained by [PROV-DM]". <INT_LITERAL> ::= ("-")? (DIGIT)+ This *might* be in conflict with QUALIFIED_NAME which allows local names like "1337" (without quotes) - but I have not checked so thoroughly. It is at least confusing. It means that you can do: entity(1337, [1337=1337]) where the first and second 1337 is the relative IRI reference <1337> based on the default namespace, while the third 1337 is the number "1337" %% xsd:int. However I don't think there is anywhere that allows both a literal and an identifier in the same position, so we MIGHT be safe. Parser heads converge here. > 3.7.3 Literal > Note:The productions for prov:QUALIFIED_NAME and INT_LITERAL are conflicting. In the context of a literal, a parser should give precedence to the production for INT_LITERAL. Again, I don't see this conflict, as the former requires 'single' quotes and the latter does not allow that. entity(e1, [ex:value='1337']) // equivalent to entity(e1, [ex:value="1337" %% prov:QUALIFIED_NAME]) entity(e1, [ex:value=1337]) // equivalent to entity(e1, [ex:value="1337" %% xsd:int]) It would be good if a (separate) example in 3.7.3 showed all of these equivalences. > 3.7.3.1 Reserved Type Values > The reserved type values in the PROV namespace are the following Add "Their meaning is defined by [PROV-DM]. 3.7.4 Namespace > namespaceDeclaration ::= "prefix" QUALIFIED_NAME namespace > <QUALIFIED_NAME> ::= ( PN_PREFIX ":" )? PN_LOCAL > | PN_PREFIX ":" This would allow: bundle prefix fred:soup <http://example.com/> endBundle but all the examples define prefix with only the lefthand side of a QUALIFIED_NAME - ie. PN_PREFIX. So it should be: namespaceDeclaration ::= "prefix" PN_PREFIX namespace To match all valid prefixes in QUALIFIED_NAME. Note that QUALIFIED_NAME allows the empty prefix, ie ":ex1" (which is different from "ex1"). ((And thus also ":")) However this would be difficult to declare ":" in the current prefix rule, because unlike say in Sparql and Turtle, the prefix is not declared with the trailing ":". One would have to say: bundle prefix <http://example.com/> // Two spaces before < ! endBundle I suggest to change the prefix declaration to include the trailing : - ie: namespaceDeclaration ::= "prefix" QUALIFIED_NAME ":" namespace bundle prefix : <http://example.com/> prefix ex: <http://www.example.org/> endBundle > Instead, they can be %-escaped or incorporated in the IRI denoted by a prefix. > <PERCENT> ::= "%" HEX HEX It is not defined what is the meaning of this escaping. What do the HEX represent? If you mean "as in section 3.1. Mapping of IRIs to URIs of [RFC3987]" - then include this. Should be "in corporated in *a* namespace URI denoted by a prefix" - as presumably that specific namespace binding does not exist yet if you needed special characters in the local name. I'm still not sure what this mean. Assume you have: bundle prefix s11: <http://s11.no/> // ... endBundle And in there, you want to refer to the entity for <http://s11.no/?fred=soup>. Then presumably I could do: entity(s11:?fred%3Dsoup) but only if we intend for %xx to be expanded before making the URI, rather than, as suggested by PROV-N, simply be valid parts of the URI by concatenation. If we say they are passed on directly - then I don't see a way to represent the above, as %3d escape for = is valid argument in query parameters - such as in ?q=1%2B1%3D2 (1+1=2) - and thus %3Dsoup not understood as the original =soup. Note - I am not arguing for double-escaping here, as I think that would become very confusing - I am just wondering why % was added here in the first place, if still only a selection of possibly local names (given a general prefix) is valid - I think it just adds potential complexity. Of course the reason why this happens is because we don't have a <URIREF> syntax allowed together with QUALIFIED_NAME - so the Sparql analogy breaks down. > entity(ex:1234) // corresponds to IRI http://example.org/2/1234 should be entity(ex:1234) // corresponds to IRI http://example.org/1/1234 I would add: entity(c/) // corresponds to IRI http://example.org/2/c/ entity(ex:/) // corresponds to IRI http://example.org/1// // Strict concatenation What does the following mean? > Note:The productions for qualifiedName and prefix are conflicting. In the context of a namespaceDeclaration, a parser should give precedence to the production for prefix. With 'prefix' here - do you mean PN_PREFIX or the keyword 'prefix' within > namespaceDeclaration ::= "prefix" QUALIFIED_NAME namespace ? I still don't see any conflict. namespaceDeclaration requires the word 'prefix', and so the following should be valid: bundle prefix prefix <prefix> endBundle ie. binding the prefix "prefix:" to the relative IRI reference <prefix>. namespaceDeclarations is optional in both bundle and namedBundle - but "prefix" is not a valid expression, and can thus I don't see any conflict here. 3.7.4 does not define how to interpret re-declaration of the same prefix: bundle prefix ex1 <http://example.org/1/> prefix ex1 <http://example.org/2/> entity(ex1:fred) endBundle (Personally I think they should not be allowed - Turtle will overwrite sequentially - but nothing else in PROV-N depends on the sequence) 3.7.4 does neither not define that that prefixes and defaults are looked up in the named bundle before the top level bundle. bundle default <http://example.com/> entity(fred) bundle fred // same fred default <http://www.example.org/> entity(fred) // Different fred! endBundle endBundle > Section 4 > A toplevel bundle, written _bundle decls exprs bundles endBundle_ in PROV-N, contains: This 'written' thing is very confusing, as it is not written like that. I would move up the bundle production rule first - use the same style as for all the other productions. The following "Contains" bullet points also only make sense after seeing the production rule: > bundle ::= "bundle" (namespaceDeclarations)? (expression)* (namedBundle)* "endBundle" The bullet points are confusing, because they talk about decls and exprs vs namespaceDeclarations and expressions. Only one variable name, please! namespaceDeclarations are optional - both here and in the named bundle. This means that this would be valid: bundle entity(e1) // What is the default namespace?? bundle b1 // or what is it here? entity(e2) endBundle endBundle I found it confusing that the top level bundle and named bundle have the same keyword. However I expect - although it is not stated explicitly here - that you can't have a free-standing named bundle - and that a PROV-N Document should have all expressions and named bundled within a single top level bundle. > activity(a1, 2011-11-16T16:05:00, -,[prov:type="edit"]) Should be (pretty printed ;) ) activity(a1, 2011-11-16T16:05:00, -, [prov:type="edit"]) > The following container > (..) > This container s/container/bundle/g > 5. Media Type We agreed some changes to this in the telcon 2012-07-05, "text/provenance-notation" and ".provn". > Published specification: > This specification. In the actual submission I assume that this will rather refer to /tr/PROV-N/. > may have the strings 'bundle' near the beginning s/strings/string/ > Section B I did not review section A or B in detail. Some whitespace needed after 'Cheney' both for [PROV-XML] and [PROV-RDF] -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester
Received on Friday, 6 July 2012 14:03:32 UTC