Re: SPARQL grammar W3C Working Draft 24 July 2012 - Feedback / questions to some rule definitions from Andy Seaborne on 2012-09-05 (public-sparql-dev@w3.org from July to September 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 05 Sep 2012 10:58:29 +0100
To: public-sparql-dev@w3.org
Message-ID: <50472245.6090205@epimorphics.com>
Some personal comments: this is not a formal working group response but 
as it was sent the the sparql discussion list, I can try to provide some 
background.

On 02/09/12 19:49, Juergen Pfundt wrote:
> Hello,
>
> as I did not dive into the protocols and public archives, chances 
> might be good that the questions and remarks in this mail have already 
> been explained or resolved. If so, sorry for the inconvenience.
>
>
> The following excerpt of the W3C Working Draft from 24th of July 2012 
> shows the definition of rule [83] PropertyListPathNotEmpty of the 
> SPARQL 1.1 Query Language:
>
> |[83] | 	|PropertyListPathNotEmpty| 	  ::= 	|( VerbPath 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbPath> | 
> VerbSimple 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbSimple> ) 
> ObjectListPath 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rObjectListPath> ( 
> ';' ( ( VerbPath 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbPath> | 
> VerbSimple 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbSimple> ) 
> ObjectList 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rObjectList> )? 
> )*|
>
>
> My understanding of zero or more repetitions of (';'( (VerbPath 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbPath>|VerbSimple 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbSimple>)ObjectList 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rObjectList>)? 
> )* this definition allows queries as shown below:
>
> SELECT ?book ?title ?price
> {
>    ?book dc:title ?title ; ; ; ;
>          ns:price ?price ; ; .
> }

That is correct - it is true in SPARQL 1.0 and Turtle (as published by 
RDF-0WG at last call) as well.

The form

    :s :p ?o ;
       :q ?z ;
    .

(i.e. trailing ;  followed by DOT) is not uncommon.

The grammar could have been

|( ';' ( ( VerbPath 
<http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbPath> | 
VerbSimple 
<http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbSimple> ) 
ObjectList 
<http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rObjectList> ) )* 
(';')?|

but that is ambiguous to an LL(1) parser - it can't tell which part of 
the rule to enter when seeing ';'.

>
> In case my first assumption is confirmed, should rule [83] therefore 
> be rewritten as
> |[83] | 	|PropertyListPathNotEmpty| 	  ::= 	|( VerbPath 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbPath> | 
> VerbSimple 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbSimple> ) 
> ObjectListPath 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rObjectListPath> ( 
> ';' ( VerbPath 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbPath> | 
> VerbSimple 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVerbSimple> ) 
> ObjectList 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rObjectList> )*|
>
>
> Chapter 19 Status of this document chapter in 
> http://www.w3.org/TR/2012/WD-sparql11-query-20120724 lists the 
> non-editorial changes since last publication. One of the items changed 
> refers to PN_LOCAL tokens:
> - Local part of prefix names can now include ":", in line with Turtle 
> standization by the RDF Working Group.
>
> |[169] | 	|PN_LOCAL| 	  ::= 	|(PN_CHARS_U 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPN_CHARS_U> | 
> ':' | [0-9] | PLX 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPLX> ) 
> ((PN_CHARS 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPN_CHARS> | 
> '.' | ':' | PLX 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPLX>)* 
> (PN_CHARS 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPN_CHARS> | 
> ':' | PLX 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPLX>) )?|
> |[170] | 	|PLX| 	  ::= 	|PERCENT 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPERCENT> | 
> PN_LOCAL_ESC 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPN_LOCAL_ESC>|
> |[171] | 	|PERCENT| 	  ::= 	|'%' HEX 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rHEX> HEX 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rHEX>|
> |[172] | 	|HEX| 	  ::= 	|[0-9] | [A-F] | [a-f]|
> |[173] | 	|PN_LOCAL_ESC| 	  ::= 	|'\' ( '_' | '~' | '.' | '-' | '!' | 
> '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' 
> | '#' | '@' | '%' )|
>
>
> Chapter 19 lists at risk features:
> - Allow certain character escape sequences in the local part of 
> prefixed names 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPN_LOCAL>. 
> These are the non-alphanumeric characters allowed in an IRI path. The 
> characters are |~.-!$&'()*+,;=:/?#@%_|.
>
> Does that mean, that ':' will be removed from PN_LOCAL and shifted 
> into PN_LOCAL_ESC ?

No -- PN_LOCAL_ESC are characters that follow a '\' and ':' is used 
directly.

In order to be as closely aligned with Turtle, the ':' can be used 
directly, unescaped, in the local part of a prefixed name.  It can't be 
escape (in a previous draft (Jan 2012), it had to be escaped but Turtle 
changed and SPARQL followed).


> The test query from the dawg test suite will therefore fail with the 
> current definition of the SPARQL syntax:
>
> PREFIX og: <http://ogp.me/ns#>
> SELECT *
> WHERE {
>    ?page og:audio\:title ?title
> }

Test syntax-query/qname-escape-01.rq (manifest entry :test_52) is not in 
the test suite - if you look in the manifest you'll see it's commented out

# :test_52    # obsoleted by decision to allow colons in prefixname 
local parts, but not allow them to be escaped

(the WG should probably remove the entry later down which is confusing)

>
> The following annotations are clearly subjective. They affect the 
> readability of the SPARQL grammar only.

Writing the rules one way round or another can make it easier to hang 
the query code generator by picking out meaning full code paths.

>
> Rule [41]
>
> |[41] | 	|Modify| 	  ::= 	|( 'WITH' iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> )? ( 
> DeleteClause 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rDeleteClause> 
> InsertClause 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rInsertClause>? 
> | InsertClause 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rInsertClause> ) UsingClause 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rUsingClause>* 
> 'WHERE' GroupGraphPattern 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rGroupGraphPattern>| 
>
>
>
> should be equivalent written as
>
> |[41] | 	|Modify| 	  ::= 	|( 'WITH' iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> )? 
> DeleteClause 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rDeleteClause>? 
> InsertClause 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rInsertClause>? 
> UsingClause 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rUsingClause>* 
> 'WHERE' GroupGraphPattern 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rGroupGraphPattern>| 
>
>
>
> Rule [44]
>
> |[44] | 	|UsingClause| 	  ::= 	|'USING' ( iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> | 'NAMED' 
> iri <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> )|
>
>
> in my opinion is better understood when converted to:
>
> |[44] | 	|UsingClause| 	  ::= 	|'USING' 'NAMED'? iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri>|
>
>
This is a good example - the parser can attach code to the path USING 
NAMED iri separately from USING iri.  As you say, it's stylistic.

> The same pattern as in rule [44] goes for rule [92]
>
> |[92] | 	|PathEltOrInverse| 	  ::= 	|PathElt 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPathElt> | '^' 
> PathElt <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPathElt>|
>
>
> which can be simplified to
>
> |[92] | 	|PathEltOrInverse| 	  ::= 	|'^'? PathElt 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPathElt>|
>
>
> Rule [94]
>
> |[94] | 	|PathPrimary| 	  ::= 	|iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> | 'a' | 
> '!' PathNegatedPropertySet 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPathNegatedPropertySet> | 
> '(' Path <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPath> 
> ')' | 'DISTINCT' '(' Path 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPath> ')'|
>
>
> can be shortened to
>
> |[94] | 	|PathPrimary| 	  ::= 	|iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> | 'a' | 
> '!' PathNegatedPropertySet 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPathNegatedPropertySet> | 
> 'DISTINCT'? '(' Path 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rPath> ')'|
>
>
> This pattern also applies to rule [96]
>
> |[96] | 	|PathOneInPropertySet| 	  ::= 	|iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> | 'a' | 
> '^' ( iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> | 'a' )|
>
>
> which results in
>
> |[96] | 	|PathOneInPropertySet| 	  ::= 	|'^'? ( iri 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#riri> | 'a' )|
>
>
> Regarding rule [64] my conjecture is with NIL defined as
> |[161] | 	|NIL| 	  ::= 	|'(' WS 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rWS>* ')'|
>
>
> and WS defined as
> |[162] | 	|WS| 	  ::= 	|#x20 | #x9 | #xD | #xA|
>
>
> and assuming that WS is skipped in the lexical part, rule [64]
>
> |[64] | 	|InlineDataFull| 	  ::= 	|( NIL 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rNIL> | '(' Var 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVar>* ')' ) 
> '{' ( '(' DataBlockValue 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rDataBlockValue>* 
> ')' | NIL 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rNIL> )* '}'|
>
>
> might be rewritten as follows
>
> |[64] | 	|InlineDataFull| 	  ::= 	|( '(' Var 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVar>* ')' ) 
> '{' ( '(' DataBlockValue 
> <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rDataBlockValue>* 
> ')' )* '}'
>
> |
>
Tokeniziers will generate a NIL token when seeing () because it is a 
token so this has to be included in the parser rules.  It could have been

|( NIL <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rNIL> | 
'(' Var <http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#rVar>+ ')' )


|NIL is called out as token to deal with its use as rdf:nil in triple 
patterns.  It then contaminates the expression part of the grammar and 
other places.

It does mean

() .

is illegal at the grammar level while

(:a :b :c) .

is legal as well as making it possibly easier to parse

:s :p () .

as rdf:nil.

>
> Kind regards
> Juergen Pfundt
>
     Thank you for the comments,
     Andy
Received on Wednesday, 5 September 2012 09:59:05 UTC