SPARQL grammar review

Some comments on the SPARQL grammar as included in "SPARQL Query
Language for RDF", rev 1.327 (2005/04/29). Comments on the grammar are
based on the assumption that an LALR-grammars [1] are pretty much ideal.

[1] http://en.wikipedia.org/wiki/LALR_parser


* Rule [1] seems too liberal: it doesn't make clear which clauses are
   allowed for which query types. For example, it allows a LIMIT-clause
   without a WHERE-clause and an ORDER BY-clause in combination with an
   AKS or DESCRIBE query. Also, it no longer allows ASK-queries without
   WHERE-clauses in the way that they are described in the text.

       [1] Query ::= Prolog
                     ( SelectClause | ConstructClause | DescribeClause | 
AskClause )
                     DatasetClause
                     ( WhereClause )?
                     ( OrderClause )?
                     ( LimitClause )?
                     ( OffsetClause )?

   It would be clearer to split this single rule into multiple rules, one
   for each query type, e.g.:

       Query ::= Prolog
                 ( SelectQuery | ConstructQuery | DescribeQuery | AskQuery )

       SelectQuery    ::= SelectClause DatasetClause
                          (
                            WhereClause
                            (OrderClause)?
                            (LimitClause)?
                            (OffsetClause)?
                          )?

       ConstructQuery ::= ConstructClause DatasetClause
                          (
                            WhereClause
                            (OrderClause)?
                            (LimitClause)?
                            (OffsetClause)?
                          )?

       DescribeQuery  ::= DescribeClause DatasetClause (WhereClause)?

       AskQuery       ::= AskClause DatasetClause GroupGraphPattern


* Rules [5] and [6] have unnecessary large lookaheads.

       [5] SelectClause   ::= 'SELECT' ( 'DISTINCT' )? ( Var )+
                            | 'SELECT' ( 'DISTINCT' )? '*'

       [6] DescribeClause ::= 'DESCRIBE' ( VarOrURI )+
                            | 'DESCRIBE' '*'

   LALR-alternatives:

       SelectClause      ::= 'SELECT' ('DISTINCT')? ( (Var)+ | '*' )

       DescribeClause    ::= 'DESCRIBE' ( (VarOrURI)+ | '*' )

* Rules [9] - [11] are not parseable without a lookahead > 1.

       [9]  DatasetClause      ::= ( DefaultGraphClause )? ( 
NamedGraphClause )*
       [10] DefaultGraphClause ::= 'FROM' SourceSelector
       [11] NamedGraphClause   ::= 'FROM' 'NAMED' SourceSelector

   So far, I have been unable come up with LALR-compatible alternative
   for these rules.

* I find rules [19] - [27] very hard to read and understand. Also, rule
   [24] requires an arbitraraly large lookahead for the choice between
   UnionGraphPattern and GroupGraphPattern.

       [13] WhereClause            ::= 'WHERE' QueryPattern
                                     | GroupGraphPattern

       [19] QueryPattern           ::= GraphPatternNotTriples ( '.' )? 
QueryPatternTail
                                     | Triples ( '.' )? ( 
GraphPatternNotTriples ( '.' )? QueryPatternTail )?
       [20] QueryPatternTail       ::= ( Triples ( '.' )? )? ( 
GraphPatternNotTriples ( '.' )? QueryPatternTail )?
       [21] GroupGraphPattern      ::= '{' GraphPatternList
       [22] GraphPatternList       ::= ( Triples ( '.' )? )? ( 
GraphPatternNotTriples ( '.' )? GraphPatternList | '}' )
       [23] GraphPattern           ::= Triples
                                     | GraphPatternNotTriples
       [24] GraphPatternNotTriples ::= OptionalGraphPattern
                                     | UnionGraphPattern
                                     | GroupGraphPattern
                                     | GraphGraphPattern
                                     | Constraint
       [25] OptionalGraphPattern   ::= 'OPTIONAL' GroupGraphPattern
       [26] GraphGraphPattern      ::= 'GRAPH' VarOrBNodeOrURI 
GroupGraphPattern
       [27] UnionGraphPattern      ::= GroupGraphPattern ( 'UNION' 
GroupGraphPattern )*

   I attempted to rewrite these rules and believe the following rules
   are equivalent (or at least come very close):

       WhereClause             ::= 'WHERE' GraphPattern
                                 | GroupGraphPattern

       GraphPattern            ::= GraphPatternElem ('.')? (GraphPattern)?

       GraphPatternElem        ::= Triples
                                 | 'OPTIONAL' GroupGraphPattern
                                 | 'GRAPH' VarOrBNodeOrURI GroupGraphPattern
                                 | GroupGraphPattern ( 'UNION' 
GroupGraphPattern )*
                                 | Constraint

       GroupGraphPattern       ::= '{' GraphPattern '}'

* I find the definition of rules [32], [34] and [36] with their
   completely optional productions a bit awkward.

       [31] Triples1              ::= VarOrTerm PropertyListNotEmpty
                                    | TriplesNode PropertyList
       [32] PropertyList          ::= ( PropertyListNotEmpty )?
       [33] PropertyListNotEmpty  ::= Verb ObjectList PropertyListTail
       [34] PropertyListTail      ::= ( ';' PropertyList )?
       [35] ObjectList            ::= Object ObjectTail
       [36] ObjectTail            ::= ( ',' ObjectList )?

       [40] BlankNodePropertyList ::= '[' PropertyList ']'

   IMHO, moving the optionality or even the entire production to the
   enclosing rules improves the readability of the grammar:

       Triples1             ::= VarOrTerm PropertyList
                              | TriplesNode PropertyList?

       PropertyList         ::= Verb ObjectList ( ';' PropertyList? )?

       ObjectList           ::= Object ( ',' Object )*

       BlankNodePropertyList ::= '[' PropertyList? ']'


Hope you can include these suggestions in the gramamr,

Arjohn

-- 
arjohn.kampman@aduna.biz
Aduna BV - http://aduna.biz/
Prinses Julianaplein 14-b, 3817 CS Amersfoort, The Netherlands
tel. +31-(0)33-4659987

Received on Monday, 2 May 2005 14:01:56 UTC