Re: SPARQL grammar review

Arjohn,

Thanks for thr comments: I have taken some of your suggestions and I am 
currently incorporating and testing them.  They haven't appeared in the editors' 
draft yet (i tend to batch grammar changes up).

The DAWG test area has syntax tests.

Comments inline.

	Andy

Arjohn Kampman wrote:
> Some comments on the SPARQL grammar as included in "SPARQL Query
> Language for RDF", rev 1.327 (2005/04/29). Comments on the grammar are
> based on the assumption that an LALR-grammars [1] are pretty much ideal.
> 
> [1] http://en.wikipedia.org/wiki/LALR_parser

The grammar is developed using LL (javacc to be exact).  It's LL(1) with some 
local, fixed lookahead.  I run the javacc LR check on the grammar as well.

There is always a tradeoff of clarity and extra states to remove lookahead needs.

> 
> 
> * Rule [1] seems too liberal: it doesn't make clear which clauses are
>    allowed for which query types. For example, it allows a LIMIT-clause
>    without a WHERE-clause and an ORDER BY-clause in combination with an
>    AKS or DESCRIBE query. Also, it no longer allows ASK-queries without
>    WHERE-clauses in the way that they are described in the text.
> 
>        [1] Query ::= Prolog
>                      ( SelectClause | ConstructClause | DescribeClause | 
> AskClause )
>                      DatasetClause
>                      ( WhereClause )?
>                      ( OrderClause )?
>                      ( LimitClause )?
>                      ( OffsetClause )?
> 
>    It would be clearer to split this single rule into multiple rules, one
>    for each query type, e.g.:
> 
>        Query ::= Prolog
>                  ( SelectQuery | ConstructQuery | DescribeQuery | AskQuery )
> 
>        SelectQuery    ::= SelectClause DatasetClause
>                           (
>                             WhereClause
>                             (OrderClause)?
>                             (LimitClause)?
>                             (OffsetClause)?
>                           )?
> 
>        ConstructQuery ::= ConstructClause DatasetClause
>                           (
>                             WhereClause
>                             (OrderClause)?
>                             (LimitClause)?
>                             (OffsetClause)?
>                           )?
> 
>        DescribeQuery  ::= DescribeClause DatasetClause (WhereClause)?
> 
>        AskQuery       ::= AskClause DatasetClause GroupGraphPattern

I have included this style.

> 
> 
> * Rules [5] and [6] have unnecessary large lookaheads.
> 
>        [5] SelectClause   ::= 'SELECT' ( 'DISTINCT' )? ( Var )+
>                             | 'SELECT' ( 'DISTINCT' )? '*'
> 
>        [6] DescribeClause ::= 'DESCRIBE' ( VarOrURI )+
>                             | 'DESCRIBE' '*'
> 
>    LALR-alternatives:
> 
>        SelectClause      ::= 'SELECT' ('DISTINCT')? ( (Var)+ | '*' )
> 
>        DescribeClause    ::= 'DESCRIBE' ( (VarOrURI)+ | '*' )

I have included this.

> 
> * Rules [9] - [11] are not parseable without a lookahead > 1.
> 
>        [9]  DatasetClause      ::= ( DefaultGraphClause )? ( 
> NamedGraphClause )*
>        [10] DefaultGraphClause ::= 'FROM' SourceSelector
>        [11] NamedGraphClause   ::= 'FROM' 'NAMED' SourceSelector
> 
>    So far, I have been unable come up with LALR-compatible alternative
>    for these rules.

Nor have I (at least for anything that I consider clear).
The difficulty is making DefaultGraphClause come first.

> 
> * I find rules [19] - [27] very hard to read and understand. Also, rule
>    [24] requires an arbitraraly large lookahead for the choice between
>    UnionGraphPattern and GroupGraphPattern.

The lookahead for UNION isn't arbitary - 3 is enough.
Depending on the parser technique, this can rewritten as a group-or-union 
intermediate state but I felt the grammar would be much less clear.

> 
>        [13] WhereClause            ::= 'WHERE' QueryPattern
>                                      | GroupGraphPattern

QueryPattern has been removed.  That simplifies things.

> 
>        [19] QueryPattern           ::= GraphPatternNotTriples ( '.' )? 
> QueryPatternTail
>                                      | Triples ( '.' )? ( 
> GraphPatternNotTriples ( '.' )? QueryPatternTail )?
>        [20] QueryPatternTail       ::= ( Triples ( '.' )? )? ( 
> GraphPatternNotTriples ( '.' )? QueryPatternTail )?
>        [21] GroupGraphPattern      ::= '{' GraphPatternList
>        [22] GraphPatternList       ::= ( Triples ( '.' )? )? ( 
> GraphPatternNotTriples ( '.' )? GraphPatternList | '}' )
>        [23] GraphPattern           ::= Triples
>                                      | GraphPatternNotTriples
>        [24] GraphPatternNotTriples ::= OptionalGraphPattern
>                                      | UnionGraphPattern
>                                      | GroupGraphPattern
>                                      | GraphGraphPattern
>                                      | Constraint
>        [25] OptionalGraphPattern   ::= 'OPTIONAL' GroupGraphPattern
>        [26] GraphGraphPattern      ::= 'GRAPH' VarOrBNodeOrURI 
> GroupGraphPattern
>        [27] UnionGraphPattern      ::= GroupGraphPattern ( 'UNION' 
> GroupGraphPattern )*
> 
>    I attempted to rewrite these rules and believe the following rules
>    are equivalent (or at least come very close):
> 
>        WhereClause             ::= 'WHERE' GraphPattern
>                                  | GroupGraphPattern
> 
>        GraphPattern            ::= GraphPatternElem ('.')? (GraphPattern)?
> 
>        GraphPatternElem        ::= Triples
>                                  | 'OPTIONAL' GroupGraphPattern
>                                  | 'GRAPH' VarOrBNodeOrURI GroupGraphPattern
>                                  | GroupGraphPattern ( 'UNION' 
> GroupGraphPattern )*
>                                  | Constraint
> 
>        GroupGraphPattern       ::= '{' GraphPattern '}'

Unfortunately that makes DOT (".") optional between triples.

"?a ?b ?c ?d ?e ?f"

parsing as one "Triples" then another "Triples".  Complexity of these rules 
comes from making DOT optional after blocks of triples, where the next parse 
iterm is either end of group "}" or something that is not just another block of 
triples (GraphPatternNotTriples) but not allowing omission of DOT between two 
triple patterns.

DOT is not strictly necessary anywhere but that makes triples very 
accident-prone and very non-Turtle.  One of those tradeoffs.

> 
> * I find the definition of rules [32], [34] and [36] with their
>    completely optional productions a bit awkward.
> 
>        [31] Triples1              ::= VarOrTerm PropertyListNotEmpty
>                                     | TriplesNode PropertyList
>        [32] PropertyList          ::= ( PropertyListNotEmpty )?
>        [33] PropertyListNotEmpty  ::= Verb ObjectList PropertyListTail
>        [34] PropertyListTail      ::= ( ';' PropertyList )?
>        [35] ObjectList            ::= Object ObjectTail
>        [36] ObjectTail            ::= ( ',' ObjectList )?
> 
>        [40] BlankNodePropertyList ::= '[' PropertyList ']'


These were based on cwm's "n3.n3" and in particular I have tried to keep the 
names the same.

One change I did make was to introduce PropertyListNotEmpty to remove the 
possibility of single graph node expressions like ":a ." (that parses to no 
triples with cwm).  Unlike Turtle, whitespace between S/P/O is only necessary to 
break between qnames. "<a><b><c>." is legal.

> 
>    IMHO, moving the optionality or even the entire production to the
>    enclosing rules improves the readability of the grammar:
> 
>        Triples1             ::= VarOrTerm PropertyList
>                               | TriplesNode PropertyList?
> 
>        PropertyList         ::= Verb ObjectList ( ';' PropertyList? )?
> 
>        ObjectList           ::= Object ( ',' Object )*
> 
>        BlankNodePropertyList ::= '[' PropertyList? ']'
> 
> 
> Hope you can include these suggestions in the gramamr,
> 
> Arjohn
> 

Received on Friday, 6 May 2005 12:18:09 UTC