Re: Grammar for DAWG query language

On Mon, 20 Sep 2004 17:41:00 +0100, "Seaborne, Andy" <andy.seaborne@hp.com> wrote:

> I have done a first attempt at a grammar that parses the outline syntax
> from the F2F.  
> It parses the example query.  There is also a rather long query below
> that exhibits most features at the end.
> 
> The grammar is more general than the example:
>  + Has OPTIONAL as well as [] for optionals

Yuck.  The whole point of suggesting [] was to ditch OPTIONAL and
blocks allowing bunching a set of triples as optional.  Having both
defeats this point.  Optionals seem to be an important feature that
this QL will be providing as far as I see from the WG, that they
deserve a matching pair of brackets [] () {} <> etc. to easily write
them down.

>  + Has grouping by () - this allows blocks after "SOURCE ?src"

Double yuck.  ()s everywhere.  Looks like inner ANDs are scoped, or
at least it is not clear, and will be hard to explain to users.

I'd prefer adding {}s for sets of triples or scoping them if you
really really really need them.  In that case, ditch [] for optionals
and use that mechanism.  Please don't do both without good reason.

At present we have 3 ways to do optionals and 2 ways to do
bind source.  That's 3 too many.



I took the html grammar and made it text to make it easier for me to
read your proposed changes.  My comments:

  Optional commas still live, so I killed them.

  Made some BNF fixes that I found improved readability:
   A ::= (B | C | D) easier to read B | C | D
   B ::= A A* change to B ::= A+

  Added WhereClause to match the other *Clause terms in Query

  Moved the '?' after some 0 or 1 terms into the calling rule
  such as FunctionCall with ArgList - consistency with other rules.

  Listed the terminals, although they still have no lexical forms
  such as "(" for LBRACE etc.

  PatternLiteral is not defined

  Added ConstructTriple and ConstructTripleList since constructed
  triples are not patterns, but can for example take Blank Nodes.  

  Added Blank non-terminal to match.

I much prefer the EBNF used in the XML REC.


The result is the grammar below


> One issue arose:
> 
>   SOURCE ?src (?x ?y ?z) AND ?z < 2
>   
> Does the AND apply to the inner SOURCE triple
> e.g. is it:
> 
>  ( SOURCE ?src (?x ?y ?z) ) AND ?z < 2
> or 
>  SOURCE ?src ( (?x ?y ?z) AND ?z < 2 )
> 
> Because SOURCE is a conjunctive element, the answer is the same even
> though the parse trees are different.  I hope! 


I think I guess this is time to propose an alternative since the
abutting of SOURCE ?src near a triple isn't working as far as clarity
goes.  This is because that in 
   SOURCE ?src (?x ?y ?z)
users are unsure if SOURCE ?src is part of the () following or
previous.  One simple approach is to move the term in the triple:

   (?x ?y ?z SOURCE ?src)

using the keyword to separate the triple from the (need a better
word) property/attribute bound to the triple.  Better than quad since
you can see what the fourth thing is.  Also allows us to add other
keywords later.


    Digression into a slightly more complex alternative which could
    also be used for more general extensions to the DAWG QL:

       (?x ?y ?z)->source(src?)

    I mention this as a possible extension method, allowing 

       (?x ?y ?z) "->" <QNAME> <PAREN> ArgList? <RPAREN>

    analogous to what we have now inside constraint expressions,
    FunctionCall in the BNF
      FunctionCall:: <AMP> <QNAME> <LPAREN> ArgList? <RPAREN>

    which could be used like
       (?x ?y ?z)->foo:bar(1)

    where 'source' is a standard name.

    Downside: that the dawg source() BINDs a variable but we don't
    propose that for user extension functions would would be pure
    functions - this would possibly be confusing.

    this could also be used if we added groups of triples using {}s
    like:
      { (?x ?y ?z) (?a ?b ?c) }->optional->source(?src)->foo:bar(1)

    I'm not particularly attached to -> as the operator.



> Attached is an HTML file mechanically produced by jjdoc.
> Terminals can be found in the full grammar.
> 
> Full details, including terminals in:
> http://cvs.sourceforge.net/viewcvs.py/jena/BRQL/Grammar/dawg.jj?rev=1.5
> but do check for the latest version.  It takes a while for the web
> interface to catch up with the true state of CVS.  

I would prefer this grammar defined in a standard EBNF form document
not derived from a particular implementation.

Thanks

Dave

---------


                           BNF for DAWG QL

                            NON-TERMINALS

CompilationUnit ::= Query <EOF>

Query ::= PrefixDecl*
        ( SelectClause | ConstructClause | DescribeClause | AskClause )
	PrefixDecl* FromClause? WhereClause?

SelectClause ::= <SELECT> VarAsNode+ | <SELECT> <STAR>

DescribeClause ::= <DESCRIBE> VarOrURI+ | <DESCRIBE> <STAR>

ConstructClause ::= <CONSTRUCT> ConstructPattern | <CONSTRUCT> <STAR>

AskClause ::= <ASK>

FromClause ::= <FROM> FromSelector+

FromSelector ::= URL /* not URI - in syntax/BNF terms, no difference */

WhereClause ::= <WHERE> GraphPattern

GraphPattern ::= PatternGroup

SourceGraphPattern ::= <SOURCE> <STAR> PatternGroup1 |
         <SOURCE> VarOrURI PatternGroup1

OptionalGraphPattern ::= <OPTIONAL> PatternGroup1 |
         <LBRACKET> PatternGroup <RBRACKET>

PatternGroup ::= PatternElement+

PatternElement ::= TriplePatternList | ExplicitGroup | PatternElementForms

PatternGroup1 ::= TriplePattern | ExplicitGroup | PatternElementForms

PatternElementForms ::= SourceGraphPattern | OptionalGraphPattern |
         <AND> Expression

ExplicitGroup ::= <LPAREN> PatternGroup <RPAREN>

TriplePatternList ::= TriplePattern+

TriplePattern ::= <LPAREN> VarOrURI VarOrURI VarOrLiteral <RPAREN>

ConstructPattern ::= ConstructTripleList

ConstructTripleList ::= ConstructTriple+

ConstructTriple ::= <LPAREN> VarOrURIorBlank VarOrURI VarOrLiteralOrBlank <RPAREN>

VarOrURI ::= VarAsNode | URI

VarOrLiteral ::= VarAsNode | Literal

VarOrURIorBlank ::= VarAsNode | URI | Blank

VarOrLiteralOrBlank ::= VarAsNode | Literal | blank

VarAsNode ::= <VAR>

VarAsExpr ::= <VAR>

PrefixDecl ::= <PREFIX> <NCNAME> <COLON> QuotedURI | <PREFIX> <COLON> QuotedURI

Expression ::= ConditionalOrExpression

ConditionalOrExpression ::= ConditionalXorExpression ( <SC_OR> ConditionalXorExpression )*

ConditionalXorExpression ::= ConditionalAndExpression

ConditionalAndExpression ::= ValueLogical ( <SC_AND> ValueLogical )*

ValueLogical ::= StringEqualityExpression

StringEqualityExpression ::= NumericalLogical ( <STR_EQ> NumericalLogical |
         <STR_NE> NumericalLogical |
         <STR_MATCH> PatternLiteral |
         <STR_NMATCH> PatternLiteral )*

NumericalLogical ::= EqualityExpression

EqualityExpression ::= RelationalExpression ( <EQ> RelationalExpression |
         <NEQ> RelationalExpression )?

RelationalExpression ::= NumericExpression ( <LT> NumericExpression |
         <GT> NumericExpression |
         <LE> NumericExpression |
         <GE> NumericExpression )?

NumericExpression ::= ShiftExpression

ShiftExpression ::= AdditiveExpression

AdditiveExpression ::= MultiplicativeExpression
        ( <PLUS> MultiplicativeExpression | <MINUS> MultiplicativeExpression )*

MultiplicativeExpression ::= UnaryExpression
        ( <STAR> UnaryExpression | <SLASH> UnaryExpression | <REM> UnaryExpression )*

UnaryExpression ::= <PLUS> UnaryExpressionNotPlusMinus |
         <MINUS> UnaryExpressionNotPlusMinus |
         UnaryExpressionNotPlusMinus

UnaryExpressionNotPlusMinus ::= ( <TILDE> | <BANG> ) 
         UnaryExpression | PrimaryExpression

PrimaryExpression ::= VarAsExpr | Literal | FunctionCall |
         <LPAREN> Expression <RPAREN>

FunctionCall ::= <AMP> <QNAME> <LPAREN> ArgList? <RPAREN>

ArgList ::= VarOrLiteral ( <COMMA> VarOrLiteral )*

Literal ::= URI | NumericLiteral | TextLiteral

NumericLiteral ::= <INTEGER_LITERAL> | <FLOATING_POINT_LITERAL>

TextLiteral ::= ( <STRING_LITERAL1> | <STRING_LITERAL2> )
         <LANG>? ( <DATATYPE> URI )?

PatternLiteral ::= /* not defined */

URL ::= URI

URI ::= QuotedURI | QName

QName ::= <QNAME>

QuotedURI ::= <URI>

Blank ::= <BLANK>


Terminals

<AMP>
<AND>
<ASK>
<BANG>
<BLANK>
<COLON>
<COMMA>
<CONSTRUCT>
<DATATYPE>
<DESCRIBE>
<EOF>
<EQ>
<FLOATING_POINT_LITERAL>
<FROM>
<GE>
<GT>
<INTEGER_LITERAL>
<LANG>
<LBRACKET>
<LE>
<LPAREN>
<LT>
<MINUS>
<NCNAME>
<NEQ>
<OPTIONAL>
<PLUS>
<PREFIX>
<QNAME>
<RBRACKET>
<REM>
<RPAREN>
<SC_AND>
<SC_OR>
<SELECT>
<SLASH>
<SOURCE>
<STAR>
<STRING_LITERAL1>
<STRING_LITERAL2>
<STR_EQ>
<STR_MATCH>
<STR_NE>
<STR_NMATCH>
<TILDE>
<URI>
<VAR>
<WHERE>

Received on Friday, 24 September 2004 10:45:19 UTC