W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2004

RE: Grammar for DAWG query language

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 24 Sep 2004 17:30:54 +0100
Message-ID: <8D5B24B83C6A2E4B9E7EE5FA82627DC920E5DF@sdcexcea01.emea.cpqcorp.net>
To: "Dave Beckett" <dave.beckett@bristol.ac.uk>, "RDF Data Access Working Group" <public-rdf-dawg@w3.org>

> From: Dave Beckett <mailto:dave.beckett@bristol.ac.uk>
> Date: 24 September 2004 11:43
> 
> On Mon, 20 Sep 2004 17:41:00 +0100, "Seaborne, Andy"
> <andy.seaborne@hp.com> wrote: 
> 
> > I have done a first attempt at a grammar that parses the outline
> > syntax from the F2F. It parses the example query.  There is also a
> > rather long query below that exhibits most features at the end.
> > 
> > The grammar is more general than the example:
> >  + Has OPTIONAL as well as [] for optionals
> 
> Yuck.  The whole point of suggesting [] was to ditch OPTIONAL and
> blocks allowing bunching a set of triples as optional.  Having both
> defeats this point.  Optionals seem to be an important feature that
> this QL will be providing as far as I see from the WG, that they
> deserve a matching pair of brackets [] () {} <> etc. to easily write
> them down.

A difficulty with allocating one kind of symbol to each kind of grouping
is that it assumes knowing what all the blocks that may be introduced.
I suggested a separating the grouping from the use of the group - hence
putting (parens) in as a general grouping.  It was {brace} before.  Both
have programming language analogies.

In working though some examples, it struck me as very odd that many
things had a keyword style but optionals used something different.  All
the clauses start with a keyword as does SOURCE and AND.

> 
> >  + Has grouping by () - this allows blocks after "SOURCE ?src"
> 
> Double yuck.  ()s everywhere.  Looks like inner ANDs are scoped, or
> at least it is not clear, and will be hard to explain to users.

Respectfully, I disagree it will be hard to explain to users.  I think a
regular approach to the syntax is better because the application writer
is likely to have some programming experience and is used to regularity
and compositionality.

> 
> I'd prefer adding {}s for sets of triples or scoping them if you
> really really really need them.  In that case, ditch [] for optionals
> and use that mechanism.  Please don't do both without good reason.

We could have {braces} for groups, keep () for triples.

I just checked and this would not require the AND keywork then.

> 
> At present we have 3 ways to do optionals and 2 ways to do
> bind source.  That's 3 too many.
> 
> I took the html grammar and made it text to make it easier for me to
> read your proposed changes.  My comments:
> 
>   Optional commas still live, so I killed them.
> 
>   Made some BNF fixes that I found improved readability:
>    A ::= (B | C | D) easier to read B | C | D
>    B ::= A A* change to B ::= A+
> 
>   Added WhereClause to match the other *Clause terms in Query
> 
>   Moved the '?' after some 0 or 1 terms into the calling rule
>   such as FunctionCall with ArgList - consistency with other rules.
> 
>   Listed the terminals, although they still have no lexical forms
>   such as "(" for LBRACE etc.
> 
>   PatternLiteral is not defined
> 
>   Added ConstructTriple and ConstructTripleList since constructed
>   triples are not patterns, but can for example take Blank Nodes.

The underlying proposal you are making is bNodes in construct patterns.
Let's sort that out separately.

> 
>   Added Blank non-terminal to match.

Am I right in guessing the form is "_:a"?  i.e. URI-like with an illegal
namespace prefix?

> 
> I much prefer the EBNF used in the XML REC.
> 
> 
> The result is the grammar below
> 
> 
> > One issue arose:
> > 
> >   SOURCE ?src (?x ?y ?z) AND ?z < 2
> > 
> > Does the AND apply to the inner SOURCE triple
> > e.g. is it:
> > 
> >  ( SOURCE ?src (?x ?y ?z) ) AND ?z < 2
> > or
> >  SOURCE ?src ( (?x ?y ?z) AND ?z < 2 )
> > 
> > Because SOURCE is a conjunctive element, the answer is the same even
> > though the parse trees are different.  I hope!
> 
> 
> I think I guess this is time to propose an alternative since the
> abutting of SOURCE ?src near a triple isn't working as far as clarity
> goes.  This is because that in
>    SOURCE ?src (?x ?y ?z)
> users are unsure if SOURCE ?src is part of the () following or
> previous.  One simple approach is to move the term in the triple:
> 
>    (?x ?y ?z SOURCE ?src)

The rq23 doc talks in terms of graph patterns, not individual triples.
As it was possible to have a graph pattern after SOURCE I put it in.  We
have other conjunctive graph patterns - I much prefer a regular
language, built on some building blocks with regular composition.  See
{braces} discussion above.

> 
> using the keyword to separate the triple from the (need a better
> word) property/attribute bound to the triple.  Better than quad since
> you can see what the fourth thing is.  Also allows us to add other
> keywords later.
> 
> 
>     Digression into a slightly more complex alternative which could
>     also be used for more general extensions to the DAWG QL:
> 
>        (?x ?y ?z)->source(src?)
> 
>     I mention this as a possible extension method, allowing
> 
>        (?x ?y ?z) "->" <QNAME> <PAREN> ArgList? <RPAREN>
> 
>     analogous to what we have now inside constraint expressions,
>     FunctionCall in the BNF
>       FunctionCall:: <AMP> <QNAME> <LPAREN> ArgList? <RPAREN>
> 
>     which could be used like
>        (?x ?y ?z)->foo:bar(1)
> 
>     where 'source' is a standard name.

I don't understand this.  What happens when a function is tied to a
triple pattern or a matched triple?

Could you give an example of such a function that might be used as an
extension.  I can see the case for extensions to test values, and that
might extend to functions that do bind variables.  But I can't think of
an example where the matched subgraph or triple is passed to the
function.  It would help me to have a concrete of such a use case.

Or have I misread your proposal?


> 
>     Downside: that the dawg source() BINDs a variable but we don't
>     propose that for user extension functions would would be pure
>     functions - this would possibly be confusing.
> 
>     this could also be used if we added groups of triples using {}s
>     like:
>       { (?x ?y ?z) (?a ?b ?c) }->optional->source(?src)->foo:bar(1)
> 
>     I'm not particularly attached to -> as the operator.
> 
> 
> 
> > Attached is an HTML file mechanically produced by jjdoc.
> > Terminals can be found in the full grammar.
> > 
> > Full details, including terminals in:
> >
http://cvs.sourceforge.net/viewcvs.py/jena/BRQL/Grammar/dawg.jj?rev=1.5
> > but do check for the latest version.  It takes a while for the web
> > interface to catch up with the true state of CVS.
> 
> I would prefer this grammar defined in a standard EBNF form document
> not derived from a particular implementation.

In the published document, there will be a better looking grammar.  Eric
has some tools to produce it.  While its in flux, I need a way to
produce and test the grammar.  If there are tools to work on the EBNF
syntax then fine - but I don't think there are.  (By the way, javacc
does some LR testing as well so this at least points out factoring
points needed for yacc.)

If you want to wait until there is a prettier form, then fine.  I just
put out what I had before disappearing for a few days.

	Andy

> 
> Thanks
> 
> Dave
> 
Received on Friday, 24 September 2004 16:32:23 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:20 GMT