Re: review of rq24-algebra from Seaborne, Andy on 2006-12-15 (public-rdf-dawg@w3.org from October to December 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 15 Dec 2006 18:03:33 +0000
To: Lee Feigenbaum <feigenbl@us.ibm.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <4582E375.7000808@hp.com>
Thanks for the comments, especially for making alternative suggestions where 
it doesn't work for you currently.

Lee Feigenbaum wrote:
> As promised. I've mainly focused on wording and presentation techniques in 
> an effort to make this as clear as possible. The details of the algebra 
> per se are being worked out as separate matters. 
> 
> Summary:I think it needs a bit more rigor in a few places I've tried to 
> explain below. I think it's excellent overall and (IBM hat on) will make 
> the specification far more accessible to implementors and those wishing to 
> study the query language. When we combine it with the example-driven 
> presentation of the rest of rq24, I think we'll be in very good shape 
> indeed. 
> 
> Lee
> 
> --
> 
> X SPARQL The SPARQL Algebra
> 
> """
> The algebraic expression is derived from a query string by parsing and 
> transforming the abstract syntax tree. 
> """
> 
> This sentence confuses me a bit as it can be read as "parsing ... the 
> abstract syntax tree". Perhaps better as:
> 
> """
> Parsing a query string yields an abstract syntax tree which can be 
> transformed into an algebraic expression.
> """

That leaves the option of not having an algebraic expression :-)

> 

Changed to:

"""
The algebraic expression is formed from the query string by first parsing, and 
then applying transformations as described below. The result of a query can 
then be calculated by the evaluation rules applied to the algebraic 
expression.  string by first parsing, and then applying transformations as 
described below.
"""

> 
> X.1.1 Mapping Graph Patterns
> 
> 
> Having implemented the mapping based on rq23, this section is very 
> appreciated by me as being helpful to implementors looking for guidance in 
> associating the syntax of a query with its logical meaning. I think that 
> the content in its look fine, but the presentation / elucidation needs 
> some work to make the mapping from concrete to abstract syntax more 
> formal. 
> In particular, I think that we should completely avoid conflating 
> concrete syntax tokens (e.g. GroupGraphPattern) with abstract syntax nodes 
> (e.g. Join(...)). So I think that we should list the relevant concrete 
> syntax rules (that's there already), list the abstract syntax nodes 
> (that's sort of there in the bullet list, but it should explicitly list 
> the names of the nodes -- "Join", "LeftJoin", etc.). Then when Step 1 says 
> "Replace all BasicGraphPattern  elements by BGP(list of triple patterns)", 
> "BGP" will be the name of an abstract syntax node already introduced. 
> Similarly for the other steps.

So you suggest moving the list just before X.1.2 to the beginning?

> 
> [I read to the end; I think perhaps you're distinguishing between 
> "abstract syntax tree" and "abstract query"? If that's the case, I don't 
> think I agree with the distinction. I see a concrete syntax tree full of 
> OPTIONAL and GroupGraphPattern and the like, and then an abstract syntax 
> tree (== abstract query) full of Join() and LeftJoin(). The algebra 
> operates over this latter tree.]

Abstract and concrete syntax tree terminology means different things to 
different people.  My idea of a concrete syntax is much more concrete than 
yours!  Checking around the web, the distinction isn't that automatic (just an 
example: concrete migh include token junk like whitespace and comments to some 
people).

I wanted to have a form that gets away from some of the structural, surface 
details of a query which an AST might be though to capture.  e.g. the rule 
about Union/Group might still be that in an AST.

So I prefer to have a SPARQL specific term for the "abstract query" so that it 
does not bring context with it.

Maybe just "syntax tree", not concrete or abstract.  The phrase "abstract 
syntax" only appears twice.

> 
> Step 4 - I still have a TODO to write up some examples to see if I can 
> concerns about this step, but I'd really like to hear someone else's take 
> on the algorithm in Step 4.

ARQ implements exactly this algorithm (access by using
arq.qparse --plan --engine=ref --query QueryFile.rq)

> 
> Step 5 Simplification - While this doesn't hurt, we don't really need to 
> specify this sort of trivial transformation -- the evaluation of the 
> algebra should yield the same result regardless of how simplified the 
> abstract syntax tree is.

It makes the examples more readable.  In the short cases it's OK but for the 
longer examples, I found it was adding noise, obscuring the point of the example.

> 
> X.1.2
> 
> At least one example (preferably one involving left associativity) would 
> be useful which included the full transform of query string > concrete 
> syntax tree > abstract syntax tree

It would be nice but I'd just make the counter argument that it runs the risk 
of needing to define the terminology for the syntax tree.  Doing that just to 
write an example might be more confusing than helpful.

If you want to write one out, we can see about including it.


The example:

{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 } OPTIONAL { ?s :p3 ?v3 } }

uses left associativity.


> 
> X.1.3
> 
> There's a typo - "Slide" should be "Slice"

Done.

> 
> [There's other typos in the document too, but I'm not enumerating them 
> until we've agreed on overall structure / semantics and integrated the 
> resulting text into rq24 proper.]

Probably :-) The symbols and technical words do somewhat confuse the spell 
checker.

> 
> X.1.3 could use the same, formal treatment which maps from concrete syntax 
> to abstract expressions that X.1.1 receives.

Which CVS version are you reviewing?

> 
> 
> X.2.1
> 
> Filter:
>   We don't define what mu(expression) means anywhere before this. It's 
> somewhat obvious, but should at least be defined. Also: "boolean effective 
> value" --> "effective boolean value". 

Will define.

expr(mu) is a bit better (the both work as mu is a mapping function but it's a 
better fit to sec 11 as expr(mu)).

> 
> Join:
>   You combine solutions here using just "x union y" rather than "x 
> set-union y" or "merge(x, y)" . Is this purposeful?

No purpose - merge might be better.  "union" is correct because the arguments 
are plain-old regular sets. I tried to put in set-union when it might be unclear.

> 
> X.3
> 
> I like the concept of "active graph." If I recall correctly, the current 
> spec. is worded such that GRAPH <g> { ... } - basically sets a new default 
> graph for the stuff inside { ... }, and then BGP matching is defined to be 
> against the default graph. If we adopt this active graph terminology, we 
> should use it throughout, and make sure the BGP matching section refers to 
> the active graph.

On balence, it's probably a useful term but one caveat: active graph can also 
lead to an implication that there is always one such.  But if compiling to a 
quad store, there isn't in "GRAPH ?g {}".  An effect of this is that FILTERs 
can not presume to be acting over a distinguished graph at all times and so 
mush not look inside the graph.  (I found out the hard way - ARQ's quad engine 
fails a filter function extension test because the extension FILTER function 
tests for list membership.)


	Thanks
	Andy
Received on Friday, 15 December 2006 18:04:04 UTC