Re: review of rq24-algebra from Lee Feigenbaum on 2006-12-29 (public-rdf-dawg@w3.org from October to December 2006)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Fri, 29 Dec 2006 16:50:28 -0500
To: public-rdf-dawg@w3.org
Message-ID: <OF1F3D5219.0E947572-ON85257253.0076FB03-85257253.0077F975@us.ibm.com>
"Seaborne, Andy" <andy.seaborne@hp.com> wrote on 12/15/2006 01:03:33 PM:

> Thanks for the comments, especially for making alternative suggestions 
where 
> it doesn't work for you currently.
> 
> Lee Feigenbaum wrote:
> > As promised. I've mainly focused on wording and presentation 
techniques in 
> > an effort to make this as clear as possible. The details of the 
algebra 
> > per se are being worked out as separate matters. 
> > 
> > Summary:I think it needs a bit more rigor in a few places I've tried 
to 
> > explain below. I think it's excellent overall and (IBM hat on) will 
make 
> > the specification far more accessible to implementors and those 
wishing to 
> > study the query language. When we combine it with the example-driven 
> > presentation of the rest of rq24, I think we'll be in very good shape 
> > indeed. 
> > 
> > Lee
> > 
> > --
> > 
> > X SPARQL The SPARQL Algebra
> > 
> > """
> > The algebraic expression is derived from a query string by parsing and 

> > transforming the abstract syntax tree. 
> > """
> > 
> > This sentence confuses me a bit as it can be read as "parsing ... the 
> > abstract syntax tree". Perhaps better as:
> > 
> > """
> > Parsing a query string yields an abstract syntax tree which can be 
> > transformed into an algebraic expression.
> > """
> 
> That leaves the option of not having an algebraic expression :-)
> 
> > 
> 
> Changed to:
> 
> """
> The algebraic expression is formed from the query string by first 
> parsing, and 
> then applying transformations as described below. The result of a query 
can 
> then be calculated by the evaluation rules applied to the algebraic 
> expression.  string by first parsing, and then applying transformations 
as 
> described below.
> """

Great, except that end part is a bit fishy. Something lost in translation?

> > 
> > X.1.1 Mapping Graph Patterns
> > 
> > 
> > Having implemented the mapping based on rq23, this section is very 
> > appreciated by me as being helpful to implementors looking for 
guidance in 
> > associating the syntax of a query with its logical meaning. I think 
that 
> > the content in its look fine, but the presentation / elucidation needs 

> > some work to make the mapping from concrete to abstract syntax more 
> > formal. 
> > In particular, I think that we should completely avoid conflating 
> > concrete syntax tokens (e.g. GroupGraphPattern) with abstract syntax 
nodes 
> > (e.g. Join(...)). So I think that we should list the relevant concrete 

> > syntax rules (that's there already), list the abstract syntax nodes 
> > (that's sort of there in the bullet list, but it should explicitly 
list 
> > the names of the nodes -- "Join", "LeftJoin", etc.). Then when Step 1 
says 
> > "Replace all BasicGraphPattern  elements by BGP(list of triple 
patterns)", 
> > "BGP" will be the name of an abstract syntax node already introduced. 
> > Similarly for the other steps.
> 
> So you suggest moving the list just before X.1.2 to the beginning?

Yup, once I got down to X.1.2 I thought to myself "hey, this is exactly 
what I was talking about before."

> > [I read to the end; I think perhaps you're distinguishing between 
> > "abstract syntax tree" and "abstract query"? If that's the case, I 
don't 
> > think I agree with the distinction. I see a concrete syntax tree full 
of 
> > OPTIONAL and GroupGraphPattern and the like, and then an abstract 
syntax 
> > tree (== abstract query) full of Join() and LeftJoin(). The algebra 
> > operates over this latter tree.]
> 
> Abstract and concrete syntax tree terminology means different things to 
> different people.  My idea of a concrete syntax is much more concrete 
than 
> yours!  Checking around the web, the distinction isn't that 
> automatic (just an 
> example: concrete migh include token junk like whitespace and 
> comments to some 
> people).
> 
> I wanted to have a form that gets away from some of the structural, 
surface 
> details of a query which an AST might be though to capture.  e.g. the 
rule 
> about Union/Group might still be that in an AST.
> 
> So I prefer to have a SPARQL specific term for the "abstract query" 
> so that it 
> does not bring context with it.
> 
> Maybe just "syntax tree", not concrete or abstract.  The phrase 
"abstract 
> syntax" only appears twice.

Right, that would be fine with me -- my main point wasn't that my 
particular understanding of concrete vs. abstract syntax is correct but 
rather that the text as it was worded when I reviewed it confused me. If 
we define our terms and how they correspond to the various stages we're 
dealing with (surface syntax, parse tree, algebra tree, algebra 
expression... ?) I think we'll be fine. 

Fred's review pointed out a few places where the thread can be more 
tightly pulled through this entire chain of representations.

> > Step 4 - I still have a TODO to write up some examples to see if I can 

> > concerns about this step, but I'd really like to hear someone else's 
take 
> > on the algorithm in Step 4.
> 
> ARQ implements exactly this algorithm (access by using
> arq.qparse --plan --engine=ref --query QueryFile.rq)

(for the moment, defering to Fred's comments here)

> > Step 5 Simplification - While this doesn't hurt, we don't really need 
to 
> > specify this sort of trivial transformation -- the evaluation of the 
> > algebra should yield the same result regardless of how simplified the 
> > abstract syntax tree is.
> 
> It makes the examples more readable.  In the short cases it's OK but for 
the 
> longer examples, I found it was adding noise, obscuring the point of
> the example.

You've convinced me.


> > X.1.2
> > 
> > At least one example (preferably one involving left associativity) 
would 
> > be useful which included the full transform of query string > concrete 

> > syntax tree > abstract syntax tree
> 
> It would be nice but I'd just make the counter argument that it runsthe 
risk 
> of needing to define the terminology for the syntax tree.  Doing that 
just to 
> write an example might be more confusing than helpful.
> 
> If you want to write one out, we can see about including it.
> 
> 
> The example:
> 
> { ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 } OPTIONAL { ?s :p3 ?v3 } }
> 
> uses left associativity.

Fred mentioned that he'd like to see UNION followed by OPTIONAL as an 
example, which would be nice as well. Perhaps on Tuesday we can rustle up 
a volunteer to write out the text for a full example or two.

> > X.3
> > 
> > I like the concept of "active graph." If I recall correctly, the 
current 
> > spec. is worded such that GRAPH <g> { ... } - basically sets a new 
default 
> > graph for the stuff inside { ... }, and then BGP matching is defined 
to be 
> > against the default graph. If we adopt this active graph terminology, 
we 
> > should use it throughout, and make sure the BGP matching section 
refers to 
> > the active graph.
> 
> On balence, it's probably a useful term but one caveat: active graphcan 
also 
> lead to an implication that there is always one such.  But if compiling 
to a 
> quad store, there isn't in "GRAPH ?g {}".  An effect of this is that 
FILTERs 
> can not presume to be acting over a distinguished graph at all times and 
so 
> mush not look inside the graph.  (I found out the hard way - ARQ's 
> quad engine 
> fails a filter function extension test because the extension FILTER 
function 
> tests for list membership.)

I imagine GRAPH ?g can be explained in terms of an active graph in the 
same exact way that the algebra treats it -- it's the effective union of 
the results received when the active graph is set in turn to each named 
graph member of the RDF dataset and then the graph pattern is evaluated 
and the results Join()'ed with the ?g -> active-graph binding. 

Lee
Received on Friday, 29 December 2006 21:50:43 UTC