Re: Review of "The SPARQL algebra"

Fred Zemke wrote:
> review of "The SPARQL algebra"
> 
> Overall, I am pleased to see this.

Thanks for the comments

> 
> substance
> ---------
> 
> X.1 Mapping a SPARQL query to a SPARQL abstract query
> I believe this treatment omits what I would call step 0,
> removing abbreviations.  In this step, the various abbreviated
> syntaxes permitted by SPARQL would be removed.  
>  From RQ24 section 3 "SPARQL term syntax" this includes:
> 
> 1. IRI abbreviations using prefixes and IRIs that are
> relative to the base IRI (section 3.1.1 "Syntax for IRIs")
> 
> 2. abbreviated literals (section 3.1.2 "Syntax for literals")
> 
> 3. perhaps normalize variables to begin with a question mark
> rather than a dollar sign (section 3.1.3 "Syntax for query
> variables) though the case can be made that the punctuation
> is not part of the variable; the variable is the string
> corresponding to rule [90] VARNAME.
> 
> 4. blank nodes of the form [] and [ :p :v ]
> (Section 3.1.4 "Syntax for blank nodes")
> 
> 5. comma and semicolon delimited lists
> (Section 3.2.1 "Predicate-object lists" and 3.2.2 "Object lists")
> 
> 6. RDF collections (3.2.3 "RDF collections")
> 
> 7. Keyword "a" (section 3.2.4 "rdf:type")

A little context: the document is not a standalone algebra description but 
workspace for developing text that will go into rq24.  I intend to retain the 
material in rq24/section3 ("SPARQL term syntax") so these expansions will be 
covered.

> 
> As a result of removing these abbreviations, the query will
> conform to a simpler grammar which does not contain rule [22]
> BasicGraphPattern nor rule [32]
> TriplesSameSubject and many others that are referenced by
> expanding it.  Instead we can substitute the rules
> 
> [22.1] ListOfTriplePatterns ::= TriplePattern ('.' TriplePattern )?
> 
> [32.1] TriplePattern ::= Subject PureVerb Object
> [32.2] Subject ::= VarOrPureIRI
> [32.3] PureVerb ::= Q_IRI_REF
> [32.4] Object ::= VarOrPureIRI
> [32.5] VarOrPureIRI ::= Var | Q_IRI_REF
> 
> This transformation is necessary before we can process
> Step 1, which talks about "list of triple patterns" but does
> not indicate how to get that list of triple patterns.
> 
> We might also elect to treat "*" as an abbreviation and remove
> it at the outset.  This is probably just a question of taste,
> whether to handle it here or later when the Project operator
> is introduced.
> 
> 
> Step 1: With the foregoing in place, we can replace
> "list of triple patterns" by "ListOfTriplePatterns".

I'm not convinced that working from another grammar is the best way.  Maybe we 
could use the current definitions (without the matching part of each) to 
define the SPARQL Abstract Query.  The downside I see of a new grammar is that 
it's sole purpose is to explain so it introduces machinary that is not used much.

As the integrated document appears, we can see where we are with this.

> Below step 4, below the three bullets, the next sentence
> has the words "graph pattern".  I believe you mean "GraphPattern"
> as introduced prior to Step 1.

Either would be right "GraphPattern" is defined above and "graph pattern" is 
an English form.
> 
> 
> Step 4, indented para in smaller font says "This is the
> left-associative operator assignment... can take a filter from
> it's left subelement".  I believe you mean right subelement.
> Also, spelling correction "it's" should be "its".
> Possibly you want to capitalize "filter" (I'm not sure if you
> mean "Filter" here as the proper noun introduced by the third
> bullet above.)

It's a bit confusing but the small, italized font are notes (class="note" in 
the HTML).  They'll be subsumed or removed.  I've made the corrections anyway.

If they have actions to be completed they are marked "@@".

> 
> Line immediately prior to the pseudocode for OPTIONAL:
> "Write 'A' for an algebra expression: true for filter(true)".
> Moving from left to right in this sentence, I believe:
> -- you want a semicolon instead of a colon
> -- you want "Filter" instead of "filter"

Done

> -- if so, Filter requires two arguments.  The second argument
> can be the empty pattern {}
> -- possibly you want to capitalize the first "true" to
> distinguish it graphically from the second, which I take to be
> a Boolean literal.

True gets introduced in join() but that itself is wrong (I did have a version 
with an expression as thrid arg to join for symmetry but didn't go that way in 
the end).

Algorithm fixed.  Text here removed.

> 
> Pseudocode algorithm for interpreting OPTIONAL:
> I think you should adopt and state a convention that distinguishes
> assignment from equality testing.  Common conventions for
> assignment are "Set x = y" or "x := y".

Good idea - I've used ":=" and s/Set/Let/ in the initializers

> Common conventions for equality testing are "=" or "==".
> The pseudocode currently uses "=" for both meanings.
> 
> X.1.2 Examples
> Examples do not show removal of prefixes, which I think is
> acceptable for space reasons, though I think the algorithm
> should expand abbreviations.

It is for space reasons.

> 
> Third example, second line, shows a couple
> Join({}, BGP(...)).  When I work this example I do not get
> such Joins.  I don't see that either Step 1 or Step 2 creates
> any {} patterns.  I have not tried the other examples but I
> wonder if all the {} that are shown are actually generated by
> the algorithm.

This is the part that introduces the empty pattern:

"Set G = the empty pattern"

and the redundant join({},) removed at step 5.  I left some redundant 
operations in the hope it makes the key step 4 effect.

Changed to
"""
Let G := the empty pattern, {}
"""

> 
> I would like to see a worked example of my old bugbear
> {...} UNION {...} OPTIONAL {...}

Example added.

Just FYI: I implemented the algorithm so I could get mechanical translation of 
queries.  If you want to try, use ARQ in CVS (I can do a build for you if you 
want) then:

arq.qparse --engine=ref --plan --query Q.rq

--engine=ref ==> use the algebra directly.
--plan ==> print out what would be executed.

The grammar in the algebra document has OPTIONAL as a unary syntax rule so it 
makes {...} UNION {...} OPTIONAL {...} not dependent of left/right parsing. 
It makes the grammar simpler (optional is not a special syntax case).

> 
> Last example shows FILTER in the right operand of OPTIONAL.
> I'd like to see an example with FILTER in the left operand.
> Perhaps a single example with FILTER in both would suffice.

Example added.

If there are other good examples to give, please do write them out and send 
them to the list.  An HTML fragment using the <div> structure in the existing 
examples would be ideal for me.

> 
> 
> X.2.1 Algebra operators
> first @@ note says "Cardinality function is not defined for
> mu not in Omega".  I am not sure that there would ever be an
> attempt to evaluate Card[Omega](mu) if mu is not in Omega,
> in which case it is okay for this to be undefined.
> However, if desired, you could define Card[Omega](mu) = 0
> if mu is not in Omega.

Agreed - mu not in Omega should not happen (it's meaningless)

> 
> Definition of Diff
> the second operand of the set union contains "mu + mu'"
> but addition is not defined on mappings. You mean
> "mu and mu' are compatible"

Corrected.

> 
> 
> Definition of Union
> Second sentence of this definition includes "Omega1 + Omega2".
> It is not clear if this is intended to define addition on
> multisets of mappings, or if it is assumed that addition is
> already defined and can be taken as an alternative definition
> of Union.  Either way, I think we can do without this.

Multiset "+" does not give the right cardinality - removed as suggested.

> 
> Definition of ToList
> last line, after semicolon, defining card, it should be
> card[ToList(Omega)](mu) = card[Omega](mu).

Corrected

> 
> Definintion of Project
> the definintion of card is incorrect.  I believe it should be
> card[Project(Psi,vars)](mu) =
>    sum { card[Psi](mu') | mu' in Psi and mu = mu' restricted to vars }

Yes - the definition in the document is not right.

But because Psi is a sequence, not a multiset, so we can define the projection 
element by element which comes out simpler if we are happy that this is order 
preserving.

It must be order preserving because it is applied after ORDER BY.

I've also changed Distinct to preserve the order as well (I haven't found a 
good expression of that yet so it's just English for now).

Slice is already order preserving.

> 
> 
> X.3 Evaluation semantics
> 
> Definition of evaluation of a graph pattern.
> For eval(D(G), Graph(var,P)) I don't think it is correct to
> use set-union; instead we need a multiset-union, the distinction
> being that multiset-union preserves cardinalities.  The
> notion of multiset-union might be used profitably in section
> X.2.1 "Algebra operators" in the definition of LeftJoin
> as well.  

Corrected.

I didn't find it simplifies LeftJoin in X.2.1.

> 
> 
> 
> typos not noted sprinkled among the substance
> ---------------------------------------------
> 

All corrected

> X.1.1 Mapping graph patterns
> first sentence after second set of bullets:
> "The results is" -> "The result is" or "The results are".
> I prefer the former.
> 
> sentence continues "is a SPARQL abstract query uses..."
> Insert "that" before "uses"
> 
> After step 4, third para, "a group pattern is mapping..."
> change "mapping" to "mapped"
> 
> pseudocode for handling OPTIONAL: Filter is not always
> capitalized.  I spot the third line "If SP[i] is..."
> 
> 
> X.2.1 Algebra operators
> definition of substitution function needs a colon between
> mu and V.
> 
> Third sentence after that definition,
> "Write mu(?x->t) for the substitution function mapping x to
> RDF term : { (x, t) }"  needs "t" after "RDF term".
> 
> first @@ note
> 
> Definition of Union: the sentence before this says
> "As this are distinct...".  What you mean is
> "As these are disjoint".
> 
> Definition of OrderBy: second sentence "...and the sequence is
> satisfies" - delete "is"
> 
> X.3 Evaluation semantics
> 
> Definition of evaluation of a union pattern
> second occurrence of "union" should be capitalized
> 
> 
> Fred
> 

Received on Wednesday, 3 January 2007 15:16:34 UTC