Operational definition of SPARQL algebra

This message is an outline for an operational definition of the SPARQL algebra.

There is advocacy for this change in the working group; this message is to 
check that with everyone; to get early feedback, and to check we are agreed on 
what it means.  The style of approach isn't new - Fred's work embodies this 
style as does the Chilean paper.

This is not about syntax at this stage.

== Preliminaries

BGP matching: this is taken as an input and we assume that any BGP matches the
target graph and generates solutions with every mentioned named query variable
having a binding.  We continue to allows for different entailment regimes and
specifically use with OWL-DL.  There is no assumptions about blank nodes and
mappings (which only affects duplicates).

The output of a evaluating a BGP is a collection of solutions.

== Outline

The SPARQL algebra consists of the following operators for manipulating 
collections of solutions (type R):

    Join(R,R) -> R
    LeftJoin(R,R) -> R
    Union(R,R) -> R
    filter(R) -> R

I have no particular attachment to the choice of operator names.  In 
particular, calling it "optional", not "left join" might be better,

Join and left join need to be defined for the SPARQL case because they are not 
exactly as defined for SQL or in a relational algebra.

Evaluation of an expression is functional (AKA bottom-up; constructional 
(Fred's work [1] and as worked out by Jorge et al [2]).

    UNION is union
    Filters go at the end of groups conceptually
      (i.e. after compositional operations).
    OPTIONAL is left-join.

    A group is a sequence of the join/left-join the filters of its elements.
      Elements are already evaluated.
      Evaluation in a group is ordered left to right
       (left-nested; left-associative).
      Using {} will change the order of evaluation if required.


== Some interesting queries:

Some interesting queries to consider: not problems or show stoppers, just 
interesting.

{}

By pure construction, this is could be considered to be empty (no 
construction).  So inserting it into a pattern would be an operation on an 
empty solution collection resulting in an empty solution.

Currently, we define it to have one solution with no bindings - we should 
continue to do so.


{ OPTIONAL {...} }
evaluates as { {} OPTIONAL {...} }  The principle is that every group has at
least the unit table in it or that every group has the unit table on the left 
hand side.


{ FILTER(expr) } evaluates as { {} FILTER(expr) } which
means that { FILTER(expr) } does not propagate the filter effect.  This is a 
change from the declarative approach (top-down) where any pattern applies the 
whole solution being considered, not the part being constructed.


{ pattern[?x] OPTIONAL { pattern[!?x] OPTIONAL { pattern[?x] } }

[?x] means mentions ?x, [!?x] means does not mention ?x.
Like the filter case above, this is a change from the declarative approach.
Discussed in Jorge's paper and (for example) [3].


{ { P1 FILTER(expr) } P2 }
evaluates the filter before P2.  Might matter if filter has a
!bound() in it or relies on a bound variable, and P2 mentions that variable 
and p1 does not.


== Notes

The domain of solutions naturally emerges (it happens as a result of the
definitions) because the constructional nature.

No need for Chilean proposal 4 for additional restrictions - which I feel is
too strong anyway (example below):

SELECT *
{
     ?x rdf:type skos:Concept .
     OPTIONAL { ?x skos:prefLabel ?label }
     OPTIONAL { ?x skos:altLabel  ?label }
}

which gets labels in preferred order.

 Andy


[1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0033
[2] http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jun/0008
[3] http://tech.groups.yahoo.com/group/jena-dev/message/25717?var=0&l=1

Received on Friday, 20 October 2006 13:18:48 UTC