Semantics of SPARQL (Perez, Arenas, and Gutierrez) from Lee Feigenbaum on 2006-10-26 (public-rdf-dawg@w3.org from October to December 2006)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Thu, 26 Oct 2006 15:42:40 -0400
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <OF59BB689C.89ECA77D-ON85257213.0063C75E-85257213.006C4653@us.ibm.com>
In the hopes that we might be able to base or at least derive many of our 
definitions from the work done in the latest paper from Perez, Arenas, and 
Gutierrez, I've gone through the paper and scribbled down my reactions 
here. 

In summary: I think the formal definitions within this paper are clear and 
concise (and as best as I can tell from my examination, accurate). The 
paper deals with solutions to BGPs based on subgraph matching, and hence 
those definitions are likely not appropriate for the SPARQL QL 
specification. That said, those definitions are orthogonal to the 
definitions of the algebraic operations that combine sets of solutions, 
and I would be in favor of modeling the SPARQL QL specification's 
definitions of the operators on this work. 

More specific comments and notices follow:

Section1 - Preliminary Notions

The definitions in this section are straightforward and easy to 
understand. The main difference from the rq24 draft is that the paper 
defines a mapping (a.k.a pattern solution in rq24) as a partial function, 
whereas rq24 specifies that the domain of a solution is all variables.


Section 2 - Syntax and Semantics of Basic Graph Patterns

This section defines basic graph pattern matching as subgraph matching. It 
does not take into account blank nodes in triple patterns. It is clear to 
point out that the solution of an empty BGP is a single solution with no 
bindings. These definitions give a clear and concise definition of 
BGP-matching based on subgraph matching in the absence of blank nodes 
within triple patterns.


Section 3 - Syntax and Semantics of General Graph Patterns

This section presents the algebraic operations that create new sets of 
mappings (solutions) from operand sets of mappings. 

In Sec. 3.1, filters are modeled as a binary operator that apply to an 
arbitrary graph pattern and a constraint. Our recent discussions apply a 
FILTER to a group graph pattern. However, a group graph pattern has no 
existence within this paper's way of viewing the world. (In this paper, a 
group graph pattern is simply a set of patterns joined together. We can 
view a group graph pattern as an operational tree, (AND(x, AND(y, z))), 
then we're suggesting that FILTERs be attached to the node that represents 
this tree. I'm not sure how to formally state this constraint without 
mapping the curly braces in SPARQL syntax { ... } to a unary operator. 
Such an operator would have no semantics of its own. That is, if P is a 
graph pattern than GROUP(P) is a graph pattern and [[GROUP(P)]](D, G) = 
[[P]](D, G) (i.e., no semantics of its own). That would allow #4 in 
definition 3.3 to say something like if P is a graph pattern and R a value 
constraint, then ((GROUP P) FILTER R) is a graph pattern.

Alternatively, these definitions could be used (defined in terms of a 
FILTER applied to any graph pattern), and the scoping to a group could be 
done entirely via prose, but after spending a few minutes thinking about 
it, I'm having trouble figuring out a precise way to state that in a 
manner that directly relates back to the algebra.


Def. 3.4 defines how solutions satisfy constraints. If we adopt this 
definition (which is again a clear and consise definition), then we'd need 
to reference the full semantics of Section 11 to define truth for atomic 
value constraints.

I am a very big fan of definitions 3.5, 3.7, and 3.9. They pretty closely 
match how my implementation composes solutions, and provide useful 
terminology (e.g. "compatible mappings") to explain and discuss the 
algebra.

(As an aside which I'll send another note about, I agree with the 4th part 
of definition 3.9, and it's why I think the filter-scope-3 and 
filter-scope-4 tests are good tests, but I'll send another note about 
that.)


Section 4 - Semantics of Query Result Forms

These definitions are fine and seem accurate, but:

  a) I haven't seen nearly as many calls for more formal definitions of 
result forms, so I'd be less inclined to adopt these (though I don't see 
anything wrong with them on the surface)
  b) They don't include ASK (which would be trivial to add) or DESCRIBE 
(which can't be added since the semantics are purposefully vague)


Section 5 - Extensions to the Subgraph Matching Approach

5.1 extends BGP matching for graph patterns that contain blank nodes by 
adding an explicit mapping function from blank nodes to RDF terms. This is 
straightforward, but does not treat blank nodes as pure existentials (if I 
understand correctly, which I may not).

5.2 is a cardinality discussion. I must admit that I have been negligent 
in following the current cardinality discussion initiated by Fred Z. and 
continued in this past week's telecon and elsewhere -- in any case, 
section 5.2 seems very straightforward to me. In particular, the formal 
definition of cardinality based on the operations that combine BGPs are 
straightforward, and I imagine would apply regardless of the entailment 
regime (as long as the entailment regime defines the proper cardinality of 
solutions to BGPs).  (The authors acknowledge this later in Remark 6.3.)

Lee
Received on Thursday, 26 October 2006 19:42:55 UTC