# Semantics of SPARQL (Perez, Arenas, and Gutierrez)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Thu, 26 Oct 2006 15:42:40 -0400
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <OF59BB689C.89ECA77D-ON85257213.0063C75E-85257213.006C4653@us.ibm.com>
In the hopes that we might be able to base or at least derive many of our
definitions from the work done in the latest paper from Perez, Arenas, and
Gutierrez, I've gone through the paper and scribbled down my reactions
here.

In summary: I think the formal definitions within this paper are clear and
concise (and as best as I can tell from my examination, accurate). The
paper deals with solutions to BGPs based on subgraph matching, and hence
those definitions are likely not appropriate for the SPARQL QL
specification. That said, those definitions are orthogonal to the
definitions of the algebraic operations that combine sets of solutions,
and I would be in favor of modeling the SPARQL QL specification's
definitions of the operators on this work.

More specific comments and notices follow:

Section1 - Preliminary Notions

The definitions in this section are straightforward and easy to
understand. The main difference from the rq24 draft is that the paper
defines a mapping (a.k.a pattern solution in rq24) as a partial function,
whereas rq24 specifies that the domain of a solution is all variables.

Section 2 - Syntax and Semantics of Basic Graph Patterns

This section defines basic graph pattern matching as subgraph matching. It
does not take into account blank nodes in triple patterns. It is clear to
point out that the solution of an empty BGP is a single solution with no
bindings. These definitions give a clear and concise definition of
BGP-matching based on subgraph matching in the absence of blank nodes
within triple patterns.

Section 3 - Syntax and Semantics of General Graph Patterns

This section presents the algebraic operations that create new sets of
mappings (solutions) from operand sets of mappings.

In Sec. 3.1, filters are modeled as a binary operator that apply to an
arbitrary graph pattern and a constraint. Our recent discussions apply a
FILTER to a group graph pattern. However, a group graph pattern has no
existence within this paper's way of viewing the world. (In this paper, a
group graph pattern is simply a set of patterns joined together. We can
view a group graph pattern as an operational tree, (AND(x, AND(y, z))),
then we're suggesting that FILTERs be attached to the node that represents
this tree. I'm not sure how to formally state this constraint without
mapping the curly braces in SPARQL syntax { ... } to a unary operator.
Such an operator would have no semantics of its own. That is, if P is a
graph pattern than GROUP(P) is a graph pattern and [[GROUP(P)]](D, G) =
[[P]](D, G) (i.e., no semantics of its own). That would allow #4 in
definition 3.3 to say something like if P is a graph pattern and R a value
constraint, then ((GROUP P) FILTER R) is a graph pattern.

Alternatively, these definitions could be used (defined in terms of a
FILTER applied to any graph pattern), and the scoping to a group could be
done entirely via prose, but after spending a few minutes thinking about
it, I'm having trouble figuring out a precise way to state that in a
manner that directly relates back to the algebra.

Def. 3.4 defines how solutions satisfy constraints. If we adopt this
definition (which is again a clear and consise definition), then we'd need
to reference the full semantics of Section 11 to define truth for atomic
value constraints.

I am a very big fan of definitions 3.5, 3.7, and 3.9. They pretty closely
match how my implementation composes solutions, and provide useful
terminology (e.g. "compatible mappings") to explain and discuss the
algebra.

(As an aside which I'll send another note about, I agree with the 4th part
of definition 3.9, and it's why I think the filter-scope-3 and
filter-scope-4 tests are good tests, but I'll send another note about
that.)

Section 4 - Semantics of Query Result Forms

These definitions are fine and seem accurate, but:

a) I haven't seen nearly as many calls for more formal definitions of
result forms, so I'd be less inclined to adopt these (though I don't see
anything wrong with them on the surface)
b) They don't include ASK (which would be trivial to add) or DESCRIBE
(which can't be added since the semantics are purposefully vague)

Section 5 - Extensions to the Subgraph Matching Approach

5.1 extends BGP matching for graph patterns that contain blank nodes by
adding an explicit mapping function from blank nodes to RDF terms. This is
straightforward, but does not treat blank nodes as pure existentials (if I
understand correctly, which I may not).

5.2 is a cardinality discussion. I must admit that I have been negligent
in following the current cardinality discussion initiated by Fred Z. and
continued in this past week's telecon and elsewhere -- in any case,
section 5.2 seems very straightforward to me. In particular, the formal
definition of cardinality based on the operations that combine BGPs are
straightforward, and I imagine would apply regardless of the entailment
regime (as long as the entailment regime defines the proper cardinality of
solutions to BGPs).  (The authors acknowledge this later in Remark 6.3.)

Lee
Received on Thursday, 26 October 2006 19:42:55 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:00:52 UTC