Re: Semantics of SPARQL (Perez, Arenas, and Gutierrez) from Seaborne, Andy on 2006-10-27 (public-rdf-dawg@w3.org from October to December 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 27 Oct 2006 16:15:19 +0100
To: Lee Feigenbaum <feigenbl@us.ibm.com>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <45422287.9030308@hp.com>
Lee Feigenbaum wrote:
> 
> In the hopes that we might be able to base or at least derive many of 
> our definitions from the work done in the latest paper from Perez, 
> Arenas, and Gutierrez, I've gone through the paper and scribbled down my 
> reactions here.

Thanks for doing that.

I'm leaning towards the multiset + blank node variation of their formalization.

> 
> In summary: I think the formal definitions within this paper are clear 
> and concise (and as best as I can tell from my examination, accurate). 
> The paper deals with solutions to BGPs based on subgraph matching, and 
> hence those definitions are likely not appropriate for the SPARQL QL 
> specification. That said, those definitions are orthogonal to the 
> definitions of the algebraic operations that combine sets of solutions, 
> and I would be in favor of modeling the SPARQL QL specification's 
> definitions of the operators on this work.

BGP matching is a nice formalization of the implementation : their 
formalization uses two mappings, one for blank nodes, then one for named 
variables.

...
> Section 3 - Syntax and Semantics of General Graph Patterns
> 
> This section presents the algebraic operations that create new sets of 
> mappings (solutions) from operand sets of mappings.
> 
> In Sec. 3.1, filters are modeled as a binary operator that apply to an 
> arbitrary graph pattern and a constraint. Our recent discussions apply a 
> FILTER to a group graph pattern. However, a group graph pattern has no 
> existence within this paper's way of viewing the world. (In this paper, 
> a group graph pattern is simply a set of patterns joined together. We 
> can view a group graph pattern as an operational tree, (AND(x, AND(y, 
> z))), then we're suggesting that FILTERs be attached to the node that 
> represents this tree. I'm not sure how to formally state this constraint 
> without mapping the curly braces in SPARQL syntax { ... } to a unary 
> operator. Such an operator would have no semantics of its own. That is, 
> if P is a graph pattern than GROUP(P) is a graph pattern and 
> [[GROUP(P)]](D, G) = [[P]](D, G) (i.e., no semantics of its own). That 
> would allow #4 in definition 3.3 to say something like if P is a graph 
> pattern and R a value constraint, then ((GROUP P) FILTER R) is a graph 
> pattern.

We could define group as a pair : an operator tree (join and left-join 
expression in the order given) and a set of filters.  Define GROUP(P, F) and 
define [[GROUP(P,F)]] as P FILTER G.

As this is quite central to SPARQL, it might be better to write all the 
expressions in functional form: JOIN(P1, P2) etc, not infix, because the left 
associativity is explicit.

> 
> Alternatively, these definitions could be used (defined in terms of a 
> FILTER applied to any graph pattern), and the scoping to a group could 
> be done entirely via prose, but after spending a few minutes thinking 
> about it, I'm having trouble figuring out a precise way to state that in 
> a manner that directly relates back to the algebra.
> 
> 

...
> Section 5 - Extensions to the Subgraph Matching Approach
> 
> 5.1 extends BGP matching for graph patterns that contain blank nodes by 
> adding an explicit mapping function from blank nodes to RDF terms. This 
> is straightforward, but does not treat blank nodes as pure existentials 
> (if I understand correctly, which I may not).

If the mu mapping is V and theta is B (variable and blank node), it depends on 
how you define card(V(B)).

The "implementation hint" says there is one mapping X covering both variables 
and blank nodes, then the solution is that mapping "restricted to just the 
variables".

If V is the projection of X on a restricted domain, project(X, dom(P)) then it 
seems natural that card(V) = card(X) : it's more like B parametrizes V and
   card(X) = card(V)*card(B)
It's as if the query reveals some of the reason behind the choices of B.
The projection can be moved out to the projection in SELECT.

If it is V is a function (set of pairs) of the restriction, then it's
card(V) = 1 (blank nodes do not contribute to counting).  It's 
distinct(project(X), dom(P)).

Which style do you prefer?

> 
> 5.2 is a cardinality discussion. I must admit that I have been negligent 
> in following the current cardinality discussion initiated by Fred Z. and 
> continued in this past week's telecon and elsewhere -- in any case, 
> section 5.2 seems very straightforward to me. In particular, the formal 
> definition of cardinality based on the operations that combine BGPs are 
> straightforward, and I imagine would apply regardless of the entailment 
> regime (as long as the entailment regime defines the proper cardinality 
> of solutions to BGPs).  (The authors acknowledge this later in Remark 6.3.)

GRAPH does not appear to introduce duplicates.  The proof of 5.9 does not 
cover this but I think that (informally) the substitution ?X->v labels each 
[[P]] in the union in 3.9/3 keeping them distinct.

	Andy

> 
> Lee
Received on Friday, 27 October 2006 15:15:39 UTC