Re: SPARQL Semantics document from Enrico Franconi on 2005-11-01 (public-rdf-dawg@w3.org from October to December 2005)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Tue, 1 Nov 2005 16:58:39 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <d8ff8f0d4046b2323814570fd3267b08@inf.unibz.it>
Hi Pat.

Most of your comments derive from the fact that the semantics document  
<http://www.inf.unibz.it/krdb/w3c/sparql/> is only a terse sequence of  
formal definitions, without explanations: I'll try to explain the sense  
of our definitions, and why we still believe that they are correct. I  
will also explicitly comment at the end of this message on your points  
8. and 9 in  
<http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/ 
0139.html>.

First of all, let me summarise three different levels of requests (in  
relation to your second message  
<http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/ 
0141.html>):

l1) the bnode names in the answer set are arbitrary, as they logically  
should (current version of the spec);

l2) the bnode names in the answer set are the same as the told bnode  
names in the dataset (so that their meaning/origin is clearer to the  
user (after all, in software engineering they keep telling us that the  
names of the variables are important), and so that (l3) becomes  
possible); of course, (l2) implies (l1);

l3) the bnodes names in a query have to be treated as told-bnodes,  
i.e., they have to match only with bnodes with the same name in the  
dataset.

Our current semantics document covers currently (l2), but at the end  
I'll show how we planned to have (l3) as well (even if this did not  
find its way in the semantics document yet), in a different way to what  
you are proposing in  
<http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/ 
0141.html>.

Given the (l2) requirement, our definition of answer set is correct,  
including all the details you were arguing about; so:

G |= (G U+ Q)[s]

where there is RDF-merge U+ rather than union U;
where U+ considers the order in which the bnodes appear (from left to  
right);
where the substitution [s] is applied after the RDF-merge;
where the substitution [s] is using only bnodes from G.
Changing any of the above would lead to a wrong definition, we argue.

The RDF-merge is necessary to be sure that the query has bnode names  
that not clash with the bnode names of G; the order of the bnodes in  
RDF-merge is necessary since only bnode names in the query but not in  
the graph have to be changed in the case of clash; if the substitution  
[s] is applied directly to Q, rather than to the result of the  
RDF-merge, then bnodes in the assignment may erroneously clash with the  
names in G, even if you want them to corefer; if the substitution [s]  
is allowed to pick arbitrary bnodes, we wouldn't be able to distiguish  
the relevant told-bnodes in the answer from arbitrary renamings.

Regarding (l3), in the case a user wants to fix some bnode name in the  
query to refer exactly to the same bnode name in the dataset, our (yet  
informal) idea is to extend our setting to consider those told bnode  
names in the query as variables, and to constrain the substitution to  
fix the assignment of those variables to the told bnode names.

> 8. Is Lemma 1.3 really correct? The query might fail to notice the  
> structure which keeps the graph lean.

Oops, you are right. In fact, it holds only for lean graphs *without*  
bnodes (as explained in parenthesis). Note also that lemma 1.5 makes a  
sloppy use of the notion of "redundant" triple of a graph; we have to  
find a precise definition for that.

> 9. I do not understand the terminology used in lemma 1.4. What is a  
> subgraph isomorphism? What is a projection over bnodes? What is the  
> query variable of an isomorphism?

Standard notations in algebra: a subgraph isomorphism between two  
graphs G and H is a bijective map f from the vertices and edges of G to  
the vertices and edges of a subgraph of H that preserves the edge  
structure, in the sense that there is an edge e from vertex u to vertex  
v in G iff there is an edge f(e) from f(u) to f(v) in a subgraph of H;  
a projection of f over a subset of vertices and edges in G restricts  
the domain of f to those subset of vertices and edges.
In our case, vertices and edges are labelled by bnodes, URIs or  
literals; some bnodes of G may be originated by variables; we restrict  
f such that f(u)=u if u is labelled by a URI or literal (i.e., it is a  
matching), and f(u)=v for an arbitrary v in H if u is labelled by a  
bnode.

cheers
--e.
Received on Tuesday, 1 November 2005 15:58:51 UTC