Re: rq23 def'n "Pattern Solution" wrong? (and more on BGP') from Pat Hayes on 2006-03-05 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 5 Mar 2006 11:26:31 -0800
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623090ec030dfd9b11b@[192.168.1.4]>
>On 1 Mar 2006, at 22:08, Pat Hayes wrote:
>
>>>1) In the first part of my original message I argued that, by carefully
>>>looking at the normative semantics RDF-MT, bnodes are always
>>>interpreted autonomously within the RDF graph where they appear, no
>>>matter where bnodes come from or which (abstract) syntax identity they
>>>have. So, while this is an argument againts having BGP', this also
>>>shows that it is absolutely harmless to have BGP'.
>>
>>The RDF MT defines a graph simply as a set of triples, so itself 
>>does not provide any way to determine the scope of a blank node. A 
>>single triple, and any bnode which it contains, might occur in many 
>>different RDF graphs. And the RDF MT refers to graphs, not to 
>>entities such as query patterns and answer binding sets, neither of 
>>which are RDF graphs. The SPARQL spec does not mention where the 
>>RDF graph boundaries are intended to be maintained, so it is up to 
>>us to make sure that the definitions draw them correctly.
>
>G' is a graph, S(BGP') is a graph, (G' union S(BGP')) is a graph, 
>the CONSTRUCTed graph from an answer set is a graph. The abstract 
>syntax representations of these graphs shouldn't care about the 
>identity of the bnodes in them (since their interpretation is 
>autonomously defined)

No, this is a mistake. The abstract syntax does care about identity 
of bnodes between graphs, because graphs behave differently with 
respect to combination depending on whether or not they share bnodes. 
This is exactly how the abstract syntax handles what is more usually 
described using some notion of identifier scope. That is, the 
abstract syntactic properties of RDF graphs are affected by the 
co-occurrence of blank nodes. This point is independent of the 
semantics of an RDF graph: it has to do with the nature of the ways 
that RDF graphs are combined. If A has three blank nodes and B has 
two blank nodes, then their union may have any number between three 
and five blank nodes, depending on the ways that A and B share or do 
not share blank nodes.

As you point out, the particular identity of bnodes in RDF graphs is 
unimportant to their meaning, so imposing disjointness is not any 
limitation on expressiveness of answer sets. For the purposes of 
interpretation, we can think of an RDF graph as a mathematical ideal 
under the group of 1:1-onto mappings from the set of all bnodes to 
itself. (We did at one point consider making such a definition.) 
However, this view of graphs does not support the operations of the 
graph syntax: it does not allow for the possibility of forming the 
union of two graphs. For unions to be meaningful, we have to respect 
the particular identity of bnodes: but this is the only reason.

>, nor should limit them in any way.
>
>>>2) In the second part of my original message I argued that if we don't
>>>have BGP' then the abstract syntax of answer sets is limited in a very
>>>peculiar way, disallowing answer sets that contains bnodes that may
>>>appear in the query.
>>
>>But this is not a limitation, since given the document conventions 
>>already in use,there is no way they could possibly share a bnode.
>
>What you are saying has nothing to do with the abstract syntax; this 
>has to do with the concrete document syntax.

I am aware of the implications of what I say. My point was that the 
abstract syntactic conditions should accurately mirror the scoping 
conditions in place for the surface syntax that we use, in the spec, 
to describe the abstract syntax. These conditions require that the 
sets of bnodes in the answer bindings, and those in the query 
pattern, are disjoint.

>The definitions in 2.5 shape the way an answer set could look like 
>in its abstract syntax, independently on its linearisation.

Quite. But there is little point in phrasing them so as to allow 
shapes which cannot be specified by any surface syntax; and in fact, 
to do so is to an error, IMO, since it implies a generality - in this 
case, the possibility of identifying a blank node in an answer with a 
blank node from the query - which we cannot in fact provide to users. 
If a user were to interpret the answer bindings in a way which 
conforms to this possibility, that would be a misinterpretation of 
the SPARQL spec. .

>>>This restriction is useless, since we know (point
>>>1 above) that bnodes are always interpreted autonomously within the
>>>graph where they appear, so having the same bnodes as in the query is
>>>fine anyway. This restriction is bad, since not every equivalent answer
>>>set would be legal in sparql.
>>
>>Can you elaborate on that last point? Perhaps with an example? It 
>>seems to be a new point in this discussion.
>
>Uh? This has been my point since ever. Here we go again:
>
>For example, suppose to have two engines that, given the same data 
>and the same query, differ only on how they represent (in abstract 
>syntax)

The representation used by an engine will of necessity be some 
concrete syntax, not the abstract syntax itself.

>the final CONSTRUCTed graph. The first one chooses not to use in the 
>final CONSTRUCTed graph any bnode appearing in the query.

That does not make sense. Engines cannot use blank nodes from the 
abstract syntax, which are abstract mathematical entities: they must 
use some concrete data structure.

>The second one uses in the final CONSTRUCTed graph some bnode 
>appearing in the query.

What does that mean? Engines cannot access particular bnodes: bnodes 
are not data structures. Do you mean it to be the case that if an 
engine were to perform an operation on its representation 
corresponding to forming the union of the query and the CONSTRUCTed 
graph from this query, that they would have a common bnode? This 
would not be supported by the SPARQL spec, but it would follow from 
your definitions. I suggest that it should not follow from them, and 
that to allow this possibility is an error, as it could imply 
consequences which are not intended by the definitions, and 
entailments which are not supported by any intended reading of the 
SPARQL surface syntactic rules.

>The two CONSTRUCTed graphs are graph-equivalent, but if we choose 
>not to have BGP' then the second engine would not satisfy the 
>conditions in 2.5, and therefore could not be called SPARQL 
>compliant.

As it should not be. We should require that SPARQL engines keep 
bnodes in answer bindings distinct from bnodes in queries, as our own 
document scoping rules imply this separation.

>Compare this with "2.7 Blank Nodes in Query Results" 
><http://www.w3.org/2001/sw/DataAccess/rq23/#BlankNodesInResults>. 
>There it is said that the bnode identity in the result is obviously 
>irrelevant:
>"These two results have the same information: the blank nodes used 
>to match the query are different in the two solutions. There is no 
>relation between using _:a in the results and any blank node label 
>in the data graph."

This is talking about bnode LABELS in documents. The 'no relation 
between' is an informal way of saying that the the query and results 
documents have disjoint label scopes, so the label "_:a" in one does 
not indicate the same bnode as the same label in the other. The 
corresponding mathematical way of saying the same thing is that the 
sets of bnodes in the underlying abstract structures are disjoint. If 
the sets were not disjoint, then it would be incorrect to say that 
the bnode label scopes of those documents were separate.

>However, without BGP' you limit the choice of bnodes in the result 
>*not* to include some bnodes, namely the ones appearing in the query.
>This is bad.

Not only is is not bad, it is required, in order to make the abstract 
syntax definitions correctly correspond to the document scope 
definitions.

There is of course no way in any surface syntax to identify a 
particular blank node, so discussions of absolute identity of blank 
nodes are syntactically meaningless: it is meaningful only to discuss 
sharing or not of bnodes between structures. Imposing disjointness 
between bnode sets is however a real syntactic constraint, one that 
is reflected in detectable properties of a surface syntax: in our 
case, it means that SPARQL does not support an answer semantics which 
would require bnodes to be shared between query patterns and answer 
bindings. And indeed, it does not. So, we should phrase the 
definitions to reflect this fact.

Pat

-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 5 March 2006 19:26:52 UTC