Re: Bnodes and the RDF graph syntax (part 1). from Bijan Parsia on 2006-08-22 (public-rdf-dawg@w3.org from July to September 2006)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Tue, 22 Aug 2006 13:34:44 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <57AAB6BC-1839-4025-818A-767B562246EC@cs.man.ac.uk>
Let me, for the sake of argument and my other deadlines, concede some  
version of this. Indeed, let me strengthen it with a quote from the  
RDF semantics:

""". Notice also that the blank nodes themselves are perfectly well- 
defined entities; they differ from other nodes only in not being  
assigned a denotation by an interpretation, reflecting the intuition  
that they have no 'global' meaning (i.e. outside the graph in which  
they occur).

...

This effectively treats all blank nodes as having the same meaning as  
existentially quantified variables in the RDF graph in which they  
occur, and which have the scope of the entire graph. In terms of the  
N-Triples syntax, this amounts to the convention that would place the  
quantifiers just outside, or at the outer edge of, the N-Triples  
document corresponding to the graph. This in turn means that there is  
a subtle but important distinction in meaning between the operation  
of forming the union of two graphs and that of forming the merge. The  
simple union of two graphs corresponds to the conjunction ( 'and' )  
of all the triples in the graphs, maintaining the identity of any  
blank nodes which occur in both graphs. This is appropriate when the  
information in the graphs comes from a single source, or where one is  
derived from the other by means of some valid inference process, as  
for example when applying an inference rule to add a triple to a  
graph. Merging two graphs treats the blank nodes in each graph as  
being existentially quantified in that graph, so that no blank node  
from one graph is allowed to stray into the scope of the other  
graph's surrounding quantifier. This is appropriate when the graphs  
come from different sources and there is no justification for  
assuming that a blank node in one refers to the same entity as any  
blank node in the other."""

(None of this requires the, well, unusual graph syntax stuff. If  
BNodes could be lexical entities quite easily with conditions on when  
you had to rewrite them in order to preserve meaning.)

Clearly, if you dork with the scope of the quantifier, the "lexical"  
distinctness or similarity (or BNodes identity) will carry through  
all the formulae in the scope of the quantifier. So, yes, if you put  
the quantifier outside ...er...the source graph, query, and result  
set, you can have behavior in various variants as you've described.  
The scoping graph and set are all to manage this, though they are  
atrociously described.

However, let me point out three things: 1) we use the term "graph"  
with abandon in the SPARQL query spec and 2) results have a life  
outside the "query process", and 3) answer sets are subject to  
interpretation and thus misinterpretation by users.

1) In point of fact, the background graph, named graphs, basic graph  
pattern, and the construct template (at the least) all are either  
straightforwardly RDF graphs, or have similar scoping conditions.  
Thus, the *decision* to, e.g., ignore the graph edges, at least some  
of the time, needs to be carefully examined and presented (e.g., to  
add the results of a construct back to the graph in order to, say,  
reach a fixed point of some sort seems not a good default behavior).  
Given that the queried Graphs and the query document are going to,  
typically, be very distinct things (e.g., they are more like the case  
of things coming from different sources), I think we should be  
consider that a straightforward reading is going to scope things  
rather narrowly. Now the common implementation scopes bnodes rather  
widely, but I'm sceptical that that is due to this rather subtle  
reading of BNodes.

I'm still waiting for a case where the (narrowly scoped) existential  
reading is worth all this toil and misunderstanding. I really don't  
see it.

2) The particular situation we were discussing was the interpretation  
of DISTINCT. Let me rephrase the issue: We definitely can think of  
the results of a CONSTRUCT as being RDF redundant, i.e., non lean.  
Why can we not think of answers sets in exactly the same way?

Yes, if we want to have sessions, we have to push the quantifiers out  
even further, but that's one reason I think it's not such a great  
idea. As a UMD rep, I pushed it, but I'll confess that I didn't  
expect *this* many ramifications and difficulties. As a UMan rep, I  
do not support it anymore. Personally, I think for portally cases  
(which drove the UMD position) that I'd have my portal coin URIs. It  
solves a world of problems.

3) My usability point is that people want to see BNodes (and URIs!)  
as being subject to a form of the UNA. Fred's examples confirm this.  
The specs should either embrace this (breaking with RDF) or firmly  
reject it and not make it so easy to get confused about it. Having a  
form of distinct that leans the answer set goes a long way to doing  
that.

Finally, as a specification point, it would be very nice if BGP  
matching and the subsequent algebra were fairly distinct.

Cheers,
Bijan.
Received on Tuesday, 22 August 2006 12:35:39 UTC