# Re: Bnodes and the RDF graph syntax (part 1).

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Tue, 22 Aug 2006 13:34:44 +0100
Message-Id: <57AAB6BC-1839-4025-818A-767B562246EC@cs.man.ac.uk>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
To: Pat Hayes <phayes@ihmc.us>
```
Let me, for the sake of argument and my other deadlines, concede some
version of this. Indeed, let me strengthen it with a quote from the
RDF semantics:

""". Notice also that the blank nodes themselves are perfectly well-
defined entities; they differ from other nodes only in not being
assigned a denotation by an interpretation, reflecting the intuition
that they have no 'global' meaning (i.e. outside the graph in which
they occur).

...

This effectively treats all blank nodes as having the same meaning as
existentially quantified variables in the RDF graph in which they
occur, and which have the scope of the entire graph. In terms of the
N-Triples syntax, this amounts to the convention that would place the
quantifiers just outside, or at the outer edge of, the N-Triples
document corresponding to the graph. This in turn means that there is
a subtle but important distinction in meaning between the operation
of forming the union of two graphs and that of forming the merge. The
simple union of two graphs corresponds to the conjunction ( 'and' )
of all the triples in the graphs, maintaining the identity of any
blank nodes which occur in both graphs. This is appropriate when the
information in the graphs comes from a single source, or where one is
derived from the other by means of some valid inference process, as
for example when applying an inference rule to add a triple to a
graph. Merging two graphs treats the blank nodes in each graph as
being existentially quantified in that graph, so that no blank node
from one graph is allowed to stray into the scope of the other
graph's surrounding quantifier. This is appropriate when the graphs
come from different sources and there is no justification for
assuming that a blank node in one refers to the same entity as any
blank node in the other."""

(None of this requires the, well, unusual graph syntax stuff. If
BNodes could be lexical entities quite easily with conditions on when
you had to rewrite them in order to preserve meaning.)

Clearly, if you dork with the scope of the quantifier, the "lexical"
distinctness or similarity (or BNodes identity) will carry through
all the formulae in the scope of the quantifier. So, yes, if you put
the quantifier outside ...er...the source graph, query, and result
set, you can have behavior in various variants as you've described.
The scoping graph and set are all to manage this, though they are
atrociously described.

However, let me point out three things: 1) we use the term "graph"
with abandon in the SPARQL query spec and 2) results have a life
outside the "query process", and 3) answer sets are subject to
interpretation and thus misinterpretation by users.

1) In point of fact, the background graph, named graphs, basic graph
pattern, and the construct template (at the least) all are either
straightforwardly RDF graphs, or have similar scoping conditions.
Thus, the *decision* to, e.g., ignore the graph edges, at least some
of the time, needs to be carefully examined and presented (e.g., to
add the results of a construct back to the graph in order to, say,
reach a fixed point of some sort seems not a good default behavior).
Given that the queried Graphs and the query document are going to,
typically, be very distinct things (e.g., they are more like the case
of things coming from different sources), I think we should be
consider that a straightforward reading is going to scope things
rather narrowly. Now the common implementation scopes bnodes rather
widely, but I'm sceptical that that is due to this rather subtle

I'm still waiting for a case where the (narrowly scoped) existential
reading is worth all this toil and misunderstanding. I really don't
see it.

2) The particular situation we were discussing was the interpretation
of DISTINCT. Let me rephrase the issue: We definitely can think of
the results of a CONSTRUCT as being RDF redundant, i.e., non lean.
Why can we not think of answers sets in exactly the same way?

Yes, if we want to have sessions, we have to push the quantifiers out
even further, but that's one reason I think it's not such a great
idea. As a UMD rep, I pushed it, but I'll confess that I didn't
expect *this* many ramifications and difficulties. As a UMan rep, I
do not support it anymore. Personally, I think for portally cases
(which drove the UMD position) that I'd have my portal coin URIs. It
solves a world of problems.

3) My usability point is that people want to see BNodes (and URIs!)
as being subject to a form of the UNA. Fred's examples confirm this.
The specs should either embrace this (breaking with RDF) or firmly
reject it and not make it so easy to get confused about it. Having a
form of distinct that leans the answer set goes a long way to doing
that.

Finally, as a specification point, it would be very nice if BGP
matching and the subsequent algebra were fairly distinct.

Cheers,
Bijan.
```
Received on Tuesday, 22 August 2006 12:35:39 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:00:51 UTC