Re: bnodes from Bijan Parsia on 2007-10-02 (public-owl-dev@w3.org from October to December 2007)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Tue, 2 Oct 2007 10:21:30 +0100
To: Reto Bachmann-Gmür <rbg@talis.com>
Cc: public-owl-dev@w3.org
Message-Id: <7A2ED682-93E3-46E3-896F-21104087B90C@cs.man.ac.uk>
On 2 Oct 2007, at 09:30, Reto Bachmann-Gmür wrote:

> Hi Bijan,
>
> You wrote:
>
>> However, the common, deployed semantics for BNodes is that they are
>> local names, not existential variables. SPARQL treats them that way.
> and
>> If so, then you are pro-existential variable semantics. If not, then
>> you are sane, er, in favor of a somewhat different approach like most
>> of the world.
> I'm wondering why you think most of the world is violating rdf- 
> semantics,

Because it's true! The *best* reason to believe something.

"Violating" is a bit strong. "Ignoring" is better. It's just that  
most of the time ignoring is "compatible", in some sense, but the  
proof comes out when you put things to the test. In RDF, given the  
weakness of the language, the tests are easier to avoid. But, for  
example, no owl engine treats bnodes the same as somevalues statements.

> do you have any stats or at least examples? Note that a tool  
> doesn't need to enforce lean-graphs all of the time,

True. But how many leaners are out there? Production quality? How  
many entailment testers are out there? Production quality. How many  
applications make use of either. How many treat triples from  
documents as stable and sacrocent? (I mean, not eliminating the  
*easy* cases?)

Usually it takes me anywhere from 15 minutes to a couple of hours to  
explain the existential semantics plus it's implications in a variety  
of contexts. And I'm always having to explain it ;)

> but for a tool complaint with rdf-semanrics the following graph
>
> eg:joesText foaf:maker [ foaf:name "jo"].
>
> expresses the same content as
>
> eg:joesText foaf:maker [ foaf:name "jo"].
> eg:joesText foaf:maker [ foaf:name "jo"].

So, how do you test that? In my experience, if a store *doesn't*  
treat these as distinct, you get bug complaints.

> Not treating b-nodes as existential varaiables would mean that the  
> union of a graph with itself would be a different graph.

Well, it's clearly *syntactically* a different graph, in a way that:

	s p o
and
	s p o.
	s p o.

are not.

> Or for an aggregator:
> whenever we aggregate the first graph we add two triples to our  
> aggregated graph, and if I got your "sane" interpretation right  
> eg:joesText has a new maker, which without further knowledge is not  
> considered to ow:sameAs to any of the exitisting foaf:makerS.

Note your language leans heavily toward the local name  
interpretation. The bnode is *not an entity* but a variable. I could  
understand if stored generally didn't lean for efficiency reasons,  
but they lean graphs of syntactically identicial triples.
	s p o.
	s p _:x.

(where as databases may not, for the sake of efficiency). SPARQL's  
"distinct" mechanism *does not lean answers*:
	http://www.w3.org/TR/rdf-sparql-query/#modDistinct
	http://www.w3.org/TR/rdf-sparql-query/#BGPsparqlBNodes

> I  was wondering why you think SPARQL treats b-nodes as local  
> names, afaik sparql doesn't guarantee that a query against two  
> graphs expressing the same content yields to the same result but it  
> doesn't require an implementation to keep redundant b-nodes neither.

"SPARQL uses the subgraph match criterion to determine the solutions  
of a basic graph pattern. There is one solution for each distinct  
pattern instance mapping from the basic graph pattern to a subset of  
the active graph."

That's the requirement to keep redundant bnodes. You, of course, in a  
separate operation, lean your graph. But note that this doesn't solve  
all issues.

"This is optimized for ease of computation rather than redundancy  
elimination. It allows query results to contain redundancies even  
when the active graph of the dataset is lean, and it allows logically  
equivalent datasets to yield query results."

I think the last should be "to yield different query results".

If you are existentially inclined, then a results set can contain  
bnode redundancies *even if* the query graph is lean. It's trivial, eh?

	_:x p o.
	_:y p z.

select ?a where {?a p ?y}

Result set:
    ?a
    _:x
    _:y

Which gives you no more information than:
   _:x

If you use construct, you'll get a non lean graph.

(See <http://www.cs.man.ac.uk/~bparsia/2006/row-tutorial/#slide32>
Note there seems to be a bug on slide 37; parts are just garbled;  
i'll try to clean it up later today.)

Plus, people just don't think of bnodes as ever redundant. I can't  
find the email right now, but a DAWG member suggested that a pattern  
like:

_:x rdf:type Invoice; hasItem 4.
_:y rdf:type Invoice; hasItem 3.

Would safely indicate that you had (only) two invoices. Note that  
this *is* lean.

> More generally, what's your motivation to change the semantic of b- 
> nodes,

Because they cause lots of problems and their semantics offer no  
gains. For example, with existential Bnodes sparql query answering  
for RDFS is *NP-Complete* in *DATA COMPLEXITY*.

That should be a scarey fact for anyone interested in scalability.  
How is having true existential semantic worth that?

> if you don't like/need existential variables why don't you just  
> assign URIs (urn:uuid) to you nodes?

I do. But lots of people do not and I have to deal with their data.  
Plus when I write specs and software, if I have to treat:
	s p _:x.

as equivalent to:
	s:[...someValuesFrom...]

then many things become much, much, much harder. Interoperability  
becomes harder.

Now there are many ways to compromise. The sparql spec works hard and  
carefully to keep thing at least *seeming* compatible. But really,  
it's a waste of effort and confusing.

> What's the particularity of "local" names, do they have to change  
> when they change context? what's the advantage of it?

I have a proposal in mind that preserves just about all used behavior  
and makes life much easier, but I've not written it out. Leaning goes  
out the window. Yay. Keep it, if you like, as an unsanctioned  
operation along side other graph munging. Parser behavior stays the  
same, though we have a "merge" vs. "union" option, since some of the  
time you want to keep nodeids in different graphs distinct and  
sometimes you don't. merge behavior (distinct) is default, though we  
could work out a proposal that allowed for roundtripping nodeids.

Cheers,
Bijan.
Received on Tuesday, 2 October 2007 09:20:12 UTC