Re: bnodes from Reto Bachmann-Gmür on 2007-10-02 (public-owl-dev@w3.org from October to December 2007)

From: Reto Bachmann-Gmür <rbg@talis.com>
Date: Tue, 02 Oct 2007 13:29:44 +0200
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: public-owl-dev@w3.org
Message-ID: <47022BA8.3040608@talis.com>
Bijan Parsia wrote:
> On 2 Oct 2007, at 09:30, Reto Bachmann-Gmür wrote:
> [...]
>
> "Violating" is a bit strong. "Ignoring" is better. It's just that most
> of the time ignoring is "compatible", in some sense, but the proof
> comes out when you put things to the test. In RDF, given the weakness
> of the language, the tests are easier to avoid. But, for example, no
> owl engine treats bnodes the same as somevalues statements.
As I already wrote, it's perfectly ok for an application to keep graph
unlean. In fact an API to draw graphs (like jena) may need to keep
redundant statements because the api user may still be adding properties
to those nodes.
>
>> do you have any stats or at least examples? Note that a tool doesn't
>> need to enforce lean-graphs all of the time,
>
> True. But how many leaners are out there? Production quality? How many
> entailment testers are out there? Production quality. How many
> applications make use of either. How many treat triples from documents
> as stable and sacrocent? (I mean, not eliminating the *easy* cases?)
I guess that many applications don't need to care because they are not
aggregating b-nodes from external sources, or because they deal with
b-nodes that are grounded by functional or inverse functional properties
so they can smush them without considering the whole extension. Other
applications do in fact some partial leanification because users do
complain when the rss:items in their aggregation get (or appear to get)
a new maker whenever that feed is aggregated. At least a partial
leanification is done by tools using graph-decomposition (such as
MSGs[1] or RDF Molecules[2]).
>
> Usually it takes me anywhere from 15 minutes to a couple of hours to
> explain the existential semantics plus it's implications in a variety
> of contexts. And I'm always having to explain it ;)
We all deal with existential semantics in daily life and natural language.

Sara says:  "there's a cat in the garden"
Peter says: "there's a cat in the garden"

If after hearing and trusting Sara and Peter someone ask us what there
is in the garden we would usually not say "there's a cat and a cat in
the garden" but we would usually never have created a non-lean graph in
our minds

or a more complex situation

Sara says:  "there's a cat in the garden"
Peter says: "there's a green cat in the garden"

If after trusting those two assertions most people would summarize what
there is in the garden as "a green cat" and this are mostly people which
never (explicitly) got existential semantics explained.

>> but for a tool complaint with rdf-semanrics the following graph
>>
>> eg:joesText foaf:maker [ foaf:name "jo"].
>>
>> expresses the same content as
>>
>> eg:joesText foaf:maker [ foaf:name "jo"].
>> eg:joesText foaf:maker [ foaf:name "jo"].
>
> So, how do you test that? 
In the example the two graphs contain an identical set of MSGs /rdf
molecules, but it's true that you can have graphs which decompose into
non-lean components or components which are obsoleted by the union of
other components.
> In my experience, if a store *doesn't* treat these as distinct, you
> get bug complaints.
and at least an RFE when *do* you treat them as distinct. It solely
depends if your users want a store for semantic content or a store for
non-lean graphs, in most cases they would be happy with both, some using
the semantic content store may use it wrong and have unexpected results
(and post your bug report), some users of the store for non-lean graphs
will eventually get tired of removing redundancies using ontology
specific knowledge and heuristics and move to a semantic content store.

[...]
>
>> Or for an aggregator:
>> whenever we aggregate the first graph we add two triples to our
>> aggregated graph, and if I got your "sane" interpretation right
>> eg:joesText has a new maker, which without further knowledge is not
>> considered to ow:sameAs to any of the exitisting foaf:makerS.
>
> Note your language leans heavily toward the local name interpretation. 
Could it be that you missed that after 'if I got your "sane"
interpretation right' I tried to paraphrase your position?
> The bnode is *not an entity* but a variable. 
agreed.
> I could understand if stored generally didn't lean for efficiency
> reasons, but they lean graphs of syntactically identicial triples.
>     s p o.
>     s p _:x.
Not leanifying for efficiency reasons is ok, doing some leanification
(where this can be done cheaply) is ok as well but compleate
leanification is best when it comes to having the most compact
expression of knowledge and thus the most valuable triples.

Or to go back to the cat(s): after being told "there's a cat in the
garden" and "Fritz is in the garden, Fritz is a cat" a cheaper edition
of our house-robot would summarize the situation as "Fritz and a cat are
in the garden, Fritz is a cat" while the slightly more expensive edition
of our RDF based robot  would say "The cat Fritz is in the garden".

[cutting sparql-part. Summary: sparql works on lean and on-lean graphs ]
> Plus, people just don't think of bnodes as ever redundant. I can't
> find the email right now, but a DAWG member suggested that a pattern
> like:
>
> _:x rdf:type Invoice; hasItem 4.
> _:y rdf:type Invoice; hasItem 3.
>
> Would safely indicate that you had (only) two invoices. Note that this
> *is* lean.
if hasItem was a functional property the you could safely conclude that
there are at least two invoices.
>> More generally, what's your motivation to change the semantic of
>> b-nodes,
> Because they cause lots of problems and their semantics offer no
> gains. For example, with existential Bnodes sparql query answering for
> RDFS is *NP-Complete* in *DATA COMPLEXITY*.
subgraph isomorphism is in NP even without existential variables in the
graph.
> That should be a scarey fact for anyone interested in scalability. 
Real-world graphs decompose into small components (especially when
taking functional and inverse function properties into account as in RDF
Molecules) so many of the operations which would be very expensive on
huge graphs consisting essentially of bnodes become quite cheap.
> How is having true existential semantic worth that?
It is. Because the tension is between b-nodes and centralized
ontological naming authorities. To have interoperable descriptions of
the same universe we would need a central authority assigning names. It
does make computation easier - at least for what the tax-office needs to
compute, that's probably why most designed social systems are centralized.
>
>> if you don't like/need existential variables why don't you just
>> assign URIs (urn:uuid) to you nodes?
>
> I do. But lots of people do not and I have to deal with their data.
> Plus when I write specs and software, if I have to treat:
>     s p _:x.
>
> as equivalent to:
>     s:[...someValuesFrom...]
>
> then many things become much, much, much harder. Interoperability
> becomes harder.
Decentralized interoperability only becomes possible with your great,
great, great effort. But its worth it :)

Cheers,
reto

1. http://www.dbin.org/RDFContextTools.php
2. www.*ksl.stanford.edu*/people/pp/papers/Ding_ISWC_2005.pdf

-- 
Reto Bachmann-Gmür
Talis Information Limited

Book your free place now at Talis Insight 2007 www.talis.com/insight
Find out more about Talis at www.talis.com
Shared InovationTM
 
Any views or personal opinions expressed within this email may not be those of Talis Information Ltd.
Received on Tuesday, 2 October 2007 11:29:54 UTC