Re: bnodes from Reto Bachmann-Gmür on 2007-10-02 (public-owl-dev@w3.org from October to December 2007)

From: Reto Bachmann-Gmür <rbg@talis.com>
Date: Tue, 02 Oct 2007 18:07:19 +0200
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: public-owl-dev@w3.org
Message-ID: <47026CB7.6010603@talis.com>
Bijan Parsia wrote:
> On 2 Oct 2007, at 12:29, Reto Bachmann-Gmür wrote:
>
>> Bijan Parsia wrote:
>>> On 2 Oct 2007, at 09:30, Reto Bachmann-Gmür wrote:
>>> [...]
>>>
>>> "Violating" is a bit strong. "Ignoring" is better. It's just that most
>>> of the time ignoring is "compatible", in some sense, but the proof
>>> comes out when you put things to the test. In RDF, given the weakness
>>> of the language, the tests are easier to avoid. But, for example, no
>>> owl engine treats bnodes the same as somevalues statements.
>> As I already wrote, it's perfectly ok for an application to keep graph
>> unlean.
>
> In practice, as far as I can tell, no store does *any* leaning (at
> least internally) not even cheap, but incomplete, leaning.
One would be http://gvs.hpl.hp.com/ .
>
> [...]
>> In fact an API to draw graphs (like jena)
>
> Jena is an api to draw graphs?
At least a part of the API is to construct graphs and as long as we are
constructing it and touching bnodes those shouldn't be merged away.
>
>> may need to keep
>> redundant statements because the api user may still be adding properties
>> to those nodes.
>
> They don't do cheap leaning on parse either.
The world isn't perfect; but that's not a reason to go in the opposite
direction.

[...]
>> We all deal with existential semantics in daily life and natural
>> language.
>
> We also deal with complex modalities, but they are a pain to explain
> and hard to formalize and even harder to formalize in a useful way.
>
> This is a non-sequitur.
granted.
>
>> We all deal with existential semantics in daily life and natural
>> language.
>>
>> Sara says:  "there's a cat in the garden"
>> Peter says: "there's a cat in the garden"
>>
>> If after hearing and trusting Sara and Peter someone ask us what there
>> is in the garden we would usually not say "there's a cat and a cat in
>> the garden" but we would usually never have created a non-lean graph in
>> our minds
> [snip]
>
> But we also don't think that there are 4 cats in the garden, which is
> what's compatible with treating that as a classic existential quantifier.
we think that there's a cat in the garden, which is true even when there
are 20 cats in the garden.
>
> This isn't going to work because really, the "a" isn't acting to
> suggest an indefinite *number*, but an indefinite *individual*, i.e.,
> an individual that we don't have more specific information about. It's
> a *demonstrative*, this is case.
>
> Consider:
>
> Bijan says: "There's a cat in the garden"
> Reto looks, then says: "Hey, there's like 10 there!"
> Bijan says: "Yes, but I was *speaking existentially* so what I say was
> TRUE!! There was *at least one* since there were 10. Ha ha ha ha,
> neener neener neener."
> Reto says: "Go to hell"
I probably wouldn't say so. I'd try to explain you calmly and politely
that I wasn't expecting 10 cats not because that would be contradictory
to what you said, but your statement may have violated the communicative
principle of relevance. This may however depend on the context, if we
are desperately hunting for a cat and our cat-detector beeps for the
garden, your statement "there's a cat in the garden, get it!" may in
fact be more appropriate than "in the garden there are, hmm, currently
at least 10 cats, get one of them!".

So while some lean utterances can violate the principle of relevance (as
when a proper instance of the utterance could have been used instead) ,
non-lean utterances always violate the principle of relevance, that's
why the robot that says "Fritz and a cat are in the garden, Fritz is a
cat" is cheaper.

 [...]
>
> I don't think people really do want a store for semantic content. I do
> think there are cases where they want redundancy eliminated or avoided
> in a  number of ways. But we can do that with URIs or with local names.
You can't remove all redundancies when using names unless these names
are assigned as unique names by a central authority and in corner cases
not even then.

Say your have a universe with exactly two identical spheres, there's
nothing else in this universe, a b-node description of this universe
would be:

[a :Sphere] owl:differentFrom [a :Sphere].

Even if we get that description repeatedly our knowledge about that
universe doesn't change but if each of our observer would have assigned
random names to the spheres our knowledge graph would constantly grow.
In this case "call IANA and ask for a name" wouldn't have worked as
there isn't a mean to tell them for which of the sphere's we would like
to look up the name.
>
> I don't want to force smushing *or* non-smushing. I just want to leave
> it to the application layer.
some smushing can only be done with existential variables, regardless of
the layer.
>> in most cases they would be happy with both, some using
>> the semantic content store may use it wrong and have unexpected results
>> (and post your bug report), some users of the store for non-lean graphs
>> will eventually get tired of removing redundancies using ontology
>> specific knowledge and heuristics and move to a semantic content store.
>
> I don't believe there's a market for the latter.
It took me a couple of years of coding with rdf till I really had an
urge for store who do this. Where adding the same graphs repeatedly
doesn't change the meaning.
>
>> [...]
>>>
>>>> Or for an aggregator:
>>>> whenever we aggregate the first graph we add two triples to our
>>>> aggregated graph, and if I got your "sane" interpretation right
>>>> eg:joesText has a new maker, which without further knowledge is not
>>>> considered to ow:sameAs to any of the exitisting foaf:makerS.
>>>
>>> Note your language leans heavily toward the local name interpretation.
>> Could it be that you missed that after 'if I got your "sane"
>> interpretation right' I tried to paraphrase your position?
>
> Nope.
>
> Consider the following:
>
> s p o
> s1 p o
> _:x p o.
>
> I don't think _:x is "sameas" either s or s1 on any reading (absent
> specific assertions or cardinality restrictions, etc.)
>
> However, _:x p o. is entailed by s p o (alone) and s1 p o (alone).
right, did I say otherwise?
>
> So the existential reading is not captured by thinking in terms of
> sameAs. *That's* the part that is "Thinking in individuals". I have no
> problem with that, fwiw :)
Every node is owl:sameAs to itself, but you would hardly use owl:sameAs
with a b-node subject or object because in this case you can just drop
the b-node and the owl:sameAs statement by attaching its properties to
the other node. But when you aggregate

eg:joesText foaf:maker [ foaf:name "jo"].

from different sources and you or the sources assign a URI the b-node, the many named nodes you'll have at the end will most likely be owl:sameAs. It is true that the graph above evaluates to true if a world where eg:joesText has been made by multiple resources with foaf:name "jo", this is also true for the graph

eg:joesText foaf:maker <urn:uuid:111>.
<urn:uuid:111> foaf:name "jo".

(apart that no [foaf:name "jo"] may answer to <urn:uuid:111> making the graph false)

>> agreed.
>>> I could understand if stored generally didn't lean for efficiency
>>> reasons, but they lean graphs of syntactically identicial triples.
>>>     s p o.
>>>     s p _:x.
>> Not leanifying for efficiency reasons is ok, doing some leanification
>> (where this can be done cheaply) is ok as well but compleate
>> leanification is best when it comes to having the most compact
>> expression of knowledge and thus the most valuable triples.
> The bnode is *not an entity* but a variable.
>
> My point is that people do, in fact, treat redundant bnode triples as
> carrying information. I've had long discussions on this with, e.g.,
> people in DAWG.
I would have hoped they didn't convince you :-(
>
>> Or to go back to the cat(s): after being told "there's a cat in the
>> garden" and "Fritz is in the garden, Fritz is a cat" a cheaper edition
>> of our house-robot would summarize the situation as "Fritz and a cat are
>> in the garden, Fritz is a cat" while the slightly more expensive edition
>> of our RDF based robot  would say "The cat Fritz is in the garden".
>
> Yeah, your intuitions don't seem very natural to me. I mean, this is
> just saying "RDF semantics are good" in a pretty forced example.
If your, Bijan's and somebody's intuition is otherwise and you think
something, you think  non-lean communication is not handy then Reto,
somebody and I suggest you buy the thing that is a robot and is the same
as the thing that is cheaper.
>
>> [cutting sparql-part. Summary: sparql works on lean and on-lean graphs ]
>>> Plus, people just don't think of bnodes as ever redundant. I can't
>>> find the email right now, but a DAWG member suggested that a pattern
>>> like:
>>>
>>> _:x rdf:type Invoice; hasItem 4.
>>> _:y rdf:type Invoice; hasItem 3.
>>>
>>> Would safely indicate that you had (only) two invoices. Note that this
>>> *is* lean.
>> if hasItem was a functional property the you could safely conclude that
>> there are at least two invoices.
>
> Sure. And if they were URIs, the lack of UNA, would *still* make it
> unsafe to conclude you had two. But that's extra/
But since 4 and 3 are literals you could safely conclude you have two.
>
>>>> More generally, what's your motivation to change the semantic of
>>>> b-nodes,
>>> Because they cause lots of problems and their semantics offer no
>>> gains. For example, with existential Bnodes sparql query answering for
>>> RDFS is *NP-Complete* in *DATA COMPLEXITY*.
>> subgraph isomorphism is in NP even without existential variables in the
>> graph.
>
> You don't understand my point. Think about data complexity and the
> relative sizes of queries and data. Think about how much of queries
> tend to be ground.
I still don't. What makes a query complex is structure and number of
variable in the query, not b-nodes in the graph.
[...]
>
>>> How is having true existential semantic worth that?
>> It is. Because the tension is between b-nodes and centralized
>> ontological naming authorities.
>
> Local singular terms work just fine, even with a default UNA. What are
> you missing? Think about the case of *adding* entailments.
default UNA? you get even farer from your 
ignoring-without-violating-too-badly principle.
[...]
>
> We can, I believe, capture all of the actual behaviors you rely on
> without forcing me to deal with the equivalence of bnodes and
> somevaluesfrom. I think it will better capture the way people
> understand bnodes (just as I think a default UNA would better capture
> how people want to work with URIs). I think it will have both better
> formal computational properties and better practical ones and be more
> usable.
People, at least some people, at least some programmers (being non-lean
again) would like to have tamper-proof bar-code on all material or
abstract things (and of course, laws forbidding adding additional
bar-codes ). That's not my world, I think this ugly in the cathedral and
impossible in a bazaar.
>
> You disagree. I get it. File a bug report with SPARQL. File a bug
> report with RIF. 
I still don't  see where RIF or SPARQL forbids leanifying graphs.

I hope you no longer disagree about existential b-nodes, but if you
still do I'll read you proposals to revise RDF semantics.

Cheers,
reto




-- 
Reto Bachmann-Gmür
Talis Information Limited

Book your free place now at Talis Insight 2007 www.talis.com/insight
Find out more about Talis at www.talis.com
Shared InovationTM
 
Any views or personal opinions expressed within this email may not be those of Talis Information Ltd.
Received on Tuesday, 2 October 2007 16:07:29 UTC