Re: bnodes from Bijan Parsia on 2007-10-02 (public-owl-dev@w3.org from October to December 2007)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Tue, 2 Oct 2007 14:44:29 +0100
To: Reto Bachmann-Gmür <rbg@talis.com>
Cc: public-owl-dev@w3.org
Message-Id: <8C5BA1E8-2299-4BCC-BFA7-8AB043E773AD@cs.man.ac.uk>
On 2 Oct 2007, at 12:29, Reto Bachmann-Gmür wrote:

> Bijan Parsia wrote:
>> On 2 Oct 2007, at 09:30, Reto Bachmann-Gmür wrote:
>> [...]
>>
>> "Violating" is a bit strong. "Ignoring" is better. It's just that  
>> most
>> of the time ignoring is "compatible", in some sense, but the proof
>> comes out when you put things to the test. In RDF, given the weakness
>> of the language, the tests are easier to avoid. But, for example, no
>> owl engine treats bnodes the same as somevalues statements.
> As I already wrote, it's perfectly ok for an application to keep graph
> unlean.

In practice, as far as I can tell, no store does *any* leaning (at  
least internally) not even cheap, but incomplete, leaning.

In practice, no store or OWL engine gives you access to the  
equivalence between bnodes and somevaluesof where they are equivalent.

Furthermore, users haven't requested it and would be, afaict,  
confused by it.

> In fact an API to draw graphs (like jena)

Jena is an api to draw graphs?

> may need to keep
> redundant statements because the api user may still be adding  
> properties
> to those nodes.

They don't do cheap leaning on parse either.

>>> do you have any stats or at least examples? Note that a tool doesn't
>>> need to enforce lean-graphs all of the time,
>>
>> True. But how many leaners are out there? Production quality? How  
>> many
>> entailment testers are out there? Production quality. How many
>> applications make use of either. How many treat triples from  
>> documents
>> as stable and sacrocent? (I mean, not eliminating the *easy* cases?)
> I guess that many applications don't need to care because they are not
> aggregating b-nodes from external sources, or because they deal with
> b-nodes that are grounded by functional or inverse functional  
> properties
> so they can smush them without considering the whole extension.

Sure.

> Other
> applications do in fact some partial leanification because users do
> complain when the rss:items in their aggregation get (or appear to  
> get)
> a new maker whenever that feed is aggregated. At least a partial
> leanification is done by tools using graph-decomposition (such as
> MSGs[1] or RDF Molecules[2]).

Thanks for the reference. I still don't see that that needs to be  
built into the semantics, and esp. the semantics via treating them as  
existentials.

>> Usually it takes me anywhere from 15 minutes to a couple of hours to
>> explain the existential semantics plus it's implications in a variety
>> of contexts. And I'm always having to explain it ;)
> We all deal with existential semantics in daily life and natural  
> language.

We also deal with complex modalities, but they are a pain to explain  
and hard to formalize and even harder to formalize in a useful way.

This is a non-sequitur.

> We all deal with existential semantics in daily life and natural  
> language.
>
> Sara says:  "there's a cat in the garden"
> Peter says: "there's a cat in the garden"
>
> If after hearing and trusting Sara and Peter someone ask us what there
> is in the garden we would usually not say "there's a cat and a cat in
> the garden" but we would usually never have created a non-lean  
> graph in
> our minds
[snip]

But we also don't think that there are 4 cats in the garden, which is  
what's compatible with treating that as a classic existential  
quantifier.

This isn't going to work because really, the "a" isn't acting to  
suggest an indefinite *number*, but an indefinite *individual*, i.e.,  
an individual that we don't have more specific information about.  
It's a *demonstrative*, this is case.

Consider:

Bijan says: "There's a cat in the garden"
Reto looks, then says: "Hey, there's like 10 there!"
Bijan says: "Yes, but I was *speaking existentially* so what I say  
was TRUE!! There was *at least one* since there were 10. Ha ha ha ha,  
neener neener neener."
Reto says: "Go to hell"

:)

BTW, if you have studies about what people do (along the lines of the  
cognitive adequacy studies of RCC8), I'd be interested in references.



>>> but for a tool complaint with rdf-semanrics the following graph
>>>
>>> eg:joesText foaf:maker [ foaf:name "jo"].
>>>
>>> expresses the same content as
>>>
>>> eg:joesText foaf:maker [ foaf:name "jo"].
>>> eg:joesText foaf:maker [ foaf:name "jo"].
>>
>> So, how do you test that?
> In the example the two graphs contain an identical set of MSGs /rdf
> molecules, but it's true that you can have graphs which decompose into
> non-lean components or components which are obsoleted by the union of
> other components.
>> In my experience, if a store *doesn't* treat these as distinct, you
>> get bug complaints.
> and at least an RFE when *do* you treat them as distinct. It solely
> depends if your users want a store for semantic content or a store for
> non-lean graphs,

I don't think people really do want a store for semantic content. I  
do think there are cases where they want redundancy eliminated or  
avoided in a  number of ways. But we can do that with URIs or with  
local names.

I don't want to force smushing *or* non-smushing. I just want to  
leave it to the application layer.

> in most cases they would be happy with both, some using
> the semantic content store may use it wrong and have unexpected  
> results
> (and post your bug report), some users of the store for non-lean  
> graphs
> will eventually get tired of removing redundancies using ontology
> specific knowledge and heuristics and move to a semantic content  
> store.

I don't believe there's a market for the latter.

> [...]
>>
>>> Or for an aggregator:
>>> whenever we aggregate the first graph we add two triples to our
>>> aggregated graph, and if I got your "sane" interpretation right
>>> eg:joesText has a new maker, which without further knowledge is not
>>> considered to ow:sameAs to any of the exitisting foaf:makerS.
>>
>> Note your language leans heavily toward the local name  
>> interpretation.
> Could it be that you missed that after 'if I got your "sane"
> interpretation right' I tried to paraphrase your position?

Nope.

Consider the following:

s p o
s1 p o
_:x p o.

I don't think _:x is "sameas" either s or s1 on any reading (absent  
specific assertions or cardinality restrictions, etc.)

However, _:x p o. is entailed by s p o (alone) and s1 p o (alone).

So the existential reading is not captured by thinking in terms of  
sameAs. *That's* the part that is "Thinking in individuals". I have  
no problem with that, fwiw :)

>> The bnode is *not an entity* but a variable.
> agreed.
>> I could understand if stored generally didn't lean for efficiency
>> reasons, but they lean graphs of syntactically identicial triples.
>>     s p o.
>>     s p _:x.
> Not leanifying for efficiency reasons is ok, doing some leanification
> (where this can be done cheaply) is ok as well but compleate
> leanification is best when it comes to having the most compact
> expression of knowledge and thus the most valuable triples.

My point is that people do, in fact, treat redundant bnode triples as  
carrying information. I've had long discussions on this with, e.g.,  
people in DAWG.

> Or to go back to the cat(s): after being told "there's a cat in the
> garden" and "Fritz is in the garden, Fritz is a cat" a cheaper edition
> of our house-robot would summarize the situation as "Fritz and a  
> cat are
> in the garden, Fritz is a cat" while the slightly more expensive  
> edition
> of our RDF based robot  would say "The cat Fritz is in the garden".

Yeah, your intuitions don't seem very natural to me. I mean, this is  
just saying "RDF semantics are good" in a pretty forced example.

> [cutting sparql-part. Summary: sparql works on lean and on-lean  
> graphs ]
>> Plus, people just don't think of bnodes as ever redundant. I can't
>> find the email right now, but a DAWG member suggested that a pattern
>> like:
>>
>> _:x rdf:type Invoice; hasItem 4.
>> _:y rdf:type Invoice; hasItem 3.
>>
>> Would safely indicate that you had (only) two invoices. Note that  
>> this
>> *is* lean.
> if hasItem was a functional property the you could safely conclude  
> that
> there are at least two invoices.

Sure. And if they were URIs, the lack of UNA, would *still* make it  
unsafe to conclude you had two. But that's extra/

>>> More generally, what's your motivation to change the semantic of
>>> b-nodes,
>> Because they cause lots of problems and their semantics offer no
>> gains. For example, with existential Bnodes sparql query answering  
>> for
>> RDFS is *NP-Complete* in *DATA COMPLEXITY*.
> subgraph isomorphism is in NP even without existential variables in  
> the
> graph.

You don't understand my point. Think about data complexity and the  
relative sizes of queries and data. Think about how much of queries  
tend to be ground.

>> That should be a scarey fact for anyone interested in scalability.
> Real-world graphs decompose into small components (especially when
> taking functional and inverse function properties into account as  
> in RDF
> Molecules) so many of the operations which would be very expensive on
> huge graphs consisting essentially of bnodes become quite cheap.

DAWG people refused to have "really distinct" answer sets (where you  
leaned the *answers*) because it was "too expensive".

>> How is having true existential semantic worth that?
> It is. Because the tension is between b-nodes and centralized
> ontological naming authorities.

Local singular terms work just fine, even with a default UNA. What  
are you missing? Think about the case of *adding* entailments.

> To have interoperable descriptions of
> the same universe we would need a central authority assigning  
> names. It
> does make computation easier - at least for what the tax-office  
> needs to
> compute, that's probably why most designed social systems are  
> centralized.

No idea what you're arguing here. Sorry.

>>> if you don't like/need existential variables why don't you just
>>> assign URIs (urn:uuid) to you nodes?
>>
>> I do. But lots of people do not and I have to deal with their data.
>> Plus when I write specs and software, if I have to treat:
>>     s p _:x.
>>
>> as equivalent to:
>>     s:[...someValuesFrom...]
>>
>> then many things become much, much, much harder. Interoperability
>> becomes harder.
> Decentralized interoperability only becomes possible with your great,
> great, great effort. But its worth it :)

I don't know what you mean here, either.

We can, I believe, capture all of the actual behaviors you rely on  
without forcing me to deal with the equivalence of bnodes and  
somevaluesfrom. I think it will better capture the way people  
understand bnodes (just as I think a default UNA would better capture  
how people want to work with URIs). I think it will have both better  
formal computational properties and better practical ones and be more  
usable.

You disagree. I get it. File a bug report with SPARQL. File a bug  
report with RIF.

(Roughly, I'd prefer to have a sort of skolem constant instead of  
bnodes, if this helps you any :))

Cheers,
Bijan.
Received on Tuesday, 2 October 2007 13:43:19 UTC