The Mary example (Re: Summary of BNode redundancy options (at the moment)) from Bijan Parsia on 2006-08-17 (public-rdf-dawg@w3.org from July to September 2006)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Thu, 17 Aug 2006 13:38:28 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <F5B3B877-E02E-41DA-A048-752E4EB8725B@cs.man.ac.uk>

On Aug 17, 2006, at 6:00 AM, Pat Hayes wrote:

> :Mary :met :Bill
> :Bill :nationality USA
> :Mary :met _:x7
> _:x7 :nationality IRAQ
>
> SELECT DISTINCT ?x [:Mary :met ?x]
>
> If you are an government agency trying to keep track of who talked  
> to whom, it would be less than helpful to be told about Bill, but  
> not *anything at all* about the nameless Iraqi who Mary has been in  
> communication with, just because a semantic theory says that  
> {:Mary, _:x7} is technically redundant. DISTINCT in this case was  
> likely intended to mean, distinct *people*, and there is enough  
> information here to enable a human reader to know that _:x7,  
> whoever it is, is not Bill, even though an RDF engine might be too  
> dumb to figure this out. Getting the answer set {Bill, _:x7} in  
> this case tells you that there are two individuals about whom  
> something detectable is recorded, and if we have told bnodes  
> available then it allows a subsequent query to ask about the  
> nationality of _:x7. (They might be the same, of course, but then  
> so might :Bill and :Joe.)

This is interesting because as I think about it, the more I become  
convinced that this is poor modeling. At least on many scenarios  
(i.e., the case is underdescribed). Given some remarks Pat (perhaps  
privately) made about shifting the burden to data managers, I think  
that I object to this modeling, and would not recommend it. A much  
more sensible approach, especially for a vertical, curated collection  
like a gov agency (yes, I know, they aren't that good, but this isn't  
hard). BNodes are the wrong thing if this is the kind of of  
interpretation of the answers. For example:

	:Mary :met :Bill
	:Bill :nationality USA
	:Mary :met :unknownPerson1
	:unknownPerson1 :nationality IRAQ

The more I think about it, the better it seems. I can encode all  
sorts of information in the uri *or* the graph. It's easily stable.  
It's easy to *talk* about. If I have any sort of equality reasoning  
it's pretty easy to merge it when appropriate. It also allows for  
things like this:

	:Mary :met :Bill
	:Bill :nationality USA
	:Mary :met :unknownPerson1
	:unknownPerson1 :nationality IRAQ
	:Mary :met :unknownPerson2
	:unknownPerson2 :nationality IRAQ

Which would get leaned away if the unknowns were bnodes.

I agree that current practice uses BNodes exactly as if they were  
these :unknownUris, but that just reinforces my overall point that  
that interpretation is contrary to RDF semantics.  We either should  
change our semantics, stress this point strongly in the documents and  
solicit serious feedback far and wide at many different levels, or  
try to change RDF.

Cheers,
Bijan.

Received on Thursday, 17 August 2006 12:38:56 UTC