Re: Well Behaved RDF - Taming Blank Nodes, etc. from Lee Feigenbaum on 2012-12-18 (semantic-web@w3.org from December 2012)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Tue, 18 Dec 2012 09:47:54 -0500
To: Thomas Passin <list1@tompassin.net>
CC: semantic-web <semantic-web@w3.org>
Message-ID: <50D0821A.9040704@thefigtrees.net>
On 12/15/2012 4:41 PM, Thomas Passin wrote:
> On 12/13/2012 4:48 AM, Bernard Vatant wrote:
>> I'm 100% with Pat here in the defence of blank nodes, as ever [1] The
>> more I'm using RDF daily, the more I love them, and seems to me you
>> miss a lot of RDF expressivity by wanting everything to be uniquely
>> identified by a URI Just another example : "John met a girl yesterday
>> in a cafe". Are you going to coin a URI for those ill-identified
>> girl, cafe, and event? Certainly not. But you want to record this
>> information in John's bio because you guess it's likely to be of some
>> importance later in his life, even if so far you don't know more
>> about it.
>
> Nicely stated.  More generally, there are two different ways to describe
> or identify something.  One uses a (probably arbitrary) identifier, such
> as a URI or a person's name.  The other uses a bundle of properties. For
> example, it has been reported that around 87% of the US population can
> be identified by the combination of {5-digit ZIP code, gender, date of
> birth} (see
> http://dataprivacylab.org/projects/identifiability/paper1.pdf). Of
> course, we usually don't even need to make a fully unique identification
> for a bundle of properties to be very useful, as Bernard illustrates.
>
> Using a bnode corresponds to using a bundle of properties to describe
> the subject.  If we don't allow the use of bnodes, we eliminate one of
> the two basic ways of describing something.  That's a loss of the
> expressiveness that Bernard talks about above. Why would we want to
> limit ourselves so severely? I wouldn't

Thomas, I don't get this. Can you help me understand it better?

Suppose I described something with a blank node:

[] a ex:Person ; :zipCode "02111" ; :gender "male" ; :dob "1978-12-13" .


And then I do the same thing with an arbitrary URI that I mint:

<urn:uuid:abcde> a ex:Person ; :zipCode "02111" ; :gender "male" ; :dob "1978-12-13" .


What can I do in the blank node case that I can't in the URI case?

You say that the second case (with no blank nodes) "eliminate[s] one of 
the two basic ways of describing something". How is that the case?

If a second person comes along and makes another observation described 
by properties, we could get either:

# note that [] is going to mint a new blank node here

[] a ex:Person ; :zipCode "02111" ; :gender "male" ; :dob "1978-12-13" .

# note a new arbitrary URI has been minted for the second observation

<urn:uuid:fghi> a ex:Person ; :zipCode "02111" ; :gender "male" ; :dob "1978-12-13" .


So... what's the difference? In either case, I can use SPARQL queries to 
say that these resources might be the same thing. In either case, I 
could use owl:hasKey, owl:sameAs, and friends to establish an equivalent 
identity (if appropriate for my application). I feel like I must be 
missing something here?

Of course, if these triples were the objects of some other data, then 
the blank nodes have all of the usage challenges that people are 
familiar with:

:Lee :satNextTo [ a ex:Person ; :zipCode "02111" ; :gender "male" ; :dob "1978-12-13" ] .


vs.

:Lee :satNextTo <urn:uuid:abcde> .

<urn:uuid:abcde> a ex:Person ; :zipCode "02111" ; :gender "male" ; :dob "1978-12-13" .


So now I come along and ask who sat next to me:

SELECT ?person { :Lee :satNextTo ?person }


In the URI case, I get a value that I can then directly interrogate further:

SELECT ?p ?o { <urn:uuid:abcde> ?p ?o }


In the blank node case... no such luck. I can't feed the blank node I 
get a result into a subsequent query (because blank node identifiers are 
not stable, of course). So... I'm stuck having to do follow-on queries like:

SELECT ?p ?o { :Lee :satNextTo [ ?p ?o ] }


which are more complicated to author, potentially more complicated to 
execute, and not nearly as precise in the face of multiple people 
sitting next to Lee.

So, can you help me understand what I'm missing, please?

Lee
Received on Tuesday, 18 December 2012 14:48:26 UTC