Re: Well Behaved RDF - Taming Blank Nodes, etc. from Hugh Glaser on 2012-12-13 (semantic-web@w3.org from December 2012)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Thu, 13 Dec 2012 11:59:28 +0000
To: Pat Hayes <phayes@ihmc.us>
CC: David Booth <david@dbooth.org>, semantic-web Web <semantic-web@w3.org>
Message-ID: <387E72E216DF1247A2F8ED4819C93BA71E384646@UOS-MSG00041-SI.soton.ac.uk>
Hi Pat,
I'm sorry to report, I am unconvinced.
The theme of my view is that I am not just publishing stuff now, it's that I am in a stream of publishing about stuff, which includes me in the future, and others.
And that the point of publishing is not to say what I want - it is to enable others to use what I publish, and in particular make statements themselves about what I publish.
Comments on your examples inline.

On 13 Dec 2012, at 07:00, Pat Hayes <phayes@ihmc.us>
 wrote:

> 
> On Dec 12, 2012, at 10:23 AM, Hugh Glaser wrote:
> 
>> OK, blank nodes :-)
>> Hi Pat,
>> On 12 Dec 2012, at 17:43, Pat Hayes <phayes@ihmc.us>
>> wrote:
>> 
>>> 
>>> On Dec 12, 2012, at 9:01 AM, David Booth wrote:
>>> 
>> <snip>
>>>> A Well-Behaved RDF graph is an RDF graph that can be serialized 
>>>> as Turtle without the use of explicit blank node identifiers. 
>>>> I.e., only blank nodes that are implicitly created by the 
>>>> bracket "[ ... ]" or list "( ... )" notations are permitted. 
>>> 
>>> That is too restrictive. There is a real need to be able to describe things such as "Joe's father" or "a woman in a red dress" which are naturally phrased as bnodes with identifying descriptors attached to them. 
>>> 
>> I don't think I understand, or I may disagree :-)
>> Why is the node for "Joe's father" any different in character from the node for "Joe"?
>> (Assuming this is my data I'm publishing, and I'm not reusing external URIs.)
> 
> Ah, but I *am* re-using external URIs. Isn't that one of the points of linked data, to re-use pre-existing URIs where possible? Take a real example which came up in our work. I have a photograph of a statue taken inside Manchester Town Hall, and I want to say this. There is a URI in DBpedia for Manchester Town Hall, and I plan to use it. But I don't want to say that my picture is of MTH, becuase its actually of a room inside MTH. I don't have a URI for this room and I don't want to coin one, because that URI would never get re-used by anyone and won't serve any useful purpose,
I think this is probably where we differ.
I worry about serendipitous reuse. Publishers should never make assumptions about how their data is to be reused - it always ends with the data being less used than it could be.
For example, later the council comes along and does a dataset of rooms in MTH. An ability to connect the rooms means that your beautiful photo is useful to the world much more easily.
When you or someone (Google goggles?) finds the council's URI it is easy to do this if you have your own URI.
> and in order to make it useful I would have to invent some kind of global naming discipline for URIs denoting rooms, and I don't know how to do that sensibly.
I don't see this either. I just need a way of generating URIs. This one happens to be a room. http://mydomain/foobar-unix-epoch will do.
> I just want to say the RDF equivalent of "**a** room in Manchester Town Hall". And that is exactly what blank nodes are for, so I would like to use them to do that. 
Yes. And the way to do it is make up a random URI.
> 
> Another example: a picture of some celebrity standing next to a horse. I have a URI for the celebrity, but I don't have and don't need one for the horse: and if I were to invent one for each horse, then I could no longer query for retrieval of a picture of that person with "a horse", but would have to remember the URi for each of the bloody horses. But nobody gives a damn about the particular horse. 
I actually don't understand this.
Probably my ignorance of RDF & SPARQL (I do appreciate you taking the time with me - really! I suspect I am embarrassing myself really seriously now)
Is this SPARQL?
Why is a querying different? I don't get to know whether the match of ?a is a blank node or a URI, do I?
> 
> LIfe is full of things which are important only by virtue of their existence and the relationships in which they stand, not by virtue of their particular identity. My car needs **a** battery to work, It does not need a particular battery identified by a URI. The conditions for my car to start involve a blank node, not a URI. Putting a URI to denote the battery would actually be misleading and inaccurate. Similarly my swiffer needs **a** refill, and the patient needs **a** course of antibiotics, and I need **a** drink when I am thirsty. 
Yes, these are all things that can be published.
But for each of them it might be the case that you want to sue a URI when you consume it.
My car has two batteries - when one goes flat I would like to be able to say which one it is.
I don't see how putting a URI for a battery is "inaccurate"? Is that because it is false, or something?
A medical researcher may want to refer to the patient's course of antibiotics. etc.
> 
>> I had to make up a URI for Joe; it isn't a great hardship to ask me to make one up for his father as well.
>> The only difference between the two resources seems to be that you have a property for one that you don't have for the other (perhaps "name" in this case).
>> That doesn't seem a great way to choose what URIs you decide to mint and publish.
> 
> Well, frankly, my rationale for creating and publishing URIs is my business, not yours. I want to create them only when they are likely to be useful for re-use. In descriptions like the ones I am mentioning, they will never serve any useful purpose. 
Clearly I completely disagree.
> 
>> And the downside (apart from any processing problems that David might have), is that no-one can make statements about the resource I am referring to as Joe's father.
> 
> Sure they can. They can use the very same RDF construction that you use to refer to it. It that involves blank nodes, they can use blank nodes in the same way. This argument you are using (that a URI is necessary in order to refer to something) is complete baloney.
My ignorance again:
How do consumers of both datasets know that the blank nodes are the same resource?
I'm sure there is a paper you can point me at that tells me - sorry.

I'm not saying that you shouldn't publish blank nodes - feel free.
My concern is that the data is often then less useful - and for David's purposes it is possibly good to encourage people to make the little extra effort to make up a URI.

I consume a lot of other people's RDF data, and I do find that what should be very useful data is made almost useless because of blank nodes.

Best
Hugh
> 
> Pat
> 
> 
>> The only reason (apart from laziness in thinking up a URI) I can see for a bnode here is if I want to prevent people making statements about the resource, which is surely not something we want to encourage in a simple RDF profile description.
>> 
>> Cheers
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
>
Received on Thursday, 13 December 2012 12:01:03 UTC