Re: Well Behaved RDF - Taming Blank Nodes, etc. from Hugh Glaser on 2012-12-19 (semantic-web@w3.org from December 2012)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Wed, 19 Dec 2012 14:19:14 +0000
To: Pat Hayes <phayes@ihmc.us>
CC: Ivan Shmakov <oneingray@gmail.com>, Semantic Web <semantic-web@w3.org>, Lee Feigenbaum <lee@thefigtrees.net>
Message-ID: <387E72E216DF1247A2F8ED4819C93BA71E3BE288@UOS-MSG00041-SI.soton.ac.uk>
Thanks Pat.
I think you have two issues here.
The second one about global responsibility (peer pressure?!) to be cool, is something I have avoided commenting on, and I have tried to carefully avoid the Linked Data-specific issues (this is the SemWeb list!).
In practice this has been a problem, but personally I think we should simply celebrate any decent RDF that is published, bnodes, unresolvable URIs and all.
An example is eprints - the only URIs that resolve are the ones for the papers - but they are great sources.

Your first issue I think comes down to publishing for consumption.
It is the case that if I never let anyone else see my RDF, or ensure that I am in control of everyone who sees my RDF and how they use/refresh it, your scenario holds.
But as soon as someone takes my RDF away, then there is a bnodeID that got taken away, and so it is no longer "local".
So I don't think bnodes solve the problem any better than URIs.
In fact, I think that every time I take a bnodeID away and put it in a SPARQL store for example, I have to treat it as a new bnodeID, so the problem is actually much worse. 1000 consumers of an RDF graph with one bnode will cause 1001 bnodeIDs to be brought into the world.
In my world I resolve URIs and cache the results. I do this by asserting the data into a store. Each time I refresh the cache with some RDF that has a bnode in it I get a new bnodeID. And we are back to the world where I have difficulty doing anything to reduce the number.
(Of course, I could try tracking it all and delete the old model before asserting the new one, but then the publisher has pushed more work on the consumer, which should never be done. And the RDF graphs that come in may well be from different sources or be common sub-graphs of different graphs.)

As always, I think, it comes down to how people want to model/publish things compared with how convenient it is to consume.
I don't think I have yet seen a single person say they would rather consume something with bnodes in it.

The best books tell the reader what they need to know, not what the author wanted to say.
The best datasets tell the consumer what they need to know, not what the publisher wanted to say.

Best

On 19 Dec 2012, at 05:12, Pat Hayes <phayes@ihmc.us>
 wrote:

> 
> On Dec 18, 2012, at 1:52 PM, Hugh Glaser wrote:
> 
>> Thanks Ivan, really interesting.
>> I hadn't thought about the idea of having a blank node, and then later publishing the same data, but having managed to find a URI to replace the blank node with.
>> I think that is the nub of what you are saying in the bit that Lee has responded to.
>> At first sight, I thought, wow, that is quite a good point.
>> I can delay my choice of URI.
>> But when I take a process view, it seems to fall down:
>> 
>> a) I publish some stuff with a blank node.
>> b) Possibly people take my RDF away, with a bnode in it.
>> c) I find a new URI and republish the RDF with it.
>> 
>> So at the end of this, I am publishing RDF with a URI in it (great!).
>> People can come and get my RDF, and use it.
>> (Other) people might still have my original RDF.
>> I can't see any advantage if the RDF that is already "out there" has a bnode instead of a URI.
>> 
>> And of course there seem to be disadvantages.
>> I can't say that the graph that I previously published is about the same resource as my new graph - in fact I can't say much at all.
>> The people who took the original RDF can't describe any such equivalences/alignments either, and of course all the arguments about not being able to say other things still applies.
>> 
>> So yes, it is a pain if you end up with lots of URIs, but using bnodes actually doesn't solve the problem - there are still lots of references to nodes; it's just that they are bnodes, not URIs.
> 
> No, they do solve the problem, because bnodeIDs are local, so can be re-used. You never need more bnodeIDs than there are bnodes in your largest graph, and you incure no global responsibility to make your bnodeIDs "cool", or to provide something that will deliver a 200 response when HTTP is given the bnodeID.
> 
> Pat
> 
>> 
>> Best
>> On 18 Dec 2012, at 16:34, Lee Feigenbaum <lee@thefigtrees.net> wrote:
>> 
>>> On 12/18/2012 11:06 AM, Ivan Shmakov wrote:
>>>>>>>>> Lee Feigenbaum <lee@thefigtrees.net>
>>>>>>>>> writes:
>>>>>>>>> On 12/18/2012 10:23 AM, Ivan Shmakov wrote:
>>>>>>>>> 
>>>> […]
>>>> 
>>>>>> This way, one may easily end up with hundreds of URI's, each naming
>>>>>> one and the only person which was unfortunate enough to sit next to
>>>>>> our Lee.
>>>> 
>>>>>> … And don't forget about all the owl:sameAs arcs necessary to manage
>>>>>> this crowd!
>>>> 
>>>>> OK, sure.  Why is having hundreds of URIs for this person any worse
>>>>> than having hundreds of distinct blank nodes?
>>>> 
>>>> 	First of all, I'd assume that a typical RDF store implementation
>>>> 	will assign temporary identifiers (most likely integers) to
>>>> 	/all/ the nodes — both blank and named.  This way, one could
>>>> 	conserve space by /not/ storing permanent identifiers (URI's) in
>>>> 	addition to the temporary ones.
>>>> 
>>> 
>>> As you seem to acknowledge, storage conservation doesn't seem like a particularly compelling reason here :-)
>>> 
>>>> 
>>>> 	But perhaps even more compelling reason to use blank nodes is
>>>> 	that instead of introducing owl:sameAs arcs, one may just
>>>> 	replace two (or more) distinct blank nodes, — found to be
>>>> 	representing the same entity, — with a sole node possessing the
>>>> 	union of the properties of such blank nodes.  (Provided we check
>>>> 	for, and resolve, any semantic conflicts there are, that is.)
>>>> 
>>>> 
>>> 
>>> OK, great, now we're getting somewhere. So, if:
>>> 	• Your system does not support owl:sameAs (or you prefer not to use it to avoid performance penalties or complications)
>>> 	• URIs that you mint are (potentially) used outside of your system (so that you can't just smush away extra URIs the way you would with blank nodes)
>>> 	• You don't support or are not worried about "told blank nodes" (that could be reused down the line)
>>> then, by using blank nodes rather than URIs, you can freely smush together resources that represent the same entity.
>>> In my particular experience, the costs of using blank nodes tends to be a higher cost than the cost of maintaining (& relating) multiple URIs for the same entity. Perhaps my experience would be different if I were working more frequently with public data that is more likely to be reused and re-minted outside of my control. On the other hand, in cases in which data is created outside of your control, you also have no control over whether the people minting that data use blank nodes or URIs :)
>>> Lee
>>> 
>> 
>> 
>> 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
>
Received on Wednesday, 19 December 2012 14:20:27 UTC