Re: Well Behaved RDF - Taming Blank Nodes, etc. from Pat Hayes on 2012-12-19 (semantic-web@w3.org from December 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 18 Dec 2012 21:12:43 -0800
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: Ivan Shmakov <oneingray@gmail.com>, Semantic Web <semantic-web@w3.org>, Lee Feigenbaum <lee@thefigtrees.net>
Message-Id: <FD4C97FE-0834-4BEE-966E-B14529862E21@ihmc.us>
On Dec 18, 2012, at 1:52 PM, Hugh Glaser wrote:

> Thanks Ivan, really interesting.
> I hadn't thought about the idea of having a blank node, and then later publishing the same data, but having managed to find a URI to replace the blank node with.
> I think that is the nub of what you are saying in the bit that Lee has responded to.
> At first sight, I thought, wow, that is quite a good point.
> I can delay my choice of URI.
> But when I take a process view, it seems to fall down:
> 
> a) I publish some stuff with a blank node.
> b) Possibly people take my RDF away, with a bnode in it.
> c) I find a new URI and republish the RDF with it.
> 
> So at the end of this, I am publishing RDF with a URI in it (great!).
> People can come and get my RDF, and use it.
> (Other) people might still have my original RDF.
> I can't see any advantage if the RDF that is already "out there" has a bnode instead of a URI.
> 
> And of course there seem to be disadvantages.
> I can't say that the graph that I previously published is about the same resource as my new graph - in fact I can't say much at all.
> The people who took the original RDF can't describe any such equivalences/alignments either, and of course all the arguments about not being able to say other things still applies.
> 
> So yes, it is a pain if you end up with lots of URIs, but using bnodes actually doesn't solve the problem - there are still lots of references to nodes; it's just that they are bnodes, not URIs.

No, they do solve the problem, because bnodeIDs are local, so can be re-used. You never need more bnodeIDs than there are bnodes in your largest graph, and you incure no global responsibility to make your bnodeIDs "cool", or to provide something that will deliver a 200 response when HTTP is given the bnodeID.

Pat

> 
> Best
> On 18 Dec 2012, at 16:34, Lee Feigenbaum <lee@thefigtrees.net> wrote:
> 
>> On 12/18/2012 11:06 AM, Ivan Shmakov wrote:
>>>>>>>> Lee Feigenbaum <lee@thefigtrees.net>
>>>>>>>> writes:
>>>>>>>> On 12/18/2012 10:23 AM, Ivan Shmakov wrote:
>>>>>>>> 
>>> […]
>>> 
>>>>> This way, one may easily end up with hundreds of URI's, each naming
>>>>> one and the only person which was unfortunate enough to sit next to
>>>>> our Lee.
>>> 
>>>>> … And don't forget about all the owl:sameAs arcs necessary to manage
>>>>> this crowd!
>>> 
>>>> OK, sure.  Why is having hundreds of URIs for this person any worse
>>>> than having hundreds of distinct blank nodes?
>>> 
>>> 	First of all, I'd assume that a typical RDF store implementation
>>> 	will assign temporary identifiers (most likely integers) to
>>> 	/all/ the nodes — both blank and named.  This way, one could
>>> 	conserve space by /not/ storing permanent identifiers (URI's) in
>>> 	addition to the temporary ones.
>>> 
>> 
>> As you seem to acknowledge, storage conservation doesn't seem like a particularly compelling reason here :-)
>> 
>>> 
>>> 	But perhaps even more compelling reason to use blank nodes is
>>> 	that instead of introducing owl:sameAs arcs, one may just
>>> 	replace two (or more) distinct blank nodes, — found to be
>>> 	representing the same entity, — with a sole node possessing the
>>> 	union of the properties of such blank nodes.  (Provided we check
>>> 	for, and resolve, any semantic conflicts there are, that is.)
>>> 
>>> 
>> 
>> OK, great, now we're getting somewhere. So, if:
>> 	• Your system does not support owl:sameAs (or you prefer not to use it to avoid performance penalties or complications)
>> 	• URIs that you mint are (potentially) used outside of your system (so that you can't just smush away extra URIs the way you would with blank nodes)
>> 	• You don't support or are not worried about "told blank nodes" (that could be reused down the line)
>> then, by using blank nodes rather than URIs, you can freely smush together resources that represent the same entity.
>> In my particular experience, the costs of using blank nodes tends to be a higher cost than the cost of maintaining (& relating) multiple URIs for the same entity. Perhaps my experience would be different if I were working more frequently with public data that is more likely to be reused and re-minted outside of my control. On the other hand, in cases in which data is created outside of your control, you also have no control over whether the people minting that data use blank nodes or URIs :)
>> Lee
>> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 19 December 2012 05:13:12 UTC