Re: Well Behaved RDF - Taming Blank Nodes, etc. from Lee Feigenbaum on 2012-12-18 (semantic-web@w3.org from December 2012)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Tue, 18 Dec 2012 11:34:55 -0500
To: Ivan Shmakov <oneingray@gmail.com>
CC: Semantic Web <semantic-web@w3.org>
Message-ID: <50D09B2F.4040009@thefigtrees.net>

On 12/18/2012 11:06 AM, Ivan Shmakov wrote:
>>>>>> Lee Feigenbaum <lee@thefigtrees.net> writes:
>>>>>> On 12/18/2012 10:23 AM, Ivan Shmakov wrote:
> […]
>
>   >> This way, one may easily end up with hundreds of URI's, each naming
>   >> one and the only person which was unfortunate enough to sit next to
>   >> our Lee.
>
>   >> … And don't forget about all the owl:sameAs arcs necessary to manage
>   >> this crowd!
>
>   > OK, sure.  Why is having hundreds of URIs for this person any worse
>   > than having hundreds of distinct blank nodes?
>
>  First of all, I'd assume that a typical RDF store implementation
>  will assign temporary identifiers (most likely integers) to
>  /all/ the nodes — both blank and named.  This way, one could
>  conserve space by /not/ storing permanent identifiers (URI's) in
>  addition to the temporary ones.

As you seem to acknowledge, storage conservation doesn't seem like a 
particularly compelling reason here :-)

>
>  But perhaps even more compelling reason to use blank nodes is
>  that instead of introducing owl:sameAs arcs, one may just
>  replace two (or more) distinct blank nodes, — found to be
>  representing the same entity, — with a sole node possessing the
>  union of the properties of such blank nodes.  (Provided we check
>  for, and resolve, any semantic conflicts there are, that is.)
>

OK, great, now we're getting somewhere. So, /if:
/

 1. Your system does not support owl:sameAs (or you prefer not to use it
    to avoid performance penalties or complications)
 2. URIs that you mint are (potentially) used outside of your system (so
    that you can't just smush away extra URIs the way you would with
    blank nodes)
 3. You don't support or are not worried about "told blank nodes" (that
    could be reused down the line)

/then/, by using blank nodes rather than URIs, you can freely smush 
together resources that represent the same entity.

In my particular experience, the costs of using blank nodes tends to be 
a higher cost than the cost of maintaining (& relating) multiple URIs 
for the same entity. Perhaps my experience would be different if I were 
working more frequently with public data that is more likely to be 
reused and re-minted outside of my control. On the other hand, in cases 
in which data is created outside of your control, you also have no 
control over whether the people minting that data use blank nodes or URIs :)

Lee

Received on Tuesday, 18 December 2012 16:35:21 UTC