Re: Blank Nodes Re: Toward easier RDF: a proposal from Hugh Glaser on 2018-11-29 (semantic-web@w3.org from November 2018)

From: Hugh Glaser <hugh@glasers.org>
Date: Thu, 29 Nov 2018 23:12:14 +0000
To: David Booth <david@dbooth.org>
Cc: semantic-web@w3.org, Henry Story <henry.story@bblfish.net>
Message-Id: <3C5B70E5-9F0A-4114-B439-0CBD59B462C2@glasers.org>
> In my own experience, objects composed of literal attributes like this generally *do* form a composite key, though perhaps other RDF developers have had different experience. 

Since you ask :-)

I’m sorry to report that my experience is that they often don’t.
And even where they appear to, it is often the case that once you understand the domain properly, they turn out not to.
(As per my observations about bibliographic citations and about JS Bach’s works, in an earlier post.)

So it would be folly to try to automagically generate URIs for such bNodes in general - generated unique URIs or sufficiently large random ones is the best that you can do.
I think that if you consider *all* the properties, essentially the SCBD, you might get away with it, almost always.
(As someone else pointed out.)
But if you did that, you are not achieving the other objective people may have of actually getting the URIs for two addresses to collide.

If you do know the context you may be able to make sensible decisions.
But for example looking at addresses below.
If they are location, that is easier, but postal address gets more difficult.
For a start, it is very unusual from disparate datasets to get such well and uniformly formatted addresses.
They may have the County or Parish in as well, or a comma after the house number, or Street, St, Str, Str., St.., or be missing the Country.
In Germany (I think) and much of the European mainland, you would need to have Joe or Monica’s family names in order to have a proper postal address (they won’t deliver to a property unless the letter is addressed with the right name).
And on and on, as you move around the world.

And you need to decide what to do about missing fields.
Oh, and by the way you need to filter out all the variants of “Unknown” (and “Test”s too)  that will have been in the source - but that should have already been done.
I guess “Unknown”s really are “proper” bNodes.

But that doesn’t mean some useful stuff can’t be done.
The main address may need to have a unique ID, but there should have been plenty of other bNodes in that RDF.
I come back again (sorry!) to the point that 
>     :addressLocality "Phoenix" ;
>     :addressRegion "AZ" ;
>     :addressCountry "US" .

at least, should be bNodes in this.
So it has something clearly more like
_:b3 a :AddressRegion ;
 rdfs:label “AZ” .
So now you want to compose literal attributes to generate a URI for _:b3 and you can confidently do that with the other literals from the SCDB.

<moan>
I am disappointed that people seem to use the sort of RDF with an xsd:string “AZ” as if it could possibly be a AddressRegion.
This sort of RDF makes the discussion both simpler and more complex than it should be, and encourages misunderstandings.
Apart from that, it makes me feel sick when I look at it as an example of how RDF might be used - I can’t imagine anyone using that for anything serious.
</moan>

> On 29 Nov 2018, at 05:02, David Booth <david@dbooth.org> wrote:
> 
> On 11/28/18 10:41 AM, Henry Story wrote:
>> A person looking at the Json [of a postal address] sees the same address because they think of of a number of things:
>>   1. . . .
>>   2.  they could decide that two addresses are the same if they have exactly the same attributes and values.
> 
> You're right, I forgot to mention one very important assumption: that the attributes and values of those addresses constitute a composite *key*.  That is what allows us to logically conclude that they are the *same* address.  For something like a postal address the attributes and values naturally do constitute a composite key, and we humans are so accustomed to knowing this that it is easy to forget that the computer will *not* necessarily know this . . . unless we tell it to do so.  Let me explain further.
> 
> Suppose we have these address entries for joe and monica:
> 
> :joe :address _:b1 .
> _:b1 a "PostalAddress" ;
>     :streetAddress "123 West Jefferson Street" ;
>     :addressLocality "Phoenix" ;
>     :addressRegion "AZ" ;
>     :postalCode "85003" ;
>     :addressCountry "US" .
> 
> :monica :address _:b2 .
> _:b2 a "PostalAddress" ;
>     :streetAddress "123 West Jefferson Street" ;
>     :addressLocality "Phoenix" ;
>     :addressRegion "AZ" ;
>     :postalCode "85003" ;
>     :addressCountry "US" .
> 
> Do joe and monica have the *same* address?  In other words, is the above logically equivalent to writing the following?
> 
> :joe :address _:b1 .
> _:b1 a "PostalAddress" ;
>     :streetAddress "123 West Jefferson Street" ;
>     :addressLocality "Phoenix" ;
>     :addressRegion "AZ" ;
>     :postalCode "85003" ;
>     :addressCountry "US" .
> 
> :monica :address _:b1 .
> 
> If we know that those attributes form a composite key, then the answer is yes.  Otherwise, the answer is no, because for example the following two statements may show up elsewhere in the graph:
> 
> _:b1 :addressPlanet "Earth" .
> _:b2 :addressPlanet "Alpha7" .
> 
> In my own experience, objects composed of literal attributes like this generally *do* form a composite key, though perhaps other RDF developers have had different experience.  But even if the attributes do not form a composite key, I am convinced that in general objects like this *should* have some kind of key -- i.e., it is beneficial to give them a key if they don't naturally have one.  This corresponds directly to standard good practice for tables in relational databases: that every table should have a primary key (which could be composite).
> 
> David Booth
>
Received on Thursday, 29 November 2018 23:22:36 UTC