Re: Blank Nodes Re: Toward easier RDF: a proposal

On 11/27/2018 10:01 PM, David Booth wrote:
> On 11/27/18 2:04 PM, Nathan Rixham wrote:
> . . .
>> Here's an extract:
>> {
>>     ...
>>    "name": "County Assessor's Office",
>>    "address": {
>>      "@type": "PostalAddress",
>>      "streetAddress": "123 West Jefferson Street",
>>      "addressLocality": "Phoenix",
>>      "addressRegion": "AZ",
>>      "postalCode": "85003",
>>      "addressCountry": "US"
>>    },
>>    "geo": {
>>      "@type": "GeoCoordinates",
>>      "latitude": 33.4466,
>>      "longitude": -112.07837  },
>> }
>> . . .
>> [To] have the same address or geo coordinates published on tens of 
>> thousands of different websites, all using a different ID (uri) would 
>> be a huge, horrible, mess.
> 
> Not so fast.  Two points:
> 
>   - Unless you make a unique name assumption with URIs, that huge, 
> horrible mess is pretty much the situation we already have using blank 
> nodes.  Except that in some ways the current situation is *worse*, 
> because the same data loaded twice cause duplicate triples (non-lean), 
> whereas that would be automatically avoided if URIs were usesd.

But the key point here is that they might or might not be duplicates. 
And the types and predicates (the semantics of table and column names, 
if you get right down to it, since a lot of linked data comes from 
relational databases) might or might not be the same.  There has to be 
some way to get decent assurances that they *are* the same, before the 
graphs get merged.  Tinkering with the RDF specs, and having ways to 
canonically name blank nodes, won't handle this problem. It's a data and 
semantics problem instead.

Avoiding premature identifiers is actually helpful in this kind of 
situation.  I think that Nathan would agree with me here (right, Nathan ??).

On top of that, many of these data graphs that one wants to merge won't 
be either isomorphic to each other, or be subsets or supersets.  In that 
situation, I don't see how a blank node identifying algorithm that has 
to traverse and consider the whole graph can spit out identifiers that 
will make corresponding blank nodes in the various graphs reliably have 
to the same identifier.  That's the kind of algorithm that Aidan Hogan's 
papers talk about, isn't it, ones that consider the entire graph?  (Of 
course, maybe I misunderstand, because his work is pretty complex.  I 
understand it to be a way that different processors can consider the 
same graph - essentially the same graph except for some forms of 
canonicalization such as degree of leaning - and come up with the same 
identifiers for the blank nodes, and the same canonical form).

TomP

Received on Wednesday, 28 November 2018 03:48:29 UTC