- From: Thomas Passin <tpassin@tompassin.net>
- Date: Tue, 27 Nov 2018 22:47:53 -0500
- To: semantic-web@w3.org
On 11/27/2018 10:01 PM, David Booth wrote: > On 11/27/18 2:04 PM, Nathan Rixham wrote: > . . . >> Here's an extract: >> { >> ... >> "name": "County Assessor's Office", >> "address": { >> "@type": "PostalAddress", >> "streetAddress": "123 West Jefferson Street", >> "addressLocality": "Phoenix", >> "addressRegion": "AZ", >> "postalCode": "85003", >> "addressCountry": "US" >> }, >> "geo": { >> "@type": "GeoCoordinates", >> "latitude": 33.4466, >> "longitude": -112.07837 }, >> } >> . . . >> [To] have the same address or geo coordinates published on tens of >> thousands of different websites, all using a different ID (uri) would >> be a huge, horrible, mess. > > Not so fast. Two points: > > - Unless you make a unique name assumption with URIs, that huge, > horrible mess is pretty much the situation we already have using blank > nodes. Except that in some ways the current situation is *worse*, > because the same data loaded twice cause duplicate triples (non-lean), > whereas that would be automatically avoided if URIs were usesd. But the key point here is that they might or might not be duplicates. And the types and predicates (the semantics of table and column names, if you get right down to it, since a lot of linked data comes from relational databases) might or might not be the same. There has to be some way to get decent assurances that they *are* the same, before the graphs get merged. Tinkering with the RDF specs, and having ways to canonically name blank nodes, won't handle this problem. It's a data and semantics problem instead. Avoiding premature identifiers is actually helpful in this kind of situation. I think that Nathan would agree with me here (right, Nathan ??). On top of that, many of these data graphs that one wants to merge won't be either isomorphic to each other, or be subsets or supersets. In that situation, I don't see how a blank node identifying algorithm that has to traverse and consider the whole graph can spit out identifiers that will make corresponding blank nodes in the various graphs reliably have to the same identifier. That's the kind of algorithm that Aidan Hogan's papers talk about, isn't it, ones that consider the entire graph? (Of course, maybe I misunderstand, because his work is pretty complex. I understand it to be a way that different processors can consider the same graph - essentially the same graph except for some forms of canonicalization such as degree of leaning - and come up with the same identifiers for the blank nodes, and the same canonical form). TomP
Received on Wednesday, 28 November 2018 03:48:29 UTC