Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

Thanks for the pointer. We may use this work in a project where we try 
to normalize URIs for a German Tourstic Knowledge Graph. There may be 
different URIs from different sources and we normalise these and give 
them a normative URl from established data sources on the web if 
possible (sorry in German [1]). Otherwise we some randomly generated 
IDs. Your work also may have relations to ID generation like studies in 
the OCCAM project or general in the IoT field (as Dan indirectly points 
out).

[1]

On 02.07.2020 19:30, Aidan Hogan wrote:
> On 2020-07-01 9:09, Dieter Fensel wrote:
>> Actually skolemization is quite an old concept of computational logic 
>> (I guess older than most of us) to deal with existential variables. 
>> Unfortunately it comes along with the typical closed world 
>> assumptions of logic assuming that you know all terms and can safely 
>> generate a unique new term. In the open and dynamic environment of 
>> the web this may cause problems. What happens if two people use the 
>> same skolemization generator and their stuff gets merged?
>
> A few people (including myself) have tried to address that question 
> from different perspectives, and a promising proposal is to compute a 
> deterministic skolem (IRI) for each blank node based on the 
> information connected to it (possibly recursively through other blank 
> nodes).
>
> The idea is that the algorithm will assign the same skolem to two 
> blank nodes if and only if they have precisely the same surrounding 
> information in the local RDF graph (recalling that blank nodes are 
> local to an RDF graph, we can know that we have the full information 
> available within a single graph for each particular blank node).
>
> Formally the skolemisation algorithm preserves RDF isomorphism. Two 
> RDF graphs are isomorphic if and only if their skolemised versions are 
> precisely equal (which implies isomorphism).
>
> So this gets rid of the problems with clashes, creating skolem IDs 
> deterministically. It also means that if two parsers parse the same 
> RDF document into two graphs, but each parser assigns different labels 
> to the blank nodes, a canonical algorithm will subsequently assign the 
> corresponding blank nodes in both graphs the same skolem.
>
> There are some minor caveats to the process (e.g., there might be a 
> hash collision, the worst cases are exponential), but in practice it 
> works well (e.g., the probably of there being a hash collision for 256 
> or 512 bit hashes is astronomically lower than the probability of a 
> global catastrophe that would render the hash collision moot 
> especially now that we are in 2020, and the worst-cases are 
> exceedingly exotic).
>
> Some further reading for RDF graphs (including algorithms to further 
> preserve RDF equivalence):
>
>     http://aidanhogan.com/docs/rdf-canonicalisation.pdf
>
> An implementation of the algorithm in Java:
>
>     https://blabel.github.io/
>
> Work towards a spec along similar lines (including RDF datasets):
>
>     https://json-ld.github.io/normalization/spec/
>     https://github.com/iherman/canonical_rdf
>
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________

-- 
Dieter Fensel
Chair STI Innsbruck
University of Innsbruck, Austria
www.sti-innsbruck.at/
tel +43-664 3964684

Received on Sunday, 5 July 2020 14:38:22 UTC