Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

On 2020-07-01 9:09, Dieter Fensel wrote:
> Actually skolemization is quite an old concept of computational logic (I 
> guess older than most of us) to deal with existential variables. 
> Unfortunately it comes along with the typical closed world assumptions 
> of logic assuming that you know all terms and can safely generate a 
> unique new term. In the open and dynamic environment of the web this may 
> cause problems. What happens if two people use the same skolemization 
> generator and their stuff gets merged?

A few people (including myself) have tried to address that question from 
different perspectives, and a promising proposal is to compute a 
deterministic skolem (IRI) for each blank node based on the information 
connected to it (possibly recursively through other blank nodes).

The idea is that the algorithm will assign the same skolem to two blank 
nodes if and only if they have precisely the same surrounding 
information in the local RDF graph (recalling that blank nodes are local 
to an RDF graph, we can know that we have the full information available 
within a single graph for each particular blank node).

Formally the skolemisation algorithm preserves RDF isomorphism. Two RDF 
graphs are isomorphic if and only if their skolemised versions are 
precisely equal (which implies isomorphism).

So this gets rid of the problems with clashes, creating skolem IDs 
deterministically. It also means that if two parsers parse the same RDF 
document into two graphs, but each parser assigns different labels to 
the blank nodes, a canonical algorithm will subsequently assign the 
corresponding blank nodes in both graphs the same skolem.

There are some minor caveats to the process (e.g., there might be a hash 
collision, the worst cases are exponential), but in practice it works 
well (e.g., the probably of there being a hash collision for 256 or 512 
bit hashes is astronomically lower than the probability of a global 
catastrophe that would render the hash collision moot especially now 
that we are in 2020, and the worst-cases are exceedingly exotic).

Some further reading for RDF graphs (including algorithms to further 
preserve RDF equivalence):

 http://aidanhogan.com/docs/rdf-canonicalisation.pdf

An implementation of the algorithm in Java:

 https://blabel.github.io/

Work towards a spec along similar lines (including RDF datasets):

 https://json-ld.github.io/normalization/spec/
 https://github.com/iherman/canonical_rdf

Received on Thursday, 2 July 2020 17:31:08 UTC