- From: Aidan Hogan <aidhog@gmail.com>
- Date: Thu, 2 Jul 2020 13:30:53 -0400
- To: semantic-web@w3.org
On 2020-07-01 9:09, Dieter Fensel wrote: > Actually skolemization is quite an old concept of computational logic (I > guess older than most of us) to deal with existential variables. > Unfortunately it comes along with the typical closed world assumptions > of logic assuming that you know all terms and can safely generate a > unique new term. In the open and dynamic environment of the web this may > cause problems. What happens if two people use the same skolemization > generator and their stuff gets merged? A few people (including myself) have tried to address that question from different perspectives, and a promising proposal is to compute a deterministic skolem (IRI) for each blank node based on the information connected to it (possibly recursively through other blank nodes). The idea is that the algorithm will assign the same skolem to two blank nodes if and only if they have precisely the same surrounding information in the local RDF graph (recalling that blank nodes are local to an RDF graph, we can know that we have the full information available within a single graph for each particular blank node). Formally the skolemisation algorithm preserves RDF isomorphism. Two RDF graphs are isomorphic if and only if their skolemised versions are precisely equal (which implies isomorphism). So this gets rid of the problems with clashes, creating skolem IDs deterministically. It also means that if two parsers parse the same RDF document into two graphs, but each parser assigns different labels to the blank nodes, a canonical algorithm will subsequently assign the corresponding blank nodes in both graphs the same skolem. There are some minor caveats to the process (e.g., there might be a hash collision, the worst cases are exponential), but in practice it works well (e.g., the probably of there being a hash collision for 256 or 512 bit hashes is astronomically lower than the probability of a global catastrophe that would render the hash collision moot especially now that we are in 2020, and the worst-cases are exceedingly exotic). Some further reading for RDF graphs (including algorithms to further preserve RDF equivalence): http://aidanhogan.com/docs/rdf-canonicalisation.pdf An implementation of the algorithm in Java: https://blabel.github.io/ Work towards a spec along similar lines (including RDF datasets): https://json-ld.github.io/normalization/spec/ https://github.com/iherman/canonical_rdf
Received on Thursday, 2 July 2020 17:31:08 UTC