Re: Make processing of RDF deterministic

On 07/11/2018 12:57 PM, Victor Porton wrote:
> I am writing a program which takes decisions based on several RDF files 
> which it may download.
> How to make my program deterministic? (no change in the RDF files => no 
> change in program decisions)
> So I want to retrieve triples in a fixed ("deterministic") order, if 
> this is possible.
> I use Python with rdflib.

Short answer: canonicalize your RDF files when you receive them, by 
parsing and re-serializing using a suitable tool.  Then compare the 
newly receive canonical file with the previous canonical file, using 
standard text-based diff comparison, to find out if anything changed.

Longer explanation: This is a weakness in standard RDF, and the origin 
of the problem is due to the semantics of blank nodes.  Instead of being 
able to easily compare two RDF graphs for equality, as you can do in 
most data representations, in RDF you have to check for graph 
isomorphism, which according to wikipedia "is not known to be solvable 
in polynomial time nor to be NP-complete".   (I don't know if rdflib 
offers a graph isomorphism function, but if so then you could use that.)

This graph isomorphism problem is why no RDF canonicalization algorithm 
has been adopted as a W3C standard to date.  However, most RDF graphs in 
practice do not cause the canonicalization algorithms to blow up.  And 
if blank node usage is modestly restricted to avoid blank node cycles, 
then the canonicalization algorithms are guaranteed to be easy and fast. 
  This is a direction that I advocate and described in "Well Behaved 
RDF: A Straw-Man Proposal for Taming Blank Nodes":

One bit of good news is that there has been significant progress in 
JSON-LD toward adopting a canonicalization standard, in part because it 
is also needed for digital signatures.  A draft spec is here (though at 
the moment it is called "normalization" instead of "canonicalization"):
Unfortunately that document is out of scope for the current JSON-LD 
working group, so there is still no clear timeline for it to become a 
W3C standard:

I hope that helps.

David Booth

Received on Wednesday, 11 July 2018 19:09:13 UTC