Re: Make processing of RDF deterministic from David Booth on 2018-07-13 (semantic-web@w3.org from July 2018)

From: David Booth <david@dbooth.org>
Date: Thu, 12 Jul 2018 20:37:52 -0400
To: semantic-web@w3.org
Message-ID: <6a9ea7b4-0548-e20b-08c5-b7a90e905992@dbooth.org>
On 07/12/2018 08:01 PM, Sebastian Samaruga wrote:
> Hi, sorry for my ignorance but, could an XML DOM / XSL / XSLT approach 
> help with the declarative notion of templates working over a 
> canonicalized / normalized representation of a graph, maybe with help of 
> context resources / nodes?

If you are able to create a canonicalized representation of each graph 
then the main problem is solved, and I don't think it would matter much 
which serialization you use, as long as the serialization 
deterministically mirrors the canonicalization.

David Booth

> 
> Regards, Sebastian.
> 
> http://exampledotorg.blogspot.com
> 
> 
> On Wed, Jul 11, 2018, 4:15 PM David Booth <david@dbooth.org 
> <mailto:david@dbooth.org>> wrote:
> 
>     On 07/11/2018 12:57 PM, Victor Porton wrote:
>      > I am writing a program which takes decisions based on several RDF
>     files
>      > which it may download.
>      >
>      > How to make my program deterministic? (no change in the RDF files
>     => no
>      > change in program decisions)
>      >
>      > So I want to retrieve triples in a fixed ("deterministic") order, if
>      > this is possible.
>      >
>      > I use Python with rdflib.
> 
>     Short answer: canonicalize your RDF files when you receive them, by
>     parsing and re-serializing using a suitable tool.  Then compare the
>     newly receive canonical file with the previous canonical file, using
>     standard text-based diff comparison, to find out if anything changed.
> 
>     Longer explanation: This is a weakness in standard RDF, and the origin
>     of the problem is due to the semantics of blank nodes.  Instead of
>     being
>     able to easily compare two RDF graphs for equality, as you can do in
>     most data representations, in RDF you have to check for graph
>     isomorphism, which according to wikipedia "is not known to be solvable
>     in polynomial time nor to be NP-complete".   (I don't know if rdflib
>     offers a graph isomorphism function, but if so then you could use that.)
> 
>     This graph isomorphism problem is why no RDF canonicalization algorithm
>     has been adopted as a W3C standard to date.  However, most RDF
>     graphs in
>     practice do not cause the canonicalization algorithms to blow up.  And
>     if blank node usage is modestly restricted to avoid blank node cycles,
>     then the canonicalization algorithms are guaranteed to be easy and
>     fast.
>        This is a direction that I advocate and described in "Well Behaved
>     RDF: A Straw-Man Proposal for Taming Blank Nodes":
>     http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf.pdf
> 
>     One bit of good news is that there has been significant progress in
>     JSON-LD toward adopting a canonicalization standard, in part because it
>     is also needed for digital signatures.  A draft spec is here (though at
>     the moment it is called "normalization" instead of "canonicalization"):
>     https://json-ld.github.io/normalization/spec/index.html
>     Unfortunately that document is out of scope for the current JSON-LD
>     working group, so there is still no clear timeline for it to become a
>     W3C standard:
>     https://www.w3.org/2018/03/jsonld-wg-charter.html
> 
>     I hope that helps.
> 
>     David Booth
> 
> 
>
Received on Friday, 13 July 2018 00:38:16 UTC