W3C home > Mailing lists > Public > semantic-web@w3.org > July 2018

Re: Make processing of RDF deterministic

From: Sebastian Samaruga <ssamarug@gmail.com>
Date: Thu, 12 Jul 2018 21:01:41 -0300
Message-ID: <CAOLUXBvF2_EsQDUJsJFiQm-3XzupAqNGEbAOcUoMWEbbZv_DkA@mail.gmail.com>
To: David Booth <david@dbooth.org>
Cc: W3C Semantic Web IG <semantic-web@w3.org>
Hi, sorry for my ignorance but, could an XML DOM / XSL / XSLT approach help
with the declarative notion of templates working over a canonicalized /
normalized representation of a graph, maybe with help of context resources
/ nodes?

Regards, Sebastian.


On Wed, Jul 11, 2018, 4:15 PM David Booth <david@dbooth.org> wrote:

> On 07/11/2018 12:57 PM, Victor Porton wrote:
> > I am writing a program which takes decisions based on several RDF files
> > which it may download.
> >
> > How to make my program deterministic? (no change in the RDF files => no
> > change in program decisions)
> >
> > So I want to retrieve triples in a fixed ("deterministic") order, if
> > this is possible.
> >
> > I use Python with rdflib.
> Short answer: canonicalize your RDF files when you receive them, by
> parsing and re-serializing using a suitable tool.  Then compare the
> newly receive canonical file with the previous canonical file, using
> standard text-based diff comparison, to find out if anything changed.
> Longer explanation: This is a weakness in standard RDF, and the origin
> of the problem is due to the semantics of blank nodes.  Instead of being
> able to easily compare two RDF graphs for equality, as you can do in
> most data representations, in RDF you have to check for graph
> isomorphism, which according to wikipedia "is not known to be solvable
> in polynomial time nor to be NP-complete".   (I don't know if rdflib
> offers a graph isomorphism function, but if so then you could use that.)
> This graph isomorphism problem is why no RDF canonicalization algorithm
> has been adopted as a W3C standard to date.  However, most RDF graphs in
> practice do not cause the canonicalization algorithms to blow up.  And
> if blank node usage is modestly restricted to avoid blank node cycles,
> then the canonicalization algorithms are guaranteed to be easy and fast.
>   This is a direction that I advocate and described in "Well Behaved
> RDF: A Straw-Man Proposal for Taming Blank Nodes":
> http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf.pdf
> One bit of good news is that there has been significant progress in
> JSON-LD toward adopting a canonicalization standard, in part because it
> is also needed for digital signatures.  A draft spec is here (though at
> the moment it is called "normalization" instead of "canonicalization"):
> https://json-ld.github.io/normalization/spec/index.html
> Unfortunately that document is out of scope for the current JSON-LD
> working group, so there is still no clear timeline for it to become a
> W3C standard:
> https://www.w3.org/2018/03/jsonld-wg-charter.html
> I hope that helps.
> David Booth
Received on Friday, 13 July 2018 00:06:08 UTC

This archive was generated by hypermail 2.3.1 : Friday, 13 July 2018 00:06:09 UTC