W3C home > Mailing lists > Public > semantic-web@w3.org > November 2018

Re: 4. Lack of standard RDF canonicalization

From: Dan Brickley <danbri@danbri.org>
Date: Sat, 24 Nov 2018 10:36:12 -0800
Message-ID: <CAFfrAFrkehKYw42ihfttrq6eHV5ADG+x=wuyAsrh-QSppjdt8g@mail.gmail.com>
To: Tim Berners-Lee <timbl@w3.org>
Cc: ivan@w3.org, David Booth <david@dbooth.org>, SW-forum Web <semantic-web@w3.org>, Dan Brickley <danbri@google.com>, "Sean B. Palmer" <sean@miscoranda.com>, Olaf Hartig <olaf.hartig@liu.se>, "Prof. Axel Polleres" <axel@polleres.net>
On Sat, 24 Nov 2018, 05:16 Tim Berners-Lee <timbl@w3.org wrote:

> On 2018-11 -22, at 17:08, Ivan Herman <ivan@w3.org> wrote:
> Hi David,
> 4. Lack of standard RDF canonicalization.  Canonicalization
> is the ability to represent RDF in a consistent, predictable
> serialization.  It is essential for diff and digital signatures.
> Developers expect to be able to diff two files, and source
> control systems rely on being able to do so.  It is easy with
> most other data representations.  Why not RDF?  Answer: Blank
> nodes.  Unrestricted blank nodes cause RDF canonicalization
> to be a "hard problem", equivalent in complexity to the graph
> isomorphism problem.[6]
> Some recent good progress on canonicalization: JSON-LD
> https://json-ld.github.io/normalization/spec/ .  However, the
> current JSON-LD canonicalization draft (called "normalization")
> is focused only on the digital signatures use case, and
> needs improvement to better address the diff use case, in
> which small, localized graph changes should result in small,
> localized differences in the canonicalized graph.
> There has been some discussions around this lately. If you are interested,
> look at:
> https://github.com/w3c/strategy/issues/116
> In particular (specific comments as well as links from those comments):
> https://github.com/w3c/strategy/issues/116#issuecomment-383875628
> https://github.com/w3c/strategy/issues/116#issuecomment-384160630
> https://github.com/w3c/strategy/issues/116#issuecomment-395791130
> https://github.com/w3c/strategy/issues/116#issuecomment-435920927
> http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf
> http://aidanhogan.com/docs/rdf-canonicalisation.pdf
> http://json-ld.github.io/normalization/spec/index.html
> https://github.com/iherman/canonical_rdf
> https://lists.w3.org/Archives/Public/www-archive/2018Oct/0011.html
> It is still not clear how exactly we will move forward, but I have some
> hopes that this will happen sometimes in 2019. It depends on the
> availability of the people involved; the path to get this done is now
> relatively clear.
> All that being said: David's point is well taken on blank nodes. If there
> was no blank nodes around, it would be obvious. Looking at the details of
> the two available solutions (see points above) it is also true that there
> may be a middle ground: if the usage of blank nodes was somehow restricted
> avoiding circular patterns. I *think* (but I am not 100% sure) that if all
> blank nodes could be expressed by [] in turtle without any need for
> explicit bnode identifiers then both algorithms referred to above would
> become way simper.
> I think we should just do RDF canonicalization including blank nodes.
> It is not rocket science.
> I have a little python program which does it, used it a lot for comparing
> test results.
> An algorithm which works on real data is fine, it does not need to handle
> a n-dimentional hypercube of bnodes with no other nodes. It generates diffs.

Let's not mix up usecases. For diffing, it seems reasonable to appeal to
commonsense idea of "typical" graph structure. But as soon as we get into
security and crypto territory it is probably a bit reckless to say "most
real graphs don't have ..." w.r.t. bnode stats/shapes. Atypical cornercases
show up all the time in security holes...


> Or maybe we should just stick with the LDJSON one and make sure it is in
> all the code bases.
> Tim
> Ivan
> ----
> Ivan Herman, W3C
> Publishing@W3C Technical Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Saturday, 24 November 2018 18:36:48 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:57 UTC