W3C home > Mailing lists > Public > public-lod@w3.org > September 2020

Re: RDF graph serialization as bytes: A solved problem?

From: Ivan Herman <ivan@w3.org>
Date: Tue, 1 Sep 2020 09:15:01 +0200
Cc: Semantic Web <semantic-web@w3.org>, W3C LOD Mailing List <public-lod@w3.org>, Dave Longley <dlongley@digitalbazaar.com>, Manu Sporny <msporny@digitalbazaar.com>, Aidan Hogan <aidhog@gmail.com>, david@dbooth.org
Message-Id: <6CABE510-4CBD-4607-84FA-9574AF4260AC@w3.org>
To: Harry Halpin <hhalpin@ibiblio.org>
(+cc to the public-lod list, to join the two threads)

Hi Harry,

the problem you raise is absolutely real (as David also noted in [1]). There are some moves to set up a W3C Working Group to, finally, settle this issue via a proper Recommendation, but we are simply facing manpower issues (to be clear, by "we" I do not mean W3C but enough persons in the community to actively contribute to the necessary work). I have still not given up to be able to start the necessary process/work in the coming months, but do not hold your breath.

Just some notes on your remarks and also where we are.

The normalization spec below[2] (cc-ing Dave Longley, one of the co-editors of that document) is not dependent on any particular RDF serialization. What it does is to provide a canonical re-naming of all the blank nodes in an RDF graph. Once that hard problem is properly solved, the rest (e.g., defining a signature, the proof spec) become manageable engineering issues (e.g., serialize the graph into N-triples or N-quads using that canonical set of names, and use some average crypto to sign it). The fact that VC uses some particular syntax is not relevant at this point.

That normalization algorithm in [2] is very close in spirit to the one published by Aidan Hogan a few years ago (David also refers to it in [1]). To start with Adrian's approach and with [2] to go ahead towards a Recommendation requires, as you rightfully mention below, a rigorous  proof. While there were numerous reviews for Aidan's approach (having been published via traditional scholarly peer review and, at this point, I am confident in that) we still need to have this work done for [2]. That work is ongoing, there is a manuscript for a mathematical underpinning of [2], but there is still no independent review: this is what we need before W3C can enter into action and start a WG with the participation of Aidan (I hope:-) and the authors of [2]. We are the point of finding people who are willing and able to do such independent review.

As I said, my hope is that this preliminary background mathematical work can be done relatively soon, and we can then start the more tedious process of setting up a W3C Working Group (which of course needs, as you very well know, the backing of enough W3C members, but I believe that is doable). 



P.S. David, I believe that if the canonicalization work is done, the RDF diff problem can also solves, although you are right that this is a slightly different issue.

[1] https://www.w3.org/mid/53d0e9f4-d00c-7830-5f60-cb215535a07b@dbooth.org <https://www.w3.org/mid/53d0e9f4-d00c-7830-5f60-cb215535a07b@dbooth.org>
[2] http://json-ld.github.io/normalization/spec/ <http://json-ld.github.io/normalization/spec/>

> On 31 Aug 2020, at 22:13, Harry Halpin <hhalpin@ibiblio.org> wrote:
> I am reading the W3C Verified Credentials Data Model, and I'm noticing there's not a W3C Verified Credentials Syntax (https://www.w3.org/TR/vc-data-model/#syntaxes <https://www.w3.org/TR/vc-data-model/#syntaxes>). Instead, there is JSON and JWT, JSON-LD, perhaps with LD Proofs. The obvious problem is that you cannot specify a cryptographic signature scheme unless you have a concrete bytestring you are signing (you usually have to hash the message to sign). So, its quite unclear what it means to "sign" a graph unless you have a single version of the graph as *bytes*. 
> There's a Community Specification called "RDF Dataset Normalization":
> http://json-ld.github.io/normalization/spec/ <http://json-ld.github.io/normalization/spec/>
> However, it does not actually specify a syntax, just a graph normalization algorithm (which is unclear if it actually works, usually you need proofs for these sorts of things).
> Second, there is Linked Data Proofs, which also does not actually seem to feature a way to convert arbitrary linked data graphs to bytes and is also not normative.
> https://w3c-ccg.github.io/ld-proofs/ <https://w3c-ccg.github.io/ld-proofs/>
> Perhaps this is just a solved problem, but given that the usage of signatures in Verified Credentials requires getting this right (see the various attacks on XML DSIG), I'd like to know if 1) there is a normative normalization to bytes of RDF graphs and 2) If it has some proofs or real interoperability, not just a JS library.
>    thanks,
>        harry

Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704

Received on Tuesday, 1 September 2020 07:15:09 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 1 September 2020 07:15:10 UTC