Re: RDF graph serialization as bytes: A solved problem? from Tobias Kuhn on 2020-09-01 (public-lod@w3.org from September 2020)

From: Tobias Kuhn <kuhntobias@gmail.com>
Date: Tue, 1 Sep 2020 09:13:22 +0200
To: semantic-web@w3.org, Linking Open Data <public-lod@w3.org>
Message-ID: <721821b9-4cbc-c840-e621-e19b218bbd95@gmail.com>

Dear all,

On this topic, you might be interested in this related work of ours from 
a few years ago where we proposed Trusty URIs: 
https://arxiv.org/abs/1401.5775

It also includes a graph canonicalization/serialization algorithm. It 
doesn't cover blank nodes, but includes a canonical method to skolemize 
them *before* the data is published (in which case there is no hard 
problem of graph isomorphism).

This approach is designed for cases where you want to create a hash for 
RDF content and then include that hash in the URI that stands for the 
content, in a way that the URI can also occur in the data. In more 
recent work, we also used this for digital signatures. It is not aligned 
with the other initiatives mentioned earlier because, to the best of my 
knowledge, they happened after we started with our work.

Regards,
Tobias


On 01.09.20 00:14, David Booth wrote:
> On 8/31/20 4:13 PM, Harry Halpin wrote:
>> I am reading the W3C Verified Credentials Data Model, and I'm noticing 
>> there's not a W3C Verified Credentials Syntax 
>> (https://www.w3.org/TR/vc-data-model/#syntaxes). Instead, there is 
>> JSON and JWT, JSON-LD, perhaps with LD Proofs. The obvious problem is 
>> that you cannot specify a cryptographic signature scheme unless you 
>> have a concrete bytestring you are signing (you usually have to hash 
>> the message to sign). So, its quite unclear what it means to "sign" a 
>> graph unless you have a single version of the graph as *bytes*.
> 
> The lack of a standard graph (or dataset) canonicalization for RDF is 
> recorded as issue #26, and remains an unsolved problem:
> https://github.com/w3c/EasierRDF/issues/26
> 
>> There's a Community Specification called "RDF Dataset Normalization":
>>
>> http://json-ld.github.io/normalization/spec/
> 
> AFAIK that is the closest we have come toward reaching a standard for 
> this, and I'm grateful that the JSON-LD group got as far as they did 
> with it.  However, it does have one very significant gap that I believe 
> is important to address: it is focused only on the digital signatures 
> use case.  The algorithm needs improvement to better address the diff 
> use case, in which small, localized graph changes should result in 
> small, localized differences in the canonicalized graph.  Aidan Hogan 
> has done a lot of work on blank nodes and canonicalization that could 
> probably help.  Here is one of his papers:
> http://aidanhogan.com/docs/rdf-canonicalisation.pdf
> 
> David Booth
> 
>>
>> However, it does not actually specify a syntax, just a graph 
>> normalization algorithm (which is unclear if it actually works, 
>> usually you need proofs for these sorts of things).
>>
>> Second, there is Linked Data Proofs, which also does not actually seem 
>> to feature a way to convert arbitrary linked data graphs to bytes and 
>> is also not normative.
>>
>> https://w3c-ccg.github.io/ld-proofs/
>>
>> Perhaps this is just a solved problem, but given that the usage of 
>> signatures in Verified Credentials requires getting this right (see 
>> the various attacks on XML DSIG), I'd like to know if 1) there is a 
>> normative normalization to bytes of RDF graphs and 2) If it has some 
>> proofs or real interoperability, not just a JS library.
>>
>>     thanks,
>>         harry
>>
>>
>>
>>
>

Received on Tuesday, 1 September 2020 07:13:37 UTC