Re: RDF graph serialization as bytes: A solved problem? from pukkamustard on 2020-09-01 (semantic-web@w3.org from September 2020)

From: pukkamustard <pukkamustard@posteo.net>
Date: Tue, 01 Sep 2020 06:53:16 +0200
To: Harry Halpin <hhalpin@ibiblio.org>
Cc: Linking Open Data <public-lod@w3.org>, semantic-web@w3.org, David Booth <david@dbooth.org>
Message-ID: <86zh6ahzcz.fsf@posteo.net>

Hi Harry,

I have recently been working on making RDF content-addressable. 
This
includes finding a canonical serialization of RDF [1]. The 
serialization
I proposed is based on Canonical S-Expressions [2], which is very 
simple
to implement. However, it is restricted to a subset of RDF without 
Blank
Nodes.

It would be great to use the Skolemization proposed by Aidan Hogan 
[3]
(as also linked in the issue [4] posted by David) to make the 
scheme
above work for general RDF.

There is also HDT [5], a binary serialization of RDF with some 
neat
tricks. I am looking into whether HDT could be used as a canonical
serialization.

-pukkamustard


[1] https://openengiadina.net/papers/content-addressable-rdf.html
[2] https://people.csail.mit.edu/rivest/Sexp.txt
[3] http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf
[4] https://github.com/w3c/EasierRDF/issues/26
[5] http://www.rdfhdt.org/


David Booth <david@dbooth.org> writes:

> On 8/31/20 4:13 PM, Harry Halpin wrote:
>> I am reading the W3C Verified Credentials Data Model, and I'm 
>> noticing there's
>> not a W3C Verified Credentials Syntax
>> (https://www.w3.org/TR/vc-data-model/#syntaxes). Instead, there 
>> is JSON
>> and JWT, JSON-LD, perhaps with LD Proofs. The obvious problem 
>> is that you
>> cannot specify a cryptographic signature scheme unless you have 
>> a concrete
>> bytestring you are signing (you usually have to hash the 
>> message to sign). So,
>> its quite unclear what it means to "sign" a graph unless you 
>> have a single
>> version of the graph as *bytes*.
>
> The lack of a standard graph (or dataset) canonicalization for 
> RDF is recorded
> as issue #26, and remains an unsolved problem:
> https://github.com/w3c/EasierRDF/issues/26
>
>> There's a Community Specification called "RDF Dataset 
>> Normalization":
>> http://json-ld.github.io/normalization/spec/
>
> AFAIK that is the closest we have come toward reaching a 
> standard for this, and
> I'm grateful that the JSON-LD group got as far as they did with 
> it.  However, it
> does have one very significant gap that I believe is important 
> to address: it is
> focused only on the digital signatures use case.  The algorithm 
> needs
> improvement to better address the diff use case, in which small, 
> localized graph
> changes should result in small, localized differences in the 
> canonicalized
> graph.  Aidan Hogan has done a lot of work on blank nodes and 
> canonicalization
> that could probably help.  Here is one of his papers:
> http://aidanhogan.com/docs/rdf-canonicalisation.pdf
>
> David Booth
>
>> However, it does not actually specify a syntax, just a graph
>> normalization algorithm (which is unclear if it actually works, 
>> usually you
>> need proofs for these sorts of things).
>> Second, there is Linked Data Proofs, which also does not 
>> actually seem
>> to feature a way to convert arbitrary linked data graphs to 
>> bytes and is also
>> not normative.
>> https://w3c-ccg.github.io/ld-proofs/
>> Perhaps this is just a solved problem, but given that the usage 
>> of
>> signatures in Verified Credentials requires getting this right 
>> (see the
>> various attacks on XML DSIG), I'd like to know if 1) there is a
>> normative normalization to bytes of RDF graphs and 2) If it has 
>> some proofs or
>> real interoperability, not just a JS library.
>>     thanks,
>>         harry
>>
>>

Received on Tuesday, 1 September 2020 04:53:39 UTC