W3C home > Mailing lists > Public > semantic-web@w3.org > June 2021

Signing and Verifying RDF Datasets for Dummies (like Me!)

From: Peter Patel-Schneider <pfpschneider@gmail.com>
Date: Mon, 07 Jun 2021 15:37:44 -0400
Message-ID: <4bf3cc612a74cf17dc19328933a0ee9b3348f9e2.camel@gmail.com>
To: semantic-web@w3.org
Here's my version of "Signing and Verifying RDF Datasets for Dummies".


If you want to sign and verify documents (sequences of Unicode code
points), encode the document in utf-8 and sign and verify a hash of the
octet sequence.  Transmit the octet sequence along with the signed
hash.

If you want to sign and verify RDF datasets, serialize the dataset in
N-Quads and sign and verify that document.  When a receiver
deserializes the document the result will be isomorphic to the dataset
that the sender had.   Don't use a syntax that allows relative IRIs
(e.g., Turtle) as relative IRIs may turn into different absolute IRIs
when the document is deserialized.  Don't use a syntax that allows
remote resources to affect deserialization (e.g., JSON-LD) as these
remote resources can be modified by an attacker.  Don't use a syntax
where parts of the document that don't serialize parts of the datatset
look as if they might be important (e.g., RDFa) as receivers might come
to depend on these non-coding parts.  Don't use a syntax where it is
not obvious which parts of the document serialize parts of the dataset
(e.g., JSON-LD) as receivers might be confused as to just what dataset
is being transmitted.  Don't use a syntax where the mapping from the
serialization to the dataset is poorly defined in practice (e.g., JSON-
LD).

If you want to sign and verify RDF datasets and you want isomorphic RDF
datasets to have the same signature, you first need to define a
canonical serialization for RDF datasets so that isomorphic RDF
datasets have the same canonical form.  To sign and verify, create the
canonical serialization for the RDF dataset and sign and verify that. 
Use N-Quads for this canonical form for the reasons above.  Don't
transmit any encoding other than the N-Quads canonical form for the
reasons above, and more.  If you don't want to depend on a complex
algorithm to produce the canonical form then forbid blank nodes.  

This pretty much boils down to just using and only transmitting the
simplest and most transparent document format possible because anything
else just adds extra problems and that N-Quads is the simplest and most
transparent document format for RDF datasets.


My takeaway from this is that any W3C WG that is trying to standardize
something that involves signing and verifying RDF datasets should only
use N-Quads to transmit these datasets.


peter
Received on Monday, 7 June 2021 19:39:16 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:46:09 UTC