- From: Peter Patel-Schneider <pfpschneider@gmail.com>
- Date: Mon, 07 Jun 2021 15:37:44 -0400
- To: semantic-web@w3.org
Here's my version of "Signing and Verifying RDF Datasets for Dummies". If you want to sign and verify documents (sequences of Unicode code points), encode the document in utf-8 and sign and verify a hash of the octet sequence. Transmit the octet sequence along with the signed hash. If you want to sign and verify RDF datasets, serialize the dataset in N-Quads and sign and verify that document. When a receiver deserializes the document the result will be isomorphic to the dataset that the sender had. Don't use a syntax that allows relative IRIs (e.g., Turtle) as relative IRIs may turn into different absolute IRIs when the document is deserialized. Don't use a syntax that allows remote resources to affect deserialization (e.g., JSON-LD) as these remote resources can be modified by an attacker. Don't use a syntax where parts of the document that don't serialize parts of the datatset look as if they might be important (e.g., RDFa) as receivers might come to depend on these non-coding parts. Don't use a syntax where it is not obvious which parts of the document serialize parts of the dataset (e.g., JSON-LD) as receivers might be confused as to just what dataset is being transmitted. Don't use a syntax where the mapping from the serialization to the dataset is poorly defined in practice (e.g., JSON- LD). If you want to sign and verify RDF datasets and you want isomorphic RDF datasets to have the same signature, you first need to define a canonical serialization for RDF datasets so that isomorphic RDF datasets have the same canonical form. To sign and verify, create the canonical serialization for the RDF dataset and sign and verify that. Use N-Quads for this canonical form for the reasons above. Don't transmit any encoding other than the N-Quads canonical form for the reasons above, and more. If you don't want to depend on a complex algorithm to produce the canonical form then forbid blank nodes. This pretty much boils down to just using and only transmitting the simplest and most transparent document format possible because anything else just adds extra problems and that N-Quads is the simplest and most transparent document format for RDF datasets. My takeaway from this is that any W3C WG that is trying to standardize something that involves signing and verifying RDF datasets should only use N-Quads to transmit these datasets. peter
Received on Monday, 7 June 2021 19:39:16 UTC